Skip to content

Conversation

@jatinwadhwa921
Copy link

Backmerging with Msft commits

daijh and others added 30 commits July 9, 2025 04:04
### Description
Fix a broken URL and numbering in the ordered list in README.md.

### Motivation and Context
See Above.
…#25345)

### Description
For ScatterND, if the indices are empty (nothing to update), it becomes
a copy operation. So we should move the early return after copying.
### Description
- Provides utility functions that serialize an `OrtGraph` to a
`GraphProto` or `ModelProto`.
- Header-only file that can be copied to a project that builds with ORT
and ONNX.
- Available in
[include/onnxruntime/core/providers/utils/ort_graph_to_proto.h](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-ort-graph-to-onnx-protobuf/include/onnxruntime/core/providers/utils/ort_graph_to_proto.h)
- Updates the `Node_GetSubgraphs` API function to also return the
attribute names associated with each subgraph. This is required to
determine which subgraph corresponds to a given attribute.
- Adds `Graph_GetNumOperatorSets` and `Graph_GetOperatorSets` API
functions to get the opset version for each domain.



### Motivation and Context
Provide a utility to facilitate porting of existing execution providers
to the new EP ABI. The utilities introduced by this PR convert an
`OrtGraph` into an ONNX protobuf representation, which some existing EPs
currently convert to their internal representation. Ideally, we would
prefer a more direct conversion from a `OrtGraph` to the EP's internal
representation, but this is a large effort. These utilities enable an
incremental transition.
The library is not used. C++ itself already has std::optional.
…tCacheManager (microsoft#25276)

### Description
<!-- Describe your changes. -->
This PR is to move buffer release or cache from OnRefresh to
ReleaseBuffer in BucketCacheManager.

### Motivation and Context
The OnRefresh is executed after a batch(16) ep runs and inside the batch
runs, the buffer can not be really reused which is a waste for gpu
buffer resources. This PR proposed a strightforward optimization that
release or cache the buffer early in ReleaseBuffer instead of OnRefresh
to improve the buffer cache or release efficiency which will improve the
peak and average GPU memory usage. The experimental result also shows a
reasonable memory optimization without perf regressions.

#### Phi3
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 3603.83 | 3127.05 | 7.17 | 139.50
Default Bucket with Early Release Optimization | 3534.77 (+1.92%) |
3073.97 (+1.70%) | 7.14 (+0.36%) | 140.01 (+0.36%)

#### Deepseek-R1
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 2089.03 | 1716.15 | 6.07 | 164.67
Default Bucket with Early Release Optimization | 2034.00 (+2.63%) |
1674.49 (+2.43%) | 6.09 (-0.20%) | 164.34 (-0.20%)

#### LLama3.2-1B
Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen
Latency (ms) | Tokens/sec
-- | -- | -- | -- | --
Default Bucket | 1736.03 | 1424.64 | 3.37 | 296.53
Default Bucket with Early Release Optimization | 1659.78 (+4.39%) |
1366.78 (+4.06%) | 3.41 (-1.09%) | 293.34 (-1.08%)
### Description
- Adds multithreaded vectorized implementations of DequantizeLinear for
int8 and uint8 inputs:
  - Intel SSE 2
  - ARM NEON
- All other architectures fallback to a multithreaded scalar reference
implementation (previous was not multithreaded).
- **Note**: only enabled if ORT is built for client/on-device workloads
(`ORT_CLIENT_PACKAGE_BUILD` is defined).

INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op
threads (SSE 2 implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 2 | 2 | 1 |
| 40 K | 5 | 5 | 1 |
| 80 K | 11 | 4 | 2.75 |
| 100 K | 14 | 5 | 2.80 |
| 150 K | 21 | 7 | 3.00 |
| 200 K | 28 | 8 | 3.50 |
| 400 K | 68 | 15 | 4.53 |
| 600 K | 107 | 21 | 5.10 |
| 800 K | 142 | 28 | 5.07 |
| 1 M | 187 | 42 | 4.45 |
| 2 M | 376 | 102 | 3.69 |
| 4 M | 880 | 236 | 3.73 |
| 6 M | 1547 | 557 | 2.78 |
| 8 M | 2438 | 1097 | 2.22 |
| 10 M | 3192 | 1464 | 2.18 |
| 100 M | 38718 | 17733 | 2.18 |

INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4
intra op threads (NEON implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 1 | 1 | 1 |
| 40 K | 3 | 3 | 1 |
| 80 K | 7 | 4 | 1.75 |
| 100 K | 9 | 3 | 3.00 |
| 150 K | 14 | 5 | 2.80 |
| 200 K | 18 | 6 | 3.00 |
| 400 K | 38 | 10 | 3.80 |
| 600 K | 61 | 15 | 4.07 |
| 800 K | 76 | 19 | 4.00 |
| 1 M | 98 | 24 | 4.08 |
| 2 M | 204 | 48 | 4.25 |
| 4 M | 424 | 112 | 3.79 |
| 6 M | 677 | 384 | 1.76 |
| 8 M | 919 | 621 | 1.48 |
| 10 M | 1132 | 776 | 1.46 |
| 100 M | 11842 | 10566 | 1.12 |
### Motivation and Context
Improves latency of quantized QDQ models that with large DQs that
dominate the inference latency.
### Description
It is an extension of [Smooth
Softmax](microsoft#21867) feature.
The difference is that each head has a learnable smooth factor that
adding to the denominator of softmax. The smooth factor is like an extra
element that joins the softmax.

The usage of the smooth factor in softmax is like the following:
```math
softmax_{i} = \frac{exp(x_{i})}{exp(s)+ \sum_{j} exp(x_{j})}
```

The head_sink is a float tensor with length of number of attention
heads. For h-th head, `head_sink[h]` is used as smooth factor s. When
head_sink is not provided, constant 0 is used as smooth factor s.

Changes:
- [x] Update operator spec to add an optional new input `head_sink`
- [x] Implement CPU (MLAS) kernel.
- [x] Update test_gqa_cpu.py to test it.

CUDA kernel will be updated later in a separate PR.
Fix: `Microsoft.ML.OnnxRuntime.Managed.nupkg` artifact from GPU pipeline
does not have package version.


![image](https://github.com/user-attachments/assets/4a6135ab-4774-4aa6-aeb1-d5b06948ba8f)
Naming conflicts when expand-pool2d-squeeze (implemented as reshape) logic is invoked during ONNX -> QNN op lowering. Model with multiple pool 1D ops would hit this issue.
- Added TopK in registry.py so as to create QDQ nodes for the op
- Ensure that both the input and output quantization params are equal
- Added unit test to verify the creation of QDQ nodes for TopK

### Description:

Added support for creation of QDQ nodes for TopK when quantized with ORT static quantization tool

### Motivation and Context:

Currently there is support to form a node unit for TopK operator when QDQ nodes are present and both the input and output quantization params are equal. But there was no support to create QDQ nodes for TopK operator in the ORT static quantization tool
…microsoft#25185)

### Description
Development for webnn op input rank range check


### Motivation and Context
- refactor webnn op input rank check
- add validation for various ops 
- take `gemm` op as an example to perform inputs rank check of
decomposed ops

@Honry @fdwr PTAL
### Description

The parser does no longer link agains the plugin library but also loads
it dynamic. Due to that I think we should also make the library optional
in ORT. @chilo-ms
…f nodes (microsoft#25191)

Added an API that creates a sub-graph from a set of nodes in an
OrtGraph.
This API is needed in the GetCapability EP ABI porting when EP wants to
check whether a 'sub-graph' of the graph is supported by the hardware
backend.
### Description

This change is a follow up to microsoft#25130.

- consume duktape from vcpkg if --use_vcpkg is specified
- ~~add a Windows CI pipeline for dynamic WGSL template~~ (Will do in a
separate PR)
- upgrade wgsl-template package from 0.1.10 to 0.1.13
  - support adding contribop folder as input
add a build option to enable default options more appropriate for
client/on-device workloads.
initial use case will be to set the default thread pool allow_spinning
policy , which we want to default to 0/false for builds targeted for
client/on-device workloads.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
The `from` is not a property of `Float16Array` but an inherited
function, we can use `Float16Array['from']` to check if it is available.
Add a new API `Node_GetEpType` to get the EP that the node is assigned
to run on.

This API is needed when porting the plugin TRT EP in `GetCapability`
where ep needs to know whether the subgraph(s) of the control flow node
is assigned to the ep and then to add this control flow op to the
support list.
### Description
Enable DSP queue polling when performance profile is burst
### Description
 - Set context priority to low when workload type is Efficient
 - Set context priority to command line configured value if Default
 - Error out otherwise (invalid argument)
…osoft#25356)

Add Compile API ModelCompilationOptions_SetEpContextBinaryInformation to set the folder path and model name so that the EP can get the right place to dump the [model_name]_[ep].bin file.
### Description

Windows WebGPU CI: add build matrix for wgsl template
Use `inputShape.length - 1` instead of `inputShape.length` to avoid
out-of-bounds access.
Description (reference:
GHSA-5crp-9r3c-p9vr)
Newtonsoft.Json prior to version 13.0.1 is vulnerable to Insecure
Defaults due to improper handling of expressions with high nesting level
that lead to StackOverFlow exception or high CPU and RAM usage.
Exploiting this vulnerability results in Denial Of Service (DoS).

To mitigate the issue one either need to update Newtonsoft.Json to
13.0.1 or set MaxDepth parameter in the JsonSerializerSettings.
```
JsonConvert.DefaultSettings = () => new JsonSerializerSettings { MaxDepth = 128 };
```
This file is the only place using `JsonConvert`, so I blindly put this
fix and hope the warning will disappear.
Change to use `Node_GetEpName` API name to avoid confusion.
For plugin EPs, the EP factory can use whatever name that registered
with ORT, so make the API name `Node_GetEpName` to align with
`OrtEpFactory.GetName.`
### Description

This PR fixes the number of hidden layers used during the export of
Whisper by always using the number of hidden layers in the decoder.

### Motivation and Context

Most of the Whisper models contain the same number of hidden layers in
the encoder and decoder. However, Whisper large v3 turbo contains 32
hidden layers in the encoder and only 4 hidden layers in the decoder.

This PR also fixes [this
issue](microsoft/onnxruntime-genai#1611).
dependabot bot and others added 20 commits July 14, 2025 13:45
…transformers/models/llama (microsoft#25328)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.48.0 to 4.52.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>Patch release v4.51.3</h2>
<p>A mix of bugs were fixed in this patch; very exceptionally, we
diverge from semantic versioning to merge GLM-4 in this patch
release.</p>
<ul>
<li>Handle torch ver in flexattn (<a
href="https://redirect.github.com/huggingface/transformers/issues/37400">#37400</a>)</li>
<li>handle torch version edge cases (<a
href="https://redirect.github.com/huggingface/transformers/issues/37399">#37399</a>)</li>
<li>Add glm4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37388">#37388</a>)</li>
</ul>
<h1>Patch Release 4.51.2</h1>
<p>This is another round of bug fixes, but they are a lot more minor and
outputs were not really affected!</p>
<ul>
<li>Fix Llama4 offset (<a
href="https://redirect.github.com/huggingface/transformers/issues/37414">#37414</a>)
by <a
href="https://github.com/Cyrilvallez"><code>@​Cyrilvallez</code></a></li>
<li>Attention Quantization with FBGemm &amp; TP (<a
href="https://redirect.github.com/huggingface/transformers/issues/37384">#37384</a>)
by <a
href="https://github.com/MekkCyber"><code>@​MekkCyber</code></a></li>
<li>use rms_norm_eps for the L2Norm for Llama4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37418">#37418</a>)
by <a
href="https://github.com/danielhanchen"><code>@​danielhanchen</code></a></li>
<li>mark llama4 as not supported with fa2 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37416">#37416</a>)
by <a
href="https://github.com/winglian"><code>@​winglian</code></a></li>
</ul>
<h1>Patch release v4.51.1</h1>
<p>Since the release of Llama 4, we have fixed a few issues that we are
now releasing in patch v4.51.1</p>
<ul>
<li>Fixing flex attention for torch=2.6.0 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37285">#37285</a>)</li>
<li>more fixes for post-training llama4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37329">#37329</a>)</li>
<li>Remove HQQ from caching allocator warmup (<a
href="https://redirect.github.com/huggingface/transformers/issues/37347">#37347</a>)</li>
<li>fix derived berts _init_weights (<a
href="https://redirect.github.com/huggingface/transformers/issues/37341">#37341</a>)</li>
<li>Fix init empty weights without accelerate (<a
href="https://redirect.github.com/huggingface/transformers/issues/37337">#37337</a>)</li>
<li>Fix deepspeed with quantization (<a
href="https://redirect.github.com/huggingface/transformers/issues/37324">#37324</a>)</li>
<li>fix llama4 training (<a
href="https://redirect.github.com/huggingface/transformers/issues/37319">#37319</a>)</li>
<li>fix flex attn when optional args aren't passed (<a
href="https://redirect.github.com/huggingface/transformers/issues/37327">#37327</a>)</li>
<li>Multiple llama4 fixe (<a
href="https://redirect.github.com/huggingface/transformers/issues/37353">#37353</a>)</li>
</ul>
<p>Thanks all for your patience</p>
<h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2>
<h2>New Model Additions</h2>
<h3>Llama 4</h3>
<p><img
src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4"
alt="image" /></p>
<p>Llama 4, developed by Meta, introduces a new auto-regressive
Mixture-of-Experts (MoE) architecture.This generation includes two
models:</p>
<ul>
<li>The highly capable Llama 4 Maverick with 17B active parameters out
of ~400B total, with 128 experts.</li>
<li>The efficient Llama 4 Scout also has 17B active parameters out of
~109B total, using just 16 experts.</li>
</ul>
<p>Both models leverage early fusion for native multimodality, enabling
them to process text and image inputs. Maverick and Scout are both
trained on up to 40 trillion tokens on data encompassing 200 languages
(with specific fine-tuning support for 12 languages including Arabic,
Spanish, German, and Hindi).</p>
<p>For deployment, Llama 4 Scout is designed for accessibility, fitting
on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization,
while Maverick is available in BF16 and FP8 formats. These models are
released under the custom Llama 4 Community License Agreement, available
on the model repositories</p>
<p>Getting started with Llama 4 using transformers is straightforward.
Make sure you have transformers v4.51.0 or later installed:</p>
<pre><code>pip install -U transformers[hf_xet]
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a>
Release: v4.52.1</li>
<li><a
href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a>
Revert parallelism temporarily (<a
href="https://redirect.github.com/huggingface/transformers/issues/38240">#38240</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a>
Protect ParallelInterface</li>
<li><a
href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a>
Release: v4.52.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a>
[gemma3] fix bidirectional attention mask (<a
href="https://redirect.github.com/huggingface/transformers/issues/38080">#38080</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a>
[mllama] fix loading and inference (<a
href="https://redirect.github.com/huggingface/transformers/issues/38223">#38223</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a>
Add padding-free to bamba (<a
href="https://redirect.github.com/huggingface/transformers/issues/35861">#35861</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a>
Fixing Bitnet after use_rms_norm introduction (<a
href="https://redirect.github.com/huggingface/transformers/issues/38229">#38229</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a>
Enable Quantize KV Cache for Mistral Model (<a
href="https://redirect.github.com/huggingface/transformers/issues/35042">#35042</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a>
parallelism goes brrr (<a
href="https://redirect.github.com/huggingface/transformers/issues/37877">#37877</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.48.0&new-version=4.52.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.2 to 0.12.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.12.3</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Support non-context-manager calls in
<code>B017</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19063">#19063</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>,
<code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>,
<code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>,
<code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>,
<code>PTH120</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19213">#19213</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18922">#18922</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functions in <code>RET504</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18433">#18433</a>)</li>
<li>Treat form feed as valid whitespace before a line continuation (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19220">#19220</a>)</li>
<li>[<code>flake8-type-checking</code>] Fix syntax error introduced by
fix (<code>TC008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19150">#19150</a>)</li>
<li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code>
should suppress the <code>UP008</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19131">#19131</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>[<code>flake8-pyi</code>] Make example error out-of-the-box
(<code>PYI007</code>, <code>PYI008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19103">#19103</a>)</li>
<li>[<code>flake8-simplify</code>] Make example error out-of-the-box
(<code>SIM116</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19111">#19111</a>)</li>
<li>[<code>flake8-type-checking</code>] Make example error
out-of-the-box (<code>TC001</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19151">#19151</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box
(<code>PTH210</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19189">#19189</a>)</li>
<li>[<code>pycodestyle</code>] Make example error out-of-the-box
(<code>E272</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19191">#19191</a>)</li>
<li>[<code>pycodestyle</code>] Make example not raise unnecessary
<code>SyntaxError</code> (<code>E114</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19190">#19190</a>)</li>
<li>[<code>pydoclint</code>] Make example error out-of-the-box
(<code>DOC501</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19218">#19218</a>)</li>
<li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in
examples (<code>PLW1501</code>, <code>UP028</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19127">#19127</a>)</li>
<li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs
and error to suggest proper usage (<code>PLC0207</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18949">#18949</a>)</li>
<li>[<code>flake8-bandit</code>] Make example error out-of-the-box
(<code>S412</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19241">#19241</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li>
<li><a
href="https://github.com/BurntSushi"><code>@​BurntSushi</code></a></li>
<li><a href="https://github.com/Gankra"><code>@​Gankra</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/LaBatata101"><code>@​LaBatata101</code></a></li>
<li><a
href="https://github.com/MatthewMckee4"><code>@​MatthewMckee4</code></a></li>
<li><a
href="https://github.com/MeGaGiGaGon"><code>@​MeGaGiGaGon</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a
href="https://github.com/NamelessGO"><code>@​NamelessGO</code></a></li>
<li><a
href="https://github.com/UnboundVariable"><code>@​UnboundVariable</code></a></li>
<li><a
href="https://github.com/abhijeetbodas2001"><code>@​abhijeetbodas2001</code></a></li>
<li><a href="https://github.com/carljm"><code>@​carljm</code></a></li>
<li><a
href="https://github.com/charliermarsh"><code>@​charliermarsh</code></a></li>
<li><a
href="https://github.com/chirizxc"><code>@​chirizxc</code></a></li>
<li><a
href="https://github.com/danparizher"><code>@​danparizher</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a href="https://github.com/fdosani"><code>@​fdosani</code></a></li>
<li><a
href="https://github.com/github-actions"><code>@​github-actions</code></a></li>
<li><a
href="https://github.com/ibraheemdev"><code>@​ibraheemdev</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.12.3</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Support non-context-manager calls in
<code>B017</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19063">#19063</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>,
<code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>,
<code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>,
<code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>,
<code>PTH120</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19213">#19213</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Add autofixes for
<code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18922">#18922</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functions in <code>RET504</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18433">#18433</a>)</li>
<li>Treat form feed as valid whitespace before a line continuation (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19220">#19220</a>)</li>
<li>[<code>flake8-type-checking</code>] Fix syntax error introduced by
fix (<code>TC008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19150">#19150</a>)</li>
<li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code>
should suppress the <code>UP008</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19131">#19131</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>[<code>flake8-pyi</code>] Make example error out-of-the-box
(<code>PYI007</code>, <code>PYI008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19103">#19103</a>)</li>
<li>[<code>flake8-simplify</code>] Make example error out-of-the-box
(<code>SIM116</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19111">#19111</a>)</li>
<li>[<code>flake8-type-checking</code>] Make example error
out-of-the-box (<code>TC001</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19151">#19151</a>)</li>
<li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box
(<code>PTH210</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19189">#19189</a>)</li>
<li>[<code>pycodestyle</code>] Make example error out-of-the-box
(<code>E272</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19191">#19191</a>)</li>
<li>[<code>pycodestyle</code>] Make example not raise unnecessary
<code>SyntaxError</code> (<code>E114</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19190">#19190</a>)</li>
<li>[<code>pydoclint</code>] Make example error out-of-the-box
(<code>DOC501</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19218">#19218</a>)</li>
<li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in
examples (<code>PLW1501</code>, <code>UP028</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19127">#19127</a>)</li>
<li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs
and error to suggest proper usage (<code>PLC0207</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18949">#18949</a>)</li>
<li>[<code>flake8-bandit</code>] Make example error out-of-the-box
(<code>S412</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19241">#19241</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/astral-sh/ruff/commit/5bc81f26c8a820835067280153a279658477ccf2"><code>5bc81f2</code></a>
Bump 0.12.3 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19279">#19279</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/6908e2682f14792898cb8f9e4d920021da022307"><code>6908e26</code></a>
Filter <code>ruff_linter::VERSION</code> out of SARIF output tests (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19280">#19280</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/25c429556421ddd6f715f5aaf906610e0c564606"><code>25c4295</code></a>
[ty] Avoid stale diagnostics for open files diagnostic mode (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19273">#19273</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/426fa4bb12d8c47185800ba14dd5b4e721fd2c29"><code>426fa4b</code></a>
[ty] Add signature help provider to playground (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19276">#19276</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/b0b65c24ff01dc9095f17b3768cf2b9a336a5a8c"><code>b0b65c2</code></a>
[ty] Initial implementation of signature help provider (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19194">#19194</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/08bc6d25899501d690c37a87d6da51951280dfc5"><code>08bc6d2</code></a>
Add simple integration tests for all output formats (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19265">#19265</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/f2ae12bab33d80d52caa3047775371fca83f6e96"><code>f2ae12b</code></a>
[<code>flake8-return</code>] Fix false-positive for variables used
inside nested functio...</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/965f415212f4f9f3ef855b647d53e892e6913828"><code>965f415</code></a>
[ty] Add a <code>--quiet</code> mode (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19233">#19233</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/83b5bbf004bf2e47dd4ca5c049930894856547f1"><code>83b5bbf</code></a>
Treat form feed as valid whitespace before a line continuation (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19220">#19220</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/87f6f08ef53edc2cbe8632d612f6d4fd016fe2ff"><code>87f6f08</code></a>
[ty] Make <code>check_file</code> a salsa query (<a
href="https://redirect.github.com/astral-sh/ruff/issues/19255">#19255</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.12.2...0.12.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.12.2&new-version=0.12.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description

Update Qnn default version to 2.36.1.250708

Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com>
…ries (microsoft#25365)

### Description
<!-- Describe your changes. -->
Add vendor id to OrtEpFactory. It's easier to get the vendor id than
name on other platforms.
Update the selection policy to prefer match on vendor id with fallback
to vendor name.

Add default ORT logger to CreateEpFactories. 
The OrtEpFactory currently has no way to log informational messages or
issues.
CreateEp is given the session logger for use by the OrtEp instance so
that part of things is good.

Misc cleanups. Make usage of ORT_API2_STATUS and ORT_API_T consistent on
onnxruntime_ep_c_api.h.
See ort_version_supported in some EP factories where it was missed.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Vendor id is easier to match against OrtHardwareDevice when doing auto
EP selection.
OrtEpFactory should have a logger. 
Last chance to cleanup APIs before 1.23 release
- Add common rank range validation to base_op_builder.cc
- Handle specific rank range validation for rest ops
- Remove duplicated input_shape validation
- Fix some typos BTW
microsoft#25401)

### Description
<!-- Describe your changes. -->
Fix some test setups where both EPs being in the same build wasn't
expected.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
SigLIP architecture inside the vision encoder should not use a causal
mask on the attention. This change will fix Phi 4 MM accuracy issues we
have seen.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
1. Add optional output to CPU impl of GQA op for storing attention
scores (QK). Buffer is of shape (B, N, S, T) and can either be fp16 or
fp32, depending on the type of other inputs
2. Add `qk_output` attribute to GQA, which controls if attention scores
should be saved before or after softmax is applied
3. Add unit tests to cover this use case
4. Added asserts on other EPs if this feature is used
…oft#25408)

[QNN-EP] Support GridSample of linear mode for ONNX opset 20+
Current limitation is more than necessary -- only reject when targeting QNN CPU.
### Description
<!-- Describe your changes. -->
Fix vendor and device id conversion from SetupApi info.
Detect Remote Display Adapter and skip. This results in a bogus device
appearing when you're connected to a machine using remote desktop.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description

Bugfix: crash when dim_value is 0



### Motivation and Context

Thanks to @skottmckay who found the bug.
### Description
- Adding test framework and initial batch of test cases for the QNN EP GPU backend.

### Motivation and Context
- To ensure the QNN EP GPU backend does not break as ongoing changes are committed to the other QNN backends mainly.
### Description

Fix cuda build error when DEBUG_GENERATION is defined.

### Motivation and Context

In microsoft#24821, a dumping API
was removed:
`void Print(const char* name, int index, bool end_line)`
But related code is not updated.

In MatMulNBits, there is a recent change to add bfloat16 support, but
the tensor dumper only support BFloat16 but not __nv_bfloat16. This PR
adds functions to support __nv_bfloat16 in cuda tensor dumper.
### Description
This commit applies WGSL template to `MatMulNBitsWideTile` to improve
code readability and enables more flexible data handling.

As part of this change, support for 4-bit and 8-bit shaders has been
consolidated, and a common `CEIL_DIV` utility has been introduced. The
previous `ShaderUsage::UseUniform` and
`ShaderUsage::UseIndicesTypeAlias` flags are no longer necessary and
have been removed.

### Motivation and Context
See above
1. Update the docker images to install system updates(per vulnerability
management requirements)
2. Disable DNNL pipelines since 
     a. There was no active development.
     b. The code is incompatible with CMake 4.x. 
3. Disable migraphx pipeline due to license issues(conda is not free
unless you only use conda-forge packages).
4. Change all UBI8 based images to use AlmaLinux8.

I will make the base images public. They are under internal review.
Fixes Error:

Could not find com.qualcomm.qti:qnn-runtime:2.36.1
The nuget packaging pipeline fails with 
Could not find com.qualcomm.qti:qnn-runtime:2.36.1


https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=866702&view=results
…evice id. (microsoft#25427)

### Description
<!-- Describe your changes. -->
Restore ability to handle "VEN_QCOM" from an ACPI entry.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k July 17, 2025 07:49
@ankitm3k ankitm3k merged commit acb29b6 into ovep-develop Jul 17, 2025
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_17_7_25 branch July 17, 2025 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.