Skip to content

Conversation

@jatinwadhwa921
Copy link

Backmerging with Msft commits

jiafatom and others added 30 commits May 2, 2025 21:38
### Description
Integrate some neural compressor code since the ORT side in the repo is
in maintenance mode.



### Motivation and Context
Enable k-quant quantization.
### Description
<!-- Describe your changes. -->
Add initial selection policy implementations.

Update device discovery
- get vendor and vendor id for CPU from cpuid_info
- trim metadata to known useful fields
- NPU detection via dxcore only Bug fixes/updates from PRs for C# and
python bindings

Add some tests for selection policy
- TODO: Add more tests

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Desire to boil oceans.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
C# API updates for auto ep selection and the compilation API.
Also includes bugfix to OrtKeyValuePairs::Remove.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
As titled.



### Motivation and Context
Dependency no need.
…icrosoft#24629)

### Description
- Enables automatic selection of QNN EP for PREFER_NPU policy
- Fixes cpuid vendor id for Qualcomm to be `'Q' | ('C' << 8) | ('O' <<
16) | ('M' << 24);`

Sample code from unit test:
```c++
// Tests autoEP feature to automatically select an EP that supports the NPU.
// Currently only works on Windows.
TEST_F(QnnHTPBackendTests, AutoEp_PreferNpu) {
  ASSERT_ORTSTATUS_OK(Ort::GetApi().RegisterExecutionProviderLibrary(*ort_env, kQnnExecutionProvider,
                                                                     ORT_TSTR("onnxruntime_providers_qnn.dll")));

  Ort::SessionOptions so;
  so.SetEpSelectionPolicy(OrtExecutionProviderDevicePolicy_PREFER_NPU);

  const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "nhwc_resize_sizes_opset18.quant.onnx";
  Ort::Session session(*ort_env, ort_model_path, so);
  EXPECT_TRUE(SessionHasEp(session, kQnnExecutionProvider));

  ASSERT_ORTSTATUS_OK(Ort::GetApi().UnregisterExecutionProviderLibrary(*ort_env, kQnnExecutionProvider));
}
```

### Motivation and Context
A recent feature allows ORT to automatically select an EP according to
policies set by the user (e.g., prefer npu or prefer gpu). This PR
allows QNN EP to be potentially selected when the user sets the
`PREFER_NPU` policy.
…the same type… (microsoft#24633)

### Description
<!-- Describe your changes. -->
Fix debug assertion when there are two devices of the same type that
don't match the vendor. e.g. WebGPU and DML.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
When under wasm we can't check for metal by looking at backend because
it will always be WEBGPU.
Because of this we'll take the DP4A path on metal that results in
sub-optimal performance.
Use vendor to check for metal instead.
…microsoft#24640)

### Description
enable use_vcpkg for QNN Nuget package build and Python arm64ec build
### Description
Python API updates for auto ep selection and the compilation API.
- Adds Python API `SessionOptions.add_provider()` (equivalent to C API's
`SessionOptionsAppendExecutionProvider`)
- Adds Python API `SessionOptions.add_provider_for_devices()`
(equivalent to C API's `SessionOptionsAppendExecutionProvider_V2`)
- Adds Python API `SessionOptions.set_provider_selection_policy()`
(equivalent to C API's `SessionOptionsSetEpSelectionPolicy`)
- Adds Python API class `ModelCompiler` to compile models (wraps C API's
`OrtModelCompilationOptions` and `CompileModel()`)
- TODO: Finish delegate callback. Need to add a `void*` parameter to
delegate function.

### Sample program that uses autoep APIs
Adapted from a unit test.
```python
    def test_cuda_prefer_gpu_and_inference(self):
        """
        Test selecting CUDA EP via the PREFER_GPU policy and running inference.
        """
        ep_lib_path = "onnxruntime_providers_cuda.dll"
        ep_registration_name = "CUDAExecutionProvider"

        if sys.platform != "win32":
            self.skipTest("Skipping test because device discovery is only supported on Windows")

        if not os.path.exists(ep_lib_path):
            self.skipTest(f"Skipping test because EP library '{ep_lib_path}' cannot be found")

        onnxrt.register_execution_provider_library(ep_registration_name, os.path.realpath(ep_lib_path))

        # Set a policy to prefer GPU. Cuda should be selected.
        sess_options = onnxrt.SessionOptions()
        sess_options.set_provider_selection_policy(onnxrt.OrtExecutionProviderDevicePolicy.PREFER_GPU)
        self.assertTrue(sess_options.has_providers())

        # Run sample model and check output
        sess = onnxrt.InferenceSession(get_name("mul_1.onnx"), sess_options=sess_options)

        x = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=np.float32)
        input_name = sess.get_inputs()[0].name
        res = sess.run([], {input_name: x})
        output_expected = np.array([[1.0, 4.0], [9.0, 16.0], [25.0, 36.0]], dtype=np.float32)
        np.testing.assert_allclose(output_expected, res[0], rtol=1e-05, atol=1e-08)
```
### Sample program that uses compile APIs
Adapted from a unit test that compiles using EP selection policy.
```python
     def test_compile_with_files_prefer_npu_policy(self):
        """
        Tests compiling a model (to/from files) using an EP selection policy (PREFER_NPU).
        """
        ep_lib_path = "onnxruntime_providers_qnn.dll"
        ep_registration_name = "QNNExecutionProvider"
        onnxrt.register_execution_provider_library(ep_registration_name, ep_lib_path)

        input_model_path = get_name("nhwc_resize_scales_opset18.onnx")
        output_model_path = os.path.join(self._tmp_dir_path, "model.compiled0.onnx")

        session_options = onnxrt.SessionOptions()
        session_options.set_provider_selection_policy(onnxrt.OrtExecutionProviderDevicePolicy.PREFER_NPU)

        model_compiler = onnxrt.ModelCompiler(
            session_options,
            input_model_path,
            embed_compiled_data_into_model=True,
            external_initializers_file_path=None,
        )
        model_compiler.compile_to_file(output_model_path)
        self.assertTrue(os.path.exists(output_model_path))
        onnxrt.unregister_execution_provider_library(ep_registration_name)
```

Adapted from a unit test that compiles using explicit EPs.
```python
    def test_compile_with_input_and_output_files(self):
        """
        Tests compiling a model (to/from files) using explicit EP.
        """
        provider = None
        provider_options = dict()
        if "QNNExecutionProvider" in available_providers:
            provider = "QNNExecutionProvider"
            provider_options["backend_type"] = "htp"
        # TODO(adrianlizarraga): Allow test to run for other compiling EPs (e.g., OpenVINO)

        input_model_path = get_name("nhwc_resize_scales_opset18.onnx")
        output_model_path = os.path.join(self._tmp_dir_path, "model.compiled1.onnx")

        session_options = onnxrt.SessionOptions()
        if provider:
            session_options.add_provider(provider, provider_options)

        model_compiler = onnxrt.ModelCompiler(
            session_options,
            input_model_path,
            embed_compiled_data_into_model=True,
            external_initializers_file_path=None,
        )
        model_compiler.compile_to_file(output_model_path)
        self.assertTrue(os.path.exists(output_model_path))
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…ite-default (microsoft#24607)

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite)
from 6.2.6 to 6.3.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/releases">vite's
releases</a>.</em></p>
<blockquote>
<h2>v6.3.4</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.4/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.3</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.3/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.2</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.2/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>create-vite@6.3.1</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/create-vite@6.3.1/packages/create-vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.1</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.1/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>create-vite@6.3.0</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/create-vite@6.3.0/packages/create-vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.0</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.0/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.0-beta.2</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.0-beta.2/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.0-beta.1</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.0-beta.1/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.3.0-beta.0</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.3.0-beta.0/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.2.7</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.2.7/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md">vite's
changelog</a>.</em></p>
<blockquote>
<h2><!-- raw HTML omitted -->6.3.4 (2025-04-30)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: check static serve file inside sirv (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19965">#19965</a>)
(<a
href="https://github.com/vitejs/vite/commit/c22c43de612eebb6c182dd67850c24e4fab8cacb">c22c43d</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19965">#19965</a></li>
<li>fix(optimizer): return plain object when using <code>require</code>
to import externals in optimized dependenci (<a
href="https://github.com/vitejs/vite/commit/efc5eab253419fde0a6a48b8d2f233063d6a9643">efc5eab</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19940">#19940</a></li>
<li>refactor: remove duplicate plugin context type (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19935">#19935</a>)
(<a
href="https://github.com/vitejs/vite/commit/d6d01c2292fa4f9603e05b95d81c8724314c20e0">d6d01c2</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19935">#19935</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.3.3 (2025-04-24)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: ignore malformed uris in tranform middleware (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19853">#19853</a>)
(<a
href="https://github.com/vitejs/vite/commit/e4d520141bcd83ad61f16767348b4a813bf9340a">e4d5201</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19853">#19853</a></li>
<li>fix(assets): ensure ?no-inline is not included in the asset url in
the production environment (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/1949">#1949</a>
(<a
href="https://github.com/vitejs/vite/commit/16a73c05d35daa34117a173784895546212db5f4">16a73c0</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19496">#19496</a></li>
<li>fix(css): resolve relative imports in sass properly on Windows (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19920">#19920</a>)
(<a
href="https://github.com/vitejs/vite/commit/ffab44270488f54ae344801024474b597249071b">ffab442</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19920">#19920</a></li>
<li>fix(deps): update all non-major dependencies (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19899">#19899</a>)
(<a
href="https://github.com/vitejs/vite/commit/a4b500ef9ccc9b19a2882156a9ba8397e69bc6b2">a4b500e</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19899">#19899</a></li>
<li>fix(ssr): fix execution order of re-export (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19841">#19841</a>)
(<a
href="https://github.com/vitejs/vite/commit/ed29dee2eb2e3573b2bc337e1a9124c65222a1e5">ed29dee</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19841">#19841</a></li>
<li>fix(ssr): fix live binding of default export declaration and hoist
exports getter (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19842">#19842</a>)
(<a
href="https://github.com/vitejs/vite/commit/80a91ff82426a4c88d54b9f5ec9a4205cb13899b">80a91ff</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19842">#19842</a></li>
<li>perf: skip sourcemap generation for renderChunk hook of
import-analysis-build plugin (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19921">#19921</a>)
(<a
href="https://github.com/vitejs/vite/commit/55cfd04b10f98cde7a96814a69b9813543ea79c2">55cfd04</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19921">#19921</a></li>
<li>test(ssr): test <code>ssrTransform</code> re-export deps and test
stacktrace with first line (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19629">#19629</a>)
(<a
href="https://github.com/vitejs/vite/commit/9399cdaf8c3b2efd5f4015d57dc3b0e4e5b91a9d">9399cda</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19629">#19629</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.3.2 (2025-04-18)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: match default asserts case insensitive (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19852">#19852</a>)
(<a
href="https://github.com/vitejs/vite/commit/cbdab1d6a30e07263ec51b2ca042369e736adec6">cbdab1d</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19852">#19852</a></li>
<li>fix: open first url if host does not match any urls (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19886">#19886</a>)
(<a
href="https://github.com/vitejs/vite/commit/6abbdce3d77990409e12380e72c7ec9dd3f8bec5">6abbdce</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19886">#19886</a></li>
<li>fix(css): respect <code>css.lightningcss</code> option in css
minification process (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19879">#19879</a>)
(<a
href="https://github.com/vitejs/vite/commit/b5055e0dd4c0e084115c3dbfead5736a54807e0c">b5055e0</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19879">#19879</a></li>
<li>fix(deps): update all non-major dependencies (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19698">#19698</a>)
(<a
href="https://github.com/vitejs/vite/commit/bab4cb92248adf6b9b18df12b2bf03889b0bd1eb">bab4cb9</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19698">#19698</a></li>
<li>feat(css): improve lightningcss messages (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19880">#19880</a>)
(<a
href="https://github.com/vitejs/vite/commit/c713f79b5a4bd98542d8dbe4c85ba4cce9b1f358">c713f79</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19880">#19880</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.3.1 (2025-04-17)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: avoid using <code>Promise.allSettled</code> in preload function
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19805">#19805</a>)
(<a
href="https://github.com/vitejs/vite/commit/35c7f35e2b67f2158ededf2af58ecec53b3f16c5">35c7f35</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19805">#19805</a></li>
<li>fix: backward compat for internal plugin <code>transform</code>
calls (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19878">#19878</a>)
(<a
href="https://github.com/vitejs/vite/commit/a152b7cbac72e05668f8fc23074d531ecebb77a5">a152b7c</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19878">#19878</a></li>
</ul>
<h2>6.3.0 (2025-04-16)</h2>
<ul>
<li>fix(hmr): avoid infinite loop happening with
<code>hot.invalidate</code> in circular deps (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19870">#19870</a>)
(<a
href="https://github.com/vitejs/vite/commit/d4ee5e8655a85f4d6bebc695b063d69406ab53ac">d4ee5e8</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19870">#19870</a></li>
<li>fix(preview): use host url to open browser (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19836">#19836</a>)
(<a
href="https://github.com/vitejs/vite/commit/50034340401b4043bb0b158f18ffb7ae1b7f5c86">5003434</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19836">#19836</a></li>
</ul>
<h2>6.3.0-beta.2 (2025-04-11)</h2>
<ul>
<li>fix: addWatchFile doesn't work if base is specified (fixes <a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19792">#19792</a>)
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19794">#19794</a>)
(<a
href="https://github.com/vitejs/vite/commit/8bed1de5710f2a097af0e22a196545446d98f988">8bed1de</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19792">#19792</a>
<a
href="https://redirect.github.com/vitejs/vite/issues/19794">#19794</a></li>
<li>fix: correct the behavior when multiple transform filter options are
specified (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19818">#19818</a>)
(<a
href="https://github.com/vitejs/vite/commit/7200deec91a501fb84734e23906f80808734540c">7200dee</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19818">#19818</a></li>
<li>fix: fs check with svg and relative paths (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>)
(<a
href="https://github.com/vitejs/vite/commit/62d7e81ee189d65899bb65f3263ddbd85247b647">62d7e81</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19782">#19782</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/vitejs/vite/commit/b040d547a17c4bfe8aba44534228667a50612318"><code>b040d54</code></a>
release: v6.3.4</li>
<li><a
href="https://github.com/vitejs/vite/commit/c22c43de612eebb6c182dd67850c24e4fab8cacb"><code>c22c43d</code></a>
fix: check static serve file inside sirv (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19965">#19965</a>)</li>
<li><a
href="https://github.com/vitejs/vite/commit/efc5eab253419fde0a6a48b8d2f233063d6a9643"><code>efc5eab</code></a>
fix(optimizer): return plain object when using <code>require</code> to
import externals ...</li>
<li><a
href="https://github.com/vitejs/vite/commit/d6d01c2292fa4f9603e05b95d81c8724314c20e0"><code>d6d01c2</code></a>
refactor: remove duplicate plugin context type (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19935">#19935</a>)</li>
<li><a
href="https://github.com/vitejs/vite/commit/db9eb97b2f530a3985b29c5d1a529772f1ab1893"><code>db9eb97</code></a>
release: v6.3.3</li>
<li><a
href="https://github.com/vitejs/vite/commit/e4d520141bcd83ad61f16767348b4a813bf9340a"><code>e4d5201</code></a>
fix: ignore malformed uris in tranform middleware (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19853">#19853</a>)</li>
<li><a
href="https://github.com/vitejs/vite/commit/55cfd04b10f98cde7a96814a69b9813543ea79c2"><code>55cfd04</code></a>
perf: skip sourcemap generation for renderChunk hook of
import-analysis-build...</li>
<li><a
href="https://github.com/vitejs/vite/commit/ffab44270488f54ae344801024474b597249071b"><code>ffab442</code></a>
fix(css): resolve relative imports in sass properly on Windows (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19920">#19920</a>)</li>
<li><a
href="https://github.com/vitejs/vite/commit/16a73c05d35daa34117a173784895546212db5f4"><code>16a73c0</code></a>
fix(assets): ensure ?no-inline is not included in the asset url in the
produc...</li>
<li><a
href="https://github.com/vitejs/vite/commit/9399cdaf8c3b2efd5f4015d57dc3b0e4e5b91a9d"><code>9399cda</code></a>
test(ssr): test <code>ssrTransform</code> re-export deps and test
stacktrace with first ...</li>
<li>Additional commits viewable in <a
href="https://github.com/vitejs/vite/commits/v6.3.4/packages/vite">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=6.2.6&new-version=6.3.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ft#24641)

### Description

This PR adds a check for the package version for dev channel.

This PR should be able to help avoid publishing packages like "-rc.*" to
dev channel automatically.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add support for selection policy delegate
- split API function into one for the policy enum and one for the
delegate
- add `void*` for user state
  - required to wire up using the delegate in other languages.

Add C# support for specifying the selection policy delegate.

Address comments from initial C# autoep support PR.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR adds the support for 8-bit quantization in the `MatMulNBits`
operation in WebGPU.

It does below things:
1. Unify to use `MatMulNBitsProgram` as the fallback path which is the
original generation path for block size = 32. Now make it support any
blocks size without limitations. And remove the original complicated
programs.
2. Enable `MatMulNBitsWideTileProgram` for all platforms.
### Description
If indices is a scalar(0 dimensional tensor) , gather OP produces
incorrect output shape.
Fix the gather op bug in VSINPU EP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: Kee <xuke537@hotmail.com>
### Description
<!-- Describe your changes. -->
Fix type mismatch using float in place of unsigned int.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
fix shader compile; don't know how this made it past ci
…24645)

### Description
Python Cuda Publishing pipeline references old test pipeline
### Description

The random failure on Web CI is hard to investigate because it's not
reproducible. Add this step to upload the log to help investigate the
issue.
…ft#24650)

### Description
Fix the outputSize computation causing duplicate indices. The outputSize
should be the size of indices tensor without counting the last
dimension.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the issue microsoft#24070
### Description
<!-- Describe your changes. -->

header file "dawn/dawn_proc.h" is only used in a non-monolithic build of
dawn.
The patch optimizes pool operators when output size is small and kernel
size is big

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…crosoft#24634)

### Description
Follow up to microsoft#24614

Example Python program (adapted from unit tests) that specifies a custom
EP selection function to select a OrtEpDevice(s) for compiling:
```python
    def test_compile_with_ep_selection_delegate(self):
        # ...
        # User's custom EP selection function.
        def my_delegate(
            ep_devices: Sequence[onnxrt.OrtEpDevice],
            model_metadata: dict[str, str],
            runtime_metadata: dict[str, str],
            max_selections: int,
        ) -> Sequence[onnxrt.OrtEpDevice]:
            self.assertTrue(len(model_metadata) > 0)
            self.assertTrue(ep_devices and max_selections > 0)

            # Select the first and last devices (if there are more than one)
            selected_devices = [ep_devices[0]]
            if max_selections > 2 and len(ep_devices) > 1:
                selected_devices.append(ep_devices[-1])  # ORT CPU EP is always last

            return selected_devices

        session_options = onnxrt.SessionOptions()
        session_options.set_provider_selection_policy_delegate(my_delegate)

        model_compiler = onnxrt.ModelCompiler(
            session_options,
            input_model_path,
            embed_compiled_data_into_model=True,
            external_initializers_file_path=None,
        )
        model_compiler.compile_to_file(output_model_path)
```

How to raise an exception from the Python EP selection function:
```python
        # User's custom EP selection function.
        custom_error_message = "MY ERROR"

        def my_delegate_that_fails(
            ep_devices: Sequence[onnxrt.OrtEpDevice],
            model_metadata: dict[str, str],
            runtime_metadata: dict[str, str],
            max_selections: int,
        ) -> Sequence[onnxrt.OrtEpDevice]:
            self.assertTrue(len(ep_devices) >= 1)
            raise ValueError(custom_error_message)

        sess_options = onnxrt.SessionOptions()
        sess_options.set_provider_selection_policy_delegate(my_delegate_that_fails)

        # Create session and expect ORT to raise a Fail exception that contains our message.
        with self.assertRaises(Fail) as context:
            onnxrt.InferenceSession(get_name("mul_1.onnx"), sess_options=sess_options)
        self.assertIn(custom_error_message, str(context.exception))
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…e APIs (microsoft#24661)

### Description
Fixes documentation errors in comments within onnxruntime_c_api.h and
onnxruntime__cxx_api.h.



### Motivation and Context
The [Generate C/C++ API
docs](https://github.com/microsoft/onnxruntime/actions/runs/14855108283/job/41706460753#logs)
action is failing with error:

```shell
Run mkdir -p build/doxygen
/mnt/vss/_work/onnxruntime/onnxruntime/include/onnxruntime/core/session/onnxruntime_cxx_api.h:775: error: explicit link request to 'OrtKeyValuePair' could not be resolved (warning treated as error, aborting now)
```
### Description
Added ScatterND operator to Native WebGPU EP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Required to increase coverage.
…icrosoft#24666)

### Description
<!-- Describe your changes. -->
Handle user selection policy delegate throwing or returning too many
selections in C# code and create error message.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
… unit tests (microsoft#24667)

### Description
Cleans up the usage of `ep_name` and `ep_registration_name` in the
autoEP Python unit tests.



### Motivation and Context
Addresses comments from a previous PR:
microsoft#24634
> nit: the registration name and EP names don't need to match. could we
call this 'ep_name' to avoid potentially creating an assumption that
they always do?
- Use ResizeNearestNeighbor Op for Resize with interpolation_mode=Nearest and rank-4 inputs.
 - Add a Unit test to verify the modified translation.

### Description
ResizeNearestNeighbor Op is faster for Resize with interpolation_mode=Nearest and rank-4 inputs.



### Motivation and Context
This commit matches Resize Op behavior in QNN-EP with QNN Offline converter path. This fix also improves inference time.
clementperon and others added 26 commits May 13, 2025 17:51
### Description
Build are not reproducible, remove information that contains local
information from the build



### Motivation and Context
Reproducible build is important to ensure package is reliable

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Clément Péron <peron.clem@gmail.com>
Co-authored-by: Andrew Davis <afd@ti.com>
### Description
Publish Windows debug symbols to not only Azure DevOps but also
msdl.microsoft.com . See
https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/symbol-path
for how to consume it.
### Description
<!-- Describe your changes. -->
Fix typos in multiple files


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Expose and test GetTensorSizeInBytes in C# and Python

### Motivation and Context

microsoft#24680
### Description

See the [example
repository](https://github.com/jordanozang/onnxruntime_minimal_static)
for a minimal example of using the static CMake Config.
- Add libraries built during the static onnxruntime build
(onnxruntime_common, onnxruntime_mlas, etc.) to the onnxruntime export
set. Additionally, add an onnxruntime::onnxruntime interface target that
behaves much the same as that target in the shared build case.
``find_package(onnxruntime REQUIRED)``
``target_link_libraries(example PRIVATE onnxruntime::onnxruntime)``
should now work.
- Minor modifications to ensure that dependency targets like Boost::mp11
are treated as imported targets and not part of the build interface.
- Static webgpu builds will currently not generate this CMake export




### Motivation and Context
- Resolves Issue microsoft#21351
- Builds on Pull Request microsoft#21348
WebNN doesn't provide a dedicated op for `MatMulInteger`, this PR
supports `MatMulInteger` by decomposing it into `DequantizeLinear A, B
-> MatMul -> Cast (to int32)` and makes some code optimization BTW.
…24753)

### Description
<!-- Describe your changes. -->
Some CPUs don't show up in SetupApi info for some reason.

Create default entry if that is the case.

Manually tested by disabling the lookup of GUID_DEVCLASS_PROCESSOR info.
Not sure of a better way to test.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix crash as other code assumes there will always be CPU device.
This PR is a follow-up of microsoft#24547 . Previously the pipelines had some
issues that prevented me to modify these files. Now the issue is solved.
The
[RotaryEmbedding](https://onnx.ai/onnx/operators/onnx__RotaryEmbedding.html#rotaryembedding)
op has been released in opset 23 and has some differences compared to
the original contributed op:
- The order of input indexes changed
- The position_ids input is optional
- If the input is 3D, the num_heads must be provided
- If it is full rotation, we need to slice the gathered cosine/sine to
get the shape [batch_size, sequence_length, rotary_embedding_dim / 2]
### Description
<!-- Describe your changes. -->
Fix typos in multiple files


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: co63oc <co63oc@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Fix typos in bert_defs.cc and contrib_defs.cc


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: co63oc <co63oc@users.noreply.github.com>
### Description
Use aligned load and preloading. There is ~10% token generation speed
up.

### Motivation and Context
Optimize perf
- Currently QNN EP only supports MaxPool for rank 4.
- This change adds support for rank 3 input by adding Reshapes before and after the Op to ensure that the MaxPool gets input rank 4.
- Updated all attributes if converting rank 3 input to rank 4 by updating stride, pads, dilations and kernel size.
- Added unit tests which takes input rank 3 to validate MaxPool on NPU.

### Description
This change extends the support of QNN EP's MaxPool operation to handle input tensors of rank 3. To achieve this, Reshape Ops are added before and after the MaxPool Op to ensure that the input to MaxPool is always of rank 4, as required. Additionally, the attributes such as stride, pads, dilations, and kernel size are updated accordingly to accommodate the conversion of rank 3 inputs to rank 4. Unit tests have been added to validate the functionality of MaxPool on NPU with rank 3 inputs.

### Motivation and Context
This change is required to enhance the flexibility and usability of QNN EP's MaxPool operation by supporting a broader range of input tensor ranks. Previously, the operation was limited to only supporting rank 4 inputs, which restricted the support in certain scenarios. By adding support for rank 3 inputs, this change solves the problem of limited compatibility and enhances the overall functionality and makes sure that the MaxPool op offloads to the NPU (QNN HTP Backend)
Remove onnxruntime-mlas section
The `symbolFolder` parameter in publish-symbolrequestprod-api.yml
actually was an unused-parameter. The yaml was copied from another
project and I didn't check the code.
Delete
tools/ci_build/github/azure-pipelines/templates/py-packaging-training-cuda-stage.yml.
…iven. (microsoft#24781)

### Description
<!-- Describe your changes. -->

Always write to profiling file if `profiling_file_path` is given.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Previously, on Windows, if the ETW path is enabled, the profiling data
will not be written to the file even if `profiling_file_path` is given.
I thought that this behavior was confusing.
…ft#24779)

### Description
<!-- Describe your changes. -->
Adds MultiHeadAttention operator support to MIGraphX EP to leverage the
existing MIGraphX parser and Implimentation


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Needed for Model enablement
### Description
<!-- Describe your changes. -->
Adds enablement for MIGraphX EP to use MIGraphX's QuickGelu parser and
op


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required for model support
### Description
Memtype Memhandle is applicable only for Graph IO tensors. For other tensors we can leave it as RAW


### Motivation and Context
Compose failed for some models as Memtype is set as MemHandle for static tensors.
…soft#24752)

Enable MaxPool Op with "auto_pad" param set as VALID.
VALID runs with all pad values set to 0.

### Description
Remove the assert from QNN_EP for MaxPool Op with "auto_pad" as VALID
since the Op with this config is supported on QNN backend.



### Motivation and Context
QNN_EP rejects MaxPool Op with "auto_pad" as VALID with message the QNN Pool does not support this config.
QNN Pool Op supports auto_pad=VALID and all the pad values are set to 0.

Signed-off-by: quic-ankus <quic_ankus@quicinc.com>
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k May 16, 2025 06:19
@ankitm3k ankitm3k merged commit 080f66b into ovep-develop May 16, 2025
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_16_5_25 branch May 16, 2025 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.