forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 57
Sync with Microsoft ONNX Runtime - 28/08/2025 #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description Fixed the macro `ORT_API_CALL` by replacing `_stdcall` with `__stdcall` ### Motivation and Context Recently, I found an issue that prevents ONNX Runtime from being built using the MinGW toolchain on Windows. After investigating, I discovered that the ONNX Runtime C API header contains a typo in the `ORT_API_CALL` preprocessor macro. It is incorrectly defined as `_stdcall` instead of the correct `__stdcall` (with two leading underscores). This causes build failures on compilers like MinGW that are strict about this syntax.
…ft#25782) ### Description allow custom CMAKE_C_STANDARD and CMAKE_CXX_STANDARD Fixes microsoft#25756 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description This change adds support for Q2 quantized matmulnbits, in webgpu. ### Motivation and Context An alternate way to support bitnets is through adding support for lower bits in matmulnbits, this reuses our shaders and is more maintainable than a separate op. The model size grows a bit however for a 2B parameter model using 1.58bpw vs 2bpw the size difference is just 100MB. The simpler dequantization also improves perf, on an Intel XE matmul looks to be 20% faster using q2 weights vs q4 weights for the same matrix dimensions. Q2 version of the bitnet model is here https://huggingface.co/sushraja/bitnet-b1.58-2B-4T-fp16-onnx/tree/main/bitnet_q2
In the flash attention algorithm, each thread in a subgroup needs to
access the same range (0-15) of data in workgroup memory `q_tile` and
`v_tile`. If we use `subgroupShuffle`, there will be bank conflicts for
`var k_local = k_tile[capped_sg_id][i];` since the sg_size is 32 and
thread16~thread31 are accessing the same bank address.
To avoid the bank conflicts, we can directly access the same address in
workgroup memory by all threads which is a broadcast and well optimized
in the NV GPUs. See ~10% improvement for phi4 prefill (1K) in NV RTX
2000 Ada. And as the input gets longer(total_sequence_length), the
optimization effect gets better (~12% for 2K).
Before
```
Batch size: 1, prompt tokens: 1000, tokens to generate: 128
Prompt processing (time to first token):
avg (us): 2.0991e+06
avg (tokens/s): 476.394
p50 (us): 2.08457e+06
stddev (us): 36140.3
n: 5 * 1000 token(s)
Token generation:
avg (us): 25477.8
avg (tokens/s): 39.2498
p50 (us): 25028.2
stddev (us): 4841.89
n: 635 * 1 token(s)
```
After
```
Batch size: 1, prompt tokens: 1000, tokens to generate: 128
Prompt processing (time to first token):
avg (us): 1.91138e+06
avg (tokens/s): 523.183
p50 (us): 1.92379e+06
stddev (us): 44768
n: 5 * 1000 token(s)
Token generation:
avg (us): 25237.2
avg (tokens/s): 39.624
p50 (us): 24860.9
stddev (us): 4874.52
n: 635 * 1 token(s)
```
Quoting cppreference.com: ``` (the [[noreturn]] attribute) Indicates that the function will not return control flow to the calling function after it finishes (e.g. functions that terminate the application, throw exceptions, loop indefinitely, etc.). This attribute applies to the name of the function being declared in function declarations only. If a function previously declared with `[[noreturn]]` is invoked and that invocation eventually returns, the behavior is runtime-undefined. ``` The `SafeIntOn*` member functions immediately throw, so if they are used in a function with non-void return type, g++ 14 issues a warning that there exist control paths in the function where no value is returned. Fix this by marking the member functions explicitly noreturn. This is needed so onnxruntime builds correctly with `-Wall -Wextra`.
…icrosoft#25832) This PR addresses accessibility issues with focus indicators on the ONNX Runtime website documentation where contrast ratios were insufficient for keyboard navigation users. The accessibility audit revealed that focus states for key navigation elements like "Learn more about ONNX Runtime & Generative AI", "Quickstart", "Tutorials", "Install ONNX Runtime", and "Hardware Acceleration" had contrast ratios as low as 1.152:1, well below the WCAG 2.1 AA requirement of 3:1 for UI components. ## Changes Made ### 1. Enhanced List Group Item Focus Contrast - **Before**: `color: #555` on `background-color: #f5f5f5` (6.8:1 ratio) - **After**: `color: #333` on `background-color: #f5f5f5` (**11.6:1 ratio**) ### 2. Improved Info List Group Item Focus Contrast - **Before**: `color: #31708f` on `background-color: #c4e3f3` (4.1:1 ratio) - **After**: `color: #1e4a5f` on `background-color: #c4e3f3` (**7.1:1 ratio**) ### 3. Added Visible Focus Indicators for Form Inputs Previously, search and filter inputs only removed the default outline (`outline: 0`) without providing alternative focus indicators, making them inaccessible to keyboard users. - **Added**: `border: 2px solid #0050C5` and `background-color: #f8f9fa` on focus - **Contrast ratio**: **6.7:1** (exceeds requirements) ## Accessibility Compliance All changes now exceed WCAG 2.1 AA standards: - ✅ **3:1 minimum** for UI components and focus indicators - ✅ **4.5:1 minimum** for normal text (all exceed 7:1) - ✅ **Keyboard navigation** fully supported with visible focus indicators - ✅ **Screen reader compatibility** improved with clear focus states ## Impact - Low vision users can now clearly see focused elements during keyboard navigation - All mentioned navigation elements meet accessibility standards - No functionality broken - purely visual accessibility enhancements - Compliance with MAS 1.4.11 Non-text Contrast requirements ## Files Modified - `csharp/ApiDocs/_exported_templates/default/styles/docfx.css` - Enhanced input focus indicators - `csharp/ApiDocs/_exported_templates/default/styles/docfx.vendor.css` - Improved text contrast ratios Fixes microsoft#24995. <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MaanavD <24942306+MaanavD@users.noreply.github.com>
…ft#25819) The DocFX tab controls on onnxruntime.ai were not accessible via keyboard navigation, violating MAS 2.1.1 keyboard accessibility requirements. Users could not navigate between language tabs (Python, C#, Java, JavaScript, C++) using keyboard-only input. ## Problem The existing implementation in `docfx.js` only handled mouse click events but lacked keyboard event handlers. This prevented keyboard users from: - Navigating between tabs using arrow keys - Activating tabs using Enter/Space keys - Jumping to first/last tabs using Home/End keys ## Solution Added comprehensive keyboard navigation support following the WAI-ARIA tabs design pattern: ```javascript // Added keyboard event listener alongside existing click handler container.addEventListener('keydown', function (event) { return handleKeyDown(event, state); }); ``` The `handleKeyDown` function implements: - **Arrow key navigation**: Left/Right and Up/Down keys move focus between tabs with wrapping - **Tab activation**: Enter and Space keys activate the focused tab - **Quick navigation**: Home/End keys jump to first/last tabs - **Proper focus management**: Only the active tab has `tabIndex="0"`, others have `tabIndex="-1"` - **Event handling**: `preventDefault()` and `stopPropagation()` for handled keys ## Accessibility Features - Follows WAI-ARIA tabs pattern specifications - Maintains proper ARIA attributes (`role="tab"`, `aria-selected`, etc.) - Provides visual focus indicators via existing CSS - Supports both horizontal and vertical arrow key navigation - Implements circular navigation (wrapping at boundaries) ## Testing Validated functionality with comprehensive keyboard navigation tests: - ✅ Arrow keys navigate between tabs with proper wrapping - ✅ Enter/Space keys activate focused tabs and switch content panels - ✅ Home/End keys jump to first/last tabs correctly - ✅ Focus management works with proper `tabIndex` handling - ✅ Visual feedback shows focused vs selected tab states This ensures keyboard users can fully access all tab functionality without requiring mouse interaction. Fixes microsoft#24997. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MaanavD <24942306+MaanavD@users.noreply.github.com>
This pull request introduces several improvements and refactorings to the quantized Mixture-of-Experts (QMoE) operator in ONNX Runtime, focusing on enhanced support for FP32 mode, improved SwiGLU activation handling, and better test coverage. The most important changes are grouped below by theme. ### Operator Registration and Type Support - Added explicit registration and support for `QMoE` operator with both `MLFloat16` and `float` data types, enabling FP32 (non-quantized) mode in addition to quantized modes. This includes updates to kernel registration and schema/type constraints. [[1]](diffhunk://#diff-fd949b2a9885f634c37c2048da9e35d227ed20adf1d7baf5de488f304a78bde9L109-R110) [[2]](diffhunk://#diff-fd949b2a9885f634c37c2048da9e35d227ed20adf1d7baf5de488f304a78bde9L275-R277) [[3]](diffhunk://#diff-81f57d9adc2cce94f85a2949a895b7ff82efcc13d05e23ee6567661f0fecb7c0L1467-R1467) [[4]](diffhunk://#diff-81f57d9adc2cce94f85a2949a895b7ff82efcc13d05e23ee6567661f0fecb7c0L1548-R1548) ### SwiGLU Activation Improvements - Refactored `ApplySwiGLUActivation` to accept configurable `activation_alpha` and `activation_beta` parameters, matching CUDA behavior and allowing flexibility in activation function tuning. Also, dropped support for non-interleaved memory layouts (now not implemented). [[1]](diffhunk://#diff-4e4afb8dcdade0abe18bd8bea68b148b4090cd86d60a1b1422c049960231737dR49-R60) [[2]](diffhunk://#diff-edb344a38502bba9a0083ab98e274ec1b5b2606639a61df7be474a600a7b99d2L29-R61) [[3]](diffhunk://#diff-f85806c745243652a0336da094126687a6c0d14b19fe760abe73df1d940dc4cbL12-R13) - Now reads `activation_alpha` and `activation_beta` attributes from operator parameters, defaulting to values appropriate for SwiGLU. ### QMoE Operator Implementation Refactor - Refactored the QMoE operator to clarify separation between quantized and FP32 implementations, and restructured internal methods for better maintainability. Added template parameterization for data types and improved handling of expert weights and biases. [[1]](diffhunk://#diff-e54124baa488af74400fae0f0dbd5cf7d4f1e307c0a5ba0e9dc79622e1315cd5R13-R35) [[2]](diffhunk://#diff-e54124baa488af74400fae0f0dbd5cf7d4f1e307c0a5ba0e9dc79622e1315cd5L38-R55) [[3]](diffhunk://#diff-e54124baa488af74400fae0f0dbd5cf7d4f1e307c0a5ba0e9dc79622e1315cd5L58-L59) ### Shape Checking and Layout - Removed legacy shape/layout support in QMoE input validation, enforcing only the new memory layout for expert weights and improving consistency and forward compatibility. ### Test and Documentation Updates - Updated unit tests for QMoE to use correct zero-point values for quantized weights (e.g., 0x88 for int4, 128 for int8), ensuring that test cases accurately reflect expected zero-output behavior for zero weights. Also clarified comments and expected outputs for SwiGLU and quantized scenarios. [[1]](diffhunk://#diff-27ea1ef8d40401d116e653d6b935304a7ad68ee8300d04ea98e814c585abee75L1340-R1349) [[2]](diffhunk://#diff-27ea1ef8d40401d116e653d6b935304a7ad68ee8300d04ea98e814c585abee75L1379-R1380) [[3]](diffhunk://#diff-27ea1ef8d40401d116e653d6b935304a7ad68ee8300d04ea98e814c585abee75L1404-R1413) [[4]](diffhunk://#diff-27ea1ef8d40401d116e653d6b935304a7ad68ee8300d04ea98e814c585abee75L1525-R1538) These changes collectively improve the flexibility, correctness, and maintainability of the QMoE operator in ONNX Runtime. Unit test result ``` sRunning test: batch_size=1, sequence_length=8, quant_bits=4, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 4-bit: max_diff = 0.000372 .Running test: batch_size=1, sequence_length=8, quant_bits=8, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 8-bit: max_diff = 0.000392 .Running test: batch_size=1, sequence_length=32, quant_bits=4, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 4-bit: max_diff = 0.000470 .Running test: batch_size=1, sequence_length=32, quant_bits=8, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 8-bit: max_diff = 0.000442 .Running test: batch_size=4, sequence_length=8, quant_bits=4, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 4-bit: max_diff = 0.000470 .Running test: batch_size=4, sequence_length=8, quant_bits=8, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 8-bit: max_diff = 0.000442 .Running test: batch_size=4, sequence_length=32, quant_bits=4, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 4-bit: max_diff = 0.000609 .Running test: batch_size=4, sequence_length=32, quant_bits=8, use_swiglu=True, swiglu_interleaved=True Parity check - SwiGLU(interleaved=True) 8-bit: max_diff = 0.000702 . ---------------------------------------------------------------------- Ran 9 tests in 46.754s OK (skipped=1) ``` --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
…ile build (microsoft#25849) ### Description `ABSL_FLAGS_STRIP_NAMES `is set to 1 by default to disable flag registration when building for Android, iPhone, and "embedded devices". So, running onnxruntime_perf_test on Android will see that flags are not registered. <img width="872" height="182" alt="image (2)" src="https://github.com/user-attachments/assets/eb6a6772-cdff-4d60-a3c7-4352477e956c" /> Set `ABSL_FLAGS_STRIP_NAMES ` to 0 by default for all builds.
### Description The phi4 mini in Edge is using ai.onnx v21. Without this change, it results a `MemcpyToHost` inserted and slows the generation speed.
This change uses subgroupShuffle for sg_size=64 to perform the matmul. It also uses a loop instead of loop unrolling to reduce the register pressure. Phi4 prefill for 1K tokens becomes 8.8s from 11.32s on Qualcomm Adreno X1-85 GPU.
### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
### Description Enable einsum op with QK equations for attention in QNN EP. ### Motivation and Context Current einsum op in QNN doesn't support equations with capital alphabets. Loose this constraint to allow more usecases. Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
…#25833) ### Description <!-- Describe your changes. --> While memory profiling some models I noticed multiple file mapping failures. `WindowsEnv::MapFileIntoMemory()` While it properly checks for the mapping offset to be granularity aligned, it calculates it as page aligned. Also, while saving external tensors we do not need to align big tensors to windows granularity or anything that is platform dependent. Set it to 4096 for all platforms. Granularity matters only for calculating mapping address. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Multiple failures for file mapping for certain models. This saves some hundreds of Mbs for some models.
### Description <!-- Describe your changes. --> Fix packaging pipelines ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> During CIs and local builds Ort::Status() gets inherited from the base due to using directives, however, that does not work for packaging pipelines. Having default ctor is important for storing Status in containers if needed.
Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4 to 5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-java/releases">actions/setup-java's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <h3>Breaking Changes</h3> <ul> <li>Upgrade to node 24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/888">actions/setup-java#888</a></li> </ul> <p>Make sure your runner is updated to this version or newer to use this release. v2.327.1 <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p> <h3>Dependency Upgrades</h3> <ul> <li>Upgrade Publish Immutable Action by <a href="https://github.com/HarithaVattikuti"><code>@HarithaVattikuti</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/798">actions/setup-java#798</a></li> <li>Upgrade eslint-plugin-jest from 27.9.0 to 28.11.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-java/pull/730">actions/setup-java#730</a></li> <li>Upgrade undici from 5.28.5 to 5.29.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-java/pull/833">actions/setup-java#833</a></li> <li>Upgrade form-data to bring in fix for critical vulnerability by <a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/887">actions/setup-java#887</a></li> <li>Upgrade actions/checkout from 4 to 5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-java/pull/896">actions/setup-java#896</a></li> </ul> <h3>Bug Fixes</h3> <ul> <li>Prevent default installation of JetBrains pre-releases by <a href="https://github.com/priyagupta108"><code>@priyagupta108</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/859">actions/setup-java#859</a></li> <li>Improve Error Handling for Setup-Java Action to Help Debug Intermittent Failures by <a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/848">actions/setup-java#848</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/848">actions/setup-java#848</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/888">actions/setup-java#888</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-java/compare/v4...v5.0.0">https://github.com/actions/setup-java/compare/v4...v5.0.0</a></p> <h2>v4.7.1</h2> <h2>What's Changed</h2> <h3>Documentation changes</h3> <ul> <li>Add Documentation to Recommend Using GraalVM JDK 17 Version to 17.0.12 to Align with GFTC License Terms by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/704">actions/setup-java#704</a></li> <li>Remove duplicated GraalVM section in documentation by <a href="https://github.com/Marcono1234"><code>@Marcono1234</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/716">actions/setup-java#716</a></li> </ul> <h3>Dependency updates:</h3> <ul> <li>Upgrade <code>@action/cache</code> from 4.0.0 to 4.0.2 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/766">actions/setup-java#766</a></li> <li>Upgrade <code>@actions/glob</code> from 0.4.0 to 0.5.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/744">actions/setup-java#744</a></li> <li>Upgrade ts-jest from 29.1.2 to 29.2.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/743">actions/setup-java#743</a></li> <li>Upgrade <code>@action/cache</code> to 4.0.3 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/773">actions/setup-java#773</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-java/compare/v4...v4.7.1">https://github.com/actions/setup-java/compare/v4...v4.7.1</a></p> <h2>v4.7.0</h2> <h2>What's Changed</h2> <ul> <li>Configure Dependabot settings by <a href="https://github.com/HarithaVattikuti"><code>@HarithaVattikuti</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/722">actions/setup-java#722</a></li> <li>README Update: Added a permissions section by <a href="https://github.com/benwells"><code>@benwells</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/723">actions/setup-java#723</a></li> <li>Upgrade <code>cache</code> from version 3.2.4 to 4.0.0 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/724">actions/setup-java#724</a></li> <li>Upgrade <code>@actions/http-client</code> from 2.2.1 to 2.2.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/728">actions/setup-java#728</a></li> <li>Upgrade <code>actions/publish-immutable-action</code> from 0.0.3 to 0.0.4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/727">actions/setup-java#727</a></li> <li>Upgrade <code>@types/jest</code> from 29.5.12 to 29.5.14 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/729">actions/setup-java#729</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/setup-java/commit/dded0888837ed1f317902acf8a20df0ad188d165"><code>dded088</code></a> Bump actions/checkout from 4 to 5 (<a href="https://redirect.github.com/actions/setup-java/issues/896">#896</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/0913e9a06eb8b69c62db76aa61f580c2b3a5b4e0"><code>0913e9a</code></a> Upgrade to node 24 (<a href="https://redirect.github.com/actions/setup-java/issues/888">#888</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/e9343db97e09d87a3c50e544105d99fe912c204b"><code>e9343db</code></a> Bumps form-data (<a href="https://redirect.github.com/actions/setup-java/issues/887">#887</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/ae2b61dbc685e60e4427b2e8ed4f0135c6ea8597"><code>ae2b61d</code></a> Bump undici from 5.28.5 to 5.29.0 (<a href="https://redirect.github.com/actions/setup-java/issues/833">#833</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/c190c18febcf6c040d80b10ea201a05a2c320263"><code>c190c18</code></a> Bump eslint-plugin-jest from 27.9.0 to 29.0.1 (<a href="https://redirect.github.com/actions/setup-java/issues/730">#730</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/67aec007b3fcabe15ca665bfccc1e255dd52e30d"><code>67aec00</code></a> Fix: prevent default installation of JetBrains pre-releases (<a href="https://redirect.github.com/actions/setup-java/issues/859">#859</a>)</li> <li><a href="https://github.com/actions/setup-java/commit/ebb356cc4e59bcf94f518203228485f5d40e4b58"><code>ebb356c</code></a> Improve Error Handling for Setup-Java Action to Help Debug Intermittent Failu...</li> <li><a href="https://github.com/actions/setup-java/commit/f4f1212c880fdec8162ea9a6493f4495191887b4"><code>f4f1212</code></a> Update publish-immutable-actions.yml (<a href="https://redirect.github.com/actions/setup-java/issues/798">#798</a>)</li> <li>See full diff in <a href="https://github.com/actions/setup-java/compare/v4...v5">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…at info (microsoft#25841) ### Description This PR adds a new API that applications can use to verify compatibility of a precompiled model with the underlying system, using only the compatibility info string from the model's metadata. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - This is a feature to enable apps to check compatibility of a precompiled model without necessarily having the model locally on the device. This enables precompiled models to be stored remotely and downloaded once the application has been able to confirm the validity of a given model with EPs on the device. ### Testing - New unit tests pass - For regression testing, built a private version of WinML + AMD NPU EP with these changes. Ran the Cpp Selfcontained Desktop sample successfully; ran with compilation and also re-ran using the already-compiled model to verify that session initialization continued to work as expected. --------- Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
) ### Description <!-- Describe your changes. --> According to the [WebNN spec](https://www.w3.org/TR/webnn/#api-mlgraphbuilder-batchnorm), the batchNorm should have input names "mean" and "variance" instead of "input_mean" and "input_var". ### Motivation and Context This issue causes any BatchNorm with mean/variance inputs to fall back to wasm.
…dLayerNorm (microsoft#25850) ### Description Use similar shaders as SkipSimplifiedLayerNorm in SimplifiedLayerNorm, to fix the performance issues with SimplifiedLayerNorm. ### Motivation and Context Prior to this change, generation in Bitnet was bottlenecked on SimplifiedLayerNorm <img width="332" height="378" alt="image" src="https://github.com/user-attachments/assets/3bc16ac1-ef7d-46bf-b403-92fc9192a2df" /> with this change performance has now improved to match SkipSimplifiedLayerNorm <img width="699" height="179" alt="image" src="https://github.com/user-attachments/assets/30009d85-d5d9-4585-987a-b39ecf52e0b5" />
…s int32 (microsoft#25646) ### Description This PR makes DequantizeLinear support non-zero zero_point when input data type is int32. ### Motivation and Context For WebNN use case, we have some scenarios that input data type is int32 and the zero_point is not zero for DequantizeLinear.
preetha-intel
approved these changes
Aug 28, 2025
preetha-intel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backmerging with Master
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.