Backmerging with Msft commits #762

jatinwadhwa921 · 2025-07-29T06:34:08Z

Backmerging with Msft commits

### Description  Add new allocator type of OrtReadOnlyAllocator to enable providing a separate allocator that is only used for initializers. Update the SessionState logic to support this allocator type being provided, and use it when doing device allocations for initializers. ### Motivation and Context  Performance.

### Description This PR patches the features provided for this PR microsoft#25476, this provides a stable fix for the GPU plugin with upcoming OV toolkit v2025.2.1 --------- Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: n1harika <niharika.sathish@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Jaskaran Singh Nagi <jaskaran.singh.nagi@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: Sushanth Rajasankar <44513542+sushraja-msft@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Seungtaek Kim <seungtaek.kim.94@gmail.com> Co-authored-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Alessio Soldano <services@soldano.it> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Ashish Garg <quic_ashigarg@quicinc.com> Co-authored-by: Ashish Garg <ashigarg@qti.qualcomm.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: wp <webgraphics@intel.com> Co-authored-by: Satya Kumar Jandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: Prathik Rao <prathik.rao@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Jianhui Dai <jianhui.j.dai@intel.com> Co-authored-by: xhcao <xinghua.cao@intel.com> Co-authored-by: Wanming Lin <wanming.lin@intel.com> Co-authored-by: Mark Schofield <mschofie@microsoft.com> Co-authored-by: jiangzhaoming <zhaoming.jiang@microsoft.com> Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: vraspar <vrajang@outlook.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: Ranjit Ranjan <165394499+ranjitshs@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: Pallavi Gupta <pallavi.gupta@intel.com> Co-authored-by: Nikolay Proshunin <nikolay.proshunin@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> Co-authored-by: Bartlomiej Filipek <bartlomiej.filipek@intel.com> Co-authored-by: bopeng1234 <bo.peng@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Yaru Du <yaru.du@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Dvoretckii, Mikhail <mikhail.dvoretckii@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Fei Chen <feich@microsoft.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Akupadhye <aupadhye@qti.qualcomm.com> Co-authored-by: Wang Ning <ning4.wang@intel.com> Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: quic-hungjuiw <quic_hungjuiw@quicinc.com> Co-authored-by: Ian Hunter <ianfhunter@gmail.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com> Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com> Co-authored-by: Nenad Banfic <46795300+nenad1002@users.noreply.github.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: Ryan Metcalfe <ryan.metcalfe@intel.com>

Previously the machine pool had a User-assigned managed identity (UMI) which was used for accessing the blob storage. Now the UMI was removed. to improve security. Therefore we baked the data into the VM image instead.

…tValues (microsoft#25482) ### Description - Adds APIs to get information (file path, file offset, byte size) for initializers with data in external files. This allows EPs to do their own custom memory-mapping of initializer data. By default, EPs that don't have specific requirements can still use `ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped initializer data. - Updates `OrtGraph` to only load `OrtValue` for external initializers on demand. This prevents having to memory map all external initializers before the first call to `OrtEp::GetCapability`. Follow up to microsoft#25320 New API functions: | Function | Summary| |-----------|--------------| | `ValueInfo_GetExternalInitializerInfo` | Get `OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be released with `ReleaseExternalInitializerInfo`| | `ReleaseExternalInitializerInfo` | Releases the `OrtExternalInitializerInfo` instance | | `ExternalInitializerInfo_GetFilePath` | Returns the relative path to the file that stores the initializer's data | | `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset within the file where the initializer's data is stored | | `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of the initializer's data within the file | ### Motivation and Context  --------- Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>

### Description Use the license file from QNN SDK to make sure it's up to date. --------- Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>

…nt. (microsoft#25465) ### Description  Add arena that uses EP API so that an EP library can be self-sufficient. Remove cross stream sharing from BFCArena. Nothing is using it and it creates a dependency on synchronizing streams inside the arena implementation. Tried to simplify the Stream/Notification usage. Current setup adds an AllocOnStream to OrtAllocator. There's no stream aware Free at this point as ORT does not attach the Stream to the memory usage so can't pass it in to the Free call. ### Motivation and Context  If ORT adds BFCArena to an OrtAllocator from the EP we have OrtAllocator -> IAllocator wrapper -> BFCArena IAllocator [-> OrtAllocator wrapper for external usage]. The EP managing its own arena is much simpler. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…icrosoft#25390) ### Description  Adjusts concat operator to batch inputs based on maxStorageBuffersPerShaderStage to allow unlimited number of inputs. ### Motivation and Context  Fixes patchtst model for transformers.js <img width="960" height="367" alt="{31C75CD1-7A7D-48E3-A090-FB153925D165}" src="https://github.com/user-attachments/assets/f5772709-80b7-4a05-8927-40f496be908c" />

### Description Implementation Attention(23) for CPU. The backend tests from onnx were wrong for Attention (see onnx/onnx#7142). The onnx version needs to be updated to make all tests pass. The implementation matches the reference implementation after onnx was fixed. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>

@chilo-ms

…t are not correctly excluded (microsoft#25502) ### Description This change respects initializers that are external but already loaded in memory. This is required due to an optimization that leaves it to the backend to read a mapped memory area. @chilo-ms can you help run the CI and merge this change ? --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

### Description  1. Implemented the required changes for the EP factory. ### Motivation and Context  These changes are required for WinML GA.

### Description Fixes for the OrtGraphToProto utilities that EPs can copy and modify: - When serializing `OrtGraph` to ONNX protobuf, do not set an `onnx::TensorShapeProto` for `onnx::ValueInfo` if the shape has no dimension entries. Otherwise, the shape incorrectly looks like a scalar. - Add `ORT_OP_ATTR_GRAPH` to the enum values returned by the `OpAttr_GetType` C API function. This allows the OrtGraphToProto utilities to skip processing subgraph attributes, which can be retrieved via a different API, but return an error on any unsupported attribute type. ### Motivation and Context

…ft#25534) ### Description 1. Upgrade onnxruntime-Ubuntu2204-AMD-CPU machine pool to Ubuntu 24.04, which can fix some vulnerability management issues. 2. Fix some packaging pipeline issues and remove some unused code blocks from dml-vs-2022.yml

…#25484) WebNN requires the shapes of zeroPoint and scale for a qdq op to be same. However the ONNX allows [1] as scalar shape and some models may use [1] as the shape for x_zero_point. We should explicitly set the shape of scale to x_zero_point.

Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>

…KleidiAI (microsoft#25187) This PR introduces the initial integration of KleidiAI-optimized microkernels into ONNX Runtime's MLAS backend, focusing on support for: - SGEMM - IGEMM - Dynamic Quantized MatMuls Key changes: Implements overrides for MlasGemmBatch, MlasGemmPackBSize, and MlasGemmPackB using KleidiAI where applicable. Applies dispatch logic based on TransA == CblasNoTrans and SME2 availability. Supports float32 and int8 GEMM workloads with conditionally invoked SME2 paths. Maintains fallback paths to default MLAS implementations to ensure coverage and stability. **Known Issues / Next Steps:** Requesting feedback specifically on the API structure: Does the new MLAS interface design align with long-term extensibility? Are the dispatch points and override boundaries well-structured? Indicative Performance figures: The kernels added are particularly effective for Conv2D operators: * Based on KleidiAI SME running mobilenet_v1_ssd_f32 on Mac Mini M4 on a single thread <img width="815" height="308" alt="image" src="https://github.com/user-attachments/assets/e39a7fef-1370-4332-83a3-1f3a80b29da4" /> --------- Signed-off-by: Damien Dooley <damien.dooley@arm.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Declan Flavin <declan.flavin@arm.com> Co-authored-by: Colm Donelan <colm.donelan@arm.com> Co-authored-by: Damien Dooley <damdoo01@ip-10-249-28-46.eu-west-1.compute.internal>

### Description  ### Motivation and Context

### Description - LPBQ encoding is Qualcomm's alternative quantization encoding format for Block Quantization - Add translation logic to read LPBQ pattern on MatMul weights in an QDQ ONNX model exported by AIMET Quantizer - Prepare the corresponding QNN Quantization param for applying LowPowerBlockQuantization on MatMul weights - Apply LPBQ Fusions only for NPU Backend as currently only NPU backend supports LPBQ encoding format ### Motivation and Context - This requires accelerate accuracy sensitive large language models like Phi-3.5 efficiently on Qualcomm's NPU accelerator.

### Description Corrected dtype_name for the respective float16 implementations, previously MLFloat16 would return bf16 rather than fp16, and vice-versa. ### Motivation and Context It looked wrong but passed the tests, I don't fully comprehend what the test suite is doing to try and improve it. I'd be willing to implement any pointers.

It reduces the pipeline time for about 30 minutes. The tests still take about 1 hour, which should be reduced.

…ded but not constant. (microsoft#25544) ### Description  In DynamicQuantizeMatMul KleidiAI-specific prepacking logic, handle case where B zero point input is provided but not constant. In this case, we should not prepack. Add some unit tests that test the prepacking code path. Add check for ARM SME instructions in DynamicQuantizeMatMul before calling `MlasDynamicQGemmBatch()` and associated functions. ### Motivation and Context  Follow up to microsoft#25187

### Description ### Motivation and Context Fix the build break on Windows+Ninja

### Description Fixes the packaging pipeline. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This PR uses the existed RunOption `gpu_graph_id` to control whether to skip the graph capture. When the webgpu ep option `enableGraphCapture` is enabled, in RunOption, gpu_graph_id = -1 means skipping graph capture. Otherwise, go to the graph capture path for each session.run. If gpu_graph_id is not specified in RunOption, it will respect `enableGraphCapture `'s value to see whether to go to graph capture path.

### Description  Refactor to split out classes and make things easier to find. ### Motivation and Context  Cleanup

…ml (microsoft#25552) ### Description Yesterday I updated the machine images. Now they already have python preinstalled. We don't need to do this anymore. Remove the steps to avoid conflicts. Also, refactor the yaml file a little bit. Refactors templates to use parameterized Python versions instead of matrix strategy.

Additional equation support for QNN EP on einsum op.

- **DynamicQuantizeMatMul - handle case where B zero point input is provided but not constant. (microsoft#25544)** - **Refactor plugin EP support (microsoft#25541)** - **Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (microsoft#25552)**

### Description This change is based on microsoft#25135. Upgrade xnnpack and several related third-party dependencies, including pthreadpool, cpuinfo, and kleidiai. This change also updates the xnnpack execution provider code to accommodate changes in the xnnpack api. Average pooling qu8 is removed as the corresponding microkernel seems no longer exist in xnnpack.

…te (microsoft#25553) This PR fixed webgpu_fix_frame_generator by adding present mode to the surface configuration. This new attribute is required by laste Dawn to rendering frames.

### Description This implements the SwiGLU activation for MoE and qMoE. The activation is corresponding to https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py. Also update test_parity_moe.py to enable test for qMoE in CI pipelines. ### Motivation and Context This is naive implementation of the activation. Since the activation will reduce each row length to half, we cannot directly use epilogue. Current implementations need an extra buffer to run SwiGLU kernel. In the future, we might take a look at other alternatives that does not need extra buffer.

### Description Fixes documentation error in onnxruntime_c_api.h: parameter name mismatch for `Graph_GetGraphView` ### Motivation and Context Fix errors in the GitHub action for generating the C/C++ documentation from public header files.

skottmckay and others added 30 commits July 24, 2025 08:10

Move Linux CUDA pipelines to H100 (microsoft#25523)

bfa2c91

Fix QNN SDK download problem (microsoft#25520)

52fd75f

Previously the machine pool had a User-assigned managed identity (UMI) which was used for accessing the blob storage. Now the UMI was removed. to improve security. Therefore we baked the data into the VM image instead.

Qnn license file update (microsoft#25158)

e0ad805

### Description Use the license file from QNN SDK to make sure it's up to date. --------- Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>

Update .config/1espt/PipelineAutobaseliningConfig.yml (microsoft#25450)

8152168

Auto-generated baselines by 1ES Pipeline Templates (microsoft#25536)

11aebeb

Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>

[build] upgrade to use Node.js in docker image (microsoft#25529)

ff83f53

### Description  ### Motivation and Context

Split windows_tensorrt.yml to two parts (microsoft#25528)

29c20cb

It reduces the pipeline time for about 30 minutes. The tests still take about 1 hour, which should be reduced.

upgrade emsdk to v4.0.11 (microsoft#25477)

b214da5

### Description ### Motivation and Context Fix the build break on Windows+Ninja

[build] Fix the file copy in get_docker_image.py (microsoft#25548)

7c0c29d

### Description Fixes the packaging pipeline. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[QNN EP] Support more Einsum equation: bhwc,hkc->bhwk (microsoft#25518)

413d38d

Additional equation support for QNN EP on einsum op.

shaoboyan091 and others added 4 commits July 28, 2025 15:08

Fix webgpu_pix_frame_generator by adding missing present mode attribu…

38e660c

…te (microsoft#25553) This PR fixed webgpu_fix_frame_generator by adding present mode to the surface configuration. This new attribute is required by laste Dawn to rendering frames.

Merge branch 'master' into synccc_msft_29_7_25

1833f04

jatinwadhwa921 requested a review from ankitm3k July 29, 2025 06:34

ankitm3k approved these changes Jul 29, 2025

View reviewed changes

ankitm3k merged commit 420ec3a into ovep-develop Jul 29, 2025
6 of 8 checks passed

ankitm3k deleted the synccc_msft_29_7_25 branch July 29, 2025 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with Msft commits #762

Backmerging with Msft commits #762

Uh oh!

jatinwadhwa921 commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants

Backmerging with Msft commits #762

Backmerging with Msft commits #762

Uh oh!

Conversation

jatinwadhwa921 commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants