forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 57
Backmerging with Msft Commits #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description This PR sets adding support for the `DecoderMaskedMultiHeadAttention` (DMMHA) kernel inside `MultiHeadAttention` (MHA) to false by default. ### Motivation and Context The models containing the extra inputs for DMMHA (i.e. `past_sequence_length` and `cache_indirection`) have some runtime issues. Additionally, not all execution providers implement the DMMHA kernel inside MHA and will therefore not support these extra inputs.
…#25140) This PR optimizes the Intel path for subgroup_matrix_matmul_nbits by removing the per-thread load of matrix A and instead using subgroupMatrixLoad directly from global memory, reducing SLM usage and bandwidth pressure. - Removed var<workgroup> tile_A and the loadSHMA helper function. - Updated inner loop to compute a global offset and call subgroupMatrixLoad on input_a. - Adjusted indexing and stride parameters to match the global layout.
### Description This change replaces the previous zero-extend + 16-bit accumulation sequence with a single wasm_i32x4_relaxed_dot_i8x16_i7x16_add operation to compute row sums directly on 8-bit data. ### Motivation and Context This update eliminates unpacking overhead and lifts the former constraints on k stride.
add support for reverse slice and enable all unit test for it. This will fix microsoft#24744 with the new webgpu ep. I need to make a similar fix for jsep.
### Description <!-- Describe your changes. --> Re-enable unit tests in Android CI build. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The CI build is not running the unit tests. It should run them.
- Implemented MeanOpBuilder to support ONNX Mean operator in QNN EP. - Decomposed Mean into a sequence of element-wise Add operations followed by a Div Op. - Added unit tests for Mean op running on HTP ### Description Adds support for the ONNX Mean operator in QNN EP via Add + Div decomposition. ### Motivation and Context Enables execution of models using Mean on QNN backend, improving Op support.
…t#25173) ### Description Per Windows team's CyberEO requirement, do not disable the warnings in project level.
…rosoft#25172) ### Description Fixed onnxruntime_mlas_test requiring /bigobj in MSVC Debug mode ### Motivation and Context microsoft#24741 microsoft#25169
### Description <!-- Describe your changes. --> Adding the following ORT EP APIs: - `GetPreferredDataLayout()` - `SetDynamicOptions()` - `OnRunStart()` - `OnRunEnd()` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Expose additional EP APIs.
…ft#25171) ### Description * Re-enable tests and remove workarounds that were introduced as part of a QNN <= 2.31 upgrade but are no longer necessary. ### Motivation and Context QNN/QAIRT releases about once a month. As ONNX Runtime adopts these new versions, some number of tests are often found to be impacted. Consequently, tests are skipped and tolerances are loosened. This change reverts as many of those workarounds as possible that were made for QNN upgrades between 2.17 and 2.31, inclusive. The most recent few releases were intentionally not examined to minimize impact on users on old versions and to avoid lock-in to the bleeding edge. --------- Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com>
…ecified in GetCapability (microsoft#25137) ### Description - Add ability to drop constant initializers for fused nodes specified in GetCapability. - Rework how an EP specifies nodes that should be fused into one node within GetCapability. - Instead of passing the set of nodes as arguments to `GraphSupportInfo_AddNodesToFuse()`, the EP creates an `OrtNodeFusionOptions` object to specify the nodes and other relevant options. This makes it easier to extend the API in the future since we can't add more parameters to an existing function, but we can add more functions that modify an options object. ### Motivation and Context Add more functionality missing from GetCapability() in the EP ABI.
### Description In TensorRT 10.12, weakly-typed network and related APIs have been marked deprecated. Ignore these deprecated API warnings for the Windows build. --------- Signed-off-by: Kevin Chen <kevinch@nvidia.com>
…soft#25176) Remove --enable_wcos. The flag is for the old WinML code only.
microsoft#25159) ### Description Updates the `OrtGraph` implementation to take advantage of the work done in PR microsoft#23979, which sets the infrastructure to store initializers as `OrtValue` instances in the `onnxruntime::Graph`. There still needs to be second part to the [aforementioned PR](microsoft#23979) to ensure that all initializers are stored as `OrtValue`s in the Graph. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Support bfloat16 for MatMulNBits in CUDA. ### Motivation and Context For LLM model with bfloat16 data type.
### Description Use lintrunner to format *.cu and *.cuh files. ### Motivation and Context Some cuda code is not formatted. This will make the style consistent.
1. Delete ROCM EP, because there is no active development and we have another AMD GPU EP(migraphx) to use. 2. Delete WASM64 build option, because the feature was incomplete. Likely we will need to reimplement it later. But, we will delete it for now( I already discussed it with @fs-eire) . 3. Delete the kernel explorer python extension, which was solely used by the ROCM EP 4. Delete the triton related build options, which wasn't really put into use. 5. Add a pull request pipeline for Migraphx EP. The following cmake options are removed: - onnxruntime_USE_ROCM - onnxruntime_ENABLE_WEBASSEMBLY_MEMORY64 - onnxruntime_ENABLE_TRITON - onnxruntime_USE_COMPOSABLE_KERNEL - onnxruntime_USE_COMPOSABLE_KERNEL_CK_TILE - onnxruntime_USE_ROCBLAS_EXTENSION_API - onnxruntime_USE_TRITON_KERNEL - onnxruntime_BUILD_KERNEL_EXPLORER - onnxruntime_BUILD_CACHE - MSVC_Z7_OVERRIDE
### Description <!-- Describe your changes. --> Add allocator and data transfer infrastructure for plugin EP API Allocators are created via the OrtEpFactory using OrtMemoryInfo that as added to the OrtEpDevice instances the factory returns. This allows allocators to be created outside of an inference session and shared. When a library is loaded a default instance of each allocator is added to the shared allocators if there is no existing allocator (e.g. user provided custom allocator). CreateSharedAllocator can be used to replace this default instance with a user configured one. e.g. add an arena or provide other configuration options that are passed through to the OrtEpFactory's CreateAllocator function. Similarly IDataTransfer is supported by the factory implementing OrtDataTransferImpl, which will also enable data transfer outside of a session. That will be added in a future PR as the synchronization requirements need to be figured out and will affect the public API. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Exclude lean attention from linux build. ### Motivation and Context Previously, lean attention was built in Linux but not in Windows. It is not used Gen AI so far, so we disable it in build to reduce binary size and build time.
…icrosoft#25017) ### Description MatMul+Add->Gemm fusion when AttentionFusion isn't enabled. ### Motivation and Context Graph transformation [MatMulAddFusion](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/matmul_add_fusion.cc) fold `ONNX::MatMul` followed by `ONNX::Add` into `ONNX::GEMM`, however, it [intentionally skipping the portion belongs to "Attention Pattern"](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/matmul_add_fusion.cc#L21). This result in poor performance on QNN EP (and other EPs who does not run *AttentionFusion transformers) due to unfused MatMul + Add pairs.  With this change, additional GEMM would be fused *post* AttentionFusions.
This PRs adds additional Node_GetAttributes C API for EP ABI use. It's based on microsoft#24887
Delete the legacy patches related to protobuf, which was added from microsoft#14279 and microsoft#15878 to simplify the ONNX patches.
### Description <!-- Describe your changes. --> Debug Windows build fails with unreachable code warning due to change added in microsoft#25161. Use an `else` to avoid the warning. ``` \onnxruntime\test\contrib_ops\matmul_4bits_test.cc(559,1): error C2220: the following warning is treated as an error [\build\Windows.vs22\Debug\onnxruntime_test_all.v cxproj] \onnxruntime\test\contrib_ops\matmul_4bits_test.cc(559,1): warning C4702: unreachable code [\build\Windows.vs22\Debug\onnxruntime_test_all.vcxproj] \onnxruntime\test\contrib_ops\matmul_4bits_test.cc(561,1): warning C4702: unreachable code [\build\Windows.vs22\Debug\onnxruntime_test_all.vcxproj] ... ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Update pytorch to address https://nvd.nist.gov/vuln/detail/CVE-2025-32434
…yout sensitive ops (microsoft#25147) ### Description <!-- Describe your changes. --> Add `IExecutionProvider::ShouldConvertDataLayoutForOp()` to allow EPs to customize layout sensitive ops. Move existing hardcoded EP-specific logic out of layout transformer code. Add `OrtEp::ShouldConvertDataLayoutForOp` to ABI EP API to allow similar customization by plugin EPs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable layout sensitive op customization through internal EP interface and the ABI EP API.
This automated commit updates the vcpkg dependency to version 2025.06.13 and its corresponding commit hash ef7dbf94b919.
…crosoft#25200) ### Description <!-- Describe your changes. --> Add back the linker option to make stack non-executable, which was accidentally lost here: microsoft#22646 This just adds back the option in the same place where it was. ### Motivation and Context After upgrading to 1.22.0 we saw this warning: OpenJDK 64-Bit Server VM warning: You have loaded library /opt/vespa-deps/lib64/libonnxruntime.so.1.22.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
…torV2 (microsoft#25220) ### Description Distinguish between the memory types when creating a shared environment allocator for CUDAExecutionProvider ### Motivation and Context Fixed microsoft#25211
…oft#25218) ### Description Added missing "mem_info" parameter into CPUAllocator constructor ### Motivation and Context Without the correct mem_info, CudaPinned allocator is mapped with wrong (default) "Cpu" memory_info.
microsoft#25222) ### Description <!-- Describe your changes. --> EP implementations need to be able to read vendor id and device id to implement OrtDataTransferImpl::CanCopy correctly. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> Add the LUID to metadata. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Requested by partners.
### Description Currently only build is enabled. Testing step is failing because of error like this: ``` 1: /onnxruntime_src/onnxruntime/core/providers/webgpu/webgpu_context.cc:87 onnxruntime::webgpu::WebGpuContext::Initialize(const onnxruntime::webgpu::WebGpuBufferCacheConfig&, int, bool)::<lambda()>::<lambda(wgpu::RequestAdapterStatus, wgpu::Adapter, wgpu::StringView, wgpu::Adapter*)> status == wgpu::RequestAdapterStatus::Success was false. Failed to get a WebGPU adapter: No supported adapters 1: ```
### Description Add telemetry error logging to InferenceSession::Run() methods to track runtime errors that occur during inference execution. ### Motivation and Context Currently, we do not have any telemetry error logging in the InferenceSession class to track runtime errors that occur during inference execution. This data would allow us to prioritize and identify the most frequent errors.
fix build error and typo
### Description <!-- Describe your changes. --> Revise existing PoolOpBuilder to support rank-5 inputs. Note that HTP only supports PoolAvg3d but not PoolMax3d. ### Motivation and Context Enable HSM-Net model support, which contains 3D AveragePool.
This pull request introduces telemetry enhancements for logging execution provider (EP) auto-selection events in ONNX Runtime. The changes include adding new methods to log EP selection data, updating classes to support session ID retrieval, and integrating telemetry logging into the session provider policy context. ### Telemetry Enhancements: * **Added `LogAutoEpSelection` Method**: Introduced a new virtual method in `Telemetry` and its override in `WindowsTelemetry` to log session-specific EP auto-selection data, including requested and available EP IDs and selection policies. ### Session ID Support: * **Added `GetCurrentSessionId` Method**: Added a method to `InferenceSession` to retrieve the current session ID, enabling session-specific telemetry logging. ### Integration with Provider Policy Context: * **Telemetry Logging in EP Selection**: Integrated telemetry logging into `ProviderPolicyContext::SelectEpsForSession`, capturing requested and available EP IDs, along with the selection policy type, and invoking the telemetry provider's `LogAutoEpSelection` method.
…icrosoft#25188) ### Description While EPContext model generation is enabled and some Nodes fallback on CPU. If the CPU nodes depend on external data. ORT force all external data to be embedded into new generated EPContext model by default. Ort used to create a dummy externa initializer file with maximum size threshold to force all initializer data dump into generated Onnx model file. Internally, a "./model_ext_ini.bin" file is created and got removed at the end of the call. It causes problem if multiple session doing the same thing. This fix is to avoid creating the temp empty external initializer file by adding a flag to force all external data to be embedded into new generated EPContext model.
### Description This PR add additional OpAttr_GetName C API for EP ABI use.
ankitm3k
approved these changes
Jul 2, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backmerging with Msft Commits