forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 57
Sync with Microsoft ONNX Runtime - 03/11/2025 #843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description <!-- Describe your changes. --> To fix build error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Currently will build fail. ``` CMake Error at /Users/runner/work/onnxruntime-custom/onnxruntime-custom/tvOS/RelWithDebInfo/_deps/flatbuffers-src/CMakeLists.txt:636 (install): -- CMAKE_CXX_FLAGS: -DNDEBUG -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -O3 -pipe -g -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -DCPUINFO_SUPPORTED install TARGETS given no BUNDLE DESTINATION for MACOSX_BUNDLE executable target "flatc". ```
… EP python wheel (microsoft#26115) ### Description - CUDA Runtime added as a dependency for NV EP python wheel - TRT RTX dlls and license copied to wheel as public wheels for TRT RTX are currently unavailable - Use onnxruntime.preload_dlls() to load CUDA Runtime DLL - python package name for NV EP is onnxruntime-trt-rtx ### Motivation and Context Enables out-of-the-box usage for the NV TensorRT RTX EP python wheel @gaugarg-nv @thevishalagarwal @gedoensmax @ishwar-raut1
### Description <!-- Describe your changes. --> Picked other models instead of those removed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Measure
Upgrade cpuinfo to a newer version to maintain it up to date.
### Description Reduce the time blocked waiting for the shader to be compiled. ### Motivation and Context Try to optimize the responsiveness of the application when running ort-web in main thread. See microsoft#25882
…soft#26400) ### Description This PR fused GeneratePositionIDs into FusedQKRotaryEmbedding which can reduce one kernel call. ### Motivation and Context Previously, for GQA, the processing flow was: `SplitPackedQKVProgram -> GeneratePositionIDs -> FusedQKRotaryEmbedding -> FlashAttention` After this change, the pipeline becomes: `SplitPackedQKVProgram -> FusedQKRotaryEmbedding -> FlashAttention` on NV5080, the token generation speed improved ~4%(128tps->133tps)
…er support (microsoft#26394) ### Description Currently the files which are needed for wasm aren't exported from the package at all. The files in question are: ``` ort-wasm-simd-threaded.wasm ort-wasm-simd-threaded.jsep.wasm ort-wasm-simd-threaded.asyncify.wasm ort-wasm-simd-threaded.mjs ort-wasm-simd-threaded.jsep.mjs ort-wasm-simd-threaded.asyncify.mjs ``` This PR changes that and adds them to `exports` field in the `package.json`. ### Motivation and Context Bundlers like `webpack` use the `copyPlugin` to move those files into the `public` directory, so the files can be accessed by a stable url. However more advanced and "state of the art" bundlers like `vite` are able to [import asset urls directly](https://vite.dev/guide/assets.html#explicit-url-imports). Vite takes the asset, moves it to to public assets folder (possibily renames the asset and adds a hash etc.). The imported value then is the bundled assets final url. Those urls can then be used in the `env.wasm.wasmPaths` directly. In vites case the full code example is: ```js import wasmUrl from 'onnxruntime-web/ort-wasm-simd-threaded.wasm?url'; import mjsUrl from 'onnxruntime-web/ort-wasm-simd-threaded.mjs?url'; env.wasm.wasmPaths = { wasm: wasmUrl, mjs: mjsUrl, }; ``` With those added exports we can leverage more of the bundlers capabilities and in vites case there isn't any need to add any additional configs. It would just work. When importing we also get proper suggestions: <img width="1604" height="498" alt="imports" src="https://github.com/user-attachments/assets/2678ccc2-ae46-4289-aa6e-607ecbc5388b" /> ---- I would like additional tests to ensure that the exports are available, but I couldn't make the `e2e` tests work on my system. I would appreciate some guidance on that topic.
### Description - Updates the `ORT_API_VERSION` value in onnxruntime_c_api.h to `24`. - Edit documentation for the `TensorTypeAndShape_HasShape` API function to indicate the correct API version (1.24). - Adds `static_assert` to ensure that API functions for 1.23 are not added or removed. ### Motivation and Context The version of ORT was previously updated to 1.24.0 but we forgot to update `ORT_API_VERSION`.
…rosoft#26420) Add nightly gpu pipelines for CUDA 13.
### Description [js] Upgrade ESLint from v8 to v9 for subfolder /js/ ### Motivation and Context
Previously, model_proto was passed by name, which triggered a copy constructor call instead of move construction. Using std::move(model_proto) ensures that the object is constructed via move semantics, reducing unnecessary memory allocation and copy overhead. Co-authored-by: liumingyue <mingyue@xilinx.com>
### Description <!-- Describe your changes. --> Change Xcode version from 16.4 to 16.2, according to what's available on the build agent image. Adjust casing in `xcodebuild -destination` argument. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix iOS packaging pipeline issues.
…26422) ### Description Use MayInPlace hint for the FusedConv NCHW CPU kernel that allows for the allocation planner to re-use the optional "sum" input buffer as the kernel's output buffer. It will potentially save [this copy](https://github.com/microsoft/onnxruntime/blob/06004826cc99dd8b8b92dbf000db3d3525716f22/onnxruntime/core/providers/cpu/nn/conv.cc#L209) if the buffers can be shared (in most cases the buffers can be shared) ### Motivation and Context The kernel already had logic to save the copy but the kernel def was missing the hint to the allocation planner. Found while investigating improvements to a Conv model's performance on ARM64
### Description Add unidirectional support for MHA. Also updated `GetCapability` to make sure only run Attention on WebGPU under certain conditions
This pull request enables conditionally register GQA with total_sequence_length on gpu or not. It resolves the issue that a MemcpyToHost is generated when graph capture is enabled (refer to microsoft#25868). This is the last functionality part to support graph capture in webgpu ep in ORT. The main changes ensure that when graph capture is enabled, sequence length information is read from GPU buffers instead of CPU memory, and shader code generation adapts accordingly. This enables more efficient execution and compatibility with graph-captured models. In this PR, we still get total sequence length from `seqlen_k` tensor not `total_seqlen_tensor` tensor to keep consistent with other parts. In the next PR, we can refactor all places to directly use `total_seqlen_tensor` instead of `seqlen_k` when graph capture enabled.
ONNX's ScatterND and ScatterElements limit their indices input to int64, but some WebNN backends only support int32 indices. As a workaround for such backends, we can insert a Cast operation to convert the data type.
### Description <!-- Describe your changes. --> We don't want to adjusted the dispatch size when we try to run `Conv1d`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…#26234) ### Description Adds APIs to allow a plugin EP to create a virtual `OrtHardwareDevice` that can be used for model cross-compilation. For example, this allows an EP to create a compiled model for NPU on a device that does not have an NPU. #### Application code An application must explicitly allow registered plugin EPs to create virtual devices. This is currently done by using a registration name that ends in the `".virtual"` suffix. Ex: ```c++ #include "onnxruntime_cxx_api.h" #include "onnxruntime_ep_device_ep_metadata_keys.h" const char* ep_registration_name = "my_ep_lib.virtual"; // IMPORTANT: ".virtual" suffix is a signal to EP library ort_env->RegisterExecutionProviderLibrary(ep_registration_name, "my_ep.dll"); std::vector<Ort::ConstEpDevice> ep_devices = ort_env->GetEpDevices(); // ep_devices includes an OrtEpDevice from "my_ep.dll" that uses a virtual OrtHardwareDevice. Ort::ConstEpDevice virtual_ep_device = std::find_if(ep_devices.begin(), ep_devices.end(), [](Ort::ConstEpDevice& device) { return device.EpName() == std::string("MyEpName"); }); // App can look in HW metadata to check if is virtual Ort::ConstHardwareDevice virtual_hw_device = virtual_ep_device.Device(); std::unordered_map<std::string, std::string> metadata = virtual_hw_device.Metadata().GetKeyValuePairs(); assert(metadata[kOrtHardwareDevice_MetadataKey_IsVirtual] == "1"); // App can use the virtual OrtEpDevice in a session to, for example, compile a model // ... ``` #### Plugin EP code This PR introduces a new _optional_ C API function in the `OrtEpFactory` struct called `SetEnvironmentOptions` that allows ORT to pass options (as key/value pairs) to an EP factory. Currently, the only key supported is `"allow_virtual_devices"`, which indicates to the EP factory that creating virtual devices is allowed. When the application registers a plugin EP library, ORT creates the library's EP factories and checks if they implement the `SetEnvironmentOptions` API function. If so, ORT calls `ep_factory.SetEnvironmentOptions` with `"allow_virtual_devices"` set to `"1"` if the EP registration name set by the application ends in the `".virtual"` suffix (or `"0"` otherwise). Here's an example implementation of `OrtEpFactory::SetEnvironmentOptions` taken from a [test plugin EP that supports a virtual GPU](https://github.com/microsoft/onnxruntime/tree/adrianl/plugin-ep-specify-ort-hw-device/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu): ```c++ /*static*/ OrtStatus* ORT_API_CALL EpFactoryVirtualGpu::SetEnvironmentOptionsImpl(OrtEpFactory* this_ptr, const OrtKeyValuePairs* options) noexcept { auto* factory = static_cast<EpFactoryVirtualGpu*>(this_ptr); const char* value = factory->ort_api_.GetKeyValue(options, "allow_virtual_devices"); if (value != nullptr) { factory->allow_virtual_devices_ = strcmp(value, "1") == 0; } return nullptr; } ``` An EP factory can create a virtual hardware device within `OrtEpFactory::GetSupportedDevices` by using a new API function called `CreateHardwareDevice`. The EP factory is expected to own the hardware device instance, which should be released when the factory is destroyed via `ReleaseHardwareDevice`. The [test plugin EP shows an implementation](https://github.com/microsoft/onnxruntime/blob/d87f8b86406525f5801a7a9933b1ced1eb40940c/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep_factory.cc#L86) of `OrtEpFactory::GetSupportedDevices` that creates a virtual GPU device. ```c++ /*static*/ OrtStatus* ORT_API_CALL EpFactoryVirtualGpu::GetSupportedDevicesImpl(OrtEpFactory* this_ptr, const OrtHardwareDevice* const* /*devices*/, size_t /*num_devices*/, OrtEpDevice** ep_devices, size_t max_ep_devices, size_t* p_num_ep_devices) noexcept { size_t& num_ep_devices = *p_num_ep_devices; auto* factory = static_cast<EpFactoryVirtualGpu*>(this_ptr); num_ep_devices = 0; // Create a virtual OrtHardwareDevice if application indicated it is allowed (e.g., for cross-compiling). // This example EP creates a virtual GPU OrtHardwareDevice and adds a new OrtEpDevice that uses the virtual GPU. if (factory->allow_virtual_devices_ && num_ep_devices < max_ep_devices) { OrtKeyValuePairs* hw_metadata = nullptr; factory->ort_api_.CreateKeyValuePairs(&hw_metadata); factory->ort_api_.AddKeyValuePair(hw_metadata, kOrtHardwareDevice_MetadataKey_IsVirtual, "1"); auto* status = factory->ep_api_.CreateHardwareDevice(OrtHardwareDeviceType::OrtHardwareDeviceType_GPU, factory->vendor_id_, /*device_id*/ 0, factory->vendor_.c_str(), hw_metadata, &factory->virtual_hw_device_); // ... OrtEpDevice* virtual_ep_device = nullptr; status = factory->ort_api_.GetEpApi()->CreateEpDevice(factory, factory->virtual_hw_device_, ep_metadata, ep_options, &virtual_ep_device); // ... ep_devices[num_ep_devices++] = virtual_ep_device; ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…lues early (microsoft#26345) ### Description Converts weights early and revert "Properly remove in-memory references (microsoft#25652)" This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code. This PR is made possible and on the heels of: microsoft#26263 microsoft#25833. Previous history: microsoft#23979 microsoft#25320 microsoft#25626 microsoft#25652 The first change (microsoft#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time. Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this: **Before** <img width="1204" height="124" alt="Before change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6" /> **After** <img width="997" height="114" alt="After change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f" /> The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb) After this change corresponding numbers look 3.5Gb and 4.7Gb respectively. Most of the savings during optimization phase come from `ConstantFolding` where we are able to reuse the resulting OrtValues directly for the new initializers. This PR concludes a series of PRs converting initializers to OrtValues. Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state. <img width="1175" height="139" alt="image" src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04" /> The model also loads about 12 seconds faster. Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers. <img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding Transpose Optimizer" src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40" /> <img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from ConstantFolding" src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e" /> <img width="325" height="160" alt="image" src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc" /> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage.
…ovider-bridge API (microsoft#26448) ### Description There is a memory leak whenever an EP uses the provider-bridge API to create a `std::unique_ptr<onnx::TypeProto>`. The `onnx::TypeProto` is not properly deleted due to a missing `operator delete()` override for the `TypeProto` wrapper class. This delete operator override is necessary because the onnx library may use custom allocators. Affected EPs: Run `git grep -rn "TypeProto::Create()" onnxruntime/core/providers/` - QNN EP: Happens when QNN EP creates an EPContext model. To reproduce, run the `onnxruntime_provider_tests` with `--gtest_filter=*ContextBinary*`. - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/qnn/builder/onnx_ctx_model_helper.cc#L73. - OpenVINO EP: Happens during QDQ stripping: - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L76 - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L473 - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L654 - NV EP: - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_helper.cc#L213 - VitisAI EP: - https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/vitisai/imp/node_arg.cc#L94 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…microsoft#26439) ### Description Fixes microsoft#26294 When using the old model compilation approach (session option configuration), ORT should verify that the generated output model does not already exist. Importantly, this check should be done _before_ calling an EP's compile() method. This PR fixes this check, which was unintentionally disabled by a [previous PR.](microsoft#25455). Note that this check also (correctly) happens _after_ calling the EP's compile() method, but it is better to catch it early if we can. ### Motivation and Context Fixes a regression in the older compilation workflow.
### Description This change fixes a bug that causes crash on macOS (and also potentially other platforms using libc) at `OrtReleaseEnv`. Instead of using static variables, now they are function local static so that compiler can handle the destruction order correctly. ### Motivation and Context Fixes microsoft#24579 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description Adds `Elu`, `Exp` and `Softplus` to coreml. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We use them in LeelaChessZero. --------- Co-authored-by: borg323 <borg323@users.noreply.github.com>
### Description <!-- Describe your changes. --> .NET MAUI 8 is out of support. See here: https://dotnet.microsoft.com/en-us/platform/support/policy/maui We started seeing errors about this in the NuGet packaging pipeline. ``` ##[error]C:\Program Files\dotnet\sdk\9.0.306\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.EolTargetFrameworks.targets(38,5): Error NETSDK1202: The workload 'net8.0-ios' is out of support and will not receive security updates in the future. Please refer to https://aka.ms/maui-support-policy for more information about the support policy. ``` This change updates net8.0 mobile target framework monikers to net9.0. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline.
### Description - Use total inference time instead of the submission time for output statistics calculation ### Motivation and Context - The min, max, and other statistics reported for inference were using device submission time instead of the inference time. @ishwar-raut1 @gaugarg-nv @thevishalagarwal @umangb-09 @gedoensmax
1. Add safeint header file application path to MLAS 2. Fix syntax errors in sqnbitgem and lasx
Fixing dump model ops feature for MIGraphX EP on Windows. The feature wasn't functional because of saving format rules on Windows which are opposite from Linux. Current state of the feature gives us opportunity to generate and save onnx model after subgraph optimizations before compiling it. On this way we can look how model graph looks like after optimizations and we can use the optimized model. --------- Co-authored-by: Uros Petkovic <urpektov@amd.com>
add support for bias and weight_index, move subgroup_matrix_matmul_nbits to template and make program callable from other ops. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ankitm3k
approved these changes
Nov 3, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.
Motivation and Context