Skip to content

Conversation

@Jaswanth51
Copy link

Description

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

Motivation and Context

hans00 and others added 30 commits October 26, 2025 23:46
### Description
<!-- Describe your changes. -->

To fix build error.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Currently will build fail.

```
CMake Error at /Users/runner/work/onnxruntime-custom/onnxruntime-custom/tvOS/RelWithDebInfo/_deps/flatbuffers-src/CMakeLists.txt:636 (install):
-- CMAKE_CXX_FLAGS: -DNDEBUG -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -O3 -pipe -g -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -DCPUINFO_SUPPORTED
  install TARGETS given no BUNDLE DESTINATION for MACOSX_BUNDLE executable
  target "flatc".
```
… EP python wheel (microsoft#26115)

### Description
- CUDA Runtime added as a dependency for NV EP python wheel
- TRT RTX dlls and license copied to wheel as public wheels for TRT RTX
are currently unavailable
-  Use onnxruntime.preload_dlls() to load CUDA Runtime DLL
- python package name for NV EP is onnxruntime-trt-rtx


### Motivation and Context
Enables out-of-the-box usage for the NV TensorRT RTX EP python wheel


@gaugarg-nv @thevishalagarwal @gedoensmax @ishwar-raut1
### Description
<!-- Describe your changes. -->
Picked other models instead of those removed.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Measure
Upgrade cpuinfo to a newer version to maintain it up to date.
### Description

Reduce the time blocked waiting for the shader to be compiled.

### Motivation and Context

Try to optimize the responsiveness of the application when running
ort-web in main thread. See microsoft#25882
…soft#26400)

### Description
This PR fused GeneratePositionIDs into FusedQKRotaryEmbedding which can
reduce one kernel call.

### Motivation and Context
Previously, for GQA, the processing flow was:
`SplitPackedQKVProgram -> GeneratePositionIDs -> FusedQKRotaryEmbedding
-> FlashAttention`

After this change, the pipeline becomes:
`SplitPackedQKVProgram -> FusedQKRotaryEmbedding -> FlashAttention`

on NV5080, the token generation speed improved ~4%(128tps->133tps)
…er support (microsoft#26394)

### Description
Currently the files which are needed for wasm aren't exported from the
package at all.
The files in question are:
```
ort-wasm-simd-threaded.wasm
ort-wasm-simd-threaded.jsep.wasm
ort-wasm-simd-threaded.asyncify.wasm
ort-wasm-simd-threaded.mjs
ort-wasm-simd-threaded.jsep.mjs
ort-wasm-simd-threaded.asyncify.mjs
```

This PR changes that and adds them to `exports` field in the
`package.json`.

### Motivation and Context

Bundlers like `webpack` use the `copyPlugin` to move those files into
the `public` directory, so the files can be accessed by a stable url.
However more advanced and "state of the art" bundlers like `vite` are
able to [import asset urls
directly](https://vite.dev/guide/assets.html#explicit-url-imports). Vite
takes the asset, moves it to to public assets folder (possibily renames
the asset and adds a hash etc.). The imported value then is the bundled
assets final url.

Those urls can then be used in the `env.wasm.wasmPaths` directly. In
vites case the full code example is:
```js
import wasmUrl from 'onnxruntime-web/ort-wasm-simd-threaded.wasm?url';
import mjsUrl from 'onnxruntime-web/ort-wasm-simd-threaded.mjs?url';

env.wasm.wasmPaths = {
  wasm: wasmUrl,
  mjs: mjsUrl,
};
```

With those added exports we can leverage more of the bundlers
capabilities and in vites case there isn't any need to add any
additional configs. It would just work.

When importing we also get proper suggestions:
<img width="1604" height="498" alt="imports"
src="https://github.com/user-attachments/assets/2678ccc2-ae46-4289-aa6e-607ecbc5388b"
/>


----

I would like additional tests to ensure that the exports are available,
but I couldn't make the `e2e` tests work on my system. I would
appreciate some guidance on that topic.
### Description
- Updates the `ORT_API_VERSION` value in onnxruntime_c_api.h to `24`.
- Edit documentation for the `TensorTypeAndShape_HasShape` API function
to indicate the correct API version (1.24).
- Adds `static_assert` to ensure that API functions for 1.23 are not
added or removed.



### Motivation and Context
The version of ORT was previously updated to 1.24.0 but we forgot to
update `ORT_API_VERSION`.
### Description
[js] Upgrade ESLint from v8 to v9 for subfolder /js/


### Motivation and Context
Previously, model_proto was passed by name, which triggered a copy
constructor call instead of move construction. Using
std::move(model_proto) ensures that the object is constructed via move
semantics, reducing unnecessary memory allocation and copy overhead.

Co-authored-by: liumingyue <mingyue@xilinx.com>
### Description
<!-- Describe your changes. -->

Change Xcode version from 16.4 to 16.2, according to what's available on
the build agent image.

Adjust casing in `xcodebuild -destination` argument.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix iOS packaging pipeline issues.
…26422)

### Description
Use MayInPlace hint for the FusedConv NCHW CPU kernel that allows for
the allocation planner to re-use the optional "sum" input buffer as the
kernel's output buffer. It will potentially save [this
copy](https://github.com/microsoft/onnxruntime/blob/06004826cc99dd8b8b92dbf000db3d3525716f22/onnxruntime/core/providers/cpu/nn/conv.cc#L209)
if the buffers can be shared (in most cases the buffers can be shared)



### Motivation and Context
The kernel already had logic to save the copy but the kernel def was
missing the hint to the allocation planner.

Found while investigating improvements to a Conv model's performance on
ARM64
### Description

Add unidirectional support for MHA. Also updated `GetCapability` to make
sure only run Attention on WebGPU under certain conditions
This pull request enables conditionally register GQA with
total_sequence_length on gpu or not. It resolves the issue that a
MemcpyToHost is generated when graph capture is enabled (refer to
microsoft#25868). This is the last functionality part to support graph capture in
webgpu ep in ORT.

The main changes ensure that when graph capture is enabled, sequence
length information is read from GPU buffers instead of CPU memory, and
shader code generation adapts accordingly. This enables more efficient
execution and compatibility with graph-captured models.

In this PR, we still get total sequence length from `seqlen_k` tensor
not `total_seqlen_tensor` tensor to keep consistent with other parts. In
the next PR, we can refactor all places to directly use
`total_seqlen_tensor` instead of `seqlen_k` when graph capture enabled.
ONNX's ScatterND and ScatterElements limit their indices input to int64,
but some WebNN backends only support int32 indices. As a workaround for
such backends, we can insert a Cast operation to convert the data type.
### Description
<!-- Describe your changes. -->

We don't want to adjusted the dispatch size when we try to run `Conv1d`.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…#26234)

### Description
Adds APIs to allow a plugin EP to create a virtual `OrtHardwareDevice`
that can be used for model cross-compilation. For example, this allows
an EP to create a compiled model for NPU on a device that does not have
an NPU.

#### Application code
An application must explicitly allow registered plugin EPs to create
virtual devices. This is currently done by using a registration name
that ends in the `".virtual"` suffix. Ex:

```c++
#include "onnxruntime_cxx_api.h"
#include "onnxruntime_ep_device_ep_metadata_keys.h"


const char* ep_registration_name = "my_ep_lib.virtual";  // IMPORTANT: ".virtual" suffix is a signal to EP library
ort_env->RegisterExecutionProviderLibrary(ep_registration_name, "my_ep.dll");

std::vector<Ort::ConstEpDevice> ep_devices = ort_env->GetEpDevices();

// ep_devices includes an OrtEpDevice from "my_ep.dll" that uses a virtual OrtHardwareDevice.
Ort::ConstEpDevice virtual_ep_device = std::find_if(ep_devices.begin(), ep_devices.end(),
                                                    [](Ort::ConstEpDevice& device) {
                                                      return device.EpName() == std::string("MyEpName");
                                                    });

// App can look in HW metadata to check if is virtual
Ort::ConstHardwareDevice virtual_hw_device = virtual_ep_device.Device();
std::unordered_map<std::string, std::string> metadata = virtual_hw_device.Metadata().GetKeyValuePairs();
assert(metadata[kOrtHardwareDevice_MetadataKey_IsVirtual] == "1");

// App can use the virtual OrtEpDevice in a session to, for example, compile a model
// ...
```

#### Plugin EP code
This PR introduces a new _optional_ C API function in the `OrtEpFactory`
struct called `SetEnvironmentOptions` that allows ORT to pass options
(as key/value pairs) to an EP factory. Currently, the only key supported
is `"allow_virtual_devices"`, which indicates to the EP factory that
creating virtual devices is allowed.

When the application registers a plugin EP library, ORT creates the
library's EP factories and checks if they implement the
`SetEnvironmentOptions` API function. If so, ORT calls
`ep_factory.SetEnvironmentOptions` with `"allow_virtual_devices"` set to
`"1"` if the EP registration name set by the application ends in the
`".virtual"` suffix (or `"0"` otherwise).

Here's an example implementation of
`OrtEpFactory::SetEnvironmentOptions` taken from a [test plugin EP that
supports a virtual
GPU](https://github.com/microsoft/onnxruntime/tree/adrianl/plugin-ep-specify-ort-hw-device/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu):
```c++
/*static*/
OrtStatus* ORT_API_CALL EpFactoryVirtualGpu::SetEnvironmentOptionsImpl(OrtEpFactory* this_ptr,
                                                                       const OrtKeyValuePairs* options) noexcept {
  auto* factory = static_cast<EpFactoryVirtualGpu*>(this_ptr);
  const char* value = factory->ort_api_.GetKeyValue(options, "allow_virtual_devices");

  if (value != nullptr) {
    factory->allow_virtual_devices_ = strcmp(value, "1") == 0;
  }

  return nullptr;
}
```

An EP factory can create a virtual hardware device within
`OrtEpFactory::GetSupportedDevices` by using a new API function called
`CreateHardwareDevice`. The EP factory is expected to own the hardware
device instance, which should be released when the factory is destroyed
via `ReleaseHardwareDevice`.

The [test plugin EP shows an
implementation](https://github.com/microsoft/onnxruntime/blob/d87f8b86406525f5801a7a9933b1ced1eb40940c/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep_factory.cc#L86)
of `OrtEpFactory::GetSupportedDevices` that creates a virtual GPU
device.

```c++
/*static*/
OrtStatus* ORT_API_CALL EpFactoryVirtualGpu::GetSupportedDevicesImpl(OrtEpFactory* this_ptr,
                                                                     const OrtHardwareDevice* const* /*devices*/,
                                                                     size_t /*num_devices*/,
                                                                     OrtEpDevice** ep_devices,
                                                                     size_t max_ep_devices,
                                                                     size_t* p_num_ep_devices) noexcept {
  size_t& num_ep_devices = *p_num_ep_devices;
  auto* factory = static_cast<EpFactoryVirtualGpu*>(this_ptr);

  num_ep_devices = 0;

  // Create a virtual OrtHardwareDevice if application indicated it is allowed (e.g., for cross-compiling).
  // This example EP creates a virtual GPU OrtHardwareDevice and adds a new OrtEpDevice that uses the virtual GPU.
  if (factory->allow_virtual_devices_ && num_ep_devices < max_ep_devices) {
    OrtKeyValuePairs* hw_metadata = nullptr;
    factory->ort_api_.CreateKeyValuePairs(&hw_metadata);
    factory->ort_api_.AddKeyValuePair(hw_metadata, kOrtHardwareDevice_MetadataKey_IsVirtual, "1");

    auto* status = factory->ep_api_.CreateHardwareDevice(OrtHardwareDeviceType::OrtHardwareDeviceType_GPU,
                                                         factory->vendor_id_,
                                                         /*device_id*/ 0,
                                                         factory->vendor_.c_str(),
                                                         hw_metadata,
                                                         &factory->virtual_hw_device_);

    // ...

    OrtEpDevice* virtual_ep_device = nullptr;
    status = factory->ort_api_.GetEpApi()->CreateEpDevice(factory, factory->virtual_hw_device_, ep_metadata,
                                                          ep_options, &virtual_ep_device);

    // ...

    ep_devices[num_ep_devices++] = virtual_ep_device;
    
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…lues early (microsoft#26345)

### Description
Converts weights early and revert "Properly remove in-memory references
(microsoft#25652)"
This reverts commit 3ca49d8 and makes
appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
microsoft#26263
microsoft#25833.

Previous history:
microsoft#23979
microsoft#25320
microsoft#25626
microsoft#25652

The first change (microsoft#26263)
allows us to convert initializers to OrtValues early and save lots of
memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks
like this:

**Before**
<img width="1204" height="124" alt="Before change DEBUG 2025-10-16
144819"
src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6"
/>

**After**

<img width="997" height="114" alt="After change DEBUG 2025-10-16 144819"
src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f"
/>

The two peaks represent memory usage at optimization time (8.1Gb before)
and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb
respectively.
Most of the savings during optimization phase come from
`ConstantFolding` where we are able to reuse the resulting OrtValues
directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb
respectively. We are saving almost 6Gb during optimization and 2Gb for
the steady state.
 
 
<img width="1175" height="139" alt="image"
src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04"
/>

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we
duplicate memory for higher peak before Resolve takes care of no longer
used initializers.
<img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding
Transpose Optimizer"
src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40"
/>

<img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from
ConstantFolding"
src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e"
/>

<img width="325" height="160" alt="image"
src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc"
/>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage.
…ovider-bridge API (microsoft#26448)

### Description
There is a memory leak whenever an EP uses the provider-bridge API to
create a `std::unique_ptr<onnx::TypeProto>`. The `onnx::TypeProto` is
not properly deleted due to a missing `operator delete()` override for
the `TypeProto` wrapper class. This delete operator override is
necessary because the onnx library may use custom allocators.

Affected EPs:
Run `git grep -rn "TypeProto::Create()" onnxruntime/core/providers/`

- QNN EP: Happens when QNN EP creates an EPContext model. To reproduce,
run the `onnxruntime_provider_tests` with
`--gtest_filter=*ContextBinary*`.
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/qnn/builder/onnx_ctx_model_helper.cc#L73.
- OpenVINO EP: Happens during QDQ stripping:
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L76
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L473
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc#L654
- NV EP:
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_helper.cc#L213
- VitisAI EP:
-
https://github.com/microsoft/onnxruntime/blob/860d0853a6eefdf19e21b0e9982bde2ffbc8a65d/onnxruntime/core/providers/vitisai/imp/node_arg.cc#L94


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…microsoft#26439)

### Description
Fixes microsoft#26294

When using the old model compilation approach (session option
configuration), ORT should verify that the generated output model does
not already exist. Importantly, this check should be done _before_
calling an EP's compile() method. This PR fixes this check, which was
unintentionally disabled by a [previous
PR.](microsoft#25455).

Note that this check also (correctly) happens _after_ calling the EP's
compile() method, but it is better to catch it early if we can.



### Motivation and Context
Fixes a regression in the older compilation workflow.
### Description

This change fixes a bug that causes crash on macOS (and also potentially
other platforms using libc) at `OrtReleaseEnv`.

Instead of using static variables, now they are function local static so that compiler can handle the destruction order correctly. 


### Motivation and Context

Fixes microsoft#24579

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description
Adds `Elu`, `Exp` and `Softplus` to coreml.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We use them in LeelaChessZero.

---------

Co-authored-by: borg323 <borg323@users.noreply.github.com>
### Description
<!-- Describe your changes. -->

.NET MAUI 8 is out of support. See here:
https://dotnet.microsoft.com/en-us/platform/support/policy/maui

We started seeing errors about this in the NuGet packaging pipeline.
```
##[error]C:\Program Files\dotnet\sdk\9.0.306\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.EolTargetFrameworks.targets(38,5): Error NETSDK1202: The workload 'net8.0-ios' is out of support and will not receive security updates in the future. Please refer to https://aka.ms/maui-support-policy for more information about the support policy.
```

This change updates net8.0 mobile target framework monikers to net9.0.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix packaging pipeline.
### Description
- Use total inference time instead of the submission time for output
statistics calculation


### Motivation and Context
- The min, max, and other statistics reported for inference were using
device submission time instead of the inference time.


@ishwar-raut1 @gaugarg-nv @thevishalagarwal @umangb-09 @gedoensmax
1. Add safeint header file application path to MLAS
2. Fix syntax errors in sqnbitgem and lasx
borg323 and others added 4 commits November 1, 2025 08:49
Fixing dump model ops feature for MIGraphX EP on Windows. The feature
wasn't functional because of saving format rules on Windows which are
opposite from Linux.

Current state of the feature gives us opportunity to generate and save
onnx model after subgraph optimizations before compiling it. On this way
we can look how model graph looks like after optimizations and we can
use the optimized model.

---------

Co-authored-by: Uros Petkovic <urpektov@amd.com>
add support for bias and weight_index, move subgroup_matrix_matmul_nbits
to template and make program callable from other ops.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Jaswanth51 Jaswanth51 requested a review from ankitm3k November 3, 2025 03:38
@ankitm3k ankitm3k merged commit 323cfeb into ovep-develop Nov 3, 2025
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_03112025 branch November 3, 2025 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.