Skip to content

Conversation

tianleiwu and others added 16 commits February 12, 2026 10:35
Some PRs that use core/common/inlined_containers.h can cause failures in
the CUDA CI pipeline.

```
E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/hash/internal/hash.h(481): error #68-D: integer conversion resulted in a change of sign [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
          sizeof(T) == -1,
                       ^
  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/hash/hash.h(337): error #549-D: variable "s" is used before its value is set [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
        return s;
               ^
E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/container/internal/raw_hash_set.h(468): error #69-D: integer conversion resulted in truncation [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
          static_cast<uint16_t>(reinterpret_cast<uintptr_t>(&seed));
                      ^
  3 errors detected in the compilation of "E:/_work/onnxruntime/onnxruntime/onnxruntime/contrib_ops/cuda/sparse/block_mask.cu".
```

This change adds a patch to Abseil to mitigate those failures.


This solution has been verified to be effective in PR
#27087.
BUG #27068

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description
Enabling 64bit udma mode for device architecture v81 or more



### Motivation and Context
Support 64bit udma mode to run model efficiently on htp target v81 or
above
### Description
Re-use weight files and their underlying memory maps across shared
contexts.

### Motivation and Context
This reduces resident memory when different ep shared context sets
reference the same weight file.

Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
…entations (#27213)

### Description

WebGPU EP's ConvTranspose operator failed to properly validate bias
tensor shape in both TypeScript and C++ implementations. Undefined
`group` attribute caused NaN in validation checks, allowing invalid bias
tensors to pass.

**TypeScript Changes**
(`js/web/lib/wasm/jsep/webgpu/ops/conv-transpose.ts`):

- **Parse time default**: Set `group` to 1 when undefined (line 135 in
`parseConvTransposeAttributes`)
  ```typescript
  const group = (attributes.group as number) ?? 1; // per ONNX spec
  ```

- **Enhanced bias validation** (lines 182-192 in `validateInputs`):
  - Check bias is 1D before accessing dimensions
  - Validate bias size matches output channels: `weight.dims[1] * group`
  - Descriptive errors showing actual vs expected values
  ```typescript
  if (inputs.length === 3) {
    if (inputs[2].dims.length !== 1) {
      throw new Error('invalid bias: bias must be 1D tensor');
    }
    const featureMaps = inputs[1].dims[1] * attributes.group;
    if (inputs[2].dims[0] !== featureMaps) {
      throw new Error(
`invalid bias: bias size (${inputs[2].dims[0]}) must be equal to output
channels (${featureMaps})`,
      );
    }
  }
  ```

**C++ Changes**
(`onnxruntime/core/providers/webgpu/nn/conv_transpose.cc`):

- **Added bias validation** (lines 61-71 in `ComputeInternal`):
  - Validates bias is 1D tensor
- Validates bias size matches output channels (`num_output_channels =
group * filter_shape[1]`)
  - Uses consistent error messages with TypeScript implementation
  ```cpp
  // Validate bias shape if provided
  if (has_bias) {
    const auto& bias_shape = bias->Shape();
    if (bias_shape.NumDimensions() != 1) {
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid bias:
bias must be 1D tensor");
    }
    if (bias_shape[0] != num_output_channels) {
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid bias:
bias size (", bias_shape[0],
") must be equal to output channels (", num_output_channels, ")");
    }
  }
  ```

**Code Formatting**:
- Applied prettier formatting to ensure TypeScript code adheres to
project style guidelines (120 character line width, proper line breaks
for long error messages)

### Motivation and Context

Addresses issue where tests with intentionally invalid bias shapes were
incorrectly passing in the WebGPU EP. The fix ensures:
- Invalid bias shapes are properly rejected in both TypeScript and C++
implementations
- NaN bugs prevented across all code paths using `group` attribute in
TypeScript
- Clear error messages for debugging
- Consistent validation logic across both WebGPU backend implementations
- Code passes all linting and formatting checks

Note: The C++ implementation already handles `group` attribute
defaulting to 1 in the ConvAttributes base class, so only bias
validation needed to be added.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>[Web] WebGPU EP's ConvTranspose input validation seems
loose</issue_title>
> <issue_description>### Describe the issue
> 
> As title.
> 
> The WebGPU EP's ConvTranspose operator neglects to check if the bias
is of the expected shape. See tests added in
#27209. The WebGPU EP
"passes" those tests when a failure of some sort is expected (preferably
along the lines of bias is not of the expected shape). Not sure if this
is masking a bug of some sort.
> 
> ### To reproduce
> 
> Run tests in #27209 with
the WebGPU EP
> 
> ### Urgency
> 
> Not urgent
> 
> ### ONNX Runtime Installation
> 
> Built from Source
> 
> ### ONNX Runtime Version or Commit ID
> 
> Run tests in PR branch
#27209
> 
> ### Execution Provider
> 
> 'webgpu' (WebGPU)</issue_description>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes #27210

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: guschmue <22941064+guschmue@users.noreply.github.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR adds the frameworkName field to critical Windows ML telemetry
events to ensure proper event attribution and prevent data loss.

The frameworkName field is added to ensure that Windows ML events are
not lost and do not require joins with events that might have been
emitted outside the scope of the time span the processing scripts check
for long-running apps/processes. This allows each event to be
self-contained with framework identification.

The following telemetry events now include the frameworkName field:

1. **SessionCreationStart** - Logs when session creation begins
2. **SessionCreation** - Logs session creation details including model
metadata
3. **RuntimeError** - Logs runtime errors (both DEBUG and release
builds)
4. **RuntimePerf** - Logs runtime performance metrics including total
runs and duration
5. **AutoEpSelection** - Logs automatic execution provider selection
policy and results
6. **ProviderOptions** - Logs execution provider configuration options

All events now include TraceLoggingString(ORT_CALLER_FRAMEWORK,
"frameworkName") to maintain consistent framework identification across
the telemetry pipeline.

---------

Co-authored-by: Angela Serrano Brummett <angelser@microsoft.com>
This change extends CUDA architecture handling to support
family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning
with updates made to Triton Inference Server repositories (backend and
onnxruntime_backend).

Changes:
1. Added CUDAARCHS environment variable support (standard CMake
variable)
   - Allows users to override architecture list via environment variable
   - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set

2. Extended regex patterns to recognize family code suffix 'f'
- Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
   - Preserves 'f' suffix during parsing phase

3. Updated normalization logic to handle family codes
- Family codes (ending with 'f') preserved without adding -real suffix
   - Traditional codes continue to receive -real or -a-real suffixes
   - Architecture-specific codes (with 'a') remain unchanged

4. Extended architecture support lists
   - Added SM 110 to ARCHITECTURES_WITH_KERNELS
   - Added SM 110 to ARCHITECTURES_WITH_ACCEL

Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward
compatibility within a GPU family. For example, 100f runs on CC 10.0,
10.3, and future 10.x devices, using features common across the family.

Usage examples:
- CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
- cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f"
..
- python build.py --cmake_extra_defines
CMAKE_CUDA_ARCHITECTURES="100f;110f"

The implementation supports mixed formats in the same list:
- Traditional: 75-real, 80-real, 90-real
- Architecture-specific: 90a-real (CC 9.0 only)
- Family-specific: 100f, 110f, 120f (entire family)

Note: Current defaults still use bare numbers (75;80;90;100;120) which
normalize to architecture-specific codes with 'a' suffix. Users who want
family-specific behavior should explicitly use the 'f' suffix via
CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.

References:
- NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
- Triton Inference Server backend updates (commit f5e901f)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description

ArrayFeatureExtractor was vulnerable to out-of-bounds reads when
provided negative indices. The bounds check only validated upper bounds
(`y_data[i] >= stride`) but not lower bounds, allowing negative values
to read arbitrary heap memory.

**Changes:**
- Added negative index validation in `array_feature_extractor.cc` line
76: `y_data[i] < 0 || y_data[i] >= stride`
- Updated error message to clarify valid range: `must be in [0, stride)`
- Added test case `InvalidInputNegativeY` to verify rejection of
negative indices

**Example exploitation:**
```python
# Previously allowed, causing heap leak
y_data = np.array([-10], dtype=np.int64)
results = session.run(["z"], {"x": x_data, "y": y_data})  # Reads unintended memory
```

Now returns `INVALID_ARGUMENT` with diagnostic message.

### Motivation and Context

Security vulnerability allowing heap memory disclosure through negative
index values bypassing bounds validation. The operator accesses
`x_data[y_data[j]]` at line 98 without ensuring `y_data[j] >= 0`.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>Out-of-Bounds Read Leading to Heap Leak</issue_title>
> <issue_description>The vulnerability being exploited is a heap leak
caused by an out-of-bounds read in ONNX Runtime’s ArrayFeatureExtractor
operator. The root cause is insufficient bounds checking on the index
input, allowing negative values to access unintended memory regions.
> 
> POC: Files shows code and code output
> 
> Per Copilot:&nbsp;
> Type: Out-of-bounds read (OOB read) in ONNX Runtime’s
ArrayFeatureExtractor operator
> Affected Version: ≤ 1.23.2 (latest at time of report)
> Root Cause:
> In the file
onnxruntime/core/providers/cpu/ml/array_feature_extractor.cc, the code
checks if y_data[i] &lt;= stride (where stride is the total length), but
does not check if y_data[i] &gt;= 0.
> This means a negative index can be used, causing an out-of-bounds read
and leaking heap memory values.
> 
> Example: Supplying a negative value in y_data (e.g., y_data = [-10])
bypasses bounds checking and reads unintended memory, exposing heap
data.
> 
> 
> FINDERS Notes ------------
> 
> Detailed information is in the attachment, which includes complete
steps to reproduce the problem.
> Detailed information is in the attachment, which includes complete
steps to reproduce the problem.
> 
> Save the model
> ```
> import numpy as np
> import onnx
> from onnx import helper, TensorProto, checker
> 
> x_shape = [ 10,1]
> x_dtype = TensorProto.INT64
> 
> y_shape = [1]
> y_dtype = TensorProto.INT64
> 
> z_dtype = TensorProto.INT64
> z_shape = [ 10,1]
> 
> node = helper.make_node(
> op_type="ArrayFeatureExtractor",
> inputs=["x", "y"],
> outputs=["z"],
> domain="ai.onnx.ml"
> )
> 
> input_x = helper.make_tensor_value_info(
> "x", x_dtype, x_shape
> )
> 
> input_y = helper.make_tensor_value_info(
> "y", y_dtype, y_shape
> )
> 
> output_z = helper.make_tensor_value_info(
> "z", z_dtype, z_shape
> )
> 
> graph = helper.make_graph(
> nodes=[node],
> name="ArrayFeatureExtractor_Test",
> inputs=[input_x, input_y],
> outputs=[output_z]
> )
> 
> 
> opset_imports = [
> helper.make_opsetid("", 15),
> helper.make_opsetid("ai.onnx.ml", 3),
> ]
> 
> model = helper.make_model(
> graph,
> opset_imports=opset_imports,
> producer_name="onnx-example"
> )
> 
> 
> onnx.save(model, "array_feature_extractor_manual.onnx")
> ```
> 
> Load the model
> ```
> import onnxruntime as ort
> import numpy as np
> session = ort.InferenceSession("array_feature_extractor_manual.onnx",
providers=["CPUExecutionProvider"])
> 
> 
> x_data = np.arange(10, dtype=np.int64).reshape( 10,1)
> 
> 
> y_data = np.array([-10], dtype=np.int64)
> 
> print(x_data)
> print("?? Index:", y_data)
> 
> 
> results = session.run(
> ["z"],
> {"x": x_data, "y": y_data}
> )
> 
> z_output = results[0]
> 
> print(z_output)
> ```</issue_description>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes #27265

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hariharans29 <9969784+hariharans29@users.noreply.github.com>
## Description

User reported build error in
#27269.

This PR addresses several build issues and compilation warnings in the
CUDA provider and associated contrib ops. These fixes ensure a clean
build and improved compatibility with different CUDA versions
(specifically CUDA 13.1) and compilers.

## Changes

### 1. Fix ShardedMoE Compilation Error
- Resolved a "no matching function for call to CheckInputs" error in
sharded_moe.cc
- Updated the `moe_helper::CheckInputs` call to provide the required
`zero_points` arguments (passing `nullptr`), aligning with the updated
function signature.

### 2. Suppress CUDA 13.1 System Header Warnings
- Added GCC/Clang diagnostic pragmas to suppress `-Wunused-parameter`
warnings in `cuda_fp4.h`.
- These warnings were causing build failures in environments where
warnings are treated as errors.
- Affected files:
    - onnxruntime/core/providers/cuda/cuda_common.h
    - onnxruntime/core/providers/cuda/cuda_type_conversion.h
    - onnxruntime/contrib_ops/cuda/llm/cutlass_type_conversion.h

### 3. Resolve Sign-Comparison Warnings
- Fixed several `-Wsign-compare` warnings that were being treated as
errors:
- **Pad Op:** Changed loop variable type to `size_t` in
onnxruntime/core/providers/cuda/tensor/pad.cc.
- **Distributed Reshape:** Added explicit casts to `size_t` for
`int64_t` comparisons in
onnxruntime/contrib_ops/cuda/collective/distributed_reshape.cc.

## Verification
- The build now completes successfully without errors or warnings using
`--cmake_extra_defines onnxruntime_USE_NCCL=ON`
- Builds tested with cuda 12.8, 13.0 and 13.1.1
This PR resolves flakiness and accuracy issues in the
`MatMulNBitsLutGemm` operator.

## Root Cause Analysis

The `MatMulNBitsLutGemm` operator exhibited non-deterministic flakiness
and numerical accuracy issues. This analysis covers the root causes
addressed by the changes.

## Identified Root Causes

### 1. Data Race in
[LutGemmPackQuantBData](https://github.com/microsoft/onnxruntime/blob/cee825d34d533ca325bfd8f8269c86133ae512e6/onnxruntime/core/mlas/lib/qlutgemm.cpp#L166-L295)
- **Issue**: The weight packing loop was parallelized across output
features ($N$). Since T-MAC packs multiple features into a single byte,
concurrent updates to the same byte caused bit-level corruption.
- **Fix**: Serialized the sub-byte accumulation phase of the weight
packing process.

### 2. Thread-Safety in Global Configuration Map
- **Issue**: `tmac_kernel_configs` (a static `std::unordered_map`) was
accessed concurrently. Map insertions or rehashing during initialization
could invalidate references held by other threads.
- **Fix**: Added `std::mutex` protection and modified the parameter
getter to return by value.

### 3. Tiling Dimension Mismatch and Buffer Safety
- **Issue**: The orchestrator used batch size ($M$) for kernel
configuration, while weights are tiled by features ($N$). Additionally,
the kernel lacked clamping for partial tiles, leading to potential
overruns.
- **Fix**: Synchronized tiling logic by using $N$ for initialization,
passing `TotalN` for parameter retrieval, and implementing explicit
clamping and tail-case handling in the AVX2 kernel.

### Verification Results
- `MatMulNBitsLutGemm.Float32_2Bits_Asymmetric_Batch32_256x256` passed
100 consecutive iterations.
- Full MatMul2Bits suite passed all 10 tests with standard **0.15f**
tolerance.
### Description

This PR restores Java support on macOS arm64 and fixes Jar testing
failures on the new AcesShared pool.

#### Background

Commit `5ed340f7a51f3cbdb62577a874daf2b3f23d6a93`
(#26252) moved macOS builds
to a faster pool (AcesShared) which reduced build time by 85%, but this
pool doesn't have JDK installed and ADO's `JavaToolInstaller` doesn't
support macOS. As a result, Java binaries for macOS arm64 were
temporarily removed.

#### Changes

1. Enable Java Builds & Tests on macOS ARM64:
* Install JDK 17: Added a script to install JDK 17 via Homebrew if
missing on the agent.
* Install Maven: Added a fallback to install Maven using curl (since
wget is missing on macOS) and configured it to use the * dynamically
resolved JAVA_HOME.
* Pipeline Updates: Updated jar_package_testing.yml and
final-jar-testing-linux.yml to run correctly on AcesShared.
2. Fix C API Tests on macOS ARM64:
* Pool Migration: Updated c-api-noopenmp-test-pipelines.yml to use
AcesShared with the correct ImageOverride.
* Template Enhancements: Updated nuget/templates/test_macos.yml to
support dynamic AgentPool and PoolDemands.
* Fix Missing Artifact: Modified mac-cpu-packaging-steps.yml to
explicitly copy libcustom_op_library.dylib into the testdata folder of
the artifact, resolving DllNotFoundException in EndToEndTests.

### Motivation and Context

To ensure robust CI coverage for macOS ARM64 (Apple Silicon) for both
Java and C APIs effectively using the efficient AcesShared pool.

### Testing

- Final_Jar_Testing_MacOS passed: 

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=1081961&view=logs&j=f1f8e11e-a9fa-53e5-cd29-3ba2c1988550&t=f4fafe98-de38-519c-0045-d220f6898d47
### Description
Adds arm64 windows python packages to the build


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull-request addresses a few issues with the
Microsoft.ML.OnnxRuntime.Foundry:

- Builds arm64 as opposed to previous arm64ec for windows arm64.
- Signs the nuget package.
- Updates target props by checking if onnxruntime.dll exists before
attempting to copy. This is a bugfix where if one tries to install any
non arm64 package on an arm64 machine (for example when one uses
Microsoft.ML.OnnxRuntime.Gpu on windows arm64) it always tries to copy
the win-arm64 onnxruntime.dll which does not exist.
- Takes a dependency on Microsoft.ML.OnnxRuntime.Gpu.Linux for the
foundry package.
## Summary

This PR addresses persistent native library loading issues in the ONNX
Runtime NuGet package, specifically on macOS and Linux, by implementing
a robust DllImportResolver. It also includes necessary pipeline and
packaging adjustments to ensure required macOS artifacts are correctly
located and validated during CI.

## Problem
#27263 reports that
`Unable to load shared library 'onnxruntime.dll' or one of its
dependencies`. It was caused by
#26415 since the commit
hard-coded onnxruntime.dll even for Linux and MacOS (The correct
filename shall be libonnxruntime.so for Linux, and libonnxruntime.dylib
for MacOS).

The Nuget test pipeline has been broken for a while, so we also need fix
the pipeline to test our change. It has the following issues:
* MacOS nuget is for arm64, but the vmImage `macOS-15` is x64. 
* MacOS nuget test need libcustom_op_library.dylib, but it is not copied
from artifacts to test environment.
* MacOS artifact contains libonnxruntime.dylib and
libonnxruntime.1.24.1.dylib, where libonnxruntime.dylib is symlink. It
causes issue since the later is excluded by nuspec.
* MacOS nuget test use models from onnx repo. However, latest onnx has
some models with data types like float8 that are not supported by C#, so
those model test failed.
* Linux nuget test uses a docker Dockerfile.package_ubuntu_2404_gpu, but
docker build failed due to libnvinfer-headers-python-plugin-dev and
libnvinfer-win-builder-resource10 version.

## Changes

### 1. Robust C# DLL Resolution

The DllImportResolver has been enhanced to handle various deployment
scenarios where standard .NET resolution might fail:

- **Platform-Specific Naming**: Maps extension-less library names
(`onnxruntime`, `ortextensions`) to appropriate filenames
(`onnxruntime.dll`, `libonnxruntime.so`, `libonnxruntime.dylib`) based
on the OS.
- **Multi-Stage Probing**:
1. **Default Loading**: Attempts `NativeLibrary.TryLoad` with the mapped
name.
2. **NuGet `runtimes` Probing**: If the above fails, it probes the
`runtimes/{rid}/native/` subdirectories relative to the assembly
location, covering common RIDs (`win-x64`, `linux-arm64`, `osx-arm64`,
etc.).
3. **Base Directory Fallback**: As a final attempt, it looks in
`AppContext.BaseDirectory`.
- **Case-Sensitivity Handling**: Ensures lowercase extensions are used
on Windows to prevent lookup failures on case-sensitive filesystems.

### 2. macOS CI/Packaging Improvements

- **Templates (test_macos.yml)**:
    - Updated to extract artifacts from TGZ files.
- Ensures `libcustom_op_library.dylib` is placed in the expected
location (`testdata/testdata`) for end-to-end tests.
    - Initializes the ONNX submodule to provide required test data.
- **Node.js**:
- Restored the Node.js macOS test stage in
c-api-noopenmp-test-pipelines.yml, configured to run on the ARM64 pool
(`AcesShared`).
- Updated test_macos.yml template to support custom agent pools (similar
to the NuGet template).
- **Pipeline Config**: Adjusted agent pool selection and demands for
macOS jobs to ensure stable execution.
- **Binary Robustness**: The `copy_strip_binary.sh` script now ensures
`libonnxruntime.dylib` is a real file rather than a symlink, improving
NuGet packaging reliability.

### 3. Test Refinements

- **Inference Tests**: Skips a specific set of pretrained-model test
cases on macOS that are currently known to be flaky or unsupported in
that environment, preventing noise in the CI results.

## Verification

### Pipelines
- [x] Verified in `NuGet_Test_MacOS`.
- [x] Verified in `NuGet_Test_Linux`.
- [x] Verified in Windows test pipelines.

### Net Effect
The C# bindings are now significantly more resilient to different
deployment environments. The CI process for macOS is also more robust,
correctly handling the artifacts required for comprehensive NuGet
validation.
…nels (#27176)

### Description
Updates the `BaseTester` class used by the `onnxruntime_provider_test`
tool to support plugin EPs that use a kernel registry but compile other
nodes. For example, TRT EP only uses registered kernels for Memcpy*
nodes, but compiles every other node.

Without this change, plugin EPs that use a mix of compiled nodes and
registered kernels cannot be tested with `onnxruntime_provider_test`.



### Motivation and Context
Fix #27125 

It does fix the build issue on Linux, but I am not entirely sure whether
this is the optimal fix.
edgchen1
edgchen1 previously approved these changes Feb 12, 2026
baijumeswani
baijumeswani previously approved these changes Feb 12, 2026
This change records the service name(s), if any, as part of the
SessionCreation/ProcessInfo events.
We cache the service names after the first time we calculate them in
order to avoid unnecessary overhead.

These changes enable deeper understanding of ORT usage, since multiple
services can run inside an application in svchost, which currently
obscures our understanding of which services/use cases are most popular.
Understanding which services are actually being used can help prioritize
more investments in making ORT better targeted to end users.

Have tested that the logic in GetServiceNamesForCurrentProcess can
accurately return service name for a given process
@titaiwangms titaiwangms self-requested a review February 12, 2026 23:58
@tianleiwu tianleiwu merged commit f34d11d into rel-1.24.2 Feb 13, 2026
95 of 101 checks passed
@tianleiwu tianleiwu deleted the tlwu/rel-1.24.2_cherry_pick_round_1 branch February 13, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.