-
Notifications
You must be signed in to change notification settings - Fork 3.7k
ORT 1.24.2 release cherry pick round 1 #27330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+1,258
−485
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some PRs that use core/common/inlined_containers.h can cause failures in the CUDA CI pipeline. ``` E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/hash/internal/hash.h(481): error #68-D: integer conversion resulted in a change of sign [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj] sizeof(T) == -1, ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/hash/hash.h(337): error #549-D: variable "s" is used before its value is set [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj] return s; ^ E:\_work\_temp\build\RelWithDebInfo\vcpkg_installed\x64-windows-static-md\include\absl/container/internal/raw_hash_set.h(468): error #69-D: integer conversion resulted in truncation [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj] static_cast<uint16_t>(reinterpret_cast<uintptr_t>(&seed)); ^ 3 errors detected in the compilation of "E:/_work/onnxruntime/onnxruntime/onnxruntime/contrib_ops/cuda/sparse/block_mask.cu". ``` This change adds a patch to Abseil to mitigate those failures. This solution has been verified to be effective in PR #27087.
BUG #27068 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description Enabling 64bit udma mode for device architecture v81 or more ### Motivation and Context Support 64bit udma mode to run model efficiently on htp target v81 or above
### Description Re-use weight files and their underlying memory maps across shared contexts. ### Motivation and Context This reduces resident memory when different ep shared context sets reference the same weight file. Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
…entations (#27213) ### Description WebGPU EP's ConvTranspose operator failed to properly validate bias tensor shape in both TypeScript and C++ implementations. Undefined `group` attribute caused NaN in validation checks, allowing invalid bias tensors to pass. **TypeScript Changes** (`js/web/lib/wasm/jsep/webgpu/ops/conv-transpose.ts`): - **Parse time default**: Set `group` to 1 when undefined (line 135 in `parseConvTransposeAttributes`) ```typescript const group = (attributes.group as number) ?? 1; // per ONNX spec ``` - **Enhanced bias validation** (lines 182-192 in `validateInputs`): - Check bias is 1D before accessing dimensions - Validate bias size matches output channels: `weight.dims[1] * group` - Descriptive errors showing actual vs expected values ```typescript if (inputs.length === 3) { if (inputs[2].dims.length !== 1) { throw new Error('invalid bias: bias must be 1D tensor'); } const featureMaps = inputs[1].dims[1] * attributes.group; if (inputs[2].dims[0] !== featureMaps) { throw new Error( `invalid bias: bias size (${inputs[2].dims[0]}) must be equal to output channels (${featureMaps})`, ); } } ``` **C++ Changes** (`onnxruntime/core/providers/webgpu/nn/conv_transpose.cc`): - **Added bias validation** (lines 61-71 in `ComputeInternal`): - Validates bias is 1D tensor - Validates bias size matches output channels (`num_output_channels = group * filter_shape[1]`) - Uses consistent error messages with TypeScript implementation ```cpp // Validate bias shape if provided if (has_bias) { const auto& bias_shape = bias->Shape(); if (bias_shape.NumDimensions() != 1) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid bias: bias must be 1D tensor"); } if (bias_shape[0] != num_output_channels) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid bias: bias size (", bias_shape[0], ") must be equal to output channels (", num_output_channels, ")"); } } ``` **Code Formatting**: - Applied prettier formatting to ensure TypeScript code adheres to project style guidelines (120 character line width, proper line breaks for long error messages) ### Motivation and Context Addresses issue where tests with intentionally invalid bias shapes were incorrectly passing in the WebGPU EP. The fix ensures: - Invalid bias shapes are properly rejected in both TypeScript and C++ implementations - NaN bugs prevented across all code paths using `group` attribute in TypeScript - Clear error messages for debugging - Consistent validation logic across both WebGPU backend implementations - Code passes all linting and formatting checks Note: The C++ implementation already handles `group` attribute defaulting to 1 in the ConvAttributes base class, so only bias validation needed to be added. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>[Web] WebGPU EP's ConvTranspose input validation seems loose</issue_title> > <issue_description>### Describe the issue > > As title. > > The WebGPU EP's ConvTranspose operator neglects to check if the bias is of the expected shape. See tests added in #27209. The WebGPU EP "passes" those tests when a failure of some sort is expected (preferably along the lines of bias is not of the expected shape). Not sure if this is masking a bug of some sort. > > ### To reproduce > > Run tests in #27209 with the WebGPU EP > > ### Urgency > > Not urgent > > ### ONNX Runtime Installation > > Built from Source > > ### ONNX Runtime Version or Commit ID > > Run tests in PR branch #27209 > > ### Execution Provider > > 'webgpu' (WebGPU)</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #27210 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: guschmue <22941064+guschmue@users.noreply.github.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR adds the frameworkName field to critical Windows ML telemetry events to ensure proper event attribution and prevent data loss. The frameworkName field is added to ensure that Windows ML events are not lost and do not require joins with events that might have been emitted outside the scope of the time span the processing scripts check for long-running apps/processes. This allows each event to be self-contained with framework identification. The following telemetry events now include the frameworkName field: 1. **SessionCreationStart** - Logs when session creation begins 2. **SessionCreation** - Logs session creation details including model metadata 3. **RuntimeError** - Logs runtime errors (both DEBUG and release builds) 4. **RuntimePerf** - Logs runtime performance metrics including total runs and duration 5. **AutoEpSelection** - Logs automatic execution provider selection policy and results 6. **ProviderOptions** - Logs execution provider configuration options All events now include TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName") to maintain consistent framework identification across the telemetry pipeline. --------- Co-authored-by: Angela Serrano Brummett <angelser@microsoft.com>
This change extends CUDA architecture handling to support family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to Triton Inference Server repositories (backend and onnxruntime_backend). Changes: 1. Added CUDAARCHS environment variable support (standard CMake variable) - Allows users to override architecture list via environment variable - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set 2. Extended regex patterns to recognize family code suffix 'f' - Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families - Preserves 'f' suffix during parsing phase 3. Updated normalization logic to handle family codes - Family codes (ending with 'f') preserved without adding -real suffix - Traditional codes continue to receive -real or -a-real suffixes - Architecture-specific codes (with 'a') remain unchanged 4. Extended architecture support lists - Added SM 110 to ARCHITECTURES_WITH_KERNELS - Added SM 110 to ARCHITECTURES_WITH_ACCEL Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3, and future 10.x devices, using features common across the family. Usage examples: - CUDAARCHS="75;80;90;100f;110f;120f" cmake .. - cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" .. - python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f" The implementation supports mixed formats in the same list: - Traditional: 75-real, 80-real, 90-real - Architecture-specific: 90a-real (CC 9.0 only) - Family-specific: 100f, 110f, 120f (entire family) Note: Current defaults still use bare numbers (75;80;90;100;120) which normalize to architecture-specific codes with 'a' suffix. Users who want family-specific behavior should explicitly use the 'f' suffix via CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES. References: - NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/ - Triton Inference Server backend updates (commit f5e901f) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description
ArrayFeatureExtractor was vulnerable to out-of-bounds reads when
provided negative indices. The bounds check only validated upper bounds
(`y_data[i] >= stride`) but not lower bounds, allowing negative values
to read arbitrary heap memory.
**Changes:**
- Added negative index validation in `array_feature_extractor.cc` line
76: `y_data[i] < 0 || y_data[i] >= stride`
- Updated error message to clarify valid range: `must be in [0, stride)`
- Added test case `InvalidInputNegativeY` to verify rejection of
negative indices
**Example exploitation:**
```python
# Previously allowed, causing heap leak
y_data = np.array([-10], dtype=np.int64)
results = session.run(["z"], {"x": x_data, "y": y_data}) # Reads unintended memory
```
Now returns `INVALID_ARGUMENT` with diagnostic message.
### Motivation and Context
Security vulnerability allowing heap memory disclosure through negative
index values bypassing bounds validation. The operator accesses
`x_data[y_data[j]]` at line 98 without ensuring `y_data[j] >= 0`.
<!-- START COPILOT ORIGINAL PROMPT -->
<details>
<summary>Original prompt</summary>
>
> ----
>
> *This section details on the original issue you should resolve*
>
> <issue_title>Out-of-Bounds Read Leading to Heap Leak</issue_title>
> <issue_description>The vulnerability being exploited is a heap leak
caused by an out-of-bounds read in ONNX Runtime’s ArrayFeatureExtractor
operator. The root cause is insufficient bounds checking on the index
input, allowing negative values to access unintended memory regions.
>
> POC: Files shows code and code output
>
> Per Copilot:
> Type: Out-of-bounds read (OOB read) in ONNX Runtime’s
ArrayFeatureExtractor operator
> Affected Version: ≤ 1.23.2 (latest at time of report)
> Root Cause:
> In the file
onnxruntime/core/providers/cpu/ml/array_feature_extractor.cc, the code
checks if y_data[i] <= stride (where stride is the total length), but
does not check if y_data[i] >= 0.
> This means a negative index can be used, causing an out-of-bounds read
and leaking heap memory values.
>
> Example: Supplying a negative value in y_data (e.g., y_data = [-10])
bypasses bounds checking and reads unintended memory, exposing heap
data.
>
>
> FINDERS Notes ------------
>
> Detailed information is in the attachment, which includes complete
steps to reproduce the problem.
> Detailed information is in the attachment, which includes complete
steps to reproduce the problem.
>
> Save the model
> ```
> import numpy as np
> import onnx
> from onnx import helper, TensorProto, checker
>
> x_shape = [ 10,1]
> x_dtype = TensorProto.INT64
>
> y_shape = [1]
> y_dtype = TensorProto.INT64
>
> z_dtype = TensorProto.INT64
> z_shape = [ 10,1]
>
> node = helper.make_node(
> op_type="ArrayFeatureExtractor",
> inputs=["x", "y"],
> outputs=["z"],
> domain="ai.onnx.ml"
> )
>
> input_x = helper.make_tensor_value_info(
> "x", x_dtype, x_shape
> )
>
> input_y = helper.make_tensor_value_info(
> "y", y_dtype, y_shape
> )
>
> output_z = helper.make_tensor_value_info(
> "z", z_dtype, z_shape
> )
>
> graph = helper.make_graph(
> nodes=[node],
> name="ArrayFeatureExtractor_Test",
> inputs=[input_x, input_y],
> outputs=[output_z]
> )
>
>
> opset_imports = [
> helper.make_opsetid("", 15),
> helper.make_opsetid("ai.onnx.ml", 3),
> ]
>
> model = helper.make_model(
> graph,
> opset_imports=opset_imports,
> producer_name="onnx-example"
> )
>
>
> onnx.save(model, "array_feature_extractor_manual.onnx")
> ```
>
> Load the model
> ```
> import onnxruntime as ort
> import numpy as np
> session = ort.InferenceSession("array_feature_extractor_manual.onnx",
providers=["CPUExecutionProvider"])
>
>
> x_data = np.arange(10, dtype=np.int64).reshape( 10,1)
>
>
> y_data = np.array([-10], dtype=np.int64)
>
> print(x_data)
> print("?? Index:", y_data)
>
>
> results = session.run(
> ["z"],
> {"x": x_data, "y": y_data}
> )
>
> z_output = results[0]
>
> print(z_output)
> ```</issue_description>
>
> ## Comments on the Issue (you are @copilot in this section)
>
> <comments>
> </comments>
>
</details>
<!-- START COPILOT CODING AGENT SUFFIX -->
- Fixes #27265
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hariharans29 <9969784+hariharans29@users.noreply.github.com>
## Description User reported build error in #27269. This PR addresses several build issues and compilation warnings in the CUDA provider and associated contrib ops. These fixes ensure a clean build and improved compatibility with different CUDA versions (specifically CUDA 13.1) and compilers. ## Changes ### 1. Fix ShardedMoE Compilation Error - Resolved a "no matching function for call to CheckInputs" error in sharded_moe.cc - Updated the `moe_helper::CheckInputs` call to provide the required `zero_points` arguments (passing `nullptr`), aligning with the updated function signature. ### 2. Suppress CUDA 13.1 System Header Warnings - Added GCC/Clang diagnostic pragmas to suppress `-Wunused-parameter` warnings in `cuda_fp4.h`. - These warnings were causing build failures in environments where warnings are treated as errors. - Affected files: - onnxruntime/core/providers/cuda/cuda_common.h - onnxruntime/core/providers/cuda/cuda_type_conversion.h - onnxruntime/contrib_ops/cuda/llm/cutlass_type_conversion.h ### 3. Resolve Sign-Comparison Warnings - Fixed several `-Wsign-compare` warnings that were being treated as errors: - **Pad Op:** Changed loop variable type to `size_t` in onnxruntime/core/providers/cuda/tensor/pad.cc. - **Distributed Reshape:** Added explicit casts to `size_t` for `int64_t` comparisons in onnxruntime/contrib_ops/cuda/collective/distributed_reshape.cc. ## Verification - The build now completes successfully without errors or warnings using `--cmake_extra_defines onnxruntime_USE_NCCL=ON` - Builds tested with cuda 12.8, 13.0 and 13.1.1
This PR resolves flakiness and accuracy issues in the `MatMulNBitsLutGemm` operator. ## Root Cause Analysis The `MatMulNBitsLutGemm` operator exhibited non-deterministic flakiness and numerical accuracy issues. This analysis covers the root causes addressed by the changes. ## Identified Root Causes ### 1. Data Race in [LutGemmPackQuantBData](https://github.com/microsoft/onnxruntime/blob/cee825d34d533ca325bfd8f8269c86133ae512e6/onnxruntime/core/mlas/lib/qlutgemm.cpp#L166-L295) - **Issue**: The weight packing loop was parallelized across output features ($N$). Since T-MAC packs multiple features into a single byte, concurrent updates to the same byte caused bit-level corruption. - **Fix**: Serialized the sub-byte accumulation phase of the weight packing process. ### 2. Thread-Safety in Global Configuration Map - **Issue**: `tmac_kernel_configs` (a static `std::unordered_map`) was accessed concurrently. Map insertions or rehashing during initialization could invalidate references held by other threads. - **Fix**: Added `std::mutex` protection and modified the parameter getter to return by value. ### 3. Tiling Dimension Mismatch and Buffer Safety - **Issue**: The orchestrator used batch size ($M$) for kernel configuration, while weights are tiled by features ($N$). Additionally, the kernel lacked clamping for partial tiles, leading to potential overruns. - **Fix**: Synchronized tiling logic by using $N$ for initialization, passing `TotalN` for parameter retrieval, and implementing explicit clamping and tail-case handling in the AVX2 kernel. ### Verification Results - `MatMulNBitsLutGemm.Float32_2Bits_Asymmetric_Batch32_256x256` passed 100 consecutive iterations. - Full MatMul2Bits suite passed all 10 tests with standard **0.15f** tolerance.
### Description This PR restores Java support on macOS arm64 and fixes Jar testing failures on the new AcesShared pool. #### Background Commit `5ed340f7a51f3cbdb62577a874daf2b3f23d6a93` (#26252) moved macOS builds to a faster pool (AcesShared) which reduced build time by 85%, but this pool doesn't have JDK installed and ADO's `JavaToolInstaller` doesn't support macOS. As a result, Java binaries for macOS arm64 were temporarily removed. #### Changes 1. Enable Java Builds & Tests on macOS ARM64: * Install JDK 17: Added a script to install JDK 17 via Homebrew if missing on the agent. * Install Maven: Added a fallback to install Maven using curl (since wget is missing on macOS) and configured it to use the * dynamically resolved JAVA_HOME. * Pipeline Updates: Updated jar_package_testing.yml and final-jar-testing-linux.yml to run correctly on AcesShared. 2. Fix C API Tests on macOS ARM64: * Pool Migration: Updated c-api-noopenmp-test-pipelines.yml to use AcesShared with the correct ImageOverride. * Template Enhancements: Updated nuget/templates/test_macos.yml to support dynamic AgentPool and PoolDemands. * Fix Missing Artifact: Modified mac-cpu-packaging-steps.yml to explicitly copy libcustom_op_library.dylib into the testdata folder of the artifact, resolving DllNotFoundException in EndToEndTests. ### Motivation and Context To ensure robust CI coverage for macOS ARM64 (Apple Silicon) for both Java and C APIs effectively using the efficient AcesShared pool. ### Testing - Final_Jar_Testing_MacOS passed: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=1081961&view=logs&j=f1f8e11e-a9fa-53e5-cd29-3ba2c1988550&t=f4fafe98-de38-519c-0045-d220f6898d47
### Description Adds arm64 windows python packages to the build ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull-request addresses a few issues with the Microsoft.ML.OnnxRuntime.Foundry: - Builds arm64 as opposed to previous arm64ec for windows arm64. - Signs the nuget package. - Updates target props by checking if onnxruntime.dll exists before attempting to copy. This is a bugfix where if one tries to install any non arm64 package on an arm64 machine (for example when one uses Microsoft.ML.OnnxRuntime.Gpu on windows arm64) it always tries to copy the win-arm64 onnxruntime.dll which does not exist. - Takes a dependency on Microsoft.ML.OnnxRuntime.Gpu.Linux for the foundry package.
## Summary This PR addresses persistent native library loading issues in the ONNX Runtime NuGet package, specifically on macOS and Linux, by implementing a robust DllImportResolver. It also includes necessary pipeline and packaging adjustments to ensure required macOS artifacts are correctly located and validated during CI. ## Problem #27263 reports that `Unable to load shared library 'onnxruntime.dll' or one of its dependencies`. It was caused by #26415 since the commit hard-coded onnxruntime.dll even for Linux and MacOS (The correct filename shall be libonnxruntime.so for Linux, and libonnxruntime.dylib for MacOS). The Nuget test pipeline has been broken for a while, so we also need fix the pipeline to test our change. It has the following issues: * MacOS nuget is for arm64, but the vmImage `macOS-15` is x64. * MacOS nuget test need libcustom_op_library.dylib, but it is not copied from artifacts to test environment. * MacOS artifact contains libonnxruntime.dylib and libonnxruntime.1.24.1.dylib, where libonnxruntime.dylib is symlink. It causes issue since the later is excluded by nuspec. * MacOS nuget test use models from onnx repo. However, latest onnx has some models with data types like float8 that are not supported by C#, so those model test failed. * Linux nuget test uses a docker Dockerfile.package_ubuntu_2404_gpu, but docker build failed due to libnvinfer-headers-python-plugin-dev and libnvinfer-win-builder-resource10 version. ## Changes ### 1. Robust C# DLL Resolution The DllImportResolver has been enhanced to handle various deployment scenarios where standard .NET resolution might fail: - **Platform-Specific Naming**: Maps extension-less library names (`onnxruntime`, `ortextensions`) to appropriate filenames (`onnxruntime.dll`, `libonnxruntime.so`, `libonnxruntime.dylib`) based on the OS. - **Multi-Stage Probing**: 1. **Default Loading**: Attempts `NativeLibrary.TryLoad` with the mapped name. 2. **NuGet `runtimes` Probing**: If the above fails, it probes the `runtimes/{rid}/native/` subdirectories relative to the assembly location, covering common RIDs (`win-x64`, `linux-arm64`, `osx-arm64`, etc.). 3. **Base Directory Fallback**: As a final attempt, it looks in `AppContext.BaseDirectory`. - **Case-Sensitivity Handling**: Ensures lowercase extensions are used on Windows to prevent lookup failures on case-sensitive filesystems. ### 2. macOS CI/Packaging Improvements - **Templates (test_macos.yml)**: - Updated to extract artifacts from TGZ files. - Ensures `libcustom_op_library.dylib` is placed in the expected location (`testdata/testdata`) for end-to-end tests. - Initializes the ONNX submodule to provide required test data. - **Node.js**: - Restored the Node.js macOS test stage in c-api-noopenmp-test-pipelines.yml, configured to run on the ARM64 pool (`AcesShared`). - Updated test_macos.yml template to support custom agent pools (similar to the NuGet template). - **Pipeline Config**: Adjusted agent pool selection and demands for macOS jobs to ensure stable execution. - **Binary Robustness**: The `copy_strip_binary.sh` script now ensures `libonnxruntime.dylib` is a real file rather than a symlink, improving NuGet packaging reliability. ### 3. Test Refinements - **Inference Tests**: Skips a specific set of pretrained-model test cases on macOS that are currently known to be flaky or unsupported in that environment, preventing noise in the CI results. ## Verification ### Pipelines - [x] Verified in `NuGet_Test_MacOS`. - [x] Verified in `NuGet_Test_Linux`. - [x] Verified in Windows test pipelines. ### Net Effect The C# bindings are now significantly more resilient to different deployment environments. The CI process for macOS is also more robust, correctly handling the artifacts required for comprehensive NuGet validation.
…nels (#27176) ### Description Updates the `BaseTester` class used by the `onnxruntime_provider_test` tool to support plugin EPs that use a kernel registry but compile other nodes. For example, TRT EP only uses registered kernels for Memcpy* nodes, but compiles every other node. Without this change, plugin EPs that use a mix of compiled nodes and registered kernels cannot be tested with `onnxruntime_provider_test`. ### Motivation and Context
Fix #27125 It does fix the build issue on Linux, but I am not entirely sure whether this is the optimal fix.
edgchen1
reviewed
Feb 12, 2026
tools/ci_build/github/azure-pipelines/templates/mac-cpu-packing-jobs.yml
Show resolved
Hide resolved
edgchen1
previously approved these changes
Feb 12, 2026
baijumeswani
previously approved these changes
Feb 12, 2026
This change records the service name(s), if any, as part of the SessionCreation/ProcessInfo events. We cache the service names after the first time we calculate them in order to avoid unnecessary overhead. These changes enable deeper understanding of ORT usage, since multiple services can run inside an application in svchost, which currently obscures our understanding of which services/use cases are most popular. Understanding which services are actually being used can help prioritize more investments in making ORT better targeted to end users. Have tested that the logic in GetServiceNamesForCurrentProcess can accurately return service name for a given process
df00e91
baijumeswani
approved these changes
Feb 12, 2026
titaiwangms
approved these changes
Feb 12, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This cherry-picks the following commits for the 1.24.2 release: