[OVEP] OpenVINO EP 1.26.0 Development Release Updates#28297
Merged
adrianlizarraga merged 372 commits intomicrosoft:mainfrom May 1, 2026
Merged
[OVEP] OpenVINO EP 1.26.0 Development Release Updates#28297adrianlizarraga merged 372 commits intomicrosoft:mainfrom
adrianlizarraga merged 372 commits intomicrosoft:mainfrom
Conversation
…759) * ov_factory: Use 'GPU_DEVICE_ID' property to match with ORT device_id * clean up comment
Sync msft 24 7 25
Backmerging with Msft commits
Sync with Microsoft ONNX Runtime - 31/07/2025
* Add on-the-fly bfloat16->float16 conversion pass * Fix undetected bfloat16 initializers * Remove the option and make the logic implicit * Add tests * Rename detection function * Fix CI for strict aliasing rules --------- Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Mild weight as input implemented to keep quantization parameters as initializers for QDQ nodes
Sync with Microsoft ONNX Runtime - 05/08/2025
Sync with Microsoft ONNX Runtime - 07/08/2025
Sync with Microsoft ONNX Runtime - 08/08/2025
DeQuantizeLinear is dangling which needs to be handled in capability.cc
Sync with Microsoft ONNX Runtime - 12/08/2025
Not setting default precision if it is not set via provider option.
#776) * Fix failing case where input onnx model is used with shared context enabled * Update onnxruntime/core/providers/openvino/openvino_execution_provider.cc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [OVEP] Support for providing layout to input/output to OpenVINO * [OVEP] Minor bug fixes for layout feature
Sync with Microsoft ONNX Runtime - [18/08/2025]
CVS-165537 Minor fixes to partially enable contrib op tests
Sync with Microsoft ONNX Runtime - 2404202
#1046) * Fix incorrect device selection(NPU) with python app running OVEP GPU backend * apply reviwer comment * apply new change * remove last change and extra space * remove space * cleaning * fix the name * Revert "cleaning" This reverts commit c58f3d6. * Revert "remove space" This reverts commit ba1939b. * Revert "remove last change and extra space" This reverts commit 0cc2294. * Revert "apply new change" This reverts commit bddb18b. * Revert "fix the name" This reverts commit c800fdc. * revert back new changes --------- Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Sync with Microsoft ONNX Runtime - 26042026
lint fixes for OVEP develop branch
Backmerging PR
Contributor
Author
|
@adrianlizarraga Please review & merge this PR. |
Contributor
There was a problem hiding this comment.
Pull request overview
Upstream sync of Intel’s OpenVINO EP development branch into ONNX Runtime main, adding OpenVINO 2026.0/2026.1 support and updating OVEP behavior/tests around external initializers, stateful CausalLM KV-cache handling, and profiling/perf-count dumping.
Changes:
- Add OV 2026.0/2026.1 version handling and expand supported initializer/op capability metadata.
- Update KV-cache reorder/stateful infer-request interfaces and logic (incl. new failure behavior on
RewindKVCache(index>0)when reorder is enabled). - Improve external initializer handling/weight sharing + add/adjust OVEP tests; add env-driven perf-count CSV dumping; adjust perftest to reset outputs for data-dependent shapes.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/providers/openvino/openvino_ep_ext_init.cc | Refactors/extents OVEP external-initializer tests; adds new edge-case coverage (dynamic embed + empty raw_data). |
| onnxruntime/test/providers/cpu/tensor/quantize_linear_test.cc | Adjusts skips/exclusions for QuantizeLinear coverage across EPs. |
| onnxruntime/test/perftest/ort_test_session.cc | Resets outputs between runs to handle data-dependent output shapes in perf tests. |
| onnxruntime/test/contrib_ops/matmul_4bits_test.cc | Updates tolerances and removes OpenVINO build guard to broaden test coverage. |
| onnxruntime/test/contrib_ops/fused_matmul_op_test.cc | Narrows OpenVINO exclusion to specific failing cases; keeps TensorRT excluded for unsupported dtype. |
| onnxruntime/test/contrib_ops/embed_layer_norm_op_test.cc | Only excludes OpenVINO when 3rd output is requested; otherwise runs normally. |
| onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc | Switches offset/length parsing to from_chars for large (>4GB) external-data metadata. |
| onnxruntime/core/providers/openvino/ov_versions/data_ops.h | Adds V_2026_0 / V_2026_1 version enums. |
| onnxruntime/core/providers/openvino/ov_versions/data_ops.cc | Updates supported types/ops by OV version; adds FLOAT8 initializer types and ReduceSum no-dimension support. |
| onnxruntime/core/providers/openvino/ov_versions/capability.cc | Bumps default OV version mapping and filters orphaned graph outputs for subgraph outputs. |
| onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc | Updates cache reorder fusion to enforce mutual exclusivity of beam_idx vs src_idx/dst_idx paths. |
| onnxruntime/core/providers/openvino/ov_shared_context.h | Adds external-weight re-add validation and improves metadata insertion under lock. |
| onnxruntime/core/providers/openvino/ov_shared_context.cc | Adds locking around device-tensor mapping creation to fix races. |
| onnxruntime/core/providers/openvino/ov_interface.h | Renames KV-cache reorder API and adds cleanup hook for reorder status. |
| onnxruntime/core/providers/openvino/ov_interface.cc | Implements reorder tensor population/validation, cleanup after inference, and updated rewind behavior. |
| onnxruntime/core/providers/openvino/ov_bin_manager.cc | Adds bounds checks and changes mapped-blob tensor view construction for import. |
| onnxruntime/core/providers/openvino/openvino_provider_factory.cc | Updates provider option handling (NPU dynamic-shape defaulting; preserve user compilation params; preserve factory device_type). |
| onnxruntime/core/providers/openvino/openvino_provider_dllmain.cc | Calls ov::shutdown() on DLL unload. |
| onnxruntime/core/providers/openvino/openvino_execution_provider.cc | Renames dynamic option hook to call SetReorderKVCacheStatus. |
| onnxruntime/core/providers/openvino/ibackend.h | Makes Infer non-const and renames reorder interface. |
| onnxruntime/core/providers/openvino/exceptions.h | Adds human-readable exception type strings and adjusts error formatting. |
| onnxruntime/core/providers/openvino/backends/basic_backend.h | Adds perf-count dump helpers/state and updates interfaces for non-const infer + KV reorder rename. |
| onnxruntime/core/providers/openvino/backends/basic_backend.cc | Implements env-driven perf-count CSV dumping and skips compilation params on precompiled blob import. |
| onnxruntime/core/providers/openvino/backend_utils.h | Adds perf-count dump path accessor and changes perf-count print signatures. |
| onnxruntime/core/providers/openvino/backend_utils.cc | Implements ORT_OPENVINO_PERF_COUNT handling and CSV formatting for profiling output. |
| onnxruntime/core/providers/openvino/backend_manager.h | Renames backend-manager reorder interface. |
| onnxruntime/core/providers/openvino/backend_manager.cc | Updates export error messaging, gates OVEP QDQ stripping by OV version, adjusts external-initializer embedding heuristic, and updates debug model dumping. |
Comments suppressed due to low confidence (1)
onnxruntime/core/providers/openvino/ov_interface.cc:534
PreProcessInferRequest()setssrc_idx/dst_idxtwice: the block starting atif (is_kvcache_reorder_added)is duplicated later in the same function. This duplicates allocations/fills and makes the logic harder to maintain. Remove the second block (or factor into a helper) so the reorder tensors are prepared exactly once per inference.
if (is_kvcache_reorder_added) {
ov::Shape dst_idx_shape = ovInfReq.get_tensor("dst_idx").get_shape();
const auto kv_num_heads = dst_idx_shape[1];
const auto kv_head_size = dst_idx_shape[3];
if (kv_src_indices.size() > 0) {
ov::Tensor src_idx_tensor = ov::Tensor(ov::element::i32, {kv_src_indices.size()});
const auto src_idx_ptr = src_idx_tensor.data<int32_t>();
for (size_t i = 0; i < kv_src_indices.size(); ++i) {
src_idx_ptr[i] = static_cast<int32_t>(kv_src_indices[i]);
}
ovInfReq.set_tensor("src_idx", src_idx_tensor);
ov::Tensor dst_idx_tensor = ov::Tensor(ov::element::i32, {1, kv_num_heads, kv_dst_indices.size(), kv_head_size});
const auto dst_idx_ptr = dst_idx_tensor.data<int32_t>();
for (size_t i = 0; i < kv_num_heads; ++i) {
for (size_t j = 0; j < kv_dst_indices.size(); ++j) {
std::fill_n(dst_idx_ptr + (i * kv_dst_indices.size() + j) * kv_head_size, kv_head_size, kv_dst_indices[j]);
}
}
ovInfReq.set_tensor("dst_idx", dst_idx_tensor);
} else {
FillTensor("src_idx", ov::element::i32, {0}, 0);
FillTensor("dst_idx", ov::element::i32, {1, kv_num_heads, 0, kv_head_size}, 0);
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot AI
added a commit
to intel/onnxruntime
that referenced
this pull request
Apr 30, 2026
Agent-Logs-Url: https://github.com/intel/onnxruntime/sessions/ccb21443-e4ea-4375-8aaa-2f953e78af4f Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Copilot AI
added a commit
to intel/onnxruntime
that referenced
this pull request
May 1, 2026
Agent-Logs-Url: https://github.com/intel/onnxruntime/sessions/b680222b-444d-4bb6-a487-d6a402683cea Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Agent-Logs-Url: https://github.com/intel/onnxruntime/sessions/b680222b-444d-4bb6-a487-d6a402683cea Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Contributor
|
@adrianlizarraga.. Addressed your review comments. |
MayureshV1
approved these changes
May 1, 2026
adrianlizarraga
approved these changes
May 1, 2026
tianleiwu
added a commit
that referenced
this pull request
May 3, 2026
## Summary Periodic upstream sync of Intel's OVEP branch (`ovep_1_26_release`) into ORT main. All changes are scoped to the OpenVINO EP and its tests. ### OpenVINO 2026.0 / 2026.1 support - Add `V_2026_0` / `V_2026_1` version enums; `capability.cc` default bumped to `V_2026_1`. - Register FLOAT8E4M3FN / FLOAT8E5M2 initializer types on CPU / GPU / NPU. - Disable OVEP-level QDQ-stripping on OV ≥ 2026.1 (OV handles it internally). - Add `ReduceSum` to no-dimension-supported ops. ### KV-cache / stateful CausalLM - Rename `ReorderKVCache` → `SetReorderKVCacheStatus` across backend interfaces. - Populate `src_idx` / `dst_idx` in `PreProcessInferRequest` with shape validation; clean state after inference and on `RewindKVCache`. - `FuseCacheReorder`: `beam_idx` and `src_idx`/`dst_idx` paths are now mutually exclusive; reject models that already carry reorder inputs. - **Behavior change:** `RewindKVCache(index > 0)` now throws when reorder is enabled (physical KV-cache eviction pass is a TODO). ### NPU / provider options - Force `disable_dynamic_shapes=true` on NPU unless `enable_causallm` is set. - Preserve user-supplied `NPU_COMPILATION_MODE_PARAMS`; skip it when importing precompiled blobs. - Preserve factory-level `device_type` when session options don't override it (fixes NPU mis-selection from Python). - **Behavior change:** removed the `ORT_OPENVINO_NPU_COMPILER_TYPE` env override — OV's default NPU compiler is used now. ### External initializers / weight sharing - Drop the 32 MB embed threshold — always externalize when multiple external initializers are in memory. - `DumpOpenVINOEPModel` rebuilds a self-contained proto when initializer data was stripped. - `AddExternalWeight` validates re-adds against existing offset/size/location (parity with ABI EP); fix race in device-tensor mapping. - `ov_bin_manager`: bounds-checked pointer view over mapped weights (fixes read-only blob import). - `qdq_stripping`: use `std::from_chars` so offsets/lengths > 4 GB parse correctly. ### Perf-count dump - New `ORT_OPENVINO_PERF_COUNT=<dir>` env var writes per-subgraph CSV (`Layer Name,Status,Layer Type,Real Time (us),Exec Type`), replacing the old stdout-only debug dump. Requires `ov::enable_profiling` on the compiled model; logs a warning and no-ops otherwise. ### Misc - **API:** `IBackend::Infer` is no longer `const` (needed for perf-dump bookkeeping). - Filter orphaned graph outputs from OVEP sub-graphs. - Better error message for "cannot export dynamically compiled model" (points to `reshape_input`). - Human-readable `ovep_exception::type` strings. - `ov::shutdown()` on DLL unload. ### Tests - Add `OVEP_ExtInit_DynamicEmbed_Tests` and `OVEP_ExtInit_EmptyRawData_Tests`; refactor setup into `SetUpTestSuite`. - Narrow OVEP exclusions in `embed_layer_norm`, `fused_matmul`, `matmul_4bits`, `quantize_linear` (skip only unsupported sub-cases). - `perftest`: reset outputs per run to support data-dependent output shapes (e.g. NonZero). ## Testing Validated against the OpenVINO versions this release targets (2025.3 – 2026.1) on CPU / GPU / NPU: - New OVEP tests pass: `OVEP_ExtInit_Tests`, `OVEP_ExtInit_DynamicEmbed_Tests`, `OVEP_ExtInit_EmptyRawData_Tests` - Narrowed contrib-op exclusions verified against EmbedLayerNorm, FusedMatMul, MatMulNBits, QuantizeLinear - Stateful CausalLM flow exercised for KV-cache reorder + rewind - `ORT_OPENVINO_PERF_COUNT=<dir>` verified to produce per-subgraph CSVs - 2+ GB external-initializers-in-memory model loads on CPU / GPU / NPU --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: Jaswanth Gannamaneni <jaswanth.gannamaneni@intel.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: n1harika <niharika.sathish@intel.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: liang <gxgaoliang@126.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com> Co-authored-by: Christopher Warrington <chwarr@microsoft.com> Co-authored-by: Ishwar Raut <iraut@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Xinpeng Dou <15529241576@163.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Pradeep Sakhamoori <psakhamoori@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: mingyue <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Susanta Bhattacharjee <susanta.bhattacharjee@intel.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: Jozef Wludzik <jozef.wludzik@intel.com> Co-authored-by: Bartlomiej Filipek <bartlomiej.filipek@intel.com> Co-authored-by: Kotomi-Du <yaru.du@intel.com> Co-authored-by: Rajeev Sekar <rajeevsekar21@gmail.com> Co-authored-by: Mayuresh M Varerkar <mayuresh.m.varerkar@intel.com> Co-authored-by: Mikhail Dvoretckii <mikhail.dvoretckii@intel.com> Co-authored-by: bopeng1234 <bo.peng@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Wenqin Yang <wenqin.yang@intel.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com> Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: chunghow-qti <chunghow@qti.qualcomm.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Jiawei Shao <jiawei.shao@intel.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: czekun <chen.zekun@intel.com> Co-authored-by: Ryan Metcalfe <ryan.metcalfe@intel.com> Co-authored-by: Jaskaran Singh Nagi <jaskaran.singh.nagi@intel.com> Co-authored-by: ai-fw-intg <sys_ai_fw_intg@intel.com> Co-authored-by: Rajeev Sekar <rajeev.sekar@intel.com> Co-authored-by: RajeevSekar <117911837+RajeevSekar@users.noreply.github.com> Co-authored-by: Nazanin Beheshti <nazanin.beheshti@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Periodic upstream sync of Intel's OVEP branch (
ovep_1_26_release) into ORT main. All changes are scoped to the OpenVINO EP and its tests.OpenVINO 2026.0 / 2026.1 support
V_2026_0/V_2026_1version enums;capability.ccdefault bumped toV_2026_1.ReduceSumto no-dimension-supported ops.KV-cache / stateful CausalLM
ReorderKVCache→SetReorderKVCacheStatusacross backend interfaces.src_idx/dst_idxinPreProcessInferRequestwith shape validation; clean state after inference and onRewindKVCache.FuseCacheReorder:beam_idxandsrc_idx/dst_idxpaths are now mutually exclusive; reject models that already carry reorder inputs.RewindKVCache(index > 0)now throws when reorder is enabled (physical KV-cache eviction pass is a TODO).NPU / provider options
disable_dynamic_shapes=trueon NPU unlessenable_causallmis set.NPU_COMPILATION_MODE_PARAMS; skip it when importing precompiled blobs.device_typewhen session options don't override it (fixes NPU mis-selection from Python).ORT_OPENVINO_NPU_COMPILER_TYPEenv override — OV's default NPU compiler is used now.External initializers / weight sharing
DumpOpenVINOEPModelrebuilds a self-contained proto when initializer data was stripped.AddExternalWeightvalidates re-adds against existing offset/size/location (parity with ABI EP); fix race in device-tensor mapping.ov_bin_manager: bounds-checked pointer view over mapped weights (fixes read-only blob import).qdq_stripping: usestd::from_charsso offsets/lengths > 4 GB parse correctly.Perf-count dump
ORT_OPENVINO_PERF_COUNT=<dir>env var writes per-subgraph CSV (Layer Name,Status,Layer Type,Real Time (us),Exec Type), replacing the old stdout-only debug dump. Requiresov::enable_profilingon the compiled model; logs a warning and no-ops otherwise.Misc
IBackend::Inferis no longerconst(needed for perf-dump bookkeeping).reshape_input).ovep_exception::typestrings.ov::shutdown()on DLL unload.Tests
OVEP_ExtInit_DynamicEmbed_TestsandOVEP_ExtInit_EmptyRawData_Tests; refactor setup intoSetUpTestSuite.embed_layer_norm,fused_matmul,matmul_4bits,quantize_linear(skip only unsupported sub-cases).perftest: reset outputs per run to support data-dependent output shapes (e.g. NonZero).Testing
Validated against the OpenVINO versions this release targets (2025.3 – 2026.1) on CPU / GPU / NPU:
OVEP_ExtInit_Tests,OVEP_ExtInit_DynamicEmbed_Tests,OVEP_ExtInit_EmptyRawData_TestsORT_OPENVINO_PERF_COUNT=<dir>verified to produce per-subgraph CSVs