Sync with Microsoft ONNX Runtime - 26042026#1064
Merged
ankitm3k merged 7 commits intoovep-developfrom Apr 28, 2026
Merged
Conversation
…uested (microsoft#28027) ### Description When MultiHeadAttention has only 1 output (no present_key/present_value outputs), past key/value inputs should be completely ignored, matching CPU EP semantics. The WebGPU EP was passing pastKey/pastValue TensorViews to shader creation functions even when outputCount <= 1, which affected shader cache keys and allowed past data to leak into the attention computation. This caused the test "MultiHeadAttention Basic, one head and head-size=4 with pastKey and pastValue" to fail with output [17,18,19,20] (pastValue data) instead of expected [9,10,11,12] (V data). The failing output matches exactly what happens when past IS used: Q·pastKey=75 dominates Q·K=35, so softmax gives ~100% weight to pastValue. ### Fix In `applyAttention()`, introduce `effectivePastKey`/`effectivePastValue` that are set to `undefined` when `outputCount <= 1`. All downstream usage (shader creation, input arrays) uses these effective values instead of the raw parameters. This ensures: - Shader cache keys correctly reflect the "no past" configuration - Past tensors are never passed to any shader creation function - Behavior matches CPU EP (which ignores past when present outputs are null) - GQA is unaffected (always has outputCount >= 3) - Vanilla Attention is unaffected (always passes undefined for past)
### Description In the CPU RNN operator's \\Assign_Y_h\\ function, when \\sequence_lens\\ contains a value of 0, the computation \\sequence_lens[batch] - 1 = -1\\ produces a negative offset into the Y output buffer. \\CopyVector\\ then reads \\hidden_size\\ floats from heap memory before the buffer, leaking heap data into the \\Y_h\\ output tensor. LSTM and GRU already handle zero-length sequences correctly (early return + zero-fill in compute path), but the basic RNN operator had neither protection. ### Changes - **rnn.cc \\Compute()\\**: Add early return when \\max_sequence_length == 0\\ — zero-fills Y and Y_h outputs and returns immediately (matches existing LSTM/GRU pattern) - **rnn.cc \\Assign_Y_h()\\**: Add bounds check on \\last_time_step\\ before computing buffer offset — guards against both negative index (\\seq_lens=0\\) and index >= seq_length, zero-fills Y_h for invalid entries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#28241) ### Description CI Python packaging pipelines now specify their packaging type (nightly vs. release) via an explicit pipeline parameter rather than the implicitly defined pipeline var `NIGHTLY_BUILD`. ### Motivation and Context Much less error prone than an implicitly defined pipeline variable.
### Description Fixes 3 ICM fixes: https://portal.microsofticm.com/imp/v5/incidents/details/31000000572208/summary https://portal.microsofticm.com/imp/v5/incidents/details/31000000573313/summary https://portal.microsofticm.com/imp/v5/incidents/details/31000000575583/summary ### Motivation and Context Fix ICM issues --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request makes a small change to the CUDA label encoder kernel to address unused parameter warnings. The change marks the `attr_name` parameter as unused in the `TryGetScalarTensorAttribute` function when building with the plugin execution provider. * Code quality improvement: * Marked the `attr_name` parameter as unused with `ORT_UNUSED_PARAMETER(attr_name);` to suppress compiler warnings when building with `BUILD_CUDA_EP_AS_PLUGIN`.
### Description Pass base timestamp for vitisai profiling Notify EP that profiling has started with the base timestamp (in nanoseconds since epoch) The VitisAI EP can use this to: 1. Calculate relative timestamps (event_ts - base_ts) for the profiling timeline 2. Store the absolute base timestamp if needed for other purposes ### Motivation and Context Due to onnxruntime default profiling json file just have the offset timestamp, it doesn't provider the base timestamp for VitisAI EP, To combine the VaitisAI timeline profiling info and the onnxruntime default profiling json file info, We need pass the timestamp for VitisAI EP. --------- Signed-off-by: Andrew Luo <junpengl@amd.com> Co-authored-by: Andrew Luo <junpengl@amd.com>
ankitm3k
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.