Skip to content

Sync with Microsoft ONNX Runtime - 26042026#1064

Merged
ankitm3k merged 7 commits intoovep-developfrom
sync_msft_26042026
Apr 28, 2026
Merged

Sync with Microsoft ONNX Runtime - 26042026#1064
ankitm3k merged 7 commits intoovep-developfrom
sync_msft_26042026

Conversation

@ai-fw-intg
Copy link
Copy Markdown

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

vraspar and others added 7 commits April 27, 2026 11:40
…uested (microsoft#28027)

### Description
When MultiHeadAttention has only 1 output (no present_key/present_value
outputs), past key/value inputs should be completely ignored, matching
CPU EP semantics. The WebGPU EP was passing pastKey/pastValue
TensorViews to shader creation functions even when outputCount <= 1,
which affected shader cache keys and allowed past data to leak into the
attention computation.

This caused the test "MultiHeadAttention Basic, one head and head-size=4
with pastKey and pastValue" to fail with output [17,18,19,20] (pastValue
data) instead of expected [9,10,11,12] (V data). The failing output
matches exactly what happens when past IS used: Q·pastKey=75 dominates
Q·K=35, so softmax gives ~100% weight to pastValue.

### Fix
In `applyAttention()`, introduce `effectivePastKey`/`effectivePastValue`
that are set to `undefined` when `outputCount <= 1`. All downstream
usage (shader creation, input arrays) uses these effective values
instead of the raw parameters. This ensures:
- Shader cache keys correctly reflect the "no past" configuration
- Past tensors are never passed to any shader creation function
- Behavior matches CPU EP (which ignores past when present outputs are
null)
- GQA is unaffected (always has outputCount >= 3)
- Vanilla Attention is unaffected (always passes undefined for past)
### Description
In the CPU RNN operator's \\Assign_Y_h\\ function, when
\\sequence_lens\\ contains a value of 0, the computation
\\sequence_lens[batch] - 1 = -1\\ produces a negative offset into the Y
output buffer. \\CopyVector\\ then reads \\hidden_size\\ floats from
heap memory before the buffer, leaking heap data into the \\Y_h\\ output
tensor.

LSTM and GRU already handle zero-length sequences correctly (early
return + zero-fill in compute path), but the basic RNN operator had
neither protection.


### Changes
- **rnn.cc \\Compute()\\**: Add early return when \\max_sequence_length
== 0\\ — zero-fills Y and Y_h outputs and returns immediately (matches
existing LSTM/GRU pattern)
- **rnn.cc \\Assign_Y_h()\\**: Add bounds check on \\last_time_step\\
before computing buffer offset — guards against both negative index
(\\seq_lens=0\\) and index >= seq_length, zero-fills Y_h for invalid
entries

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#28241)

### Description

CI Python packaging pipelines now specify their packaging type (nightly
vs. release) via an explicit pipeline parameter rather than the
implicitly defined pipeline var `NIGHTLY_BUILD`.

### Motivation and Context

Much less error prone than an implicitly defined pipeline variable.
### Description
Fixes 3 ICM fixes:


https://portal.microsofticm.com/imp/v5/incidents/details/31000000572208/summary

https://portal.microsofticm.com/imp/v5/incidents/details/31000000573313/summary

https://portal.microsofticm.com/imp/v5/incidents/details/31000000575583/summary


### Motivation and Context
Fix ICM issues

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request makes a small change to the CUDA label encoder kernel
to address unused parameter warnings. The change marks the `attr_name`
parameter as unused in the `TryGetScalarTensorAttribute` function when
building with the plugin execution provider.

* Code quality improvement:
* Marked the `attr_name` parameter as unused with
`ORT_UNUSED_PARAMETER(attr_name);` to suppress compiler warnings when
building with `BUILD_CUDA_EP_AS_PLUGIN`.
### Description
Pass base timestamp for vitisai profiling

Notify EP that profiling has started with the base timestamp (in
nanoseconds since epoch)
The VitisAI EP can use this to:
1. Calculate relative timestamps (event_ts - base_ts) for the profiling
timeline
2. Store the absolute base timestamp if needed for other purposes



### Motivation and Context
Due to onnxruntime default profiling json file just have the offset
timestamp, it doesn't provider the base timestamp for VitisAI EP, To
combine the VaitisAI timeline profiling info and the onnxruntime default
profiling json file info, We need pass the timestamp for VitisAI EP.

---------

Signed-off-by: Andrew Luo <junpengl@amd.com>
Co-authored-by: Andrew Luo <junpengl@amd.com>
@ankitm3k ankitm3k merged commit 6750358 into ovep-develop Apr 28, 2026
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_26042026 branch April 28, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants