Skip to content

Conversation

@jatinwadhwa921
Copy link

Backmerging with Msft Commits

wcy123 and others added 11 commits July 7, 2025 10:25
### Description

Add a new ORT API `GetSessionOptionConfigEntries`.

### Motivation and Context

microsoft#24887 allows plugin-EPs to interface with ORT using a binary stable interface. microsoft#24445 allows an EP to handle the extraction of EP options from the session option configurations. For an EP like VitisAI EP to comply with the requirements,

it is necessary for a plugin-EPs to access all config entries in a session option.

```c++
    OrtKeyValuePairs * kvps = nullptr;
    auto status = GetSessionOptionConfigEntries(session_option, &kvps);
    if(status) {
        throw status;
    }
    std::unique_ptr<OrtKeyValuePairs, void (*)(OrtKeyValuePairs*)>
        config_entries(kvps,
                       ort_api.ReleaseKeyValuePairs);
    const char* const* keys = nullptr;
    const char* const* values = nullptr;
    size_t num_keys = 0;
    // Get keys and values from the config entries
    Ort::GetApi().GetKeyValuePairs(config_entries.get(), &keys, &values, &num_keys);

    for (size_t i = 0; i < num_keys; ++i) {
        // process keys[i] and values[i]
    }
```
…5192)

### Description
This PR optimizes the Intel GPU path for the
`DP4AMatMulNBitsSmallMProgram` by tuning `tile_size` and
`tile_size_k_vec`.



### Motivation and Context
With this change, we achieved >8% performance boost on Intel iGPUs
(Xe-LP and Xe2-LPG) for phi-4-mini-accuracy4 model.
Follow up microsoft#24980
Fix microsoft#24556

Add ONNX RotaryEmbedding(23) following
https://github.com/onnx/onnx/blob/main/docs/Operators.md#RotaryEmbedding.
The PR uses contrib op RotaryEmbedding implementation under the hood.

The main difference between this op and the contrib op is that the
position_ids in ONNX RotaryEmbedding is optional. When it's not
provided, cos_cache and sin_cache should be 3d.
…mization (microsoft#25296)

In the context of a model containing EPContext nodes, it's highly
unlikely that two EPContext nodes will produce the same results.
Furthermore, the EquivalenceClass constructor includes the node and all
its attributes in the hash calculation, which can be particularly
time-consuming when the "ep_cache_context" attribute contains a large
binary blob.

Therefore, we exclude EPContext op from CSE.
…oft#25285)

### Description



support smooth softmax for non-FA GQA implementation


This change depends on:
- microsoft#25269



Work items:

- [x] support smooth softmax
- [x] support bias
- [x] support head sink (per-head smooth softmax)

The following will not be included in this PR:
- support for FlashAttention
- support sliding window
### Description
<!-- Describe your changes. -->
1. Fix the Build Break in NV TRT RTX EP
### Description

Fix Windows build with MSVC 17.14.7 and cuda 12.9.1. 

The build error was like:
`CUDACOMPILE : nvcc error : 'cudafe++' died with status 0xC0000005
(ACCESS_VIOLATION)`

The cause is unknown (maybe cudafe bug). The code change resolved the
issue. I've verified it in two machines.
…icrosoft#25308)

### Description
- Infer `OrtDevice` for a plugin EP from the registered `OrtMemoryInfo`
for device memory.
- Fix potential `nullptr` dereference when a `PluginExecutionProvider`
tries to log a message without a valid logger. Now, constructing a
`PluginExecutionProvider` requires passing a valid logger.



### Motivation and Context
Address a `TODO` to properly set the `OrtDevice` for a
`PluginExecutionProvider` instance.
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k July 8, 2025 09:12
@jatinwadhwa921 jatinwadhwa921 merged commit 8264e36 into ovep-develop Jul 8, 2025
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_8_7_25 branch July 8, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.