fix: thread ep parameter through to WinMLSession by DingmaomaoBJTU · Pull Request #404 · microsoft/winml-cli

DingmaomaoBJTU · 2026-04-27T10:14:06Z

Summary

The --ep flag was silently dropped in the model construction path: WinMLAutoModel received it but never forwarded it to WinMLPreTrainedModel → WinMLSession
Added ep parameter to WinMLPreTrainedModel.__init__() and forwarded it to WinMLSession
Passed ep at all three winml_class(...) construction sites in WinMLAutoModel (from_pretrained, from_onnx skip-build path, from_onnx build path)

Default-path behavior change

Previously, when no --ep was set, _build_session_options always fell through to set_provider_selection_policy(...) (PREFER_NPU/GPU/CPU) and let ORT pick the EP. This PR also changes that path: for non-CPU devices, it now first calls _find_ep_for_device and, if a match is found, uses add_provider_for_devices directly — bypassing the device policy. This is intentional: it gives us explicit control over which physical device is used (matching what we already do for the --ep path) and avoids WinML EP registration issues that the providers= string-based path cannot handle.

Fixes #402

🤖 Generated with Claude Code

The `--ep` flag was silently dropped in the model construction path. `WinMLAutoModel.from_pretrained()` and `from_onnx()` received the `ep` value but never forwarded it to `WinMLPreTrainedModel`, which in turn never passed it to `WinMLSession`. This caused `ModelCompiler` to fall back to policy-based EP selection, which it does not support, resulting in an empty provider type string and a runtime crash. Fixes #402

Policy-based EP selection (set_provider_selection_policy) does not work for InferenceSession on the current ORT build — nodes end up with an empty provider type string. When no explicit ep is set, fall back to _DEVICE_TO_EP to resolve device ("gpu"→"dml", "npu"→"qnn") and use add_provider_for_devices instead. Fixes #402

ort.get_ep_devices() may not list DML, causing _find_ep_device to return None and falling back to the broken policy-based path. Instead, resolve the providers list directly from EP name map and pass it via the InferenceSession(providers=...) parameter, which does not depend on get_ep_devices(). Fixes #402

_find_ep_device previously matched only on ep_name and returned the first hit. When multiple devices share the same EP (e.g., integrated + discrete GPU both using DmlExecutionProvider), this could select the wrong physical device. Now also matches on OrtHardwareDeviceType, consistent with the pattern used in runtime_checker_query and winml.py. Fixes #402

Replace the static _DEVICE_TO_EP mapping (gpu→dml, npu→qnn) with runtime discovery via ort.get_ep_devices() filtered by device type. This correctly handles machines with non-default EPs (e.g., CUDA or MIGraphX on GPU instead of DML). Fixes #402

When e2e_eval builds a model then benchmarks the resulting .onnx file, it calls _run_onnx_benchmark which created WinMLSession with only device but not ep. This was the actual failing path — config.ep was available but never forwarded to the session. Fixes #402

QNN supports GPU via Qualcomm Adreno backend, but _EP_DEVICE_MAP hardcoded it to NPU only. Change to "npu/gpu" and update _DEVICE_EP_MAP generation to split multi-device strings so QNN appears in both the NPU and GPU device lists. Fixes #402

Two issues fixed: 1. Explicit EP (--ep qnn) no longer filters by device type in _find_ep_device. QNN reports as NPU in get_ep_devices() but can target GPU — trust the user's choice. 2. InferenceSession now uses _build_session_options() (which calls add_provider_for_devices, working with WinML EP registry) instead of the providers= string parameter (which tries standard DLL loading and fails for WinML-registered EPs like QNN). Falls back to providers= only when _build_session_options returns policy-based options. Fixes #402

WinML-registered EPs (e.g. QNN) do not support the providers= parameter in InferenceSession. Remove _resolve_providers and the conditional providers= path entirely. EP is now configured exclusively via add_provider_for_devices in _build_session_options, or left to ORT device policy in the fallback case.

timenick

Code review — 4 issues found. The core fix (threading ep from WinMLAutoModel → WinMLPreTrainedModel → WinMLSession) looks correct. Comments below are on the supporting changes in session.py.

🤖 Generated with Claude Code

- Remove unused `device` parameter from `_find_ep_device`; the caller intentionally skips device-type filtering (QNN reports as NPU but can target GPU), so the parameter was dead code that contradicted the comment at the call site - Update `_build_session_options` docstring to document that `"cpu"` is excluded from the add_provider_for_devices path and falls through to policy-based selection - Document registry-order dependency in `_find_ep_for_device`: when multiple EPs match the same device type the first one wins; callers that need a specific EP should set `self._ep` to bypass discovery

DingmaomaoBJTU requested a review from a team as a code owner April 27, 2026 10:14

DingmaomaoBJTU added 7 commits April 27, 2026 20:44

fix: map QNN EP to both NPU and GPU

66a0148

QNN supports GPU via Qualcomm Adreno backend, but _EP_DEVICE_MAP hardcoded it to NPU only. Change to "npu/gpu" and update _DEVICE_EP_MAP generation to split multi-device strings so QNN appears in both the NPU and GPU device lists. Fixes #402

xieofxie reviewed Apr 28, 2026

View reviewed changes

Comment thread src/winml/modelkit/session/session.py Outdated

DingmaomaoBJTU added 2 commits April 29, 2026 14:09

Merge branch 'main' into fix/thread-ep-to-session

4fb1607

timenick reviewed Apr 29, 2026

View reviewed changes

Comment thread src/winml/modelkit/session/session.py Outdated

Comment thread src/winml/modelkit/session/session.py

Comment thread src/winml/modelkit/session/session.py

Comment thread src/winml/modelkit/session/session.py

timenick approved these changes Apr 29, 2026

View reviewed changes

DingmaomaoBJTU merged commit ef2db0b into main Apr 29, 2026
9 checks passed

DingmaomaoBJTU deleted the fix/thread-ep-to-session branch April 29, 2026 07:09

This was referenced May 6, 2026

fix(perf): forward --ep/--device through perf --module path and narrow EP discovery #440

Merged

fix(session): suppress VerifyEachNodeIsAssignedToAnEp warning for explicit EP #571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: thread ep parameter through to WinMLSession#404

fix: thread ep parameter through to WinMLSession#404
DingmaomaoBJTU merged 11 commits into
mainfrom
fix/thread-ep-to-session

DingmaomaoBJTU commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DingmaomaoBJTU commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Default-path behavior change

Uh oh!

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DingmaomaoBJTU commented Apr 27, 2026 •

edited

Loading