feat: use resolve device in session creation by xieofxie · Pull Request #708 · microsoft/winml-cli

xieofxie · 2026-05-22T06:15:50Z

Use resolve_device in session creation to align device ep resolution with other commands.

Currently we use a manual mapping in resolve_device / resolve_eps/

We could migrate to PREFER_XPU resolution with a temp model + real session later, but it is no harm to unify how we do it.

Closes #700
Closes #656

xieofxie · 2026-05-22T08:14:27Z

after #712

…_device

DingmaomaoBJTU

Good direction — unifying session creation through resolve_device/resolve_eps eliminates a lot of duplicated logic and makes the EP selection path consistent across all commands. The _find_ep_device removal is clean.

However, there are a few issues, one of which is a crash-level bug.

Also note: DEVICE_POLICY_MAP (lines 46–51) is now dead code — no remaining caller after the policy-based fallback was removed. Should be deleted (or kept only if you plan to re-introduce policy fallback). Similarly, the class docstring (lines 98–106) still says "Policy-based device selection" and "does NOT use explicit EP provider names" which is now the opposite of what the code does.

timenick

Reviewed the resolve_device unification. The shape is right but I want to flag two behavior changes that may be unintentional, a missing regression-test anchor, and three minor / nits.

…-less CI After #708, WinMLSession with device='auto' resolves to a concrete EP via resolve_device() and force-binds it through add_provider_for_devices. On the Windows CI runner, the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs crashes natively in InferenceSession creation (#726). The previously-failing test_run_uses_epcontext_after_compile and the adjacent device='auto' run/compile tests don't depend on auto-resolution semantics; they exercise general session mechanics. Switch them to device='cpu' so the tested behaviour is observable on CPU-only CI. test_providers_are_valid_and_include_fallback asserted the pre-#708 'auto falls back to CPU' behaviour, which #708 intentionally removed. Delete it — test_cpu_provider_always_available already covers the CPU case explicitly. Coverage note: there is no longer a unit/CI test for device='auto' runtime behaviour. The remaining device='auto' tests cover only constructor/io_config paths (no EP binding). End-to-end coverage on real hardware should live behind an e2e marker.

@e2e

…o e2e The CI flake on test_run_uses_epcontext_after_compile (and similar device='auto' compile/run tests) traces to #708: after that PR, device='auto' force-binds the first resolve_device-chosen EP via add_provider_for_devices, which segfaults natively when the WinML EP registry advertises phantom NPU/GPU EPs on a hardware-less Windows CI runner (#726). Audit the affected device='auto' tests against existing CPU-explicit coverage in the same file: - test_run_uses_epcontext_after_compile redundant with test_compile_is_idempotent - test_basic_inference redundant with test_explicit_cpu_provider + perf tests - test_inference_auto_compiles implicit in every other run-without-compile test - test_state_transitions redundant with test_ep_name_is_none/after_compile - test_reset_returns_to_initialized redundant with test_reset_clears_error_state - test_providers_are_valid_and_include_fallback asserted pre-#708 fallback behaviour that #708 removed All six are redundant. Delete them rather than mechanically rewriting to device='cpu'. Keep test_inference_with_torch_tensor (switched to device='cpu'): only test covering the torch.Tensor input-conversion path. Add test_auto_device_runtime_smoke to tests/e2e/test_session.py under the existing @e2e class marker. End-to-end coverage of the resolve_device -> add_provider_for_devices -> InferenceSession path now lives where it can rely on real hardware being present.

…o e2e (#727) ## Summary Closes #726. After #708, `WinMLSession(device="auto")` resolves to a concrete EP via `resolve_device()` and force-binds it through `add_provider_for_devices`. On the Windows CI runner the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs segfaults natively in `InferenceSession` creation, surfacing as `Process completed with exit code 1` with no pytest traceback. The crash is non-deterministic (#719 happened to pass on a re-run while #717 failed on the same commit), so every PR is exposed to a random failure until main is fixed. ## Approach I first considered rewriting the affected tests to `device="cpu"`. On audit, almost all of them duplicate existing CPU-explicit coverage in the same file — they used `device="auto"` as a convenience, not to exercise auto-resolution semantics. Drop the redundant ones instead. | Test | Overlap | |---|---| | `test_run_uses_epcontext_after_compile` | `test_compile_is_idempotent` (compile()→COMPILED) | | `test_basic_inference` | `test_explicit_cpu_provider` + 5 perf tests already call `run(sample)` on cpu | | `test_inference_auto_compiles` | Implicit in every other test that calls `run()` without prior `compile()` | | `test_state_transitions` | `test_ep_name_is_none_before_compile` + `test_ep_name_after_compile` cover state transitions | | `test_reset_returns_to_initialized` | `test_reset_clears_error_state` exercises `reset()` | | `test_providers_are_valid_and_include_fallback` | Asserted pre-#708 'auto falls back to CPU' behaviour that #708 intentionally removed; `test_cpu_provider_always_available` covers the CPU-explicit case | Six tests deleted, one kept and converted: - **`test_inference_with_torch_tensor`** → `device="cpu"`. Sole test covering `torch.Tensor` input → numpy conversion path. ## Restoring `device="auto"` runtime coverage Added `test_auto_device_runtime_smoke` to [tests/e2e/test_session.py](tests/e2e/test_session.py) under the existing `@pytest.mark.e2e` class marker. End-to-end coverage of `resolve_device → add_provider_for_devices → InferenceSession` now lives where real hardware can be assumed. ## Verification ``` tests\unit\session\test_winml_session.py =========== 33 passed, 6 skipped in 3.02s =========== ``` The 5 fewer-passing-than-before are exactly the deleted redundant tests; nothing else moved.

hualxie added 2 commits May 22, 2026 14:05

use resolve_device in session

1a9ac23

nit

543b8f0

xieofxie requested a review from a team as a code owner May 22, 2026 06:15

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread src/winml/modelkit/session/session.py Fixed

bugfix

b3a2688

xieofxie commented May 22, 2026

View reviewed changes

Comment thread tests/unit/session/test_winml_session.py

Merge remote-tracking branch 'origin/main' into hualxie/merge_resolve…

a2d79f4

…_device

DingmaomaoBJTU requested changes May 25, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

timenick reviewed May 25, 2026

View reviewed changes

Comment thread src/winml/modelkit/session/session.py Outdated

Comment thread src/winml/modelkit/session/session.py

Comment thread src/winml/modelkit/session/session.py

Comment thread src/winml/modelkit/session/session.py

Comment thread tests/unit/session/test_winml_session.py

hualxie added 3 commits May 25, 2026 13:09

fix comments

40601e2

cool

b6b01f1

add test

9fd2ef5

DingmaomaoBJTU approved these changes May 25, 2026

View reviewed changes

xieofxie merged commit 3cf0de5 into main May 25, 2026
9 checks passed

xieofxie deleted the hualxie/merge_resolve_device branch May 25, 2026 06:37

timenick mentioned this pull request May 25, 2026

[session] WinMLSession(device='auto') crashes on hardware-less Windows CI after #708 #726

Closed

timenick mentioned this pull request May 25, 2026

test(session): drop redundant device='auto' tests + move auto smoke to e2e #727

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use resolve device in session creation#708

feat: use resolve device in session creation#708
xieofxie merged 7 commits into
mainfrom
hualxie/merge_resolve_device

xieofxie commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

xieofxie commented May 22, 2026

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xieofxie commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie commented May 22, 2026

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xieofxie commented May 22, 2026 •

edited

Loading