feat: use resolve device in session creation#708
Conversation
|
after #712 |
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good direction — unifying session creation through resolve_device/resolve_eps eliminates a lot of duplicated logic and makes the EP selection path consistent across all commands. The _find_ep_device removal is clean.
However, there are a few issues, one of which is a crash-level bug.
Also note: DEVICE_POLICY_MAP (lines 46–51) is now dead code — no remaining caller after the policy-based fallback was removed. Should be deleted (or kept only if you plan to re-introduce policy fallback). Similarly, the class docstring (lines 98–106) still says "Policy-based device selection" and "does NOT use explicit EP provider names" which is now the opposite of what the code does.
timenick
left a comment
There was a problem hiding this comment.
Reviewed the resolve_device unification. The shape is right but I want to flag two behavior changes that may be unintentional, a missing regression-test anchor, and three minor / nits.
…-less CI After #708, WinMLSession with device='auto' resolves to a concrete EP via resolve_device() and force-binds it through add_provider_for_devices. On the Windows CI runner, the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs crashes natively in InferenceSession creation (#726). The previously-failing test_run_uses_epcontext_after_compile and the adjacent device='auto' run/compile tests don't depend on auto-resolution semantics; they exercise general session mechanics. Switch them to device='cpu' so the tested behaviour is observable on CPU-only CI. test_providers_are_valid_and_include_fallback asserted the pre-#708 'auto falls back to CPU' behaviour, which #708 intentionally removed. Delete it — test_cpu_provider_always_available already covers the CPU case explicitly. Coverage note: there is no longer a unit/CI test for device='auto' runtime behaviour. The remaining device='auto' tests cover only constructor/io_config paths (no EP binding). End-to-end coverage on real hardware should live behind an e2e marker.
…o e2e The CI flake on test_run_uses_epcontext_after_compile (and similar device='auto' compile/run tests) traces to #708: after that PR, device='auto' force-binds the first resolve_device-chosen EP via add_provider_for_devices, which segfaults natively when the WinML EP registry advertises phantom NPU/GPU EPs on a hardware-less Windows CI runner (#726). Audit the affected device='auto' tests against existing CPU-explicit coverage in the same file: - test_run_uses_epcontext_after_compile redundant with test_compile_is_idempotent - test_basic_inference redundant with test_explicit_cpu_provider + perf tests - test_inference_auto_compiles implicit in every other run-without-compile test - test_state_transitions redundant with test_ep_name_is_none/after_compile - test_reset_returns_to_initialized redundant with test_reset_clears_error_state - test_providers_are_valid_and_include_fallback asserted pre-#708 fallback behaviour that #708 removed All six are redundant. Delete them rather than mechanically rewriting to device='cpu'. Keep test_inference_with_torch_tensor (switched to device='cpu'): only test covering the torch.Tensor input-conversion path. Add test_auto_device_runtime_smoke to tests/e2e/test_session.py under the existing @e2e class marker. End-to-end coverage of the resolve_device -> add_provider_for_devices -> InferenceSession path now lives where it can rely on real hardware being present.
…o e2e (#727) ## Summary Closes #726. After #708, `WinMLSession(device="auto")` resolves to a concrete EP via `resolve_device()` and force-binds it through `add_provider_for_devices`. On the Windows CI runner the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs segfaults natively in `InferenceSession` creation, surfacing as `Process completed with exit code 1` with no pytest traceback. The crash is non-deterministic (#719 happened to pass on a re-run while #717 failed on the same commit), so every PR is exposed to a random failure until main is fixed. ## Approach I first considered rewriting the affected tests to `device="cpu"`. On audit, almost all of them duplicate existing CPU-explicit coverage in the same file — they used `device="auto"` as a convenience, not to exercise auto-resolution semantics. Drop the redundant ones instead. | Test | Overlap | |---|---| | `test_run_uses_epcontext_after_compile` | `test_compile_is_idempotent` (compile()→COMPILED) | | `test_basic_inference` | `test_explicit_cpu_provider` + 5 perf tests already call `run(sample)` on cpu | | `test_inference_auto_compiles` | Implicit in every other test that calls `run()` without prior `compile()` | | `test_state_transitions` | `test_ep_name_is_none_before_compile` + `test_ep_name_after_compile` cover state transitions | | `test_reset_returns_to_initialized` | `test_reset_clears_error_state` exercises `reset()` | | `test_providers_are_valid_and_include_fallback` | Asserted pre-#708 'auto falls back to CPU' behaviour that #708 intentionally removed; `test_cpu_provider_always_available` covers the CPU-explicit case | Six tests deleted, one kept and converted: - **`test_inference_with_torch_tensor`** → `device="cpu"`. Sole test covering `torch.Tensor` input → numpy conversion path. ## Restoring `device="auto"` runtime coverage Added `test_auto_device_runtime_smoke` to [tests/e2e/test_session.py](tests/e2e/test_session.py) under the existing `@pytest.mark.e2e` class marker. End-to-end coverage of `resolve_device → add_provider_for_devices → InferenceSession` now lives where real hardware can be assumed. ## Verification ``` tests\unit\session\test_winml_session.py =========== 33 passed, 6 skipped in 3.02s =========== ``` The 5 fewer-passing-than-before are exactly the deleted redundant tests; nothing else moved.
Use resolve_device in session creation to align device ep resolution with other commands.
Currently we use a manual mapping in resolve_device / resolve_eps/
We could migrate to PREFER_XPU resolution with a temp model + real session later, but it is no harm to unify how we do it.
Closes #700
Closes #656