Skip to content

test(session): drop redundant device='auto' tests + move auto smoke to e2e#727

Merged
timenick merged 5 commits into
mainfrom
zhiwang/fix-session-auto-device-crash
May 25, 2026
Merged

test(session): drop redundant device='auto' tests + move auto smoke to e2e#727
timenick merged 5 commits into
mainfrom
zhiwang/fix-session-auto-device-crash

Conversation

@timenick
Copy link
Copy Markdown
Collaborator

Summary

Closes #726.

After #708, WinMLSession(device="auto") resolves to a concrete EP via resolve_device() and force-binds it through add_provider_for_devices. On the Windows CI runner the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs segfaults natively in InferenceSession creation, surfacing as Process completed with exit code 1 with no pytest traceback.

The crash is non-deterministic (#719 happened to pass on a re-run while #717 failed on the same commit), so every PR is exposed to a random failure until main is fixed.

Approach

I first considered rewriting the affected tests to device="cpu". On audit, almost all of them duplicate existing CPU-explicit coverage in the same file — they used device="auto" as a convenience, not to exercise auto-resolution semantics. Drop the redundant ones instead.

Test Overlap
test_run_uses_epcontext_after_compile test_compile_is_idempotent (compile()→COMPILED)
test_basic_inference test_explicit_cpu_provider + 5 perf tests already call run(sample) on cpu
test_inference_auto_compiles Implicit in every other test that calls run() without prior compile()
test_state_transitions test_ep_name_is_none_before_compile + test_ep_name_after_compile cover state transitions
test_reset_returns_to_initialized test_reset_clears_error_state exercises reset()
test_providers_are_valid_and_include_fallback Asserted pre-#708 'auto falls back to CPU' behaviour that #708 intentionally removed; test_cpu_provider_always_available covers the CPU-explicit case

Six tests deleted, one kept and converted:

  • test_inference_with_torch_tensordevice="cpu". Sole test covering torch.Tensor input → numpy conversion path.

Restoring device="auto" runtime coverage

Added test_auto_device_runtime_smoke to tests/e2e/test_session.py under the existing @pytest.mark.e2e class marker. End-to-end coverage of resolve_device → add_provider_for_devices → InferenceSession now lives where real hardware can be assumed.

Verification

tests\unit\session\test_winml_session.py
=========== 33 passed, 6 skipped in 3.02s ===========

The 5 fewer-passing-than-before are exactly the deleted redundant tests; nothing else moved.

…o e2e

The CI flake on test_run_uses_epcontext_after_compile (and similar
device='auto' compile/run tests) traces to #708: after that PR,
device='auto' force-binds the first resolve_device-chosen EP via
add_provider_for_devices, which segfaults natively when the WinML EP
registry advertises phantom NPU/GPU EPs on a hardware-less Windows CI
runner (#726).

Audit the affected device='auto' tests against existing CPU-explicit
coverage in the same file:

  - test_run_uses_epcontext_after_compile  redundant with test_compile_is_idempotent
  - test_basic_inference                   redundant with test_explicit_cpu_provider + perf tests
  - test_inference_auto_compiles           implicit in every other run-without-compile test
  - test_state_transitions                 redundant with test_ep_name_is_none/after_compile
  - test_reset_returns_to_initialized      redundant with test_reset_clears_error_state
  - test_providers_are_valid_and_include_fallback  asserted pre-#708 fallback
                                                   behaviour that #708 removed

All six are redundant. Delete them rather than mechanically rewriting
to device='cpu'.

Keep test_inference_with_torch_tensor (switched to device='cpu'): only
test covering the torch.Tensor input-conversion path.

Add test_auto_device_runtime_smoke to tests/e2e/test_session.py under
the existing @e2e class marker. End-to-end coverage of the
resolve_device -> add_provider_for_devices -> InferenceSession path
now lives where it can rely on real hardware being present.
@timenick timenick requested a review from a team as a code owner May 25, 2026 08:29
Comment thread tests/e2e/test_session.py
timenick added 2 commits May 25, 2026 16:38
Review feedback on #727: a few specific assertions from the deleted
device='auto' tests weren't pinned elsewhere on device='cpu':

- not is_compiled -> run() -> is_compiled  (implicit lazy-compile contract)
- outputs['C'].dtype == np.float32         (output dtype on a fp32 model)

Add both to test_cpu_provider_always_available so the contract stays
covered in PR-level CI without resurrecting redundant test methods.
Review feedback on #727: the new auto-device e2e test should pin the
assertions that the deleted unit tests covered, since this is now the
home of device='auto' runtime coverage.

Expand test_auto_device_runtime_smoke to include:
  - state == INITIALIZED before any work
  - not is_compiled before run (lazy-compile contract)
  - state == COMPILED after run
  - outputs['C'].dtype == np.float32
  - second run keeps COMPILED state
  - reset() -> INITIALIZED + not is_compiled

Add test_auto_device_explicit_compile_writes_epcontext to replace the
deleted test_run_uses_epcontext_after_compile (covers the explicit
compile() + run() ordering).
Comment thread tests/e2e/test_session.py Fixed
CodeQL flagged the second 'outputs = session.run(...)' as an unused
local. The second run only exercises state preservation; drop the
binding rather than fabricate an extra assertion.
@timenick timenick merged commit 6361251 into main May 25, 2026
9 checks passed
@timenick timenick deleted the zhiwang/fix-session-auto-device-crash branch May 25, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[session] WinMLSession(device='auto') crashes on hardware-less Windows CI after #708

3 participants