test(session): drop redundant device='auto' tests + move auto smoke to e2e#727
Merged
Conversation
…o e2e The CI flake on test_run_uses_epcontext_after_compile (and similar device='auto' compile/run tests) traces to #708: after that PR, device='auto' force-binds the first resolve_device-chosen EP via add_provider_for_devices, which segfaults natively when the WinML EP registry advertises phantom NPU/GPU EPs on a hardware-less Windows CI runner (#726). Audit the affected device='auto' tests against existing CPU-explicit coverage in the same file: - test_run_uses_epcontext_after_compile redundant with test_compile_is_idempotent - test_basic_inference redundant with test_explicit_cpu_provider + perf tests - test_inference_auto_compiles implicit in every other run-without-compile test - test_state_transitions redundant with test_ep_name_is_none/after_compile - test_reset_returns_to_initialized redundant with test_reset_clears_error_state - test_providers_are_valid_and_include_fallback asserted pre-#708 fallback behaviour that #708 removed All six are redundant. Delete them rather than mechanically rewriting to device='cpu'. Keep test_inference_with_torch_tensor (switched to device='cpu'): only test covering the torch.Tensor input-conversion path. Add test_auto_device_runtime_smoke to tests/e2e/test_session.py under the existing @e2e class marker. End-to-end coverage of the resolve_device -> add_provider_for_devices -> InferenceSession path now lives where it can rely on real hardware being present.
xieofxie
reviewed
May 25, 2026
Review feedback on #727: a few specific assertions from the deleted device='auto' tests weren't pinned elsewhere on device='cpu': - not is_compiled -> run() -> is_compiled (implicit lazy-compile contract) - outputs['C'].dtype == np.float32 (output dtype on a fp32 model) Add both to test_cpu_provider_always_available so the contract stays covered in PR-level CI without resurrecting redundant test methods.
Review feedback on #727: the new auto-device e2e test should pin the assertions that the deleted unit tests covered, since this is now the home of device='auto' runtime coverage. Expand test_auto_device_runtime_smoke to include: - state == INITIALIZED before any work - not is_compiled before run (lazy-compile contract) - state == COMPILED after run - outputs['C'].dtype == np.float32 - second run keeps COMPILED state - reset() -> INITIALIZED + not is_compiled Add test_auto_device_explicit_compile_writes_epcontext to replace the deleted test_run_uses_epcontext_after_compile (covers the explicit compile() + run() ordering).
CodeQL flagged the second 'outputs = session.run(...)' as an unused local. The second run only exercises state preservation; drop the binding rather than fabricate an extra assertion.
xieofxie
approved these changes
May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #726.
After #708,
WinMLSession(device="auto")resolves to a concrete EP viaresolve_device()and force-binds it throughadd_provider_for_devices. On the Windows CI runner the WinML EP registry advertises phantom NPU/GPU EP devices even without real hardware — force-binding those EPs segfaults natively inInferenceSessioncreation, surfacing asProcess completed with exit code 1with no pytest traceback.The crash is non-deterministic (#719 happened to pass on a re-run while #717 failed on the same commit), so every PR is exposed to a random failure until main is fixed.
Approach
I first considered rewriting the affected tests to
device="cpu". On audit, almost all of them duplicate existing CPU-explicit coverage in the same file — they useddevice="auto"as a convenience, not to exercise auto-resolution semantics. Drop the redundant ones instead.test_run_uses_epcontext_after_compiletest_compile_is_idempotent(compile()→COMPILED)test_basic_inferencetest_explicit_cpu_provider+ 5 perf tests already callrun(sample)on cputest_inference_auto_compilesrun()without priorcompile()test_state_transitionstest_ep_name_is_none_before_compile+test_ep_name_after_compilecover state transitionstest_reset_returns_to_initializedtest_reset_clears_error_stateexercisesreset()test_providers_are_valid_and_include_fallbacktest_cpu_provider_always_availablecovers the CPU-explicit caseSix tests deleted, one kept and converted:
test_inference_with_torch_tensor→device="cpu". Sole test coveringtorch.Tensorinput → numpy conversion path.Restoring
device="auto"runtime coverageAdded
test_auto_device_runtime_smoketo tests/e2e/test_session.py under the existing@pytest.mark.e2eclass marker. End-to-end coverage ofresolve_device → add_provider_for_devices → InferenceSessionnow lives where real hardware can be assumed.Verification
The 5 fewer-passing-than-before are exactly the deleted redundant tests; nothing else moved.