Skip to content

fix: winml config --device npu silently succeeds when no NPU EP is available #431

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Summary

When --device npu is specified but no NPU execution provider (VitisAI, QNN) is available, winml config emits a WARNING but still exits 0 and writes a config with \"execution_provider\": \"qnn\" — guaranteeing a failure at compile/inference time.

Context

There are two independent device-availability checks that can disagree:

  1. Hardware detection (_get_available_devices()): WMI-based, detects NPU silicon → adds \"npu\" to available_devices
  2. EP detection (_get_available_eps()): ORT/WinML provider enumeration → may not find QNNExecutionProvider or VitisAIExecutionProvider (e.g. QNN SDK not installed, wrong ORT build)

When these disagree, the tool currently warns and proceeds — creating a config that will fail downstream.

Repro

On a machine with NPU hardware but no QNN/VitisAI EP (e.g. standard pip install onnxruntime, no QNN SDK):

winml config -m microsoft/resnet-50 --device npu --precision int8 -o out.json

Observed:

WARNING - Device 'npu' requested but no compatible EP found.
  Compatible EPs: ['VitisAIExecutionProvider', 'QNNExecutionProvider'].
  Available EPs:  ['CPUExecutionProvider', 'DmlExecutionProvider',
                   'NvTensorRTRTXExecutionProvider', 'OpenVINOExecutionProvider'].
INFO  - Device resolved: npu (available: npu, gpu, cpu)
✅ Config saved to: out.json          ← exits 0

out.json contains \"execution_provider\": \"qnn\" — an EP that was just flagged as unavailable.

Expected: Either an error exit (EP not available → config would be unusable), or at minimum a visible mismatch between the "no compatible EP" warning and the "Device resolved: npu" success message.

Current State

  • src/winml/modelkit/sysinfo/device.py:183–191resolve_device() warns but unconditionally returns (device, available_devices) for explicit device requests
  • src/winml/modelkit/sysinfo/device.py:77–109_get_available_devices() is hardware-only (WMI); does not consult EP availability
  • src/winml/modelkit/config/build.py:279–284 and :573–578 — callers log "Device resolved" using available_devices from hardware detection, contradicting the EP-level warning

Desired State

When an explicit --device <X> is requested but no compatible EP is available, one of:

  • Option A (stricter): resolve_device() raises an error — the user asked for a device that cannot work. The CLI should print a clear error and exit non-zero.
  • Option B (softer): EP availability gates available_devices — if no NPU EP is found, "npu" is not reported as available, and --device npu triggers a clear "device unavailable" error.

Option A is simpler and more actionable for users.

Acceptance Criteria

  • winml config --device npu exits non-zero with a clear error message when no NPU EP (VitisAI, QNN) is available in the current environment
  • The error message names the missing compatible EPs so the user knows what to install
  • --device auto is unaffected: if no NPU EP is available, auto silently falls through to GPU/CPU (existing behavior is correct)
  • uv run pytest tests/ passes

Technical Notes

  • The fix is localized to resolve_device() in src/winml/modelkit/sysinfo/device.py:181–191. For explicit devices (non-auto), a missing compatible EP should raise ValueError rather than log a warning.
  • _get_available_devices() (hardware detection) and _get_available_eps() (EP detection) deliberately use different mechanisms — this is intentional for the auto path. The fix should not merge them; it should just make the explicit-device path harder on the mismatch.
  • The two call sites in build.py (lines 279, 573) don't need changes if the error is raised inside resolve_device().
  • CLAUDE.md naming convention: EP acronyms are uppercase (QNN, NPU, GPU, CPU).

Related Files

  • src/winml/modelkit/sysinfo/device.py:146–191resolve_device(), the fix site
  • src/winml/modelkit/sysinfo/device.py:77–109_get_available_devices() (hardware)
  • src/winml/modelkit/sysinfo/device.py:112–143_get_available_eps() (ORT/WinML)
  • src/winml/modelkit/config/build.py:279–284 — first caller of resolve_device()
  • src/winml/modelkit/config/build.py:573–578 — second caller of resolve_device()
  • tests/sysinfo/ — add test: mock _get_available_eps() to return only CPU EP, call resolve_device("npu"), assert ValueError

Metadata

Metadata

Assignees

Labels

0430 bugbashBugs found during 0430 bug bashNPUNPU specificP0Critical — blocking, crash, data lossbugSomething isn't workingdev experienceDeveloper experience improvementshardwareHardware relatedtriagedIssue has been triaged

Type

No fields configured for Bug.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions