Summary
When running winml perf --device npu (or any command that resolves device → EP) on an AMD machine, the tool hardcodes qnn as the NPU provider. Since QNNExecutionProvider is not present on AMD hardware, inference falls through with a warning and likely silently degrades to CPU.
Context
The NPU EP is platform-dependent:
- Qualcomm / Intel NPU →
QNNExecutionProvider (short name qnn)
- AMD NPU (Ryzen AI) →
VitisAIExecutionProvider (short name vitisai)
The current code has "npu": "qnn" hardcoded as the single NPU provider, so any AMD NPU machine hits the wrong EP.
Observed behavior (screenshot)
winml perf -m openai/clip-vit-base-patch32 --device npu --iterations 100
...
[2026-04-30T15:16:37] WARNING: EP 'qnn' (QNNExecutionProvider) not found in available devices
The model runs but the device shown is npu while the QNN warning indicates the session fell back — likely to CPU policy-based selection.
Root Cause
Three locations share the same hardcoded assumption:
-
src/winml/modelkit/config/precision.py:66 — _DEVICE_TO_PROVIDER dict:
_DEVICE_TO_PROVIDER: dict[str, str | None] = {
"npu": "qnn", # ← wrong for AMD
"gpu": "dml",
"cpu": None,
}
Used at precision.py:296 to set compile_provider during resolve_precision().
-
src/winml/modelkit/commands/compile.py:280-283 — _resolve_compile_provider():
provider = _DEVICE_TO_PROVIDER.get(device.lower())
if provider is None:
return "cpu" if device.lower() == "cpu" else "qnn" # ← also hardcoded
return provider
-
src/winml/modelkit/session/session.py:438-451 — once self._ep = "qnn" is propagated from the compile config, _build_session_options tries to find QNNExecutionProvider via _find_ep_device(). On AMD it returns None → warning fires → falls through to policy-based selection.
The session already has the right mechanism for discovery (_find_ep_for_device, session.py:487-510), but it is bypassed when self._ep is pre-set to "qnn".
Desired State
When --device npu is requested:
- On Qualcomm/Intel: resolves to
qnn (QNNExecutionProvider) — current behavior, keep
- On AMD Ryzen AI: resolves to
vitisai (VitisAIExecutionProvider)
The EP selection for NPU should inspect which NPU EP is actually available at runtime (via ort.get_ep_devices() or _get_available_eps() from sysinfo/device.py) rather than hardcoding qnn.
Acceptance Criteria
Technical Notes
sysinfo/device.py already has the right EP-to-device map (_EP_DEVICE_MAP) where both QNNExecutionProvider and VitisAIExecutionProvider map to "npu". The inverse _DEVICE_EP_MAP["npu"] already contains both.
resolve_device() (sysinfo/device.py:146) uses _DEVICE_EP_MAP and _get_available_eps() correctly — it returns the right device string. The problem is downstream: get_provider_for_device() then re-maps that device back to a single hardcoded EP.
- The fix should replace
get_provider_for_device("npu") with a function that inspects _get_available_eps() and picks the first available NPU EP from _DEVICE_EP_MAP["npu"] (priority: VitisAIExecutionProvider, QNNExecutionProvider, or whatever is present).
precision.py is pure decision logic (no I/O, no imports of sysinfo at module level) — keep sysinfo calls in callers (build.py, compile.py) rather than inside precision.py to preserve testability.
Related Files
src/winml/modelkit/config/precision.py:64-68 — _DEVICE_TO_PROVIDER dict (primary fix location)
src/winml/modelkit/config/precision.py:296 — compile_provider assignment in resolve_precision()
src/winml/modelkit/commands/compile.py:271-284 — _resolve_compile_provider() secondary hardcode
src/winml/modelkit/session/session.py:438-451 — EP matching in _build_session_options() (symptom surface)
src/winml/modelkit/sysinfo/device.py:37-58 — _EP_DEVICE_MAP / _DEVICE_EP_MAP (already correct, use this)
src/winml/modelkit/sysinfo/device.py:112-143 — _get_available_eps() (runtime EP discovery)
Summary
When running
winml perf --device npu(or any command that resolves device → EP) on an AMD machine, the tool hardcodesqnnas the NPU provider. SinceQNNExecutionProvideris not present on AMD hardware, inference falls through with a warning and likely silently degrades to CPU.Context
The NPU EP is platform-dependent:
QNNExecutionProvider(short nameqnn)VitisAIExecutionProvider(short namevitisai)The current code has
"npu": "qnn"hardcoded as the single NPU provider, so any AMD NPU machine hits the wrong EP.Observed behavior (screenshot)
The model runs but the device shown is
npuwhile the QNN warning indicates the session fell back — likely to CPU policy-based selection.Root Cause
Three locations share the same hardcoded assumption:
src/winml/modelkit/config/precision.py:66—_DEVICE_TO_PROVIDERdict:Used at
precision.py:296to setcompile_providerduringresolve_precision().src/winml/modelkit/commands/compile.py:280-283—_resolve_compile_provider():src/winml/modelkit/session/session.py:438-451— onceself._ep = "qnn"is propagated from the compile config,_build_session_optionstries to findQNNExecutionProvidervia_find_ep_device(). On AMD it returnsNone→ warning fires → falls through to policy-based selection.The session already has the right mechanism for discovery (
_find_ep_for_device,session.py:487-510), but it is bypassed whenself._epis pre-set to"qnn".Desired State
When
--device npuis requested:qnn(QNNExecutionProvider) — current behavior, keepvitisai(VitisAIExecutionProvider)The EP selection for NPU should inspect which NPU EP is actually available at runtime (via
ort.get_ep_devices()or_get_available_eps()fromsysinfo/device.py) rather than hardcodingqnn.Acceptance Criteria
winml perf --device npuon an AMD machine usesVitisAIExecutionProviderwithout warningswinml perf --device npuon a Qualcomm/Intel machine continues to useQNNExecutionProvider_DEVICE_TO_PROVIDER["npu"]is no longer a static string — it is either removed or replaced with a runtime lookupwinml compile --device npu) resolves tovitisaion AMDTechnical Notes
sysinfo/device.pyalready has the right EP-to-device map (_EP_DEVICE_MAP) where bothQNNExecutionProviderandVitisAIExecutionProvidermap to"npu". The inverse_DEVICE_EP_MAP["npu"]already contains both.resolve_device()(sysinfo/device.py:146) uses_DEVICE_EP_MAPand_get_available_eps()correctly — it returns the right device string. The problem is downstream:get_provider_for_device()then re-maps that device back to a single hardcoded EP.get_provider_for_device("npu")with a function that inspects_get_available_eps()and picks the first available NPU EP from_DEVICE_EP_MAP["npu"](priority:VitisAIExecutionProvider,QNNExecutionProvider, or whatever is present).precision.pyis pure decision logic (no I/O, no imports of sysinfo at module level) — keep sysinfo calls in callers (build.py,compile.py) rather than insideprecision.pyto preserve testability.Related Files
src/winml/modelkit/config/precision.py:64-68—_DEVICE_TO_PROVIDERdict (primary fix location)src/winml/modelkit/config/precision.py:296—compile_providerassignment inresolve_precision()src/winml/modelkit/commands/compile.py:271-284—_resolve_compile_provider()secondary hardcodesrc/winml/modelkit/session/session.py:438-451— EP matching in_build_session_options()(symptom surface)src/winml/modelkit/sysinfo/device.py:37-58—_EP_DEVICE_MAP/_DEVICE_EP_MAP(already correct, use this)src/winml/modelkit/sysinfo/device.py:112-143—_get_available_eps()(runtime EP discovery)