Summary
wmk compile --ep qnn silently falls back to OpenVINO EP when QNN SDK is not installed, with no warning. The output file is misleadingly named *_qnn_ctx.onnx but contains an OpenVINO binary, causing downstream wmk perf to crash.
Repro steps
# 1. On a machine without QNN SDK (Intel NPU + OpenVINO EP available)
wmk compile -m <quantized.onnx> --output-dir out/ --ep qnn
# Observe: no error, but logs show OpenVINO was used:
# Session created with policy qnn, providers: ['OpenVINOExecutionProvider', 'CPUExecutionProvider']
# Output: out/resnet_quant_qnn_ctx.onnx + out/resnet_quant_qnn_ctx_OpenVINOExecutionProvider.bin
# 2. Run perf without --device (auto-device picks GPU, crashes)
wmk perf -m out/resnet_quant_qnn_ctx.onnx --iterations 10
# Error: Nv EP could not deserialize engine from cache: ...OpenVINOExecutionProvider.bin
# 3. Workaround: explicit --device npu
wmk perf -m out/resnet_quant_qnn_ctx.onnx --iterations 10 --device npu
# Works correctly
Root cause
compiler/stages/compile.py:69 passes EP name "qnn" as the device parameter to WinMLSession:
winml_session = session_cls(
onnx_path=model_path,
device=context.execution_provider, # "qnn" is an EP name, not a device
)
session.py:409-410 _build_session_options() doesn't find "qnn" in DEVICE_POLICY_MAP and silently defaults to PREFER_NPU:
policy = DEVICE_POLICY_MAP.get(
device.lower(), ort.OrtExecutionProviderDevicePolicy.PREFER_NPU # silent fallback
)
ORT's PREFER_NPU policy then walks the fallback chain (QNN → OpenVINO → DML → CPU) without error.
Expected behavior
- If user explicitly requests
--ep qnn and QNN SDK is not available, wmk compile should error (or at minimum warn loudly) rather than silently using a different EP.
- Output file naming should reflect the actual EP used.
Environment
wmk sys output: Intel AI Boost NPU, RTX 5080 GPU, QNN SDK "Not found", OpenVINO EP available
- OS: Windows 11, ORT 1.23.3 (windowsml)
🤖 Generated with Claude Code
Summary
wmk compile --ep qnnsilently falls back to OpenVINO EP when QNN SDK is not installed, with no warning. The output file is misleadingly named*_qnn_ctx.onnxbut contains an OpenVINO binary, causing downstreamwmk perfto crash.Repro steps
Root cause
compiler/stages/compile.py:69passes EP name"qnn"as thedeviceparameter toWinMLSession:session.py:409-410_build_session_options()doesn't find"qnn"inDEVICE_POLICY_MAPand silently defaults toPREFER_NPU:ORT's
PREFER_NPUpolicy then walks the fallback chain (QNN → OpenVINO → DML → CPU) without error.Expected behavior
--ep qnnand QNN SDK is not available,wmk compileshould error (or at minimum warn loudly) rather than silently using a different EP.Environment
wmk sysoutput: Intel AI Boost NPU, RTX 5080 GPU, QNN SDK "Not found", OpenVINO EP available🤖 Generated with Claude Code