Summary
winml optimize fails with an OptimizationError (ORT NOT_IMPLEMENTED) when the input .onnx is the output of winml build — a compiled QNN EPContext model. The ort_graph pipe tries to create an InferenceSession over the EPContext model without the QNN EP, which ORT cannot execute.
Repro
# Step 1: build a compiled QNN model
winml build -c config.json -m microsoft/resnet-50 -o resnet_build/
# Step 2: try to optimize the compiled output
winml optimize -m resnet_build/model.onnx -o resnet_build/model_optimized.onnx
Error
2026-04-03 11:45:34,355 - winml.modelkit.optim.optimizer - INFO - ⚙ Executing ort_graph...
2026-04-03 11:45:34,373 - winml.modelkit.optim.optimizer - ERROR - ✗ ort_graph failed:
ONNX Runtime optimization failed: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
Could not find an implementation for EPContext(1) node with name
'QNNExecutionProvider_QNN_16222565441986465013_1_0'
Full traceback: optim/pipes/graph.py:571 → onnxruntime_inference_collection.py:485 → OptimizationError
Root Cause
The ort_graph optimization pipe creates an ORT InferenceSession on the input model using only CPU/default providers. A QNN EPContext model contains EPContext nodes that are opaque blobs for the QNN execution provider — they cannot be executed (or optimized) by the standard ORT session without loading the QNN EP.
Expected behavior
winml optimize should detect that the input is an EPContext model (e.g. by checking for EPContext op nodes) and either:
- Skip the
ort_graph pipe (which is not applicable to pre-compiled models), or
- Emit a clear error: "Input model is a compiled EPContext artifact and cannot be re-optimized. Run
winml optimize on the original ONNX model before compilation."
Notes
- Reporter: AdinaTru
- Input model: output of
winml config + winml build for ResNet-50 on Qualcomm device
- Same root cause as the
winml quantize EPContext failure (see related issue)
Summary
winml optimizefails with anOptimizationError(ORTNOT_IMPLEMENTED) when the input.onnxis the output ofwinml build— a compiled QNN EPContext model. Theort_graphpipe tries to create anInferenceSessionover the EPContext model without the QNN EP, which ORT cannot execute.Repro
Error
Full traceback:
optim/pipes/graph.py:571 → onnxruntime_inference_collection.py:485 → OptimizationErrorRoot Cause
The
ort_graphoptimization pipe creates an ORTInferenceSessionon the input model using only CPU/default providers. A QNN EPContext model containsEPContextnodes that are opaque blobs for the QNN execution provider — they cannot be executed (or optimized) by the standard ORT session without loading the QNN EP.Expected behavior
winml optimizeshould detect that the input is an EPContext model (e.g. by checking forEPContextop nodes) and either:ort_graphpipe (which is not applicable to pre-compiled models), orwinml optimizeon the original ONNX model before compilation."Notes
winml config+winml buildfor ResNet-50 on Qualcomm devicewinml quantizeEPContext failure (see related issue)