Skip to content

bug: winml quantize crashes with NOT_IMPLEMENTED when input model is a QNN EPContext model #257

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Summary

winml quantize fails with an ORT NOT_IMPLEMENTED error when the input .onnx is a compiled QNN EPContext model (e.g. output of winml build). ORT's calibration step tries to create an InferenceSession over the EPContext model using the CPU provider, which cannot execute EPContext nodes.

Repro

# model.onnx is the output of `winml config` + `winml build` for ResNet-50
winml quantize -m C:\Users\...\resnet-50\model.onnx \
  --weight-type int8 --activation-type uint16 \
  -o C:\Users\...\resnet-50\quantize\model_w8a16.onnx

Error

[2026-04-03T11:54:41] ERROR: Quantization failed
Traceback (most recent call last):
  File "winml/modelkit/quant/quantizer.py", line 166, in quantize_onnx
    quantize(...)
  ...
  File "onnxruntime/quantization/calibrate.py", line 235, in create_inference_session
    self.infer_session = onnxruntime.InferenceSession(...)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
Could not find an implementation for EPContext(1) node with name
'QNNExecutionProvider_QNN_16222565441986465013_1_0'

Root Cause

ORT's static quantization calibrator creates an InferenceSession on the input model using default (CPU) providers. A QNN EPContext model contains EPContext nodes that can only be executed by the QNN execution provider — the CPU provider raises NOT_IMPLEMENTED.

Quantization of a pre-compiled EPContext model is not a meaningful operation anyway, as the weights are embedded in an opaque binary blob.

Expected behavior

winml quantize should detect that the input is an EPContext model and fail early with a clear message:

"Input model is a compiled QNN EPContext artifact. Run winml quantize on the original ONNX model before compilation."

Notes

  • Reporter: AdinaTru
  • Input model: output of winml config + winml build for ResNet-50 on Qualcomm device
  • Same root cause as the winml optimize EPContext failure (see related issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    NPUNPU specificQDQQDQ quantization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions