bug: winml quantize crashes with NOT_IMPLEMENTED when input model is a QNN EPContext model

## Summary

`winml quantize` fails with an ORT `NOT_IMPLEMENTED` error when the input `.onnx` is a compiled QNN EPContext model (e.g. output of `winml build`). ORT's calibration step tries to create an `InferenceSession` over the EPContext model using the CPU provider, which cannot execute `EPContext` nodes.

## Repro

```bash
# model.onnx is the output of `winml config` + `winml build` for ResNet-50
winml quantize -m C:\Users\...\resnet-50\model.onnx \
  --weight-type int8 --activation-type uint16 \
  -o C:\Users\...\resnet-50\quantize\model_w8a16.onnx
```

## Error

```
[2026-04-03T11:54:41] ERROR: Quantization failed
Traceback (most recent call last):
  File "winml/modelkit/quant/quantizer.py", line 166, in quantize_onnx
    quantize(...)
  ...
  File "onnxruntime/quantization/calibrate.py", line 235, in create_inference_session
    self.infer_session = onnxruntime.InferenceSession(...)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
Could not find an implementation for EPContext(1) node with name
'QNNExecutionProvider_QNN_16222565441986465013_1_0'
```

## Root Cause

ORT's static quantization calibrator creates an `InferenceSession` on the input model using default (CPU) providers. A QNN EPContext model contains `EPContext` nodes that can only be executed by the QNN execution provider — the CPU provider raises `NOT_IMPLEMENTED`.

Quantization of a pre-compiled EPContext model is not a meaningful operation anyway, as the weights are embedded in an opaque binary blob.

## Expected behavior

`winml quantize` should detect that the input is an EPContext model and fail early with a clear message:
> \"Input model is a compiled QNN EPContext artifact. Run `winml quantize` on the original ONNX model before compilation.\"

## Notes

- Reporter: AdinaTru
- Input model: output of `winml config` + `winml build` for ResNet-50 on Qualcomm device
- Same root cause as the `winml optimize` EPContext failure (see related issue)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: winml quantize crashes with NOT_IMPLEMENTED when input model is a QNN EPContext model #257

Summary

Repro

Error

Root Cause

Expected behavior

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: winml quantize crashes with NOT_IMPLEMENTED when input model is a QNN EPContext model #257

Description

Summary

Repro

Error

Root Cause

Expected behavior

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions