Skip to content

fix: guard optimize/quantize/compile against EPContext models (#256 #257)#265

Merged
DingmaomaoBJTU merged 3 commits into
mainfrom
fix/epcontext-guard-optimize-quantize
Apr 9, 2026
Merged

fix: guard optimize/quantize/compile against EPContext models (#256 #257)#265
DingmaomaoBJTU merged 3 commits into
mainfrom
fix/epcontext-guard-optimize-quantize

Conversation

@DingmaomaoBJTU
Copy link
Copy Markdown
Collaborator

Summary

winml optimize and winml quantize crash with an ORT NOT_IMPLEMENTED error when given a compiled QNN EPContext model. The crash happens deep inside ORT's session initialization, which gives no indication that the real problem is that the input model is already compiled.

Add an is_compiled_onnx() guard at the start of all three commands that operate on uncompiled ONNX models. Also adds the guard to winml compile to catch re-compilation attempts.

Changes

All guards use the existing is_compiled_onnx() helper from winml.modelkit.onnx.

Before / After

winml optimize — Before:

Loading model...
Running optimizer...
ERROR ✗ ort_graph failed: ONNX Runtime optimization failed: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
  Could not find an implementation for EPContext(1) node with name
  'QNNExecutionProvider_QNN_16222565441986465013_1_0'
Error: Optimization failed: ONNX Runtime optimization failed: ...

winml optimize — After:

Error: model.onnx is a compiled EPContext model and cannot be optimized.
Run 'winml optimize' on the original ONNX model before compilation.

winml quantize — Before:

Running quantization...
ERROR: Quantization failed
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
  Could not find an implementation for EPContext(1) node with name
  'QNNExecutionProvider_QNN_16222565441986465013_1_0'
Error: Quantization failed

winml quantize — After:

Error: model.onnx is a compiled EPContext model and cannot be quantized.
Run 'winml quantize' on the original ONNX model before compilation.

winml compile (bonus) — After:

Error: model_ctx.onnx is already a compiled EPContext model and cannot be re-compiled.
Run 'winml compile' on the original ONNX model.

Closes #256
Closes #257

)

Running winml optimize or winml quantize on a compiled QNN EPContext model
crashes deep in ORT with a NOT_IMPLEMENTED error. Add an is_compiled_onnx()
guard before processing starts in all three commands so users get a clear,
actionable error message instead of an ORT traceback.

Also guards winml compile against re-compiling an already-compiled model,
which would silently produce incorrect output.
@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner April 8, 2026 03:07
Copy link
Copy Markdown
Collaborator

@timenick timenick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions on guard placement — see inline comments.

Comment thread src/winml/modelkit/commands/optimize.py Outdated
Comment thread src/winml/modelkit/commands/quantize.py Outdated
@DingmaomaoBJTU
Copy link
Copy Markdown
Collaborator Author

Fixed in 90a8503 — moved both guards to fail-fast positions:

  • optimize.py: guard now fires right after the model is None check, before any config parsing or console output
  • quantize.py: guard now fires right after configure_logging(), before type resolution, output-path computation, or any console output

@DingmaomaoBJTU DingmaomaoBJTU enabled auto-merge (squash) April 9, 2026 02:16
@DingmaomaoBJTU DingmaomaoBJTU merged commit f917a12 into main Apr 9, 2026
8 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the fix/epcontext-guard-optimize-quantize branch April 9, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants