Background
PR #681 (commit 5b088fd9) added a normalization step inside export_pytorch() that runs optimize_onnx(model, output) with no flags on every successful export. This is the right default for everyone who consumes the output of export_pytorch directly — including raw API callers and the winml export CLI.
However, the build pipeline (run_optimize_analyze_loop in src/winml/modelkit/build/common.py) also calls optimize_onnx immediately after the export stage:
https://github.com/microsoft/winml-cli/blob/main/src/winml/modelkit/build/common.py#L82-L88
# 1. Optimize
optimize_onnx(
model=model_path,
output=optimized_path,
**onnx_kwargs,
**config.optim,
)
When **onnx_kwargs and **config.optim are both empty, the build path performs a second normalization-only pass over a model that was already normalized inside export_pytorch. For multi-GB WinML models this is a noticeable wall-time and disk-I/O regression — most observable in winml build, winml eval, and scripts/e2e_eval/run_eval.py.
In all other cases (config.optim carries fusion flags, onnx_kwargs is non-empty, or the input is a private ONNX model that didn't go through export_pytorch) the second optimize_onnx call is still required.
Goal
Eliminate the redundant optimize_onnx pass when, and only when, the build pipeline is about to re-normalize a model that export_pytorch already normalized — without breaking the case where additional fusion flags or a non-export_pytorch-produced model genuinely need optimizing.
Option 1 — ad-hoc --skip-pre-normalization flag (short-term, internal)
Add a temporary CLI flag and plumb it through the build API:
winml build gains --skip-pre-normalization. When set and **onnx_kwargs and **config.optim are both empty, run_optimize_analyze_loop skips the step-1 optimize_onnx call and treats model_path as already-normalized (copies/aliases it as optimized_path).
scripts/e2e_eval/run_eval.py passes the flag through to its winml build subprocess, since the e2e eval pipeline reuses an export_pytorch-produced model.
- The check guards against accidental misuse: if the caller actually provided fusion flags, the flag is silently ignored (or warned) so we never skip an intended optimization.
Pros: minimal blast radius, ships in a day, immediately reclaims the wall-time regression for the two internal pipelines that matter (winml build, run_eval.py).
Cons: explicit opt-in coupling between caller and pipeline; no benefit for external users who don't know to set the flag; technical debt that has to be removed once Option 2 lands.
Option 2 — detect "already normalized" inside optimize_onnx (long-term, public)
Design and implement a signal that lets optimize_onnx recognize a model that is already in its normalized form, and short-circuit when the call has no further work to do (kwargs all empty / defaulted). Possible directions to investigate (not committing to any of these yet):
- Model-level metadata stamp (e.g.
winml.normalized.version in metadata_props) written by _normalize_exported_model and read by optimize_onnx as a fast-path check.
- A cheap structural fingerprint compared against the post-normalize invariant set (
value_info populated, no dangling shape mismatches, fused-op presence, etc.).
- A capability registry handshake —
optimize_onnx enumerates which capabilities each enabled flag would change vs. the current model state, and skips the pipeline if the diff is empty.
Each of these has trade-offs around staleness (a model touched by a third-party tool after normalization), opset/version skew, and false positives that would silently skip an intended optimization. Picking one is out of scope for this issue — the point is to mark this as the durable replacement for Option 1 and capture the directions worth probing.
Pros: caller-transparent; benefits every consumer of optimize_onnx, not just the two internal pipelines; lets the ad-hoc flag retire.
Cons: non-trivial design work; needs a robust contract that survives format/opset evolution and external tooling.
Action items
References
Background
PR #681 (commit
5b088fd9) added a normalization step insideexport_pytorch()that runsoptimize_onnx(model, output)with no flags on every successful export. This is the right default for everyone who consumes the output ofexport_pytorchdirectly — including raw API callers and thewinml exportCLI.However, the build pipeline (
run_optimize_analyze_loopinsrc/winml/modelkit/build/common.py) also callsoptimize_onnximmediately after the export stage:https://github.com/microsoft/winml-cli/blob/main/src/winml/modelkit/build/common.py#L82-L88
When
**onnx_kwargsand**config.optimare both empty, the build path performs a second normalization-only pass over a model that was already normalized insideexport_pytorch. For multi-GB WinML models this is a noticeable wall-time and disk-I/O regression — most observable inwinml build,winml eval, andscripts/e2e_eval/run_eval.py.In all other cases (
config.optimcarries fusion flags,onnx_kwargsis non-empty, or the input is a private ONNX model that didn't go throughexport_pytorch) the secondoptimize_onnxcall is still required.Goal
Eliminate the redundant
optimize_onnxpass when, and only when, the build pipeline is about to re-normalize a model thatexport_pytorchalready normalized — without breaking the case where additional fusion flags or a non-export_pytorch-produced model genuinely need optimizing.Option 1 — ad-hoc
--skip-pre-normalizationflag (short-term, internal)Add a temporary CLI flag and plumb it through the build API:
winml buildgains--skip-pre-normalization. When set and**onnx_kwargsand**config.optimare both empty,run_optimize_analyze_loopskips the step-1optimize_onnxcall and treatsmodel_pathas already-normalized (copies/aliases it asoptimized_path).scripts/e2e_eval/run_eval.pypasses the flag through to itswinml buildsubprocess, since the e2e eval pipeline reuses anexport_pytorch-produced model.Pros: minimal blast radius, ships in a day, immediately reclaims the wall-time regression for the two internal pipelines that matter (
winml build,run_eval.py).Cons: explicit opt-in coupling between caller and pipeline; no benefit for external users who don't know to set the flag; technical debt that has to be removed once Option 2 lands.
Option 2 — detect "already normalized" inside
optimize_onnx(long-term, public)Design and implement a signal that lets
optimize_onnxrecognize a model that is already in its normalized form, and short-circuit when the call has no further work to do (kwargs all empty / defaulted). Possible directions to investigate (not committing to any of these yet):winml.normalized.versioninmetadata_props) written by_normalize_exported_modeland read byoptimize_onnxas a fast-path check.value_infopopulated, no dangling shape mismatches, fused-op presence, etc.).optimize_onnxenumerates which capabilities each enabled flag would change vs. the current model state, and skips the pipeline if the diff is empty.Each of these has trade-offs around staleness (a model touched by a third-party tool after normalization), opset/version skew, and false positives that would silently skip an intended optimization. Picking one is out of scope for this issue — the point is to mark this as the durable replacement for Option 1 and capture the directions worth probing.
Pros: caller-transparent; benefits every consumer of
optimize_onnx, not just the two internal pipelines; lets the ad-hoc flag retire.Cons: non-trivial design work; needs a robust contract that survives format/opset evolution and external tooling.
Action items
--skip-pre-normalizationinwinml build+ threading inrun_eval.py)References
export_pytorchnormalization (merged in5b088fd9)src/winml/modelkit/build/common.pyrun_optimize_analyze_loop— step 1optimize_onnxcallsrc/winml/modelkit/commands/build.py—winml buildCLI surfacescripts/e2e_eval/run_eval.py— internal e2e eval pipeline