Skip to content

Avoid duplicate normalization between export_pytorch and run_optimize_analyze_loop #684

@vortex-captain

Description

@vortex-captain

Background

PR #681 (commit 5b088fd9) added a normalization step inside export_pytorch() that runs optimize_onnx(model, output) with no flags on every successful export. This is the right default for everyone who consumes the output of export_pytorch directly — including raw API callers and the winml export CLI.

However, the build pipeline (run_optimize_analyze_loop in src/winml/modelkit/build/common.py) also calls optimize_onnx immediately after the export stage:

https://github.com/microsoft/winml-cli/blob/main/src/winml/modelkit/build/common.py#L82-L88

# 1. Optimize
optimize_onnx(
    model=model_path,
    output=optimized_path,
    **onnx_kwargs,
    **config.optim,
)

When **onnx_kwargs and **config.optim are both empty, the build path performs a second normalization-only pass over a model that was already normalized inside export_pytorch. For multi-GB WinML models this is a noticeable wall-time and disk-I/O regression — most observable in winml build, winml eval, and scripts/e2e_eval/run_eval.py.

In all other cases (config.optim carries fusion flags, onnx_kwargs is non-empty, or the input is a private ONNX model that didn't go through export_pytorch) the second optimize_onnx call is still required.

Goal

Eliminate the redundant optimize_onnx pass when, and only when, the build pipeline is about to re-normalize a model that export_pytorch already normalized — without breaking the case where additional fusion flags or a non-export_pytorch-produced model genuinely need optimizing.

Option 1 — ad-hoc --skip-pre-normalization flag (short-term, internal)

Add a temporary CLI flag and plumb it through the build API:

  • winml build gains --skip-pre-normalization. When set and **onnx_kwargs and **config.optim are both empty, run_optimize_analyze_loop skips the step-1 optimize_onnx call and treats model_path as already-normalized (copies/aliases it as optimized_path).
  • scripts/e2e_eval/run_eval.py passes the flag through to its winml build subprocess, since the e2e eval pipeline reuses an export_pytorch-produced model.
  • The check guards against accidental misuse: if the caller actually provided fusion flags, the flag is silently ignored (or warned) so we never skip an intended optimization.

Pros: minimal blast radius, ships in a day, immediately reclaims the wall-time regression for the two internal pipelines that matter (winml build, run_eval.py).
Cons: explicit opt-in coupling between caller and pipeline; no benefit for external users who don't know to set the flag; technical debt that has to be removed once Option 2 lands.

Option 2 — detect "already normalized" inside optimize_onnx (long-term, public)

Design and implement a signal that lets optimize_onnx recognize a model that is already in its normalized form, and short-circuit when the call has no further work to do (kwargs all empty / defaulted). Possible directions to investigate (not committing to any of these yet):

  • Model-level metadata stamp (e.g. winml.normalized.version in metadata_props) written by _normalize_exported_model and read by optimize_onnx as a fast-path check.
  • A cheap structural fingerprint compared against the post-normalize invariant set (value_info populated, no dangling shape mismatches, fused-op presence, etc.).
  • A capability registry handshake — optimize_onnx enumerates which capabilities each enabled flag would change vs. the current model state, and skips the pipeline if the diff is empty.

Each of these has trade-offs around staleness (a model touched by a third-party tool after normalization), opset/version skew, and false positives that would silently skip an intended optimization. Picking one is out of scope for this issue — the point is to mark this as the durable replacement for Option 1 and capture the directions worth probing.

Pros: caller-transparent; benefits every consumer of optimize_onnx, not just the two internal pipelines; lets the ad-hoc flag retire.
Cons: non-trivial design work; needs a robust contract that survives format/opset evolution and external tooling.

Action items

  • Land Option 1 as an interim measure (--skip-pre-normalization in winml build + threading in run_eval.py)
  • Open a design ticket for Option 2 and pick a direction
  • Once Option 2 lands, remove Option 1's ad-hoc flag

References

  • PR feat(export): normalize exported ONNX in-place via optimize_onnx #681 — added export_pytorch normalization (merged in 5b088fd9)
  • src/winml/modelkit/build/common.py run_optimize_analyze_loop — step 1 optimize_onnx call
  • src/winml/modelkit/commands/build.pywinml build CLI surface
  • scripts/e2e_eval/run_eval.py — internal e2e eval pipeline

Metadata

Metadata

Labels

triagedIssue has been triaged

Type

No fields configured for Bug.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions