Avoid duplicate normalization between export_pytorch and run_optimize_analyze_loop

## Background

PR #681 (commit `5b088fd9`) added a normalization step inside `export_pytorch()` that runs `optimize_onnx(model, output)` with **no flags** on every successful export. This is the right default for everyone who consumes the output of `export_pytorch` directly — including raw API callers and the `winml export` CLI.

However, the build pipeline (`run_optimize_analyze_loop` in `src/winml/modelkit/build/common.py`) **also** calls `optimize_onnx` immediately after the export stage:

https://github.com/microsoft/winml-cli/blob/main/src/winml/modelkit/build/common.py#L82-L88

```python
# 1. Optimize
optimize_onnx(
    model=model_path,
    output=optimized_path,
    **onnx_kwargs,
    **config.optim,
)
```

When `**onnx_kwargs` **and** `**config.optim` are both empty, the build path performs a second normalization-only pass over a model that was already normalized inside `export_pytorch`. For multi-GB WinML models this is a noticeable wall-time and disk-I/O regression — most observable in `winml build`, `winml eval`, and `scripts/e2e_eval/run_eval.py`.

In all other cases (`config.optim` carries fusion flags, `onnx_kwargs` is non-empty, or the input is a private ONNX model that didn't go through `export_pytorch`) the second `optimize_onnx` call is still required.

## Goal

Eliminate the redundant `optimize_onnx` pass when, and only when, the build pipeline is about to re-normalize a model that `export_pytorch` already normalized — without breaking the case where additional fusion flags or a non-`export_pytorch`-produced model genuinely need optimizing.

## Option 1 — ad-hoc `--skip-pre-normalization` flag (short-term, internal)

Add a temporary CLI flag and plumb it through the build API:

- `winml build` gains `--skip-pre-normalization`. When set **and** `**onnx_kwargs` and `**config.optim` are both empty, `run_optimize_analyze_loop` skips the step-1 `optimize_onnx` call and treats `model_path` as already-normalized (copies/aliases it as `optimized_path`).
- `scripts/e2e_eval/run_eval.py` passes the flag through to its `winml build` subprocess, since the e2e eval pipeline reuses an `export_pytorch`-produced model.
- The check guards against accidental misuse: if the caller actually provided fusion flags, the flag is silently ignored (or warned) so we never skip an intended optimization.

**Pros:** minimal blast radius, ships in a day, immediately reclaims the wall-time regression for the two internal pipelines that matter (`winml build`, `run_eval.py`).
**Cons:** explicit opt-in coupling between caller and pipeline; no benefit for external users who don't know to set the flag; technical debt that has to be removed once Option 2 lands.

## Option 2 — detect "already normalized" inside `optimize_onnx` (long-term, public)

Design and implement a signal that lets `optimize_onnx` recognize a model that is already in its normalized form, and short-circuit when the call has no further work to do (kwargs all empty / defaulted). Possible directions to investigate (not committing to any of these yet):

- Model-level metadata stamp (e.g. `winml.normalized.version` in `metadata_props`) written by `_normalize_exported_model` and read by `optimize_onnx` as a fast-path check.
- A cheap structural fingerprint compared against the post-normalize invariant set (`value_info` populated, no dangling shape mismatches, fused-op presence, etc.).
- A capability registry handshake — `optimize_onnx` enumerates which capabilities each enabled flag would change vs. the current model state, and skips the pipeline if the diff is empty.

Each of these has trade-offs around staleness (a model touched by a third-party tool after normalization), opset/version skew, and false positives that would silently skip an intended optimization. Picking one is out of scope for this issue — the point is to mark this as the durable replacement for Option 1 and capture the directions worth probing.

**Pros:** caller-transparent; benefits every consumer of `optimize_onnx`, not just the two internal pipelines; lets the ad-hoc flag retire.
**Cons:** non-trivial design work; needs a robust contract that survives format/opset evolution and external tooling.

## Action items

- [ ] Land Option 1 as an interim measure (`--skip-pre-normalization` in `winml build` + threading in `run_eval.py`)
- [ ] Open a design ticket for Option 2 and pick a direction
- [ ] Once Option 2 lands, remove Option 1's ad-hoc flag

## References

- PR #681 — added `export_pytorch` normalization (merged in `5b088fd9`)
- `src/winml/modelkit/build/common.py` `run_optimize_analyze_loop` — step 1 `optimize_onnx` call
- `src/winml/modelkit/commands/build.py` — `winml build` CLI surface
- `scripts/e2e_eval/run_eval.py` — internal e2e eval pipeline


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid duplicate normalization between export_pytorch and run_optimize_analyze_loop #684

Background

Goal

Option 1 — ad-hoc `--skip-pre-normalization` flag (short-term, internal)

Option 2 — detect "already normalized" inside `optimize_onnx` (long-term, public)

Action items

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Avoid duplicate normalization between export_pytorch and run_optimize_analyze_loop #684

Description

Background

Goal

Option 1 — ad-hoc --skip-pre-normalization flag (short-term, internal)

Option 2 — detect "already normalized" inside optimize_onnx (long-term, public)

Action items

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Option 1 — ad-hoc `--skip-pre-normalization` flag (short-term, internal)

Option 2 — detect "already normalized" inside `optimize_onnx` (long-term, public)