perf: HF model ID uses AOT build pipeline, ONNX file uses raw JIT — inconsistent results and divergent code paths

## Summary

`winml perf` currently dispatches by argument shape and runs two completely different benchmark pipelines for what users perceive as the same operation:

- `winml perf -m hf/model` → `PerfBenchmark.run()` (`commands/perf.py:280`), which calls `WinMLAutoModel.from_pretrained(...)`. This runs the **full AOT pipeline**: export → optimize → quantize → compile, then benchmarks the compiled artifact.
- `winml perf -m path/to/model.onnx` → `_run_onnx_benchmark(...)` (`commands/perf.py:952`), which calls `WinMLSession(onnx_path=...)` directly. This **bypasses the build pipeline entirely** — it's effectively a raw ORT JIT load with whatever EP the device resolves to.

The branch is at `commands/perf.py:1297` (`is_onnx = model_path.suffix.lower() == \".onnx\"`).

## Why this matters

1. **Apples-to-oranges numbers.** If a user runs `winml perf -m microsoft/resnet-50` and then `winml perf -m <the_output.onnx>`, the two latency numbers don't agree, even though the file was just produced by the first command. The first goes through quantize+compile (so the benchmarked artifact may not even be `<the_output.onnx>` — it's the compiled context model). The second loads the raw file as-is.

2. **Flag semantics silently change.** Several CLI flags are HF-path only:
   - `--no-quantize`, `--rebuild`, `--ignore-cache`, `--precision`: silently no-op on the ONNX path.
   - `--shape-config`: ONNX path prints a warning and discards it (`commands/perf.py:1301`).
   The user has no way to ask \"benchmark this ONNX after quantize+compile\".

3. **Duplicated benchmark logic.** The perf loop, hardware-monitor setup, model-info printing, and stats collection exist in **both** `PerfBenchmark.run()` and `_run_onnx_benchmark()` (with helpers `_run_simple_loop` / `_run_monitored_loop`). Any future change to the timing/stats code has to be made in two places.

4. **Dead code.** `PerfBenchmark._load_model` at `commands/perf.py:322-353` has a branch for `is_onnx and model_path.exists()` calling `WinMLAutoModel.from_onnx(...)`. This branch is unreachable from the CLI because the outer dispatcher at `commands/perf.py:1297` already routes `.onnx` files away.

## Suggested direction

Unify the two paths around `WinMLAutoModel`, which already exposes both `from_pretrained` (HF) and `from_onnx` (ONNX) constructors. The CLI dispatcher decides which constructor to call; everything downstream (benchmark loop, monitor, stats, report) shares one implementation. Then:

- `winml perf -m model.onnx` (raw, no transforms) → today's default for `.onnx` paths, achieved via `WinMLBuildConfig(export=None, optim=None, quant=None, compile=...)` or equivalent.
- `winml perf -m model.onnx --quantize ... --compile ...` → opt into transforms on a pre-exported ONNX, producing meaningful numbers comparable to the HF flow.
- `--shape-config` / `--precision` / `--no-quantize` etc. mean the same thing on both paths (or error clearly when not applicable).

This also lets us delete `_run_onnx_benchmark` and the dead `from_onnx` branch in `_load_model`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: HF model ID uses AOT build pipeline, ONNX file uses raw JIT — inconsistent results and divergent code paths #596

Summary

Why this matters

Suggested direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf: HF model ID uses AOT build pipeline, ONNX file uses raw JIT — inconsistent results and divergent code paths #596

Description

Summary

Why this matters

Suggested direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions