Skip to content

perf: HF model ID uses AOT build pipeline, ONNX file uses raw JIT — inconsistent results and divergent code paths #596

@xieofxie

Description

@xieofxie

Summary

winml perf currently dispatches by argument shape and runs two completely different benchmark pipelines for what users perceive as the same operation:

  • winml perf -m hf/modelPerfBenchmark.run() (commands/perf.py:280), which calls WinMLAutoModel.from_pretrained(...). This runs the full AOT pipeline: export → optimize → quantize → compile, then benchmarks the compiled artifact.
  • winml perf -m path/to/model.onnx_run_onnx_benchmark(...) (commands/perf.py:952), which calls WinMLSession(onnx_path=...) directly. This bypasses the build pipeline entirely — it's effectively a raw ORT JIT load with whatever EP the device resolves to.

The branch is at commands/perf.py:1297 (is_onnx = model_path.suffix.lower() == \".onnx\").

Why this matters

  1. Apples-to-oranges numbers. If a user runs winml perf -m microsoft/resnet-50 and then winml perf -m <the_output.onnx>, the two latency numbers don't agree, even though the file was just produced by the first command. The first goes through quantize+compile (so the benchmarked artifact may not even be <the_output.onnx> — it's the compiled context model). The second loads the raw file as-is.

  2. Flag semantics silently change. Several CLI flags are HF-path only:

    • --no-quantize, --rebuild, --ignore-cache, --precision: silently no-op on the ONNX path.
    • --shape-config: ONNX path prints a warning and discards it (commands/perf.py:1301).
      The user has no way to ask "benchmark this ONNX after quantize+compile".
  3. Duplicated benchmark logic. The perf loop, hardware-monitor setup, model-info printing, and stats collection exist in both PerfBenchmark.run() and _run_onnx_benchmark() (with helpers _run_simple_loop / _run_monitored_loop). Any future change to the timing/stats code has to be made in two places.

  4. Dead code. PerfBenchmark._load_model at commands/perf.py:322-353 has a branch for is_onnx and model_path.exists() calling WinMLAutoModel.from_onnx(...). This branch is unreachable from the CLI because the outer dispatcher at commands/perf.py:1297 already routes .onnx files away.

Suggested direction

Unify the two paths around WinMLAutoModel, which already exposes both from_pretrained (HF) and from_onnx (ONNX) constructors. The CLI dispatcher decides which constructor to call; everything downstream (benchmark loop, monitor, stats, report) shares one implementation. Then:

  • winml perf -m model.onnx (raw, no transforms) → today's default for .onnx paths, achieved via WinMLBuildConfig(export=None, optim=None, quant=None, compile=...) or equivalent.
  • winml perf -m model.onnx --quantize ... --compile ... → opt into transforms on a pre-exported ONNX, producing meaningful numbers comparable to the HF flow.
  • --shape-config / --precision / --no-quantize etc. mean the same thing on both paths (or error clearly when not applicable).

This also lets us delete _run_onnx_benchmark and the dead from_onnx branch in _load_model.

Metadata

Metadata

Labels

P1High — major feature broken or significant UX impactbugSomething isn't workingtriagedIssue has been triaged

Type

No fields configured for Bug.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions