Skip to content

fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements#477

Merged
DingmaomaoBJTU merged 21 commits into
mainfrom
qiowu/perf-improvements
May 14, 2026
Merged

fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements#477
DingmaomaoBJTU merged 21 commits into
mainfrom
qiowu/perf-improvements

Conversation

@DingmaomaoBJTU
Copy link
Copy Markdown
Collaborator

@DingmaomaoBJTU DingmaomaoBJTU commented May 8, 2026

Summary

winml build UX improvements

  • -c/--config is now optional: omit it to auto-generate the build config from -m <model> directly (one-step workflow; -m becomes required when -c is omitted)
  • WinMLBuildConfig gains a top-level auto field (default True); the build pipeline sets it to False after the optimize + autoconf loop and saves it into winml_build_config.json as part of the build output — this marks the config as "already resolved" so CI/CD pipelines or subsequent winml build -c runs can reuse it without re-triggering the analyzer
  • Setup header now shows (autoconf on) / (autoconf off) next to the config name; summary section prints the saved config path after the final artifact

Performance

  • Skip PyTorch weight loading in from_pretrained() when the ONNX artifact is already cached — use AutoConfig instead (~1 s vs ~60 s on a warm cache)

EP compatibility workaround

  • Add _suppress_ep_registration_stderr() to silence the WinApp SDK 2.0 / WinML 1.8 API version mismatch warning printed to native stderr on every run:
    The requested API version [24] is not available, only API versions [1, 23] are supported.
    Current ORT Version is: 1.23.5
    
    Root cause: SDK 2.0 EP DLLs target ORT API v24; installed WinML runtime is still v1.8 (ORT 1.23.5, max API v23). No functional impact — ORT falls back cleanly. Workaround uses SetStdHandle (Win32) + os.dup2 (CRT) because the DLL writes via GetStdHandle(STD_ERROR_HANDLE), bypassing Python's stderr. Remove once WinML runtime upgrades to 2.0.

Bug fixes

winml perf --ep openvino --device cpu not working

winml perf -m <hf-model> --ep openvino --device cpu was broken. Root cause: when loading a HF model ID, perf explicitly triggered EPContext compilation via generate_hf_build_config(), which called WinMLCompileConfig.for_provider(compile_provider) without forwarding device. As a result, for_openvino() set no device_type in provider_options, so OpenVINO compiled a GPU/NPU blob instead of a CPU-specific one. When WinMLSession later tried to load that blob on CPU, it raised INVALID_GRAPH.

Two fixes:

  1. for_provider() / for_openvino() / for_qnn() / for_vitisai() now all accept and forward deviceprovider_options["device_type"], so the compiled EPContext is device-specific and cache keys don't collide across devices
  2. winml perf now defaults to --no-compile (skip EPContext compilation during benchmarking); use --compile to opt in. Perf is for benchmarking existing artifacts, not building new ones — compilation belongs in winml build

- Suppress native stderr during EP DLL registration so OpenVINO's
  API version mismatch warning no longer surfaces to users
- Skip loading PyTorch model weights in from_pretrained() when the
  ONNX artifact is already cached; use AutoConfig instead (~1s vs ~60s)
- Fix CPU device incorrectly setting compile_provider, which caused
  unnecessary EPContext compilation on CPU
@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner May 8, 2026 13:33
…mand

- WinMLBuildConfig gains a top-level `auto` field (default True); set to
  False after the optimize+autoconf loop so the saved winml_build_config.json
  can be reused without re-running the analyzer
- build command: -c/--config is now optional; when omitted, config is
  auto-generated via generate_build_config() and -m becomes required
- Setup header now shows (autoconf on/off) next to the config name
- Summary section prints the saved winml_build_config.json path after the final artifact
… for explicit EP on CPU

- WinMLCompileConfig.for_qnn() and for_vitisai() now accept a device
  parameter and write it into provider_options["device_type"], matching
  the existing for_openvino() pattern. for_provider() passes device via
  lambda for qnn, vitisai, and openvino.
- precision.resolve_precision(): drop the special-case CPU branch that
  discarded an explicit --ep. Replace with a single rule: ep wins when
  set, otherwise fall back to _DEVICE_TO_PROVIDER[resolved_device]
  (which already maps cpu -> None).
- session.compile(): revert unnecessary EP-fallback retry block; failed
  session creation now raises CompilationError directly.
- Add _suppress_ep_registration_stderr() workaround to silence the
  WinApp SDK 2.0 / WinML 1.8 API version mismatch printed to native
  stderr during EP DLL registration (see comment for full context)
- Revert _suppress_native_output to stdout-only (main branch behavior)
- Fix E501 line-length violations in configs.py and console.py
- Fix test to use _suppress_ep_registration_stderr instead of the
  removed suppress_stderr param; fix F401/N806/TRY002 lint errors
@DingmaomaoBJTU DingmaomaoBJTU changed the title perf: suppress EP warning, skip model load on cache hit, fix CPU compile feat(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements May 9, 2026
@DingmaomaoBJTU DingmaomaoBJTU requested a review from tezheng May 9, 2026 09:43
@DingmaomaoBJTU DingmaomaoBJTU changed the title feat(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements May 9, 2026
DingmaomaoBJTU added a commit that referenced this pull request May 13, 2026
Port winml build UX improvements from #477:
- -c/--config is now optional; omit it to auto-generate config from -m
  (-m becomes required when -c is omitted)
- WinMLBuildConfig gains auto: bool = True; pipeline sets it to False
  after the optimize+autoconf loop and saves it into winml_build_config.json
  so subsequent runs skip the analyzer on reuse
- Setup header shows (autoconf on)/(autoconf off) next to config name
- Summary section prints the saved config path after the final artifact
- Patch compile config with CLI --device after config load (build.py)
- Fix WinMLSession receiving EP name instead of device string (compile.py)
- Use device_type from provider_options for EPContext filename search
- Add glob fallback for EPContext discovery across device variants
Compile is now disabled by default in the build command globally.
The perf command no longer needs its own flag.
Remove the LoadPhase from WinMLAutoModel.from_pretrained(). The cache
check and heavy pytorch model load now happen inside build_hf_model()
which already handles both cases. Replace with a lightweight AutoConfig
fetch for model_type resolution only.
- config/build.py: remove dead else-branch (policy.device is always
  concrete since resolve_device never returns "auto"); make the
  CPU/GPU -> quant=None guarantee unconditional
- commands/build.py: pass device to generate_build_config so the
  precision policy is applied at generation time; use
  resolve_quant_compile_config in _patch_device to clear quant for
  CPU/GPU without hardcoding device names
… explicit EP

- Comment out _suppress_ep_registration_stderr to make EP registration
  output visible for debugging
- CPU shortcut in _build_session_options now skips when an explicit EP
  is set, allowing e.g. OpenVINO to bind correctly on CPU
- Moving no_compile=True default inside the auto-generate branch
  so config-file builds inherit compile settings as before
- Export resolve_quant_compile_config from config package __init__.py
  so _patch_device can import it without reaching into internal modules
…n-time WinML init

conftest.pytest_collection_modifyitems skips WinML EP discovery when no non-e2e
items carry @pytest.mark.ep. The new TestOpenVINODeviceRouting class was only
marked with @pytest.mark.ep("openvino"), causing registry.register_to_ort() +
ort.get_ep_devices() to be called during collection on the CI runner (which lacks
matching hardware), hanging for the full 30-minute job timeout.

Adding @pytest.mark.e2e excludes the class from the EP-discovery guard and from
the '-m "not e2e"' CI filter, so the commands job completes without hanging.
Comment thread src/winml/modelkit/session/session.py Outdated
Comment thread tests/unit/session/test_winml_session.py
Comment thread src/winml/modelkit/compiler/configs.py Outdated
Comment thread src/winml/modelkit/compiler/stages/compile.py Outdated
Comment thread src/winml/modelkit/build/common.py Outdated
DingmaomaoBJTU and others added 2 commits May 14, 2026 09:19
…CPU EP fallback

Four test failures fixed:

1. _suppress_native_output: add suppress_stderr=False param so callers can
   also redirect stderr to the same log/devnull destination as stdout.

2. _suppress_ep_registration_stderr: capture old_w32 (Win32 STD_ERROR_HANDLE)
   BEFORE os.dup2(null_fd, 2). os.dup2 on Windows calls SetStdHandle
   internally, so reading GetStdHandle after the redirect returned the devnull
   handle instead of the original — making the restore a no-op and leaving the
   handle pointing at a closed fd.

3. compile(): when an explicit EP fails on device=cpu (e.g. OpenVINO-CPU
   INVALID_GRAPH), retry with a fresh PREFER_CPU SessionOptions so
   CPUExecutionProvider handles the model as a transparent fallback.

4. precision.py: ep="cpu" must produce compile_provider=None (CPU never needs
   EPContext compilation). The previous ep if ep expression forwarded "cpu"
   as-is; fix to ep if (ep and ep != "cpu").
Comment thread src/winml/modelkit/commands/build.py Outdated
Comment thread src/winml/modelkit/build/common.py Outdated
- session.py: re-enable _suppress_ep_registration_stderr() call in
  _init_winml_eps_once; the function was wired correctly but the call was
  accidentally commented out, leaving the workaround as dead code

- compiler/configs.py: drop quantize param from for_qnn/for_openvino/
  for_vitisai — introduced and immediately deprecated in the same commit
  with no callers; remove the DeprecationWarning shim and unused
  `import warnings`

- compiler/stages/compile.py: remove glob fallback sorted(...)[0] that
  silently picked the alphabetically-first EPContext when stem_<device>_ctx
  and stem_ctx both missed; let the existing "EPContext model not found"
  warning fire instead so the failure is explicit

- build/common.py + commands/build.py: move config.auto = False from
  run_optimize_analyze_loop (unexpected mutation) to the call site in
  build.py right after the loop returns

- commands/build.py: clarify --no-compile/--compile help text to distinguish
  config-file mode (inherits compile section) from auto-generate mode
  (compilation off by default)
Comment thread src/winml/modelkit/config/build.py Outdated
Comment thread src/winml/modelkit/models/auto.py Outdated
Resolved conflict in src/winml/modelkit/commands/perf.py; dropped two
leftover resolved_device assignments whose results were never consumed
(ruff RUF059/F821).
Comment thread src/winml/modelkit/session/session.py Outdated
When an explicit EP fails, raise CompilationError immediately so the
caller sees the real error. The previous fallback to CPUExecutionProvider
was too broad — it silently swapped out any EP (qnn, dml, openvino) for
CPU without surfacing the substitution to the user.

Remove TestOpenVINOCpuFallback unit test that was testing the now-deleted
fallback behavior.
When --device is not passed on the CLI it is None, which bypasses the
function-signature default of "auto" and caused an AttributeError on
device.lower(). Normalize None -> "auto" at the top of resolve_device.
resolve_device expects a non-None string; passing None (the previous
CLI default) caused AttributeError on device.lower(). Fix at the call
site: default --device to "auto" and update the function annotation.

Revert the defensive (device or "auto") workaround added to resolve_device.
@DingmaomaoBJTU DingmaomaoBJTU merged commit ed7dbfd into main May 14, 2026
9 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the qiowu/perf-improvements branch May 14, 2026 03:38
DingmaomaoBJTU added a commit that referenced this pull request May 19, 2026
…dation (#673)

## Summary

- **`analyze_result.json`**: The full static analysis result is now
written to the build output folder after every analyze pass (each pass
overwrites the previous), so users can inspect node-level compatibility
after a build.
- **`--device npu` fix**: `winml build -m <model> --device npu`
previously always failed with _"quant.task is required"_. This was a
regression introduced in ed7dbfd (#477).

## Details

### analyze_result.json
- `analyze_onnx()` gains an `output_path` parameter; when set, it writes
`AnalysisResult.to_json()` to disk after each call
- `run_optimize_analyze_loop` / `_run_analyze_loop` thread
`analyze_output_path` through to every `analyze_onnx()` call
- `build/hf.py`, `build/onnx.py`, and the CLI's `_run_optimize_stage`
all supply `analyze_result_path`

### _patch_device regression fix
`_patch_device` was replacing the entire `cfg.quant` object with the
result of `resolve_quant_compile_config()`, which only carries
`weight_type`/`activation_type`. This silently dropped `task` and
`model_name` that `generate_build_config()` had already set, causing
validation to fail for any device that requires quantization (i.e. NPU).

Fix: when an existing quant config is present, only update the precision
fields instead of replacing the whole object.

```python
# Before
cfg.quant = resolved_quant  # drops task, model_name

# After
if resolved_quant is None:
    cfg.quant = None
elif cfg.quant is None:
    cfg.quant = resolved_quant
else:
    cfg.quant.weight_type = resolved_quant.weight_type
    cfg.quant.activation_type = resolved_quant.activation_type
```
DingmaomaoBJTU added a commit that referenced this pull request May 26, 2026
)

## Summary

- ORT's pybind module writes `Init provider bridge failed.` directly to
native stderr (fd 2 / Win32 `STD_ERROR_HANDLE`), bypassing Python's
`logging`/`warnings` systems entirely
- Added `utils/native_stderr.py` — a dedicated module for capturing and
replaying native stderr output from ORT/QNN
- `suppress_ep_registration_stderr()` context manager redirects fd 2 via
pipe, re-emits captured lines through Python logging
- `replay_ort_startup_logs()` public API for deferred replay after
logging is configured
- Fixed **64-bit HANDLE truncation bug**: set proper
`argtypes`/`restype` via `ctypes.wintypes` for `GetStdHandle` and
`SetStdHandle`
- Fixed **dup2 restore ordering**: UCRT's `dup2` for fds 0-2 internally
calls `SetStdHandle`, so `GetStdHandle` must be captured before `dup2`
and restored after
- **Platform-gated**: no-op on non-Windows — Linux/macOS CI gets a plain
`import onnxruntime` with zero fd overhead
- `constants.py` restored to leaf-level constants-only module
- Downgraded HF/optimum/transformers logging noise to appropriate levels
- Added 10 unit tests covering fd capture, ANSI strip, Win32 HANDLE
restore, and replay API

Reference: #477

---------

Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
Co-authored-by: Yi Ren <reny@microsoft.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yue Sun <yuesu@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants