fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements by DingmaomaoBJTU · Pull Request #477 · microsoft/winml-cli

DingmaomaoBJTU · 2026-05-08T13:33:15Z

Summary

`winml build` UX improvements

-c/--config is now optional: omit it to auto-generate the build config from -m <model> directly (one-step workflow; -m becomes required when -c is omitted)
WinMLBuildConfig gains a top-level auto field (default True); the build pipeline sets it to False after the optimize + autoconf loop and saves it into winml_build_config.json as part of the build output — this marks the config as "already resolved" so CI/CD pipelines or subsequent winml build -c runs can reuse it without re-triggering the analyzer
Setup header now shows (autoconf on) / (autoconf off) next to the config name; summary section prints the saved config path after the final artifact

Performance

Skip PyTorch weight loading in from_pretrained() when the ONNX artifact is already cached — use AutoConfig instead (~1 s vs ~60 s on a warm cache)

EP compatibility workaround

Add _suppress_ep_registration_stderr() to silence the WinApp SDK 2.0 / WinML 1.8 API version mismatch warning printed to native stderr on every run:
```
The requested API version [24] is not available, only API versions [1, 23] are supported.
Current ORT Version is: 1.23.5
```
Root cause: SDK 2.0 EP DLLs target ORT API v24; installed WinML runtime is still v1.8 (ORT 1.23.5, max API v23). No functional impact — ORT falls back cleanly. Workaround uses SetStdHandle (Win32) + os.dup2 (CRT) because the DLL writes via GetStdHandle(STD_ERROR_HANDLE), bypassing Python's stderr. Remove once WinML runtime upgrades to 2.0.

Bug fixes

`winml perf --ep openvino --device cpu` not working

winml perf -m <hf-model> --ep openvino --device cpu was broken. Root cause: when loading a HF model ID, perf explicitly triggered EPContext compilation via generate_hf_build_config(), which called WinMLCompileConfig.for_provider(compile_provider) without forwarding device. As a result, for_openvino() set no device_type in provider_options, so OpenVINO compiled a GPU/NPU blob instead of a CPU-specific one. When WinMLSession later tried to load that blob on CPU, it raised INVALID_GRAPH.

Two fixes:

for_provider() / for_openvino() / for_qnn() / for_vitisai() now all accept and forward device → provider_options["device_type"], so the compiled EPContext is device-specific and cache keys don't collide across devices
winml perf now defaults to --no-compile (skip EPContext compilation during benchmarking); use --compile to opt in. Perf is for benchmarking existing artifacts, not building new ones — compilation belongs in winml build

- Suppress native stderr during EP DLL registration so OpenVINO's API version mismatch warning no longer surfaces to users - Skip loading PyTorch model weights in from_pretrained() when the ONNX artifact is already cached; use AutoConfig instead (~1s vs ~60s) - Fix CPU device incorrectly setting compile_provider, which caused unnecessary EPContext compilation on CPU

…mand - WinMLBuildConfig gains a top-level `auto` field (default True); set to False after the optimize+autoconf loop so the saved winml_build_config.json can be reused without re-running the analyzer - build command: -c/--config is now optional; when omitted, config is auto-generated via generate_build_config() and -m becomes required

- Setup header now shows (autoconf on/off) next to the config name - Summary section prints the saved winml_build_config.json path after the final artifact

… for explicit EP on CPU - WinMLCompileConfig.for_qnn() and for_vitisai() now accept a device parameter and write it into provider_options["device_type"], matching the existing for_openvino() pattern. for_provider() passes device via lambda for qnn, vitisai, and openvino. - precision.resolve_precision(): drop the special-case CPU branch that discarded an explicit --ep. Replace with a single rule: ep wins when set, otherwise fall back to _DEVICE_TO_PROVIDER[resolved_device] (which already maps cpu -> None). - session.compile(): revert unnecessary EP-fallback retry block; failed session creation now raises CompilationError directly.

- Add _suppress_ep_registration_stderr() workaround to silence the WinApp SDK 2.0 / WinML 1.8 API version mismatch printed to native stderr during EP DLL registration (see comment for full context) - Revert _suppress_native_output to stdout-only (main branch behavior) - Fix E501 line-length violations in configs.py and console.py - Fix test to use _suppress_ep_registration_stderr instead of the removed suppress_stderr param; fix F401/N806/TRY002 lint errors

Port winml build UX improvements from #477: - -c/--config is now optional; omit it to auto-generate config from -m (-m becomes required when -c is omitted) - WinMLBuildConfig gains auto: bool = True; pipeline sets it to False after the optimize+autoconf loop and saves it into winml_build_config.json so subsequent runs skip the analyzer on reuse - Setup header shows (autoconf on)/(autoconf off) next to config name - Summary section prints the saved config path after the final artifact

- Patch compile config with CLI --device after config load (build.py) - Fix WinMLSession receiving EP name instead of device string (compile.py) - Use device_type from provider_options for EPContext filename search - Add glob fallback for EPContext discovery across device variants

Compile is now disabled by default in the build command globally. The perf command no longer needs its own flag.

Remove the LoadPhase from WinMLAutoModel.from_pretrained(). The cache check and heavy pytorch model load now happen inside build_hf_model() which already handles both cases. Replace with a lightweight AutoConfig fetch for model_type resolution only.

- config/build.py: remove dead else-branch (policy.device is always concrete since resolve_device never returns "auto"); make the CPU/GPU -> quant=None guarantee unconditional - commands/build.py: pass device to generate_build_config so the precision policy is applied at generation time; use resolve_quant_compile_config in _patch_device to clear quant for CPU/GPU without hardcoding device names

… explicit EP - Comment out _suppress_ep_registration_stderr to make EP registration output visible for debugging - CPU shortcut in _build_session_options now skips when an explicit EP is set, allowing e.g. OpenVINO to bind correctly on CPU

- Moving no_compile=True default inside the auto-generate branch so config-file builds inherit compile settings as before - Export resolve_quant_compile_config from config package __init__.py so _patch_device can import it without reaching into internal modules

…n-time WinML init conftest.pytest_collection_modifyitems skips WinML EP discovery when no non-e2e items carry @pytest.mark.ep. The new TestOpenVINODeviceRouting class was only marked with @pytest.mark.ep("openvino"), causing registry.register_to_ort() + ort.get_ep_devices() to be called during collection on the CI runner (which lacks matching hardware), hanging for the full 30-minute job timeout. Adding @pytest.mark.e2e excludes the class from the EP-discovery guard and from the '-m "not e2e"' CI filter, so the commands job completes without hanging.

…CPU EP fallback Four test failures fixed: 1. _suppress_native_output: add suppress_stderr=False param so callers can also redirect stderr to the same log/devnull destination as stdout. 2. _suppress_ep_registration_stderr: capture old_w32 (Win32 STD_ERROR_HANDLE) BEFORE os.dup2(null_fd, 2). os.dup2 on Windows calls SetStdHandle internally, so reading GetStdHandle after the redirect returned the devnull handle instead of the original — making the restore a no-op and leaving the handle pointing at a closed fd. 3. compile(): when an explicit EP fails on device=cpu (e.g. OpenVINO-CPU INVALID_GRAPH), retry with a fresh PREFER_CPU SessionOptions so CPUExecutionProvider handles the model as a transparent fallback. 4. precision.py: ep="cpu" must produce compile_provider=None (CPU never needs EPContext compilation). The previous ep if ep expression forwarded "cpu" as-is; fix to ep if (ep and ep != "cpu").

- session.py: re-enable _suppress_ep_registration_stderr() call in _init_winml_eps_once; the function was wired correctly but the call was accidentally commented out, leaving the workaround as dead code - compiler/configs.py: drop quantize param from for_qnn/for_openvino/ for_vitisai — introduced and immediately deprecated in the same commit with no callers; remove the DeprecationWarning shim and unused `import warnings` - compiler/stages/compile.py: remove glob fallback sorted(...)[0] that silently picked the alphabetically-first EPContext when stem_<device>_ctx and stem_ctx both missed; let the existing "EPContext model not found" warning fire instead so the failure is explicit - build/common.py + commands/build.py: move config.auto = False from run_optimize_analyze_loop (unexpected mutation) to the call site in build.py right after the loop returns - commands/build.py: clarify --no-compile/--compile help text to distinguish config-file mode (inherits compile section) from auto-generate mode (compilation off by default)

Resolved conflict in src/winml/modelkit/commands/perf.py; dropped two leftover resolved_device assignments whose results were never consumed (ruff RUF059/F821).

When an explicit EP fails, raise CompilationError immediately so the caller sees the real error. The previous fallback to CPUExecutionProvider was too broad — it silently swapped out any EP (qnn, dml, openvino) for CPU without surfacing the substitution to the user. Remove TestOpenVINOCpuFallback unit test that was testing the now-deleted fallback behavior.

When --device is not passed on the CLI it is None, which bypasses the function-signature default of "auto" and caused an AttributeError on device.lower(). Normalize None -> "auto" at the top of resolve_device.

resolve_device expects a non-None string; passing None (the previous CLI default) caused AttributeError on device.lower(). Fix at the call site: default --device to "auto" and update the function annotation. Revert the defensive (device or "auto") workaround added to resolve_device.

…dation (#673) ## Summary - **`analyze_result.json`**: The full static analysis result is now written to the build output folder after every analyze pass (each pass overwrites the previous), so users can inspect node-level compatibility after a build. - **`--device npu` fix**: `winml build -m <model> --device npu` previously always failed with _"quant.task is required"_. This was a regression introduced in ed7dbfd (#477). ## Details ### analyze_result.json - `analyze_onnx()` gains an `output_path` parameter; when set, it writes `AnalysisResult.to_json()` to disk after each call - `run_optimize_analyze_loop` / `_run_analyze_loop` thread `analyze_output_path` through to every `analyze_onnx()` call - `build/hf.py`, `build/onnx.py`, and the CLI's `_run_optimize_stage` all supply `analyze_result_path` ### _patch_device regression fix `_patch_device` was replacing the entire `cfg.quant` object with the result of `resolve_quant_compile_config()`, which only carries `weight_type`/`activation_type`. This silently dropped `task` and `model_name` that `generate_build_config()` had already set, causing validation to fail for any device that requires quantization (i.e. NPU). Fix: when an existing quant config is present, only update the precision fields instead of replacing the whole object. ```python # Before cfg.quant = resolved_quant # drops task, model_name # After if resolved_quant is None: cfg.quant = None elif cfg.quant is None: cfg.quant = resolved_quant else: cfg.quant.weight_type = resolved_quant.weight_type cfg.quant.activation_type = resolved_quant.activation_type ```

) ## Summary - ORT's pybind module writes `Init provider bridge failed.` directly to native stderr (fd 2 / Win32 `STD_ERROR_HANDLE`), bypassing Python's `logging`/`warnings` systems entirely - Added `utils/native_stderr.py` — a dedicated module for capturing and replaying native stderr output from ORT/QNN - `suppress_ep_registration_stderr()` context manager redirects fd 2 via pipe, re-emits captured lines through Python logging - `replay_ort_startup_logs()` public API for deferred replay after logging is configured - Fixed **64-bit HANDLE truncation bug**: set proper `argtypes`/`restype` via `ctypes.wintypes` for `GetStdHandle` and `SetStdHandle` - Fixed **dup2 restore ordering**: UCRT's `dup2` for fds 0-2 internally calls `SetStdHandle`, so `GetStdHandle` must be captured before `dup2` and restored after - **Platform-gated**: no-op on non-Windows — Linux/macOS CI gets a plain `import onnxruntime` with zero fd overhead - `constants.py` restored to leaf-level constants-only module - Downgraded HF/optimum/transformers logging noise to appropriate levels - Added 10 unit tests covering fd capture, ANSI strip, Win32 HANDLE restore, and replay API Reference: #477 --------- Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com> Co-authored-by: Yi Ren <reny@microsoft.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Yue Sun <yuesu@microsoft.com>

DingmaomaoBJTU requested a review from a team as a code owner May 8, 2026 13:33

DingmaomaoBJTU added 4 commits May 9, 2026 12:17

feat: show autoconf status and build config path in build output

ef05944

- Setup header now shows (autoconf on/off) next to the config name - Summary section prints the saved winml_build_config.json path after the final artifact

DingmaomaoBJTU changed the title ~~perf: suppress EP warning, skip model load on cache hit, fix CPU compile~~ feat(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements May 9, 2026

DingmaomaoBJTU requested a review from tezheng May 9, 2026 09:43

DingmaomaoBJTU mentioned this pull request May 9, 2026

bug: winml perf --ep openvino --device cpu fails with INVALID_GRAPH (OV routes to NPU despite cpu flag) #569

Closed

DingmaomaoBJTU changed the title ~~feat(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements~~ fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements May 9, 2026

DingmaomaoBJTU mentioned this pull request May 12, 2026

feat(build): optional -c/--config, auto field, autoconf UX (#477) #603

Closed

merge

99478d4

DingmaomaoBJTU added 7 commits May 13, 2026 20:33

refactor(perf): remove --no-compile/--compile option

61ff350

Compile is now disabled by default in the build command globally. The perf command no longer needs its own flag.

timenick reviewed May 14, 2026

View reviewed changes

DingmaomaoBJTU and others added 2 commits May 14, 2026 09:19

Merge branch 'main' into qiowu/perf-improvements

db30727

xieofxie reviewed May 14, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/build.py Outdated

xieofxie reviewed May 14, 2026

View reviewed changes

Comment thread src/winml/modelkit/build/common.py Outdated

xieofxie reviewed May 14, 2026

View reviewed changes

Comment thread src/winml/modelkit/config/build.py Outdated

xieofxie reviewed May 14, 2026

View reviewed changes

Comment thread src/winml/modelkit/models/auto.py Outdated

DingmaomaoBJTU added 2 commits May 14, 2026 10:57

Merge branch 'main' into qiowu/perf-improvements

11ccc69

Resolved conflict in src/winml/modelkit/commands/perf.py; dropped two leftover resolved_device assignments whose results were never consumed (ruff RUF059/F821).

fix(auto): restore ep=resolved_ep for inference wrapper construction

fe0bbe7

timenick reviewed May 14, 2026

View reviewed changes

Comment thread src/winml/modelkit/session/session.py Outdated

DingmaomaoBJTU added 3 commits May 14, 2026 11:21

fix(sysinfo): treat None device as auto in resolve_device

118e560

When --device is not passed on the CLI it is None, which bypasses the function-signature default of "auto" and caused an AttributeError on device.lower(). Normalize None -> "auto" at the top of resolve_device.

xieofxie approved these changes May 14, 2026

View reviewed changes

DingmaomaoBJTU merged commit ed7dbfd into main May 14, 2026
9 checks passed

DingmaomaoBJTU deleted the qiowu/perf-improvements branch May 14, 2026 03:38

chinazhangchao linked an issue May 14, 2026 that may be closed by this pull request

[winml config] [P0] --precision int8 silently produces uint8/uint8 #525

Closed

chinazhangchao removed a link to an issue May 14, 2026

[winml config] [P0] --precision int8 silently produces uint8/uint8 #525

Closed

DingmaomaoBJTU mentioned this pull request May 22, 2026

fix: suppress ORT native stderr, fix HANDLE bug, clean up warnings #709

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements#477

fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements#477
DingmaomaoBJTU merged 21 commits into
mainfrom
qiowu/perf-improvements

DingmaomaoBJTU commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DingmaomaoBJTU commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

winml build UX improvements

Performance

EP compatibility workaround

Bug fixes

winml perf --ep openvino --device cpu not working

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DingmaomaoBJTU commented May 8, 2026 •

edited

Loading

`winml build` UX improvements

`winml perf --ep openvino --device cpu` not working