fix(build): optional config, autoconf status display, EP compatibility fixes, and perf improvements#477
Merged
Merged
Conversation
- Suppress native stderr during EP DLL registration so OpenVINO's API version mismatch warning no longer surfaces to users - Skip loading PyTorch model weights in from_pretrained() when the ONNX artifact is already cached; use AutoConfig instead (~1s vs ~60s) - Fix CPU device incorrectly setting compile_provider, which caused unnecessary EPContext compilation on CPU
…mand - WinMLBuildConfig gains a top-level `auto` field (default True); set to False after the optimize+autoconf loop so the saved winml_build_config.json can be reused without re-running the analyzer - build command: -c/--config is now optional; when omitted, config is auto-generated via generate_build_config() and -m becomes required
- Setup header now shows (autoconf on/off) next to the config name - Summary section prints the saved winml_build_config.json path after the final artifact
… for explicit EP on CPU - WinMLCompileConfig.for_qnn() and for_vitisai() now accept a device parameter and write it into provider_options["device_type"], matching the existing for_openvino() pattern. for_provider() passes device via lambda for qnn, vitisai, and openvino. - precision.resolve_precision(): drop the special-case CPU branch that discarded an explicit --ep. Replace with a single rule: ep wins when set, otherwise fall back to _DEVICE_TO_PROVIDER[resolved_device] (which already maps cpu -> None). - session.compile(): revert unnecessary EP-fallback retry block; failed session creation now raises CompilationError directly.
- Add _suppress_ep_registration_stderr() workaround to silence the WinApp SDK 2.0 / WinML 1.8 API version mismatch printed to native stderr during EP DLL registration (see comment for full context) - Revert _suppress_native_output to stdout-only (main branch behavior) - Fix E501 line-length violations in configs.py and console.py - Fix test to use _suppress_ep_registration_stderr instead of the removed suppress_stderr param; fix F401/N806/TRY002 lint errors
DingmaomaoBJTU
added a commit
that referenced
this pull request
May 13, 2026
Port winml build UX improvements from #477: - -c/--config is now optional; omit it to auto-generate config from -m (-m becomes required when -c is omitted) - WinMLBuildConfig gains auto: bool = True; pipeline sets it to False after the optimize+autoconf loop and saves it into winml_build_config.json so subsequent runs skip the analyzer on reuse - Setup header shows (autoconf on)/(autoconf off) next to config name - Summary section prints the saved config path after the final artifact
- Patch compile config with CLI --device after config load (build.py) - Fix WinMLSession receiving EP name instead of device string (compile.py) - Use device_type from provider_options for EPContext filename search - Add glob fallback for EPContext discovery across device variants
Compile is now disabled by default in the build command globally. The perf command no longer needs its own flag.
Remove the LoadPhase from WinMLAutoModel.from_pretrained(). The cache check and heavy pytorch model load now happen inside build_hf_model() which already handles both cases. Replace with a lightweight AutoConfig fetch for model_type resolution only.
- config/build.py: remove dead else-branch (policy.device is always concrete since resolve_device never returns "auto"); make the CPU/GPU -> quant=None guarantee unconditional - commands/build.py: pass device to generate_build_config so the precision policy is applied at generation time; use resolve_quant_compile_config in _patch_device to clear quant for CPU/GPU without hardcoding device names
… explicit EP - Comment out _suppress_ep_registration_stderr to make EP registration output visible for debugging - CPU shortcut in _build_session_options now skips when an explicit EP is set, allowing e.g. OpenVINO to bind correctly on CPU
- Moving no_compile=True default inside the auto-generate branch so config-file builds inherit compile settings as before - Export resolve_quant_compile_config from config package __init__.py so _patch_device can import it without reaching into internal modules
…n-time WinML init
conftest.pytest_collection_modifyitems skips WinML EP discovery when no non-e2e
items carry @pytest.mark.ep. The new TestOpenVINODeviceRouting class was only
marked with @pytest.mark.ep("openvino"), causing registry.register_to_ort() +
ort.get_ep_devices() to be called during collection on the CI runner (which lacks
matching hardware), hanging for the full 30-minute job timeout.
Adding @pytest.mark.e2e excludes the class from the EP-discovery guard and from
the '-m "not e2e"' CI filter, so the commands job completes without hanging.
timenick
reviewed
May 14, 2026
…CPU EP fallback Four test failures fixed: 1. _suppress_native_output: add suppress_stderr=False param so callers can also redirect stderr to the same log/devnull destination as stdout. 2. _suppress_ep_registration_stderr: capture old_w32 (Win32 STD_ERROR_HANDLE) BEFORE os.dup2(null_fd, 2). os.dup2 on Windows calls SetStdHandle internally, so reading GetStdHandle after the redirect returned the devnull handle instead of the original — making the restore a no-op and leaving the handle pointing at a closed fd. 3. compile(): when an explicit EP fails on device=cpu (e.g. OpenVINO-CPU INVALID_GRAPH), retry with a fresh PREFER_CPU SessionOptions so CPUExecutionProvider handles the model as a transparent fallback. 4. precision.py: ep="cpu" must produce compile_provider=None (CPU never needs EPContext compilation). The previous ep if ep expression forwarded "cpu" as-is; fix to ep if (ep and ep != "cpu").
xieofxie
reviewed
May 14, 2026
xieofxie
reviewed
May 14, 2026
- session.py: re-enable _suppress_ep_registration_stderr() call in _init_winml_eps_once; the function was wired correctly but the call was accidentally commented out, leaving the workaround as dead code - compiler/configs.py: drop quantize param from for_qnn/for_openvino/ for_vitisai — introduced and immediately deprecated in the same commit with no callers; remove the DeprecationWarning shim and unused `import warnings` - compiler/stages/compile.py: remove glob fallback sorted(...)[0] that silently picked the alphabetically-first EPContext when stem_<device>_ctx and stem_ctx both missed; let the existing "EPContext model not found" warning fire instead so the failure is explicit - build/common.py + commands/build.py: move config.auto = False from run_optimize_analyze_loop (unexpected mutation) to the call site in build.py right after the loop returns - commands/build.py: clarify --no-compile/--compile help text to distinguish config-file mode (inherits compile section) from auto-generate mode (compilation off by default)
xieofxie
reviewed
May 14, 2026
xieofxie
reviewed
May 14, 2026
Resolved conflict in src/winml/modelkit/commands/perf.py; dropped two leftover resolved_device assignments whose results were never consumed (ruff RUF059/F821).
timenick
reviewed
May 14, 2026
When an explicit EP fails, raise CompilationError immediately so the caller sees the real error. The previous fallback to CPUExecutionProvider was too broad — it silently swapped out any EP (qnn, dml, openvino) for CPU without surfacing the substitution to the user. Remove TestOpenVINOCpuFallback unit test that was testing the now-deleted fallback behavior.
When --device is not passed on the CLI it is None, which bypasses the function-signature default of "auto" and caused an AttributeError on device.lower(). Normalize None -> "auto" at the top of resolve_device.
resolve_device expects a non-None string; passing None (the previous CLI default) caused AttributeError on device.lower(). Fix at the call site: default --device to "auto" and update the function annotation. Revert the defensive (device or "auto") workaround added to resolve_device.
xieofxie
approved these changes
May 14, 2026
This was referenced May 15, 2026
Closed
DingmaomaoBJTU
added a commit
that referenced
this pull request
May 19, 2026
…dation (#673) ## Summary - **`analyze_result.json`**: The full static analysis result is now written to the build output folder after every analyze pass (each pass overwrites the previous), so users can inspect node-level compatibility after a build. - **`--device npu` fix**: `winml build -m <model> --device npu` previously always failed with _"quant.task is required"_. This was a regression introduced in ed7dbfd (#477). ## Details ### analyze_result.json - `analyze_onnx()` gains an `output_path` parameter; when set, it writes `AnalysisResult.to_json()` to disk after each call - `run_optimize_analyze_loop` / `_run_analyze_loop` thread `analyze_output_path` through to every `analyze_onnx()` call - `build/hf.py`, `build/onnx.py`, and the CLI's `_run_optimize_stage` all supply `analyze_result_path` ### _patch_device regression fix `_patch_device` was replacing the entire `cfg.quant` object with the result of `resolve_quant_compile_config()`, which only carries `weight_type`/`activation_type`. This silently dropped `task` and `model_name` that `generate_build_config()` had already set, causing validation to fail for any device that requires quantization (i.e. NPU). Fix: when an existing quant config is present, only update the precision fields instead of replacing the whole object. ```python # Before cfg.quant = resolved_quant # drops task, model_name # After if resolved_quant is None: cfg.quant = None elif cfg.quant is None: cfg.quant = resolved_quant else: cfg.quant.weight_type = resolved_quant.weight_type cfg.quant.activation_type = resolved_quant.activation_type ```
DingmaomaoBJTU
added a commit
that referenced
this pull request
May 26, 2026
) ## Summary - ORT's pybind module writes `Init provider bridge failed.` directly to native stderr (fd 2 / Win32 `STD_ERROR_HANDLE`), bypassing Python's `logging`/`warnings` systems entirely - Added `utils/native_stderr.py` — a dedicated module for capturing and replaying native stderr output from ORT/QNN - `suppress_ep_registration_stderr()` context manager redirects fd 2 via pipe, re-emits captured lines through Python logging - `replay_ort_startup_logs()` public API for deferred replay after logging is configured - Fixed **64-bit HANDLE truncation bug**: set proper `argtypes`/`restype` via `ctypes.wintypes` for `GetStdHandle` and `SetStdHandle` - Fixed **dup2 restore ordering**: UCRT's `dup2` for fds 0-2 internally calls `SetStdHandle`, so `GetStdHandle` must be captured before `dup2` and restored after - **Platform-gated**: no-op on non-Windows — Linux/macOS CI gets a plain `import onnxruntime` with zero fd overhead - `constants.py` restored to leaf-level constants-only module - Downgraded HF/optimum/transformers logging noise to appropriate levels - Added 10 unit tests covering fd capture, ANSI strip, Win32 HANDLE restore, and replay API Reference: #477 --------- Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com> Co-authored-by: Yi Ren <reny@microsoft.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Yue Sun <yuesu@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
winml buildUX improvements-c/--configis now optional: omit it to auto-generate the build config from-m <model>directly (one-step workflow;-mbecomes required when-cis omitted)WinMLBuildConfiggains a top-levelautofield (defaultTrue); the build pipeline sets it toFalseafter the optimize + autoconf loop and saves it intowinml_build_config.jsonas part of the build output — this marks the config as "already resolved" so CI/CD pipelines or subsequentwinml build -cruns can reuse it without re-triggering the analyzer(autoconf on)/(autoconf off)next to the config name; summary section prints the saved config path after the final artifactPerformance
from_pretrained()when the ONNX artifact is already cached — useAutoConfiginstead (~1 s vs ~60 s on a warm cache)EP compatibility workaround
_suppress_ep_registration_stderr()to silence the WinApp SDK 2.0 / WinML 1.8 API version mismatch warning printed to native stderr on every run:SetStdHandle(Win32) +os.dup2(CRT) because the DLL writes viaGetStdHandle(STD_ERROR_HANDLE), bypassing Python's stderr. Remove once WinML runtime upgrades to 2.0.Bug fixes
winml perf --ep openvino --device cpunot workingwinml perf -m <hf-model> --ep openvino --device cpuwas broken. Root cause: when loading a HF model ID,perfexplicitly triggered EPContext compilation viagenerate_hf_build_config(), which calledWinMLCompileConfig.for_provider(compile_provider)without forwardingdevice. As a result,for_openvino()set nodevice_typeinprovider_options, so OpenVINO compiled a GPU/NPU blob instead of a CPU-specific one. WhenWinMLSessionlater tried to load that blob on CPU, it raisedINVALID_GRAPH.Two fixes:
for_provider()/for_openvino()/for_qnn()/for_vitisai()now all accept and forwarddevice→provider_options["device_type"], so the compiled EPContext is device-specific and cache keys don't collide across deviceswinml perfnow defaults to--no-compile(skip EPContext compilation during benchmarking); use--compileto opt in. Perf is for benchmarking existing artifacts, not building new ones — compilation belongs inwinml build