Skip to content

fix(perf): forward --ep/--device through perf --module path and narrow EP discovery#440

Merged
xieofxie merged 7 commits into
mainfrom
hualxie/fix_per
May 7, 2026
Merged

fix(perf): forward --ep/--device through perf --module path and narrow EP discovery#440
xieofxie merged 7 commits into
mainfrom
hualxie/fix_per

Conversation

@xieofxie
Copy link
Copy Markdown
Contributor

@xieofxie xieofxie commented May 6, 2026

Summary

Two related fixes for --ep / --device not flowing all the way through:

commands/perf.py--module path dropped CLI flags

_perf_modules did not accept device, ep, or precision, so winml perf -m bert-base-uncased --module BertAttention --device npu --ep qnn ignored those flags and ran on default device/EP. Plumbed them through to generate_hf_build_config, build_hf_model, and WinMLSession.

session/session.py — explicit-EP path ignored device

_find_ep_device(ep_name) and _find_ep_for_device(device) were two methods, and the explicit-EP branch only matched by EP name. On systems where one EP exposes multiple OrtEpDevice entries (e.g. QNN-on-NPU and QNN-on-GPU), --ep qnn --device gpu could return QNN-on-NPU. Merged into a single _find_ep_device(ep_name=None, device=None) with AND'd filters so both intents are honored.

Tests

  • tests/unit/commands/test_perf_module.py::TestPerfModuleParameterForwarding — invokes the --module CLI with --device npu --ep qnn, mocks the four boundary calls, asserts each receives the expected kwargs. Verified to fail on the unfixed code (KeyError: 'device').
  • tests/unit/session/test_winml_session.py::TestFindEpDevice — 5 cases covering ep_name only / device only / both-must-match / no-match / device="auto" no-op. 4 of 5 fail on the old API.

@xieofxie xieofxie requested a review from a team as a code owner May 6, 2026 08:11
@DingmaomaoBJTU
Copy link
Copy Markdown
Collaborator

Code review

Found 2 issues:

  1. Explicit-EP path AND-filters by device type, reversing the intentional design from PR fix: thread ep parameter through to WinMLSession #404. QNN is registered as OrtHardwareDeviceType.NPU in ORT's EP registry even when targeting GPU workloads. With this change, --ep qnn --device gpu finds no match (NPU ≠ GPU) and silently falls back to policy-based selection — precisely the failure mode the PR fix: thread ep parameter through to WinMLSession #404 comment "Don't filter by device type — trust the user's --ep choice (e.g., QNN reports as NPU in get_ep_devices but can target GPU)" was protecting against. The PR's motivating case (--ep qnn --device cpu) works because QNN-on-CPU gets a distinct OrtEpDevice entry, but that assumption may not hold for GPU.

https://github.com/microsoft/WinML-ModelKit/blob/256c2ade1dcf48160a7b841a3b3e55d2d5dd4481/src/winml/modelkit/session/session.py#L439-L444

  1. device="auto" bypasses the device filter in _find_ep_device, returning the first registered EP unconditionally. The old _find_ep_for_device returned None when DEVICE_TO_DEVICE_TYPE.get(device.upper()) was None (which includes "auto", since it is not a key in the dict), triggering policy-based EP selection. The new unified function's guard (if device else None) only catches None/empty string, not the string "auto". When WinMLSession is constructed with the default device="auto" and self._ep is unset, the no-EP discovery branch at line 456 calls _find_ep_device(device="auto"), which now returns the first entry in ort.get_ep_devices() instead of falling through to policy.

https://github.com/microsoft/WinML-ModelKit/blob/256c2ade1dcf48160a7b841a3b3e55d2d5dd4481/src/winml/modelkit/session/session.py#L499-L508

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

hualxie added 6 commits May 7, 2026 09:51
Previously returned an arbitrary ep_device when both ep_name and
device were unset (or device='auto' with no ep_name). Now returns
None unless at least one filter is effective.
Both call sites pass device as a str. Drop the None branch and
make device the first positional arg.
When user passes --device auto, the log now shows both the requested
device and the device type of the matched ep_device, e.g.:
  Explicit EP: qnn (QNNExecutionProvider) device=auto -> NPU
@xieofxie
Copy link
Copy Markdown
Contributor Author

xieofxie commented May 7, 2026

Code review

Found 2 issues:

  1. Explicit-EP path AND-filters by device type, reversing the intentional design from PR fix: thread ep parameter through to WinMLSession #404. QNN is registered as OrtHardwareDeviceType.NPU in ORT's EP registry even when targeting GPU workloads. With this change, --ep qnn --device gpu finds no match (NPU ≠ GPU) and silently falls back to policy-based selection — precisely the failure mode the PR fix: thread ep parameter through to WinMLSession #404 comment "Don't filter by device type — trust the user's --ep choice (e.g., QNN reports as NPU in get_ep_devices but can target GPU)" was protecting against. The PR's motivating case (--ep qnn --device cpu) works because QNN-on-CPU gets a distinct OrtEpDevice entry, but that assumption may not hold for GPU.

https://github.com/microsoft/WinML-ModelKit/blob/256c2ade1dcf48160a7b841a3b3e55d2d5dd4481/src/winml/modelkit/session/session.py#L439-L444

  1. device="auto" bypasses the device filter in _find_ep_device, returning the first registered EP unconditionally. The old _find_ep_for_device returned None when DEVICE_TO_DEVICE_TYPE.get(device.upper()) was None (which includes "auto", since it is not a key in the dict), triggering policy-based EP selection. The new unified function's guard (if device else None) only catches None/empty string, not the string "auto". When WinMLSession is constructed with the default device="auto" and self._ep is unset, the no-EP discovery branch at line 456 calls _find_ep_device(device="auto"), which now returns the first entry in ort.get_ep_devices() instead of falling through to policy.

https://github.com/microsoft/WinML-ModelKit/blob/256c2ade1dcf48160a7b841a3b3e55d2d5dd4481/src/winml/modelkit/session/session.py#L499-L508

🤖 Generated with Claude Code

  • If this code review was useful, please react with 👍. Otherwise, react with 👎.

Fix the 2nd one

The 1st one is hallucination. QNN ep will always be registered as two devices (for one dll), so --ep qnn --device gpu will work

Comment thread src/winml/modelkit/sysinfo/device.py
@xieofxie xieofxie merged commit 6b8b004 into main May 7, 2026
9 checks passed
@xieofxie xieofxie deleted the hualxie/fix_per branch May 7, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants