fix(perf): forward --ep/--device through perf --module path and narrow EP discovery#440
Merged
Conversation
Collaborator
Code reviewFound 2 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
added 6 commits
May 7, 2026 09:51
Previously returned an arbitrary ep_device when both ep_name and device were unset (or device='auto' with no ep_name). Now returns None unless at least one filter is effective.
Both call sites pass device as a str. Drop the None branch and make device the first positional arg.
When user passes --device auto, the log now shows both the requested device and the device type of the matched ep_device, e.g.: Explicit EP: qnn (QNNExecutionProvider) device=auto -> NPU
Contributor
Author
Fix the 2nd one The 1st one is hallucination. QNN ep will always be registered as two devices (for one dll), so |
xieofxie
commented
May 7, 2026
DingmaomaoBJTU
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related fixes for
--ep/--devicenot flowing all the way through:commands/perf.py—--modulepath dropped CLI flags_perf_modulesdid not acceptdevice,ep, orprecision, sowinml perf -m bert-base-uncased --module BertAttention --device npu --ep qnnignored those flags and ran on default device/EP. Plumbed them through togenerate_hf_build_config,build_hf_model, andWinMLSession.session/session.py— explicit-EP path ignored device_find_ep_device(ep_name)and_find_ep_for_device(device)were two methods, and the explicit-EP branch only matched by EP name. On systems where one EP exposes multipleOrtEpDeviceentries (e.g. QNN-on-NPU and QNN-on-GPU),--ep qnn --device gpucould return QNN-on-NPU. Merged into a single_find_ep_device(ep_name=None, device=None)with AND'd filters so both intents are honored.Tests
tests/unit/commands/test_perf_module.py::TestPerfModuleParameterForwarding— invokes the--moduleCLI with--device npu --ep qnn, mocks the four boundary calls, asserts each receives the expected kwargs. Verified to fail on the unfixed code (KeyError: 'device').tests/unit/session/test_winml_session.py::TestFindEpDevice— 5 cases covering ep_name only / device only / both-must-match / no-match /device="auto"no-op. 4 of 5 fail on the old API.