-
Notifications
You must be signed in to change notification settings - Fork 749
Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger #15735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger #15735
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b154f22 with merge base a042763 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Seems like missing some dependency Do you want to introduce this dependency in general? |
|
Thank you for enable this feature and the detailed documentation! Just minor comments, can we move |
|
@winskuo-quic do you able to try story llm and see what is cos similarity it is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the QNN ExecuTorch Intermediate Output Debugger, a comprehensive debugging tool for comparing intermediate tensor outputs between QNN and CPU execution to identify accuracy issues in model inference.
Key Changes
- New Debugging Framework: Added a complete debugger implementation with support for custom metrics, multiple output formats (SVG, CSV, raw files), and per-layer accuracy comparison
- API and Utilities: Introduced
QNNIntermediateDebuggerclass with configurable evaluators and output format options - Enhanced Testing: Updated test infrastructure to validate intermediate output dumping and comparison for various model architectures
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/qualcomm/debugger/qnn_intermediate_debugger.py | Core debugger implementation for matching and processing QNN vs CPU intermediate tensors |
| backends/qualcomm/debugger/metrics_evaluator.py | Base class and concrete metric evaluators (cosine similarity, ATOL, MSE) for tensor comparison |
| backends/qualcomm/debugger/format_outputs.py | Output formatting utilities for SVG graph visualization, CSV export, and raw tensor dumps |
| backends/qualcomm/debugger/README.md | Comprehensive documentation with tutorial, design flow, and usage examples |
| examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py | End-to-end demo script using InceptionV3 to showcase debugger capabilities |
| examples/qualcomm/utils.py | Added debugger integration to build_executorch_binary() with spelling fixes |
| backends/qualcomm/tests/utils.py | Enhanced test utilities to validate intermediate output dumping and comparison |
| backends/qualcomm/tests/test_qnn_delegate.py | Added new tests for intermediate debugger and refactored existing tests |
| backends/qualcomm/qnn_preprocess.py | Moved axis order metadata cleanup logic from partitioner to preprocessor |
| backends/qualcomm/partition/qnn_partitioner.py | Removed axis order cleanup (now handled in preprocessor) |
| backends/qualcomm/builders/node_visitor.py | Added QNN tensor name tracking to node metadata for debugger |
| backends/qualcomm/utils/constants.py | Added QCOM_TENSOR_NAME constant for intermediate output tracking |
| examples/qualcomm/executor_runner/qnn_executor_runner.cpp | Increased debug buffer size from 20MB to 100MB |
| examples/qualcomm/qnn_intermediate_output_inspector.py | Removed obsolete intermediate output inspector (replaced by new debugger) |
| examples/qualcomm/README.md | Added documentation for new util_scripts folder |
| examples/qualcomm/qaihub_scripts/llama/README.md | Fixed spelling: "futher" → "further" |
| backends/qualcomm/quantizer/README.md | Fixed spelling: "utilies" → "utilities", "converage" → "coverage" |
| backends/qualcomm/builders/README.md | Fixed spelling and typos in documentation |
| backends/qualcomm/README.md | Updated code review authors list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
backends/qualcomm/tests/utils.py
Outdated
| self.assertTrue( | ||
| len(node_tensor_map) == expected_compared_events, |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertTrue(a == b) cannot provide an informative message. Using assertEqual(a, b) instead will give more informative messages.
| self.assertTrue( | |
| len(node_tensor_map) == expected_compared_events, | |
| self.assertEqual( | |
| len(node_tensor_map), expected_compared_events, |
|
Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here. One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools ( This might simply be a gap in my understanding, so I’d love to learn more about your experience here:
If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow. Looking forward to your thoughts! |
Hi @cccclai, |
Hi @billmguo, |
Hi @Gasoonjia, Thanks for reviewing the PR. |
Just to confirm, do you mean moving from |
4bdfe6e to
fe228e9
Compare
|
hi @winskuo-quic: Thanks again for your feedback, and your email reply. I understand this work may have been created quite a while ago in a private context. I’m wondering whether there is any plan to migrate or align the current QNN debugger with the native ExecuTorch devtools — not just reusing If you’re open to it, I’d be happy to discuss potential directions for collaboration on operator-level numerical discrepancy detection and how we can streamline the work going forward. Thanks |
|
Hi @Gasoonjia, I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe |
Thanks for efficient reply and sharing your thoughts regarding our current API. with #15969, now we can create customize metrics for numerical gap detection. Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation! |
Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! |
This PR looks fine to me and thanks for your contribution! Will let @cccclai for final stamp. |
e17f559 to
6328f7b
Compare
.ci/scripts/test_model.sh
Outdated
| export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang/ | ||
| export PYTHONPATH=$EXECUTORCH_ROOT/.. | ||
|
|
||
| pip install pydot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we introduce the dependency here, let's also add it in the build.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
I have removed pip install pydot from .ci/scripts/test_model.sh & .ci/scripts/test_qnn_static_llm.sh, and I have introduced the dependency under build.sh since both ci files will call build.sh.
| from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import QNNIntermediateDebugger | ||
|
|
||
| qnn_intermediate_debugger = QNNIntermediateDebugger() | ||
| build_executorch_binary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use to_edge..._qnn() API as it's the newer API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
Actually, build_executorch_binary should still be a more official API to users. Inside build_executorch_binary, we will then call to_edge_transform_and_lower_to_qnn. I think it will be more straightforward to use build_executorch_binary since to_edge_transform_and_lower_to_qnn requires users to handle quantization themself and some post processing after the model is lowered.
We might consider moving these API functions to a new file in the future, making it easier for users to know what are some released APIs to use.
cccclai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for clarifying. Can you rebase the PR? The CI stopped working last week
dfe9aeb to
724ad4c
Compare
|
It seems like I merge some PRs that have conflict with this PR...can you rebase again? |
724ad4c to
b154f22
Compare
Done. Thanks |
Summary
python examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py -b build-android -m SM8550 --device $DEVICE --dataset ../imagenet-mini/val/ --dump_intermediate_outputsAn example use case
MobileVit V2 has significant drop in accuracy in certain QNN versions, while QNN 2.29 has good accuracy. With the help of accuracy debugger, we have targeted the node native_group_norm_default_1 in the model. As shown below, in QNN 2.29, this node has a cos_similarity (QNN V.S. CPU) of 0.997, while all other QNN versions has cos_similarity of 0, which provides us some hint it is possibly this group_norm node that is causing accuracy drop.

What's Coming Next?
debug_handleand reuse https://github.com/pytorch/executorch/blob/main/devtools/inspector/_intermediate_output_capturer.py in future instead.Test plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleUtilsScript.test_intermediate_debugger -s $DEVICE --model SM8650 --build_folder build-android/ --executorch_root . --image_dataset ../imagenet-mini/val/ --artifact ./e2e_test_debugpython backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model --model SM8550 --device $DEVICE --build_folder build-androidpython backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk --model SM8550 --device $DEVICE --build_folder build-android