Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger #15735

winskuo-quic · 2025-11-11T12:22:00Z

Summary

Enabled ExecuTorch QNN Intermediate Tensor Debugger.
Provide an API for users to define their own metrics
Offers a variety of output format to visualize the debug results: svg, csv, raw files.
A README file and tutorial script to guide users on how to debug a model. Example script: python examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py -b build-android -m SM8550 --device $DEVICE --dataset ../imagenet-mini/val/ --dump_intermediate_outputs

An example use case

MobileVit V2 has significant drop in accuracy in certain QNN versions, while QNN 2.29 has good accuracy. With the help of accuracy debugger, we have targeted the node native_group_norm_default_1 in the model. As shown below, in QNN 2.29, this node has a cos_similarity (QNN V.S. CPU) of 0.997, while all other QNN versions has cos_similarity of 0, which provides us some hint it is possibly this group_norm node that is causing accuracy drop.

What's Coming Next?

Currently, we dump CPU outputs by manually inserting observer nodes. However, ExecuTorch actually has built in methods (intermediate_output_capturer) that could dump intermediate output for us, in format of a dict{debug_handle : tensor_output}. We will enable debug_handle and reuse https://github.com/pytorch/executorch/blob/main/devtools/inspector/_intermediate_output_capturer.py in future instead.
Support graph with partitions
Support LLM models

Test plan

E2E example script test
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleUtilsScript.test_intermediate_debugger -s $DEVICE --model SM8650 --build_folder build-android/ --executorch_root . --image_dataset ../imagenet-mini/val/ --artifact ./e2e_test_debug
Simple model test
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model --model SM8550 --device $DEVICE --build_folder build-android
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk --model SM8550 --device $DEVICE --build_folder build-android

pytorch-bot · 2025-11-11T12:22:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b154f22 with merge base a042763 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-11T12:22:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cccclai · 2025-11-15T20:43:48Z

Seems like missing some dependency

ModuleNotFoundError: No module named 'pydot'

Do you want to introduce this dependency in general?

cccclai · 2025-11-16T01:50:22Z

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

billmguo · 2025-11-17T21:58:17Z

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Copilot

Pull Request Overview

This PR introduces the QNN ExecuTorch Intermediate Output Debugger, a comprehensive debugging tool for comparing intermediate tensor outputs between QNN and CPU execution to identify accuracy issues in model inference.

Key Changes

New Debugging Framework: Added a complete debugger implementation with support for custom metrics, multiple output formats (SVG, CSV, raw files), and per-layer accuracy comparison
API and Utilities: Introduced QNNIntermediateDebugger class with configurable evaluators and output format options
Enhanced Testing: Updated test infrastructure to validate intermediate output dumping and comparison for various model architectures

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 21 comments.

Show a summary per file

File	Description
backends/qualcomm/debugger/qnn_intermediate_debugger.py	Core debugger implementation for matching and processing QNN vs CPU intermediate tensors
backends/qualcomm/debugger/metrics_evaluator.py	Base class and concrete metric evaluators (cosine similarity, ATOL, MSE) for tensor comparison
backends/qualcomm/debugger/format_outputs.py	Output formatting utilities for SVG graph visualization, CSV export, and raw tensor dumps
backends/qualcomm/debugger/README.md	Comprehensive documentation with tutorial, design flow, and usage examples
examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py	End-to-end demo script using InceptionV3 to showcase debugger capabilities
examples/qualcomm/utils.py	Added debugger integration to `build_executorch_binary()` with spelling fixes
backends/qualcomm/tests/utils.py	Enhanced test utilities to validate intermediate output dumping and comparison
backends/qualcomm/tests/test_qnn_delegate.py	Added new tests for intermediate debugger and refactored existing tests
backends/qualcomm/qnn_preprocess.py	Moved axis order metadata cleanup logic from partitioner to preprocessor
backends/qualcomm/partition/qnn_partitioner.py	Removed axis order cleanup (now handled in preprocessor)
backends/qualcomm/builders/node_visitor.py	Added QNN tensor name tracking to node metadata for debugger
backends/qualcomm/utils/constants.py	Added `QCOM_TENSOR_NAME` constant for intermediate output tracking
examples/qualcomm/executor_runner/qnn_executor_runner.cpp	Increased debug buffer size from 20MB to 100MB
examples/qualcomm/qnn_intermediate_output_inspector.py	Removed obsolete intermediate output inspector (replaced by new debugger)
examples/qualcomm/README.md	Added documentation for new util_scripts folder
examples/qualcomm/qaihub_scripts/llama/README.md	Fixed spelling: "futher" → "further"
backends/qualcomm/quantizer/README.md	Fixed spelling: "utilies" → "utilities", "converage" → "coverage"
backends/qualcomm/builders/README.md	Fixed spelling and typos in documentation
backends/qualcomm/README.md	Updated code review authors list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backends/qualcomm/debugger/README.md

backends/qualcomm/debugger/metrics_evaluator.py

backends/qualcomm/debugger/qnn_intermediate_debugger.py

backends/qualcomm/tests/test_qnn_delegate.py

backends/qualcomm/tests/utils.py

backends/qualcomm/debugger/README.md

Copilot · 2025-11-17T23:49:02Z

backends/qualcomm/tests/utils.py

+                self.assertTrue(
+                    len(node_tensor_map) == expected_compared_events,


assertTrue(a == b) cannot provide an informative message. Using assertEqual(a, b) instead will give more informative messages.

Suggested change

self.assertTrue(

len(node_tensor_map) == expected_compared_events,

self.assertEqual(

len(node_tensor_map), expected_compared_events,

Gasoonjia · 2025-11-20T07:30:59Z

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

winskuo-quic · 2025-11-24T04:17:43Z

Seems like missing some dependency
ModuleNotFoundError: No module named 'pydot'
Do you want to introduce this dependency in general?

Hi @cccclai,
Thanks for the suggestion. As there are a couple of install_requirments.sh in the codebase, do you have any suggestions on which install_requirments.sh should we put this under?

winskuo-quic · 2025-11-24T04:20:04Z

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Hi @billmguo,
As mentioned in the PR summary and Limitations section under backends/qualcomm/debugger/README.md, LLM models are currently unsupported. This is in our TODO list and we will enable this in the future.
Thanks.

winskuo-quic · 2025-11-24T04:22:27Z

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?

Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

Hi @Gasoonjia,

Thanks for reviewing the PR.
The reason is written under the PR summary.
I will also share more details in the email thread.
Thanks

winskuo-quic · 2025-11-24T05:35:59Z

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

Just to confirm, do you mean moving from executorch/examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py? Thanks.

Gasoonjia · 2025-11-25T01:21:22Z

hi @winskuo-quic:

Thanks again for your feedback, and your email reply. I understand this work may have been created quite a while ago in a private context.

I’m wondering whether there is any plan to migrate or align the current QNN debugger with the native ExecuTorch devtools — not just reusing intermediate_output_capturer or debug_handle, but more broadly integrating with the full debugging pipeline built around the Inspector.calculate_numeric_gap API for intermediate-output numerical discrepancy detection. Unifying these efforts could help us reduce divergence, avoid reinventing functionality, and centralize future debugging capabilities.

If you’re open to it, I’d be happy to discuss potential directions for collaboration on operator-level numerical discrepancy detection and how we can streamline the work going forward.

Thanks

winskuo-quic · 2025-11-25T01:33:45Z

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap.
We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined.
Thanks

Gasoonjia · 2025-11-25T01:58:53Z

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap. We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined. Thanks

Thanks for efficient reply and sharing your thoughts regarding our current API.

with #15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

winskuo-quic · 2025-11-25T02:09:35Z

Thanks for efficient reply and sharing your thoughts regarding our current API.

with #15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled!
In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you?
Thanks

Gasoonjia · 2025-11-25T02:16:57Z

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks

This PR looks fine to me and thanks for your contribution! Will let @cccclai for final stamp.
I'm looking forward to working with you to contribute into devtool directly together In the future.

cccclai · 2025-11-25T06:08:30Z

.ci/scripts/test_model.sh

  export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang/
  export PYTHONPATH=$EXECUTORCH_ROOT/..

+  pip install pydot


Since we introduce the dependency here, let's also add it in the build.sh

Thanks for the suggestion.
I have removed pip install pydot from .ci/scripts/test_model.sh & .ci/scripts/test_qnn_static_llm.sh, and I have introduced the dependency under build.sh since both ci files will call build.sh.

cccclai · 2025-11-25T06:12:41Z

backends/qualcomm/debugger/README.md

+from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import QNNIntermediateDebugger
+
+qnn_intermediate_debugger = QNNIntermediateDebugger()
+build_executorch_binary(


can we use to_edge..._qnn() API as it's the newer API?

Thanks for the suggestion.
Actually, build_executorch_binary should still be a more official API to users. Inside build_executorch_binary, we will then call to_edge_transform_and_lower_to_qnn. I think it will be more straightforward to use build_executorch_binary since to_edge_transform_and_lower_to_qnn requires users to handle quantization themself and some post processing after the model is lowered.
We might consider moving these API functions to a new file in the future, making it easier for users to know what are some released APIs to use.

meta-codesync · 2025-11-26T18:15:12Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87936803.

cccclai

thanks for clarifying. Can you rebase the PR? The CI stopped working last week

cccclai · 2025-12-02T05:33:27Z

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

winskuo-quic · 2025-12-02T08:06:41Z

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

Done. Thanks

winskuo-quic requested a review from cccclai as a code owner November 11, 2025 12:22

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025

mergennachin requested review from Gasoonjia and Copilot November 17, 2025 23:42

Copilot started reviewing on behalf of mergennachin November 17, 2025 23:43 View session

Copilot finished reviewing on behalf of mergennachin November 17, 2025 23:46

Copilot AI reviewed Nov 17, 2025

View reviewed changes

winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from 4bdfe6e to fe228e9 Compare November 24, 2025 06:27

winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from e17f559 to 6328f7b Compare November 25, 2025 05:28

cccclai reviewed Nov 25, 2025

View reviewed changes

cccclai approved these changes Dec 2, 2025

View reviewed changes

winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from dfe9aeb to 724ad4c Compare December 2, 2025 01:21

Qualcomm AI Engine Direct - Support Intermediate output debugging

5b800e7

winskuo-quic added 4 commits December 2, 2025 13:48

Copliot Code Review

f9bbad7

Add CI dependency

b5e2085

Code Review

4553d02

Lint

b154f22

winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from 724ad4c to b154f22 Compare December 2, 2025 06:13

		self.assertTrue(
		len(node_tensor_map) == expected_compared_events,

Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger #15735

Are you sure you want to change the base?

Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger #15735

Uh oh!

Conversation

winskuo-quic commented Nov 11, 2025

Summary

An example use case

What's Coming Next?

Test plan

Uh oh!

pytorch-bot bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735

✅ No Failures

Uh oh!

github-actions bot commented Nov 11, 2025

This PR needs a release notes: label

Uh oh!

cccclai commented Nov 15, 2025

Uh oh!

cccclai commented Nov 16, 2025

Uh oh!

billmguo commented Nov 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Gasoonjia commented Nov 20, 2025

Uh oh!

winskuo-quic commented Nov 24, 2025

Uh oh!

winskuo-quic commented Nov 24, 2025

Uh oh!

winskuo-quic commented Nov 24, 2025

Uh oh!

winskuo-quic commented Nov 24, 2025

Uh oh!

Gasoonjia commented Nov 25, 2025

Uh oh!

winskuo-quic commented Nov 25, 2025

Uh oh!

Gasoonjia commented Nov 25, 2025

Uh oh!

winskuo-quic commented Nov 25, 2025

Uh oh!

Gasoonjia commented Nov 25, 2025

Uh oh!

cccclai Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Nov 26, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Nov 11, 2025 •

edited

Loading

This PR needs a `release notes:` label