Skip to content

Conversation

@winskuo-quic
Copy link
Collaborator

Summary

  • Enabled ExecuTorch QNN Intermediate Tensor Debugger.
  • Provide an API for users to define their own metrics
  • Offers a variety of output format to visualize the debug results: svg, csv, raw files.
  • A README file and tutorial script to guide users on how to debug a model. Example script: python examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py -b build-android -m SM8550 --device $DEVICE --dataset ../imagenet-mini/val/ --dump_intermediate_outputs

An example use case

MobileVit V2 has significant drop in accuracy in certain QNN versions, while QNN 2.29 has good accuracy. With the help of accuracy debugger, we have targeted the node native_group_norm_default_1 in the model. As shown below, in QNN 2.29, this node has a cos_similarity (QNN V.S. CPU) of 0.997, while all other QNN versions has cos_similarity of 0, which provides us some hint it is possibly this group_norm node that is causing accuracy drop.
image

What's Coming Next?

Test plan

  • E2E example script test
    • python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleUtilsScript.test_intermediate_debugger -s $DEVICE --model SM8650 --build_folder build-android/ --executorch_root . --image_dataset ../imagenet-mini/val/ --artifact ./e2e_test_debug
  • Simple model test
    • python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model --model SM8550 --device $DEVICE --build_folder build-android
    • python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk --model SM8550 --device $DEVICE --build_folder build-android

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b154f22 with merge base a042763 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2025
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cccclai
Copy link
Contributor

cccclai commented Nov 15, 2025

Seems like missing some dependency

ModuleNotFoundError: No module named 'pydot'

Do you want to introduce this dependency in general?

@cccclai
Copy link
Contributor

cccclai commented Nov 16, 2025

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

@billmguo
Copy link
Contributor

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the QNN ExecuTorch Intermediate Output Debugger, a comprehensive debugging tool for comparing intermediate tensor outputs between QNN and CPU execution to identify accuracy issues in model inference.

Key Changes

  • New Debugging Framework: Added a complete debugger implementation with support for custom metrics, multiple output formats (SVG, CSV, raw files), and per-layer accuracy comparison
  • API and Utilities: Introduced QNNIntermediateDebugger class with configurable evaluators and output format options
  • Enhanced Testing: Updated test infrastructure to validate intermediate output dumping and comparison for various model architectures

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
backends/qualcomm/debugger/qnn_intermediate_debugger.py Core debugger implementation for matching and processing QNN vs CPU intermediate tensors
backends/qualcomm/debugger/metrics_evaluator.py Base class and concrete metric evaluators (cosine similarity, ATOL, MSE) for tensor comparison
backends/qualcomm/debugger/format_outputs.py Output formatting utilities for SVG graph visualization, CSV export, and raw tensor dumps
backends/qualcomm/debugger/README.md Comprehensive documentation with tutorial, design flow, and usage examples
examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py End-to-end demo script using InceptionV3 to showcase debugger capabilities
examples/qualcomm/utils.py Added debugger integration to build_executorch_binary() with spelling fixes
backends/qualcomm/tests/utils.py Enhanced test utilities to validate intermediate output dumping and comparison
backends/qualcomm/tests/test_qnn_delegate.py Added new tests for intermediate debugger and refactored existing tests
backends/qualcomm/qnn_preprocess.py Moved axis order metadata cleanup logic from partitioner to preprocessor
backends/qualcomm/partition/qnn_partitioner.py Removed axis order cleanup (now handled in preprocessor)
backends/qualcomm/builders/node_visitor.py Added QNN tensor name tracking to node metadata for debugger
backends/qualcomm/utils/constants.py Added QCOM_TENSOR_NAME constant for intermediate output tracking
examples/qualcomm/executor_runner/qnn_executor_runner.cpp Increased debug buffer size from 20MB to 100MB
examples/qualcomm/qnn_intermediate_output_inspector.py Removed obsolete intermediate output inspector (replaced by new debugger)
examples/qualcomm/README.md Added documentation for new util_scripts folder
examples/qualcomm/qaihub_scripts/llama/README.md Fixed spelling: "futher" → "further"
backends/qualcomm/quantizer/README.md Fixed spelling: "utilies" → "utilities", "converage" → "coverage"
backends/qualcomm/builders/README.md Fixed spelling and typos in documentation
backends/qualcomm/README.md Updated code review authors list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 311 to 312
self.assertTrue(
len(node_tensor_map) == expected_compared_events,
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(a == b) cannot provide an informative message. Using assertEqual(a, b) instead will give more informative messages.

Suggested change
self.assertTrue(
len(node_tensor_map) == expected_compared_events,
self.assertEqual(
len(node_tensor_map), expected_compared_events,

Copilot uses AI. Check for mistakes.
@Gasoonjia
Copy link
Contributor

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

  • Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
  • Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

@winskuo-quic
Copy link
Collaborator Author

Seems like missing some dependency

ModuleNotFoundError: No module named 'pydot'

Do you want to introduce this dependency in general?

Hi @cccclai,
Thanks for the suggestion. As there are a couple of install_requirments.sh in the codebase, do you have any suggestions on which install_requirments.sh should we put this under?

@winskuo-quic
Copy link
Collaborator Author

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Hi @billmguo,
As mentioned in the PR summary and Limitations section under backends/qualcomm/debugger/README.md, LLM models are currently unsupported. This is in our TODO list and we will enable this in the future.
Thanks.

@winskuo-quic
Copy link
Collaborator Author

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

  • Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
  • Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

Hi @Gasoonjia,

Thanks for reviewing the PR.
The reason is written under the PR summary.
I will also share more details in the email thread.
Thanks

@winskuo-quic
Copy link
Collaborator Author

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

Just to confirm, do you mean moving from executorch/examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py? Thanks.

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from 4bdfe6e to fe228e9 Compare November 24, 2025 06:27
@Gasoonjia
Copy link
Contributor

hi @winskuo-quic:

Thanks again for your feedback, and your email reply. I understand this work may have been created quite a while ago in a private context.

I’m wondering whether there is any plan to migrate or align the current QNN debugger with the native ExecuTorch devtools — not just reusing intermediate_output_capturer or debug_handle, but more broadly integrating with the full debugging pipeline built around the Inspector.calculate_numeric_gap API for intermediate-output numerical discrepancy detection. Unifying these efforts could help us reduce divergence, avoid reinventing functionality, and centralize future debugging capabilities.

If you’re open to it, I’d be happy to discuss potential directions for collaboration on operator-level numerical discrepancy detection and how we can streamline the work going forward.

Thanks

@winskuo-quic
Copy link
Collaborator Author

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap.
We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined.
Thanks

@Gasoonjia
Copy link
Contributor

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap. We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined. Thanks

Thanks for efficient reply and sharing your thoughts regarding our current API.

with #15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

@winskuo-quic
Copy link
Collaborator Author

Thanks for efficient reply and sharing your thoughts regarding our current API.

with #15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled!
In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you?
Thanks

@Gasoonjia
Copy link
Contributor

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks

This PR looks fine to me and thanks for your contribution! Will let @cccclai for final stamp.
I'm looking forward to working with you to contribute into devtool directly together In the future.

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from e17f559 to 6328f7b Compare November 25, 2025 05:28
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang/
export PYTHONPATH=$EXECUTORCH_ROOT/..

pip install pydot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we introduce the dependency here, let's also add it in the build.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
I have removed pip install pydot from .ci/scripts/test_model.sh & .ci/scripts/test_qnn_static_llm.sh, and I have introduced the dependency under build.sh since both ci files will call build.sh.

from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import QNNIntermediateDebugger

qnn_intermediate_debugger = QNNIntermediateDebugger()
build_executorch_binary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use to_edge..._qnn() API as it's the newer API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
Actually, build_executorch_binary should still be a more official API to users. Inside build_executorch_binary, we will then call to_edge_transform_and_lower_to_qnn. I think it will be more straightforward to use build_executorch_binary since to_edge_transform_and_lower_to_qnn requires users to handle quantization themself and some post processing after the model is lowered.
We might consider moving these API functions to a new file in the future, making it easier for users to know what are some released APIs to use.

@meta-codesync
Copy link

meta-codesync bot commented Nov 26, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87936803.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarifying. Can you rebase the PR? The CI stopped working last week

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from dfe9aeb to 724ad4c Compare December 2, 2025 01:21
@cccclai
Copy link
Contributor

cccclai commented Dec 2, 2025

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/intermediate_tensor_debugger branch from 724ad4c to b154f22 Compare December 2, 2025 06:13
@winskuo-quic
Copy link
Collaborator Author

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

Done. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants