Skip to content

Conversation

@mohammed-saalim
Copy link
Contributor

@mohammed-saalim mohammed-saalim commented Feb 4, 2026

Summary

This PR fixes a KeyError in the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.

Problem

In insert_io_qdq.py, the q_dq_map dictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g., dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused a KeyError.

Solution

Extended the q_dq_map to include dequantize-to-self (identity) mappings for:

  • quantized_decomposed.dequantize_per_tensor.default
  • quantized_decomposed.dequantize_per_tensor.tensor
  • quantized_decomposed.dequantize_per_channel.default
    This allows the pass to correctly handle nodes that have already been processed into dequantized form.

Testing

  • Verified that the modified file parses correctly via Python's ast module.
  • Confirmed that q_dq_map now contains the expected 6 keys.
  • Manual verification on Qualcomm hardware is requested from the maintainers to confirm resolution for the SmolLM2 workflow.
    Fixes Qualcomm Quantization and Lowering for LLM fails #16690

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Extend q_dq_map to include dequantize ops mapping to themselves.
This fixes KeyError when nodes have dequantize encodings (e.g.,
dequantize_per_tensor.default) instead of quantize encodings.

Fixes pytorch#16690
Copilot AI review requested due to automatic review settings February 4, 2026 06:06
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval, 2 Pending

As of commit b3964d7 with merge base aa2f683 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
@mohammed-saalim
Copy link
Contributor Author

While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py
pass should still be robust enough to handle dequantize operations in the IR without throwing a KeyError. This PR ensures the pass is forward-compatible with models that already have these encodings

@mohammed-saalim
Copy link
Contributor Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot bot added the release notes: none Do not include this in the release notes label Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.

Changes:

  • Extended q_dq_map with identity mappings for dequantize operations to handle nodes that already have dequantize encodings
  • Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nil-is-all nil-is-all added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ labels Feb 4, 2026
@haowhsu-quic
Copy link
Collaborator

I think the root cause is when falling back an op with quantized parameters, thanks for the contribution.

@mohammed-saalim
Copy link
Contributor Author

@haowhsu-quic Thank you for the review! I appreciate the guidance throughout this contribution.

@mohammed-saalim
Copy link
Contributor Author

@cccclai Is there anything else pending from my side? Thanks!

@cccclai
Copy link
Contributor

cccclai commented Feb 9, 2026

@cccclai Is there anything else pending from my side? Thanks!

let me try to merge it, should be good with @haowhsu-quic's review

@mohammed-saalim
Copy link
Contributor Author

Thanks! The failures look like infrastructure issues unrelated to my changes. Happy to help if anything else is needed. @cccclai

Copilot AI review requested due to automatic review settings February 10, 2026 00:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 33 to +43
q_dq_map = {
# per tensor
# per tensor (quantize -> dequantize)
exir_ops.edge.quantized_decomposed.quantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
# per channel
# per tensor (dequantize -> dequantize, for nodes with dequantize encoding)
exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default,
exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
# per channel (quantize -> dequantize)
exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
# per channel (dequantize -> dequantize, for nodes with dequantize encoding)
exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding dequantize ops as keys in q_dq_map changes _create_node() behavior: it checks if target in self.q_dq_map to decide when to pop QCOM_QUANT_ATTRS and cast meta['val'] to the quantized dtype. After this change, inserted dequantize nodes (e.g. dequantize_per_tensor.tensor / dequantize_per_channel.default) will now satisfy that condition, causing their meta['val'] dtype to be incorrectly cast to the quantized dtype and moving QCOM_QUANT_ATTRS off the original node. The special-case should apply only to quantize ops; consider switching the check to if target in q_ops (or an explicit quantize-op set) so output dequant nodes keep float meta['val'] and don’t steal the original node’s quant metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +43
# per tensor (dequantize -> dequantize, for nodes with dequantize encoding)
exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default,
exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
# per channel (quantize -> dequantize)
exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
# per channel (dequantize -> dequantize, for nodes with dequantize encoding)
exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a previously crashing edge case (nodes whose QCOM_ENCODING is already a dequantize op), but there doesn’t appear to be any unit coverage for InsertIOQDQ in backends/qualcomm/tests/. Adding a small FX graph test that sets QCOM_QUANT_ATTRS[QCOM_ENCODING] to dequantize_per_tensor.default and asserts the pass inserts the expected output dequant node (and doesn’t alter output meta['val'] dtype) would help prevent regressions.

Copilot uses AI. Check for mistakes.
@mohammed-saalim
Copy link
Contributor Author

should i work on the copilot's suggestions? @cccclai

@cccclai
Copy link
Contributor

cccclai commented Feb 10, 2026

I think the change looks good to me, can you rebase to main?

@mohammed-saalim
Copy link
Contributor Author

i just updated the branch, is this ok?

Copilot AI review requested due to automatic review settings February 10, 2026 17:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 11, 2026 05:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cccclai cccclai merged commit 08282a4 into pytorch:main Feb 11, 2026
150 of 155 checks passed
metascroy added a commit that referenced this pull request Feb 11, 2026
metascroy added a commit that referenced this pull request Feb 11, 2026
Reverts #17194

Many QNN trunk jobs started failing after this PR was merged. Testing
revert to see if that fixes issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qualcomm Quantization and Lowering for LLM fails

4 participants