Fix KeyError in InsertIOQDQ pass for LLM quantization #17194

mohammed-saalim · 2026-02-04T06:06:37Z

Summary

This PR fixes a KeyError in the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.

Problem

In insert_io_qdq.py, the q_dq_map dictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g., dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused a KeyError.

Solution

Extended the q_dq_map to include dequantize-to-self (identity) mappings for:

quantized_decomposed.dequantize_per_tensor.default
quantized_decomposed.dequantize_per_tensor.tensor
quantized_decomposed.dequantize_per_channel.default
This allows the pass to correctly handle nodes that have already been processed into dequantized form.

Testing

Verified that the modified file parses correctly via Python's ast module.
Confirmed that q_dq_map now contains the expected 6 keys.
Manual verification on Qualcomm hardware is requested from the maintainers to confirm resolution for the SmolLM2 workflow.
Fixes Qualcomm Quantization and Lowering for LLM fails #16690

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Extend q_dq_map to include dequantize ops mapping to themselves. This fixes KeyError when nodes have dequantize encodings (e.g., dequantize_per_tensor.default) instead of quantize encodings. Fixes pytorch#16690

pytorch-bot · 2026-02-04T06:06:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval, 2 Pending

As of commit b3964d7 with merge base aa2f683 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mohammed-saalim · 2026-02-04T06:10:14Z

While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py
pass should still be robust enough to handle dequantize operations in the IR without throwing a KeyError. This PR ensures the pass is forward-compatible with models that already have these encodings

mohammed-saalim · 2026-02-04T06:11:18Z

@pytorchbot label "release notes: none"

Copilot

Pull request overview

This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.

Changes:

Extended q_dq_map with identity mappings for dequantize operations to handle nodes that already have dequantize encodings
Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

haowhsu-quic · 2026-02-09T03:03:05Z

I think the root cause is when falling back an op with quantized parameters, thanks for the contribution.

mohammed-saalim · 2026-02-09T03:44:37Z

@haowhsu-quic Thank you for the review! I appreciate the guidance throughout this contribution.

mohammed-saalim · 2026-02-09T16:46:55Z

@cccclai Is there anything else pending from my side? Thanks!

cccclai · 2026-02-09T17:57:09Z

@cccclai Is there anything else pending from my side? Thanks!

let me try to merge it, should be good with @haowhsu-quic's review

mohammed-saalim · 2026-02-09T19:30:28Z

Thanks! The failures look like infrastructure issues unrelated to my changes. Happy to help if anything else is needed. @cccclai

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T00:52:20Z

backends/qualcomm/_passes/insert_io_qdq.py

    q_dq_map = {
-        # per tensor
+        # per tensor (quantize -> dequantize)
        exir_ops.edge.quantized_decomposed.quantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
        exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
-        # per channel
+        # per tensor (dequantize -> dequantize, for nodes with dequantize encoding)
+        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default,
+        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
+        # per channel (quantize -> dequantize)
        exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
+        # per channel (dequantize -> dequantize, for nodes with dequantize encoding)
+        exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,


Adding dequantize ops as keys in q_dq_map changes _create_node() behavior: it checks if target in self.q_dq_map to decide when to pop QCOM_QUANT_ATTRS and cast meta['val'] to the quantized dtype. After this change, inserted dequantize nodes (e.g. dequantize_per_tensor.tensor / dequantize_per_channel.default) will now satisfy that condition, causing their meta['val'] dtype to be incorrectly cast to the quantized dtype and moving QCOM_QUANT_ATTRS off the original node. The special-case should apply only to quantize ops; consider switching the check to if target in q_ops (or an explicit quantize-op set) so output dequant nodes keep float meta['val'] and don’t steal the original node’s quant metadata.

Copilot · 2026-02-10T00:52:21Z

backends/qualcomm/_passes/insert_io_qdq.py

+        # per tensor (dequantize -> dequantize, for nodes with dequantize encoding)
+        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default,
+        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor,
+        # per channel (quantize -> dequantize)
        exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,
+        # per channel (dequantize -> dequantize, for nodes with dequantize encoding)
+        exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default,


This change fixes a previously crashing edge case (nodes whose QCOM_ENCODING is already a dequantize op), but there doesn’t appear to be any unit coverage for InsertIOQDQ in backends/qualcomm/tests/. Adding a small FX graph test that sets QCOM_QUANT_ATTRS[QCOM_ENCODING] to dequantize_per_tensor.default and asserts the pass inserts the expected output dequant node (and doesn’t alter output meta['val'] dtype) would help prevent regressions.

mohammed-saalim · 2026-02-10T17:35:46Z

should i work on the copilot's suggestions? @cccclai

cccclai · 2026-02-10T17:39:26Z

I think the change looks good to me, can you rebase to main?

mohammed-saalim · 2026-02-10T17:40:29Z

i just updated the branch, is this ok?

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This reverts commit 08282a4.

Reverts #17194 Many QNN trunk jobs started failing after this PR was merged. Testing revert to see if that fixes issue.

Fix KeyError in InsertIOQDQ pass for LLM quantization

5ebb788

Extend q_dq_map to include dequantize ops mapping to themselves. This fixes KeyError when nodes have dequantize encodings (e.g., dequantize_per_tensor.default) instead of quantize encodings. Fixes pytorch#16690

mohammed-saalim requested a review from cccclai as a code owner February 4, 2026 06:06

Copilot AI review requested due to automatic review settings February 4, 2026 06:06

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026

Copilot started reviewing on behalf of mohammed-saalim February 4, 2026 06:06 View session

pytorch-bot bot added the release notes: none Do not include this in the release notes label Feb 4, 2026

Copilot AI reviewed Feb 4, 2026

View reviewed changes

nil-is-all added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ labels Feb 4, 2026

cccclai requested review from chenweng-quic, haowhsu-quic, shewu-quic and winskuo-quic February 4, 2026 18:20

haowhsu-quic mentioned this pull request Feb 5, 2026

Qualcomm Quantization and Lowering for LLM fails #16690

Closed

haowhsu-quic approved these changes Feb 9, 2026

View reviewed changes

Merge branch 'main' into fix-insert-io-qdq-keyerror

c932c65

Merge branch 'main' into fix-insert-io-qdq-keyerror

5249111

Copilot AI review requested due to automatic review settings February 10, 2026 00:46

Copilot started reviewing on behalf of mohammed-saalim February 10, 2026 00:47 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

cccclai approved these changes Feb 10, 2026

View reviewed changes

Merge branch 'main' into fix-insert-io-qdq-keyerror

621beaf

Merge branch 'main' into fix-insert-io-qdq-keyerror

4e3e05a

Copilot AI review requested due to automatic review settings February 10, 2026 17:47

Copilot started reviewing on behalf of mohammed-saalim February 10, 2026 17:48 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

mohammed-saalim added 2 commits February 10, 2026 16:18

Merge branch 'main' into fix-insert-io-qdq-keyerror

f8cfad5

Merge branch 'main' into fix-insert-io-qdq-keyerror

b3964d7

Copilot AI review requested due to automatic review settings February 11, 2026 05:14

Copilot started reviewing on behalf of mohammed-saalim February 11, 2026 05:14 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

cccclai merged commit 08282a4 into pytorch:main Feb 11, 2026
150 of 155 checks passed

metascroy added a commit that referenced this pull request Feb 11, 2026

Revert "Fix KeyError in InsertIOQDQ pass for LLM quantization (#17194)"

9831c30

This reverts commit 08282a4.

metascroy mentioned this pull request Feb 11, 2026

Revert "Fix KeyError in InsertIOQDQ pass for LLM quantization" #17385

Merged

metascroy added a commit that referenced this pull request Feb 11, 2026

Revert "Fix KeyError in InsertIOQDQ pass for LLM quantization" (#17385)

9ae8181

Reverts #17194 Many QNN trunk jobs started failing after this PR was merged. Testing revert to see if that fixes issue.

Fix KeyError in InsertIOQDQ pass for LLM quantization #17194

Fix KeyError in InsertIOQDQ pass for LLM quantization #17194

Conversation

mohammed-saalim commented Feb 4, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Uh oh!

pytorch-bot bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194

⚠️ 8 Awaiting Approval, 2 Pending

Uh oh!

mohammed-saalim commented Feb 4, 2026

Uh oh!

mohammed-saalim commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

haowhsu-quic commented Feb 9, 2026

Uh oh!

mohammed-saalim commented Feb 9, 2026

Uh oh!

mohammed-saalim commented Feb 9, 2026

Uh oh!

cccclai commented Feb 9, 2026

Uh oh!

mohammed-saalim commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

mohammed-saalim commented Feb 10, 2026

Uh oh!

cccclai commented Feb 10, 2026

Uh oh!

mohammed-saalim commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohammed-saalim commented Feb 4, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 4, 2026 •

edited

Loading