-
Notifications
You must be signed in to change notification settings - Fork 837
Fix KeyError in InsertIOQDQ pass for LLM quantization #17194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KeyError in InsertIOQDQ pass for LLM quantization #17194
Conversation
Extend q_dq_map to include dequantize ops mapping to themselves. This fixes KeyError when nodes have dequantize encodings (e.g., dequantize_per_tensor.default) instead of quantize encodings. Fixes pytorch#16690
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194
Note: Links to docs will display an error until the docs builds have been completed.
|
|
While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py |
|
@pytorchbot label "release notes: none" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.
Changes:
- Extended
q_dq_mapwith identity mappings for dequantize operations to handle nodes that already have dequantize encodings - Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I think the root cause is when falling back an op with quantized parameters, thanks for the contribution. |
|
@haowhsu-quic Thank you for the review! I appreciate the guidance throughout this contribution. |
|
@cccclai Is there anything else pending from my side? Thanks! |
let me try to merge it, should be good with @haowhsu-quic's review |
|
Thanks! The failures look like infrastructure issues unrelated to my changes. Happy to help if anything else is needed. @cccclai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| q_dq_map = { | ||
| # per tensor | ||
| # per tensor (quantize -> dequantize) | ||
| exir_ops.edge.quantized_decomposed.quantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor, | ||
| exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor, | ||
| # per channel | ||
| # per tensor (dequantize -> dequantize, for nodes with dequantize encoding) | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default, | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor, | ||
| # per channel (quantize -> dequantize) | ||
| exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default, | ||
| # per channel (dequantize -> dequantize, for nodes with dequantize encoding) | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default, |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding dequantize ops as keys in q_dq_map changes _create_node() behavior: it checks if target in self.q_dq_map to decide when to pop QCOM_QUANT_ATTRS and cast meta['val'] to the quantized dtype. After this change, inserted dequantize nodes (e.g. dequantize_per_tensor.tensor / dequantize_per_channel.default) will now satisfy that condition, causing their meta['val'] dtype to be incorrectly cast to the quantized dtype and moving QCOM_QUANT_ATTRS off the original node. The special-case should apply only to quantize ops; consider switching the check to if target in q_ops (or an explicit quantize-op set) so output dequant nodes keep float meta['val'] and don’t steal the original node’s quant metadata.
| # per tensor (dequantize -> dequantize, for nodes with dequantize encoding) | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default, | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor, | ||
| # per channel (quantize -> dequantize) | ||
| exir_ops.edge.quantized_decomposed.quantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default, | ||
| # per channel (dequantize -> dequantize, for nodes with dequantize encoding) | ||
| exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.dequantize_per_channel.default, |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixes a previously crashing edge case (nodes whose QCOM_ENCODING is already a dequantize op), but there doesn’t appear to be any unit coverage for InsertIOQDQ in backends/qualcomm/tests/. Adding a small FX graph test that sets QCOM_QUANT_ATTRS[QCOM_ENCODING] to dequantize_per_tensor.default and asserts the pass inserts the expected output dequant node (and doesn’t alter output meta['val'] dtype) would help prevent regressions.
|
should i work on the copilot's suggestions? @cccclai |
|
I think the change looks good to me, can you rebase to main? |
|
i just updated the branch, is this ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit 08282a4.
Reverts #17194 Many QNN trunk jobs started failing after this PR was merged. Testing revert to see if that fixes issue.
Summary
This PR fixes a
KeyErrorin the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.Problem
In insert_io_qdq.py, the
q_dq_mapdictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g.,dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused aKeyError.Solution
Extended the
q_dq_mapto include dequantize-to-self (identity) mappings for:quantized_decomposed.dequantize_per_tensor.defaultquantized_decomposed.dequantize_per_tensor.tensorquantized_decomposed.dequantize_per_channel.defaultThis allows the pass to correctly handle nodes that have already been processed into dequantized form.
Testing
astmodule.q_dq_mapnow contains the expected 6 keys.Fixes Qualcomm Quantization and Lowering for LLM fails #16690
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin