Fix InsertIOQDQ KeyError for dequantize encodings (#18622)#18622
Fix InsertIOQDQ KeyError for dequantize encodings (#18622)#18622abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18622
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 7efb4e5 with merge base 8b30cfe ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
5cc2b41 to
b7f5914
Compare
|
@abhinaykukkadapu has imported this pull request. If you are a Meta employee, you can view this in D98977887. |
|
Moving the discussion from other PR.
Thanks for taking a look, right this task is not 1-1 mapping for the exact issue, but there is a comment on the task that refers to unavail dq in the map: #17732 (comment) Also #17194 got reverted in #17385 this change with |
Summary: q_dq_map only contained quantize ops as keys, so when a node with a dequantize encoding (e.g. a pre-quantized LLM parameter) feeds the output node, the lookup crashes with a KeyError. Add dequantize ops as keys in q_dq_map, mapping them to the correct dequantize target for output boundary insertion (dq.default -> dq.tensor, matching the existing quantize convention). Since dequantize targets are now keys, _create_node transfers QCOM_QUANT_ATTRS to inserted dequant nodes. To prevent the live iterator from revisiting these nodes, iterate over a snapshot via list(graph_module.graph.nodes). Fixes pytorch#17732 Differential Revision: D98977887 Pulled By: abhinaykukkadapu
b7f5914 to
ecae50d
Compare
|
@abhinaykukkadapu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98977887. |
Summary: q_dq_map only contained quantize ops as keys, so when a node with a dequantize encoding (e.g. a pre-quantized LLM parameter) feeds the output node, the lookup crashes with a KeyError. Add dequantize ops as keys in q_dq_map, mapping them to the correct dequantize target for output boundary insertion (dq.default -> dq.tensor, matching the existing quantize convention). Since dequantize targets are now keys, _create_node transfers QCOM_QUANT_ATTRS to inserted dequant nodes. To prevent the live iterator from revisiting these nodes, iterate over a snapshot via list(graph_module.graph.nodes). Fixes pytorch#17732 Differential Revision: D98977887 Pulled By: abhinaykukkadapu
ecae50d to
14d0dbb
Compare
|
|
||
| # insert dq before output or fold mix_quantization q if applicable | ||
| users = list(n.users.keys()) | ||
| if n.meta.get(QCOM_QUANT_ATTRS) and any( |
There was a problem hiding this comment.
Feel like we could just add a check like n.target != exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor for any dequantize nodes that already be added.
Then we don't need the list approach (probably will have smaller memory footprint). But I still wonder if this is really an issue? Looks like the root cause of #17782 is because they tweaked the codebase.
There was a problem hiding this comment.
The issue this diff is handling is not just with the snapshot of the nodes which is solved by the list. Here is what we want:
- For input we want to pop the quant attrs, and insert q node
- For output, we insert dq node but we don't want to pop the attrs
The target for example a conv node has q attrs, weight node if it feeds output will have dq attrs, we want to insert dq node for both without popping. I will put up a patch where we can reuse q_ops may be.
There was a problem hiding this comment.
@haowhsu-quic i'm referring to the comments on this task, which is irrelevant to the task they commented on but this is the root cause: #17732 (comment)
Summary: q_dq_map only contained quantize ops as keys, so when a node with a dequantize encoding (e.g. a pre-quantized LLM parameter) feeds the output node, the lookup crashes with a KeyError. Add dequantize ops as keys in q_dq_map, mapping them to the correct dequantize target for output boundary insertion (dq.default -> dq.tensor, matching the existing quantize convention). Since dequantize targets are now keys, _create_node transfers QCOM_QUANT_ATTRS to inserted dequant nodes. To prevent the live iterator from revisiting these nodes, iterate over a snapshot via list(graph_module.graph.nodes). Fixes pytorch#17732 Differential Revision: D98977887 Pulled By: abhinaykukkadapu
14d0dbb to
7efb4e5
Compare
| ) | ||
| meta_val = node.meta["val"] | ||
| if target in self.q_dq_map: | ||
| if target in q_ops: |
There was a problem hiding this comment.
This is to avoid pop on target node if we are inserting dq.
Summary:
q_dq_map only contained quantize ops as keys, so when a node with a dequantize encoding (e.g. a pre-quantized LLM parameter) feeds the output node, the lookup crashes with a KeyError.
Add dequantize ops as keys in q_dq_map, mapping them to the correct dequantize target for output boundary insertion (dq.default -> dq.tensor, matching the existing quantize convention).
Since dequantize targets are now keys, _create_node transfers QCOM_QUANT_ATTRS to inserted dequant nodes. To prevent the live iterator from revisiting these nodes, iterate over a snapshot via list(graph_module.graph.nodes).
Fixes #17732
Differential Revision: D98977887
Pulled By: abhinaykukkadapu