[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path by pytorchbot · Pull Request #18975 · pytorch/executorch

pytorchbot · 2026-04-17T15:25:00Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18957 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/521/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/521/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/521/orig
Differential Revision: D101252037
@diff-train-skip-merge

…ation path D100066455 changed the Llama export pipeline to quantize weights in the checkpoint dtype (typically bfloat16) before casting to the computation dtype (fp32). This introduced a regression for Vulkan 4w export: `dequantize_affine` ops produced bfloat16 outputs, which Vulkan doesn't support, causing the graph to be split into multiple partitions. When `sym_constrain_range_for_size` constraint nodes were partitioned into a different delegate than the `_local_scalar_dense` + `slice_copy` ops they constrain, ExportPass re-tracing (in ConvertToLinearPass, SpecPropPass, etc.) would crash with `GuardOnDataDependentSymNode: Could not guard on data-dependent expression u539 < 0`. The root cause is `unwrap_tensor_subclass()`. This function decomposes `IntxUnpackedToInt8Tensor` into plain tensors via `torch.nn.utils.parametrize`, capturing the subclass's metadata — including its `dtype` attribute (which controls `dequantize_affine`'s output dtype) — as a frozen snapshot in `UnwrapTensorSubclass.rebuild_stack`. A subsequent `model.to(dtype=fp32)` casts the plain tensors but cannot update the frozen metadata, so `dequantize_affine` continues to output bfloat16. `unwrap_tensor_subclass()` was originally needed as a workaround because `torch.export` did not support tensor subclasses natively. This is no longer the case — the 8da4w path already works without it (using the same `IntxUnpackedToInt8Tensor` subclass), and `torch.export` traces through the subclass correctly. Removing it makes 4w consistent with 8da4w and avoids the metadata-freezing issue entirely. This change was authored with Claude. Differential Revision: [D101252037](https://our.internmc.facebook.com/intern/diff/D101252037/) ghstack-source-id: 368594388 Pull Request resolved: #18957

pytorch-bot · 2026-04-17T15:25:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18975

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-17T15:25:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorchbot requested a review from lucylq as a code owner April 17, 2026 15:25

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

SS-JIA approved these changes Apr 17, 2026

View reviewed changes

SS-JIA merged commit 9576316 into main Apr 17, 2026
159 of 163 checks passed

SS-JIA deleted the gh/SS-JIA/521/orig branch April 17, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path#18975

[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path#18975
SS-JIA merged 1 commit intomainfrom
gh/SS-JIA/521/orig

pytorchbot commented Apr 17, 2026

Uh oh!

pytorch-bot Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Apr 17, 2026

Uh oh!

pytorch-bot Bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18975

Uh oh!

github-actions Bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

This PR needs a `release notes:` label