[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path by SS-JIA · Pull Request #18957 · pytorch/executorch

SS-JIA · 2026-04-16T22:22:46Z

Stack from ghstack (oldest at bottom):

-> [ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path #18957

D100066455 changed the Llama export pipeline to quantize weights in the checkpoint dtype (typically bfloat16) before casting to the computation dtype (fp32). This introduced a regression for Vulkan 4w export: dequantize_affine ops produced bfloat16 outputs, which Vulkan doesn't support, causing the graph to be split into multiple partitions. When sym_constrain_range_for_size constraint nodes were partitioned into a different delegate than the _local_scalar_dense + slice_copy ops they constrain, ExportPass re-tracing (in ConvertToLinearPass, SpecPropPass, etc.) would crash with GuardOnDataDependentSymNode: Could not guard on data-dependent expression u539 < 0.

The root cause is unwrap_tensor_subclass(). This function decomposes IntxUnpackedToInt8Tensor into plain tensors via torch.nn.utils.parametrize, capturing the subclass's metadata — including its dtype attribute (which controls dequantize_affine's output dtype) — as a frozen snapshot in UnwrapTensorSubclass.rebuild_stack. A subsequent model.to(dtype=fp32) casts the plain tensors but cannot update the frozen metadata, so dequantize_affine continues to output bfloat16.

unwrap_tensor_subclass() was originally needed as a workaround because torch.export did not support tensor subclasses natively. This is no longer the case — the 8da4w path already works without it (using the same IntxUnpackedToInt8Tensor subclass), and torch.export traces through the subclass correctly. Removing it makes 4w consistent with 8da4w and avoids the metadata-freezing issue entirely.

This change was authored with Claude.

Differential Revision: D101252037

…ation path D100066455 changed the Llama export pipeline to quantize weights in the checkpoint dtype (typically bfloat16) before casting to the computation dtype (fp32). This introduced a regression for Vulkan 4w export: `dequantize_affine` ops produced bfloat16 outputs, which Vulkan doesn't support, causing the graph to be split into multiple partitions. When `sym_constrain_range_for_size` constraint nodes were partitioned into a different delegate than the `_local_scalar_dense` + `slice_copy` ops they constrain, ExportPass re-tracing (in ConvertToLinearPass, SpecPropPass, etc.) would crash with `GuardOnDataDependentSymNode: Could not guard on data-dependent expression u539 < 0`. The root cause is `unwrap_tensor_subclass()`. This function decomposes `IntxUnpackedToInt8Tensor` into plain tensors via `torch.nn.utils.parametrize`, capturing the subclass's metadata — including its `dtype` attribute (which controls `dequantize_affine`'s output dtype) — as a frozen snapshot in `UnwrapTensorSubclass.rebuild_stack`. A subsequent `model.to(dtype=fp32)` casts the plain tensors but cannot update the frozen metadata, so `dequantize_affine` continues to output bfloat16. `unwrap_tensor_subclass()` was originally needed as a workaround because `torch.export` did not support tensor subclasses natively. This is no longer the case — the 8da4w path already works without it (using the same `IntxUnpackedToInt8Tensor` subclass), and `torch.export` traces through the subclass correctly. Removing it makes 4w consistent with 8da4w and avoids the metadata-freezing issue entirely. This change was authored with Claude. Differential Revision: [D101252037](https://our.internmc.facebook.com/intern/diff/D101252037/) [ghstack-poisoned]

…ation path D100066455 changed the Llama export pipeline to quantize weights in the checkpoint dtype (typically bfloat16) before casting to the computation dtype (fp32). This introduced a regression for Vulkan 4w export: `dequantize_affine` ops produced bfloat16 outputs, which Vulkan doesn't support, causing the graph to be split into multiple partitions. When `sym_constrain_range_for_size` constraint nodes were partitioned into a different delegate than the `_local_scalar_dense` + `slice_copy` ops they constrain, ExportPass re-tracing (in ConvertToLinearPass, SpecPropPass, etc.) would crash with `GuardOnDataDependentSymNode: Could not guard on data-dependent expression u539 < 0`. The root cause is `unwrap_tensor_subclass()`. This function decomposes `IntxUnpackedToInt8Tensor` into plain tensors via `torch.nn.utils.parametrize`, capturing the subclass's metadata — including its `dtype` attribute (which controls `dequantize_affine`'s output dtype) — as a frozen snapshot in `UnwrapTensorSubclass.rebuild_stack`. A subsequent `model.to(dtype=fp32)` casts the plain tensors but cannot update the frozen metadata, so `dequantize_affine` continues to output bfloat16. `unwrap_tensor_subclass()` was originally needed as a workaround because `torch.export` did not support tensor subclasses natively. This is no longer the case — the 8da4w path already works without it (using the same `IntxUnpackedToInt8Tensor` subclass), and `torch.export` traces through the subclass correctly. Removing it makes 4w consistent with 8da4w and avoids the metadata-freezing issue entirely. This change was authored with Claude. Differential Revision: [D101252037](https://our.internmc.facebook.com/intern/diff/D101252037/) ghstack-source-id: 368594388 Pull Request resolved: #18957

pytorch-bot · 2026-04-16T22:22:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18957

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[CI[B200] Smoke test encounters CUDA Unknown error for dgxb200-03 and dgxb200-04

❌ 2 New Failures, 2 Unrelated Failures

As of commit 30e9d98 with merge base a43675c ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-16T22:23:44Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ation path D100066455 changed the Llama export pipeline to quantize weights in the checkpoint dtype (typically bfloat16) before casting to the computation dtype (fp32). This introduced a regression for Vulkan 4w export: `dequantize_affine` ops produced bfloat16 outputs, which Vulkan doesn't support, causing the graph to be split into multiple partitions. When `sym_constrain_range_for_size` constraint nodes were partitioned into a different delegate than the `_local_scalar_dense` + `slice_copy` ops they constrain, ExportPass re-tracing (in ConvertToLinearPass, SpecPropPass, etc.) would crash with `GuardOnDataDependentSymNode: Could not guard on data-dependent expression u539 < 0`. The root cause is `unwrap_tensor_subclass()`. This function decomposes `IntxUnpackedToInt8Tensor` into plain tensors via `torch.nn.utils.parametrize`, capturing the subclass's metadata — including its `dtype` attribute (which controls `dequantize_affine`'s output dtype) — as a frozen snapshot in `UnwrapTensorSubclass.rebuild_stack`. A subsequent `model.to(dtype=fp32)` casts the plain tensors but cannot update the frozen metadata, so `dequantize_affine` continues to output bfloat16. `unwrap_tensor_subclass()` was originally needed as a workaround because `torch.export` did not support tensor subclasses natively. This is no longer the case — the 8da4w path already works without it (using the same `IntxUnpackedToInt8Tensor` subclass), and `torch.export` traces through the subclass correctly. Removing it makes 4w consistent with 8da4w and avoids the metadata-freezing issue entirely. This change was authored with Claude. Differential Revision: [D101252037](https://our.internmc.facebook.com/intern/diff/D101252037/) ghstack-source-id: 368594388 Pull Request resolved: #18957

SS-JIA requested a review from lucylq as a code owner April 16, 2026 22:22

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 16, 2026

GregoryComer approved these changes Apr 16, 2026

View reviewed changes

meta-codesync Bot merged commit 4ca6270 into gh/SS-JIA/521/base Apr 17, 2026
162 of 170 checks passed

meta-codesync Bot deleted the gh/SS-JIA/521/head branch April 17, 2026 15:24

meta-codesync Bot temporarily deployed to cherry-pick-bot April 17, 2026 15:24 Inactive

pytorchbot mentioned this pull request Apr 17, 2026

[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path #18975

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path#18957

[ExecuTorch][Llama][ez] Remove unwrap_tensor_subclass from 4w quantization path#18957
meta-codesync[bot] merged 1 commit intogh/SS-JIA/521/basefrom
gh/SS-JIA/521/head

SS-JIA commented Apr 16, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18957

❗ 1 Active SEVs

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 16, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Apr 16, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 16, 2026 •

edited

Loading

This PR needs a `release notes:` label