Decompose after export in export_llama #15951

lucylq · 2025-11-22T00:47:16Z

Summary

unwrap_tensor_subclass was not unwrapping nested lora linears. This meant qdata/scale/zero were bundled together in the subclass, and separated at run decompositions inside to_edge_transform_and_lower. This is after nodes are tagged, meaning that the scales were not tagged, and remained in the PTE file after the rest of the weights were moved to a PTD file.

It's recommended to move away from unwrap_tensor_subclass and rely on export + decomps. This PR adds a decomp after exporting in export_llama, and removes cases of unwrap_tensor_subclass.

TODO: remove all cases of unwrap_tensor_subclass in ET.

Test plan

Add check that quantized weights are in PTD file (not PTE file) after quantization. This is a simple check, nested linears seem to be the real issue that decomposing resolves. TODO to add a test for that (probably e2e test with stories in subsequent PR)

 python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass

pytorch-bot · 2025-11-22T00:47:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15951

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit afc924f with merge base 04f1e4d ():

NEW FAILURES - The following jobs have failed:

pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh)
RuntimeError: Command docker exec -t fb04038d47a186b818e51c1bb718cf22f8edc42a12d4a57d3081918e5e155792 /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-22T00:47:56Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2025-11-24T22:41:23Z

@lucylq has imported this pull request. If you are a Meta employee, you can view this in D87826410.

metascroy · 2025-11-24T23:11:30Z

LGTM! Address any failing CI tests before landing.

Summary: `unwrap_tensor_subclass` was not unwrapping nested lora linears. This meant qdata/scale/zero were bundled together in the subclass, and separated at run decompositions inside to_edge_transform_and_lower. This is after nodes are tagged, meaning that the scales were not tagged, and remained in the PTE file after the rest of the weights were moved to a PTD file. It's recommended to move away from `unwrap_tensor_subclass` and rely on export + decomps. This PR adds a decomp after exporting in export_llama, and removes cases of `unwrap_tensor_subclass`. TODO: remove all cases of `unwrap_tensor_subclass` in ET. Test Plan: Add check that quantized weights are in PTD file (not PTE file) after quantization. This is a simple check, nested linears seem to be the real issue that decomposing resolves. TODO to add a test for that (probably e2e test with stories in subsequent PR) ``` python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass ``` Reviewed By: metascroy Differential Revision: D87826410 Pulled By: lucylq

meta-codesync · 2025-12-02T18:55:55Z

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87826410.

Summary: `unwrap_tensor_subclass` was not unwrapping nested lora linears. This meant qdata/scale/zero were bundled together in the subclass, and separated at run decompositions inside to_edge_transform_and_lower. This is after nodes are tagged, meaning that the scales were not tagged, and remained in the PTE file after the rest of the weights were moved to a PTD file. It's recommended to move away from `unwrap_tensor_subclass` and rely on export + decomps. This PR adds a decomp after exporting in export_llama, and removes cases of `unwrap_tensor_subclass`. TODO: remove all cases of `unwrap_tensor_subclass` in ET. Test Plan: Add check that quantized weights are in PTD file (not PTE file) after quantization. This is a simple check, nested linears seem to be the real issue that decomposing resolves. TODO to add a test for that (probably e2e test with stories in subsequent PR) ``` python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass ``` Reviewed By: metascroy Differential Revision: D87826410 Pulled By: lucylq

### Summary Use qwen0.6B with unsloth (instead of llama1B with torchtune) for lora test. 1. Smaller model / quicker test. 2. Eventually remove dependency on torchtune. 3. Qwen is not gated on HF. TODO: add quantized test after #15951 ``` Expected result prefix: <|im_start|>user Calculate 15% of 80?<|im_end|><|im_start|>assistant To calculate 15% of 80, we can multiply 80 by 0.15. 80 * 0.15 = 12 So, 15% of 80 is 12. #### 12 The answer is: 12<|im_end|> + echo 'Actual result: <|im_start|>user Calculate 15% of 80?<|im_end|><|im_start|>assistant To calculate 15% of 80, we can multiply 80 by 0.15. 80 * 0.15 = 12 So, 15% of 80 is 12. #### 12 The answer is: 12<|im_end|> PyTorchObserver {"prompt_tokens":15,"generated_tokens":65,"model_load_start_ms":1765320124550,"model_load_end_ms":1765320127516,"inference_start_ms":1765320152867,"inference_end_ms":1765320178119,"prompt_eval_end_ms":1765320153334,"first_token_ms":1765320153334,"aggregate_sampling_time_ms":19,"SCALING_FACTOR_UNITS_PER_SECOND":1000}' Actual result: <|im_start|>user Calculate 15% of 80?<|im_end|><|im_start|>assistant To calculate 15% of 80, we can multiply 80 by 0.15. 80 * 0.15 = 12 So, 15% of 80 is 12. #### 12 The answer is: 12<|im_end|> PyTorchObserver {"prompt_tokens":15,"generated_tokens":65,"model_load_start_ms":1765320124550,"model_load_end_ms":1765320127516,"inference_start_ms":1765320152867,"inference_end_ms":1765320178119,"prompt_eval_end_ms":1765320153334,"first_token_ms":1765320153334,"aggregate_sampling_time_ms":19,"SCALING_FACTOR_UNITS_PER_SECOND":1000} + echo Success Success ```

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2025

lucylq force-pushed the lfq.decomp-after-export branch 2 times, most recently from 3d09dc1 to 10783cb Compare November 24, 2025 22:35

lucylq marked this pull request as ready for review November 24, 2025 22:35

lucylq requested review from digantdesai, jackzhxng, larryliu0820 and mergennachin as code owners November 24, 2025 22:35

lucylq requested a review from metascroy November 24, 2025 22:35

metascroy approved these changes Nov 24, 2025

View reviewed changes

lucylq mentioned this pull request Dec 2, 2025

Quantization for program-data separation #15419

Closed

facebook-github-bot force-pushed the lfq.decomp-after-export branch from 10783cb to 22a9e78 Compare December 2, 2025 18:55

meta-codesync bot added fb-exported meta-exported labels Dec 2, 2025

facebook-github-bot force-pushed the lfq.decomp-after-export branch from 22a9e78 to d9e7c6c Compare December 8, 2025 18:20

facebook-github-bot force-pushed the lfq.decomp-after-export branch from d9e7c6c to afc924f Compare December 9, 2025 00:43

meta-codesync bot merged commit 2f501c5 into main Dec 9, 2025
164 of 168 checks passed

meta-codesync bot deleted the lfq.decomp-after-export branch December 9, 2025 08:04

lucylq mentioned this pull request Dec 9, 2025

Add lora test using qwen #16161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decompose after export in export_llama #15951

Decompose after export in export_llama #15951

lucylq commented Nov 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

meta-codesync bot commented Nov 24, 2025

Uh oh!

metascroy commented Nov 24, 2025

Uh oh!

meta-codesync bot commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Decompose after export in export_llama #15951

Decompose after export in export_llama #15951

Conversation

lucylq commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15951

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Nov 22, 2025

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Nov 24, 2025

Uh oh!

metascroy commented Nov 24, 2025

Uh oh!

meta-codesync bot commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucylq commented Nov 22, 2025 •

edited

Loading

pytorch-bot bot commented Nov 22, 2025 •

edited

Loading

This PR needs a `release notes:` label