Add pruning-aware training in torchao.prototype.pat by lisjin · Pull Request #3429 · pytorch/ao

lisjin · 2025-12-03T21:54:08Z

Adding our pruning-aware training (PAT) library as a prototype. The original library is under fairinternal/qpat but we would like to surface it in torchao for broader adoption.

The interface is almost identical to torchao.prototype.parq, but we use (group) Lasso instead of piecewise-affine regularization. More details on code organization and usage can be found in the README.

pytorch-bot · 2025-12-03T21:54:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3429

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1c753dc with merge base d988122 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lisjin · 2025-12-08T14:10:10Z

@andrewor14 Let me know if anything needs to be cleared up in this diff. I'm hoping to update D88501706 so that it imports from torchao.prototype.pat instead of copying code.

meta-codesync · 2025-12-08T14:11:11Z

@lisjin has imported this pull request. If you are a Meta employee, you can view this in D88638093.

andrewor14 · 2025-12-12T20:28:39Z

torchao/prototype/pat/optim/pruneopt.py

+    a base optimizer (e.g., SGD or AdamW)
+        - update the latent variables for QAT
+    Other parameters:
+        warmup_steps: int >= 0


This is the central API right, can we add an example usage in this docstring?

Good call—I updated the README example to include keyword args like warmup_steps and reg_lambda

andrewor14 · 2025-12-12T20:33:45Z

torchao/prototype/pat/layers/masked_layernorm.py

+    return out
+
+
+class MaskedLayerNorm(nn.LayerNorm):


Seems like this is not used anywhere other than in tests. Can we delete this? Am I missing something?

I'm hoping to keep this class since it's important for converting pruned models to their compressed inference-ready forms. This functionality can be added to PAT in the future.

torchao/prototype/pat/utils/__init__.py

torchao/prototype/pat/utils/distributed.py

torchao/prototype/pat/optim/pruneopt.py

andrewor14 · 2025-12-12T20:47:21Z

torchao/prototype/pat/optim/nm_sgd.py

+from .pruneopt import PruneOptimizer
+
+
+class NMSGDOptimizer(PruneOptimizer):


General question: I notice a lot of APIs in this PR that are not used or referenced anywhere. Are these all user-facing APIs? If so can we document them somewhere (e.g. main README) and explain how they're related to the main PruneOptimizer API? If they're not user-facing APIs and they're not used, do we still need them?

Some examples:

NMSGDOptimizer

ProxNuclearNorm

all the groupers like QKSVDGrouper

The NMSGDOptimizer was written by a summer intern last year and has shown promising results. Since it's an experimental feature, we don't have unit tests for it yet.

ProxNuclearNorm is important for applying low-rank pruning to embeddings. Here's an example config.

The other groupers are more experimental. It would be great to keep them around so that we can stay in sync with the original repo, but I can also remove them if you'd like.

I see, we can keep them if we document them somewhere. If they're experimental we can mark them as such in the README. In general public APIs should have associated documentation somewhere, otherwise users won't be able to find them

andrewor14 · 2025-12-12T20:50:50Z

Hi @lisjin looks good overall. My main comment is just my confusion about how the APIs are used, seems like the code snippet in the main README only references 1 or 2 of these, so it's unclear to me how the rest are related. Would be great if you can clarify this in documentation.

Separately do you have any initial results? If so, would be great to include these in the README too.

lisjin · 2026-02-12T15:58:51Z

@andrewor14 Thanks for taking the time to review this back in Dec! I found out in January that the team I was collaborating with no longer needed to use PAT in torchao. However, now @Ninja91 and his team are planning to experiment with PAT. Could you please check that my fixes addressed all your comments? I've also added some initial results on unstructured pruning to the README.

torchao/prototype/pat/distributed_utils.py

andrewor14 · 2026-02-13T16:17:45Z

torchao/prototype/pat/README.md

+    {
+        "params": weights",
+        "group_type": "pat.group.Dim0Grouper",
+        "prox_type": "pat.prox.ProxGroupLasso",


Should these take in actual classes instead of strings of classes? Seems like it'll be more robust

Ah this usage is actually outdated. I updated it a while back to accept strings like "Dim0Grouper" and "ProxGroupLasso" so that there's no dependency on import structure. The README is fixed to reflect this.

torchao/prototype/pat/__init__.py

andrewor14 · 2026-02-13T16:20:47Z

torchao/prototype/pat/optim/nm_sgd.py

+from .pruneopt import PruneOptimizer
+
+
+class NMSGDOptimizer(PruneOptimizer):


I see, we can keep them if we document them somewhere. If they're experimental we can mark them as such in the README. In general public APIs should have associated documentation somewhere, otherwise users won't be able to find them

lisjin · 2026-02-13T17:47:16Z

@andrewor14 Thanks for the suggestions again. Here's what I've updated in the latest commit:

Removed experimental classes like NMSGDOptimizer, QKGrouper, QKSVDGrouper
Documented all remaining grouper and proximal mapping classes in a new table of the README
Added underscores to non user-facing methods in distributed_utils.py

Let me know if anything's missing—this is very much a research prototype :)

andrewor14

Looks good, thanks!

lisjin requested a review from andrewor14 December 3, 2025 21:54

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2025

lisjin added the topic: new feature Use this tag if this PR adds a new feature label Dec 3, 2025

lisjin force-pushed the lvj/pat branch 2 times, most recently from ffa338e to 4f78b65 Compare December 8, 2025 14:08

andrewor14 reviewed Dec 12, 2025

View reviewed changes

lisjin force-pushed the lvj/pat branch 3 times, most recently from 73a572c to 71c2270 Compare February 12, 2026 15:58

lisjin force-pushed the lvj/pat branch from 71c2270 to 392e4f8 Compare February 12, 2026 16:02

lisjin added 2 commits February 12, 2026 11:33

Add pruning-aware training in torchao.prototype.pat

4f2f856

Address PR comments

0a63652

lisjin force-pushed the lvj/pat branch from 392e4f8 to 0a63652 Compare February 12, 2026 19:33

andrewor14 reviewed Feb 13, 2026

View reviewed changes

Remove experimental modules; document grouper + proximal mapping pairs

1c753dc

andrewor14 approved these changes Feb 20, 2026

View reviewed changes

lisjin enabled auto-merge (squash) February 23, 2026 14:44

lisjin merged commit 2a37912 into main Feb 23, 2026
21 of 22 checks passed

lisjin deleted the lvj/pat branch February 23, 2026 15:14

lisjin mentioned this pull request Apr 11, 2026

Fix DTensor errors in PruneOptimizer, add iterative reweighting #4267

Open

		from .pruneopt import PruneOptimizer


		class NMSGDOptimizer(PruneOptimizer):

Conversation

lisjin commented Dec 3, 2025

Uh oh!

pytorch-bot bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3429

✅ No Failures

Uh oh!

lisjin commented Dec 8, 2025

Uh oh!

meta-codesync bot commented Dec 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Dec 12, 2025

Uh oh!

lisjin commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lisjin commented Feb 13, 2026

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Dec 3, 2025 •

edited

Loading