Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3429
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1c753dc with merge base d988122 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ffa338e to
4f78b65
Compare
|
@andrewor14 Let me know if anything needs to be cleared up in this diff. I'm hoping to update D88501706 so that it imports from torchao.prototype.pat instead of copying code. |
| a base optimizer (e.g., SGD or AdamW) | ||
| - update the latent variables for QAT | ||
| Other parameters: | ||
| warmup_steps: int >= 0 |
There was a problem hiding this comment.
This is the central API right, can we add an example usage in this docstring?
There was a problem hiding this comment.
Good call—I updated the README example to include keyword args like warmup_steps and reg_lambda
| return out | ||
|
|
||
|
|
||
| class MaskedLayerNorm(nn.LayerNorm): |
There was a problem hiding this comment.
Seems like this is not used anywhere other than in tests. Can we delete this? Am I missing something?
There was a problem hiding this comment.
I'm hoping to keep this class since it's important for converting pruned models to their compressed inference-ready forms. This functionality can be added to PAT in the future.
| from .pruneopt import PruneOptimizer | ||
|
|
||
|
|
||
| class NMSGDOptimizer(PruneOptimizer): |
There was a problem hiding this comment.
General question: I notice a lot of APIs in this PR that are not used or referenced anywhere. Are these all user-facing APIs? If so can we document them somewhere (e.g. main README) and explain how they're related to the main PruneOptimizer API? If they're not user-facing APIs and they're not used, do we still need them?
Some examples:
- NMSGDOptimizer
- ProxNuclearNorm
- all the groupers like QKSVDGrouper
There was a problem hiding this comment.
- The NMSGDOptimizer was written by a summer intern last year and has shown promising results. Since it's an experimental feature, we don't have unit tests for it yet.
- ProxNuclearNorm is important for applying low-rank pruning to embeddings. Here's an example config.
- The other groupers are more experimental. It would be great to keep them around so that we can stay in sync with the original repo, but I can also remove them if you'd like.
There was a problem hiding this comment.
I see, we can keep them if we document them somewhere. If they're experimental we can mark them as such in the README. In general public APIs should have associated documentation somewhere, otherwise users won't be able to find them
|
Hi @lisjin looks good overall. My main comment is just my confusion about how the APIs are used, seems like the code snippet in the main README only references 1 or 2 of these, so it's unclear to me how the rest are related. Would be great if you can clarify this in documentation. Separately do you have any initial results? If so, would be great to include these in the README too. |
73a572c to
71c2270
Compare
|
@andrewor14 Thanks for taking the time to review this back in Dec! I found out in January that the team I was collaborating with no longer needed to use PAT in torchao. However, now @Ninja91 and his team are planning to experiment with PAT. Could you please check that my fixes addressed all your comments? I've also added some initial results on unstructured pruning to the README. |
torchao/prototype/pat/README.md
Outdated
| { | ||
| "params": weights", | ||
| "group_type": "pat.group.Dim0Grouper", | ||
| "prox_type": "pat.prox.ProxGroupLasso", |
There was a problem hiding this comment.
Should these take in actual classes instead of strings of classes? Seems like it'll be more robust
There was a problem hiding this comment.
Ah this usage is actually outdated. I updated it a while back to accept strings like "Dim0Grouper" and "ProxGroupLasso" so that there's no dependency on import structure. The README is fixed to reflect this.
| from .pruneopt import PruneOptimizer | ||
|
|
||
|
|
||
| class NMSGDOptimizer(PruneOptimizer): |
There was a problem hiding this comment.
I see, we can keep them if we document them somewhere. If they're experimental we can mark them as such in the README. In general public APIs should have associated documentation somewhere, otherwise users won't be able to find them
|
@andrewor14 Thanks for the suggestions again. Here's what I've updated in the latest commit:
Let me know if anything's missing—this is very much a research prototype :) |
Adding our pruning-aware training (PAT) library as a prototype. The original library is under fairinternal/qpat but we would like to surface it in torchao for broader adoption.
The interface is almost identical to torchao.prototype.parq, but we use (group) Lasso instead of piecewise-affine regularization. More details on code organization and usage can be found in the README.