New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cudagraphs support for compiled optimizers #107504
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107504
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 283cb16 with merge base ad17e5e (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! A few questions:
- should grads be marked as static? I dont know that they will be. if they're not, it will just lead to endless recompilation.
- The cleanup stuff is cool, i dont know if it could be more generic. Also, it wouldnt be sufficient to clean up cudagraph memory usage.. we'll need something where we invalidate a particular compilation within cudagraphs. i can submit that pr separately, not a blocker for this.
-You might want someone to help with dynamo reviewing that is not me
I will let @eellison handle the review on this one |
@pytorchbot merge |
This PR needs to be approved by an authorized maintainer before merge. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This removes extra copies introduced by torch.compile when compiling the optimizer. These copies would tank perf and make cudagraphs unusable.
Marks all params/optimizer state as static addresses and a finalizer which cleans up the graph attributes when the optimizer goes out of scope.
**Note: this does not mark grads as static because this will increase memory usage significantly
There are two cases:
None
inzero_grad()
#107853) in flight to throw an error if zero_grad attempts to set static grads to None.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305