Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudagraphs support for compiled optimizers #107504

Closed
wants to merge 13 commits into from
Closed

Conversation

mlazos
Copy link
Contributor

@mlazos mlazos commented Aug 19, 2023

This removes extra copies introduced by torch.compile when compiling the optimizer. These copies would tank perf and make cudagraphs unusable.

Marks all params/optimizer state as static addresses and a finalizer which cleans up the graph attributes when the optimizer goes out of scope.

**Note: this does not mark grads as static because this will increase memory usage significantly

There are two cases:

  1. The upstream graph is cudagraphed - this case will work fine OOTB
  2. The upstream graph is not cudagraphed - in this case, there will be a lot of copies introduced from the upstream (to copy the grads) into cudagraphed-owned memory, unless the user explicitly marks the grads as static. If the user does this, this will also require not deallocating the grads in zero_grad() (either the mod or optimizer version) by setting them to zero vs None. There is a PR (Throw error if setting static grads to None in zero_grad() #107853) in flight to throw an error if zero_grad attempts to set static grads to None.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

@mlazos mlazos requested a review from eellison August 19, 2023 03:25
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 19, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107504

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 283cb16 with merge base ad17e5e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! A few questions:

  • should grads be marked as static? I dont know that they will be. if they're not, it will just lead to endless recompilation.
  • The cleanup stuff is cool, i dont know if it could be more generic. Also, it wouldnt be sufficient to clean up cudagraph memory usage.. we'll need something where we invalidate a particular compilation within cudagraphs. i can submit that pr separately, not a blocker for this.
    -You might want someone to help with dynamo reviewing that is not me

torch/_dynamo/variables/optimizer.py Show resolved Hide resolved
torch/_dynamo/variables/optimizer.py Show resolved Hide resolved
@mlazos mlazos requested a review from jansel August 22, 2023 04:01
@jansel jansel removed their request for review August 23, 2023 16:37
@jansel
Copy link
Contributor

jansel commented Aug 23, 2023

I will let @eellison handle the review on this one

@mlazos
Copy link
Contributor Author

mlazos commented Aug 30, 2023

@pytorchbot merge

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 30, 2023

This PR needs to be approved by an authorized maintainer before merge.

@mlazos mlazos requested a review from eellison August 30, 2023 19:42
@mlazos
Copy link
Contributor Author

mlazos commented Aug 31, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants