-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[optim] Adam defaults to fused when CUDA + differentiable=False #90865
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90865
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5df2847: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
c292b78
to
e379ec5
Compare
e379ec5
to
5df2847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
@@ -105,9 +105,12 @@ class Adam(Optimizer): | |||
capturable (bool, optional): whether this instance is safe to capture in a CUDA graph. | |||
Passing True can impair ungraphed performance, so if you don't intend to | |||
graph capture this instance, leave it False (default: False) | |||
fused (bool, optional): whether fused implementation of optimizer is used. | |||
fused (bool, optional): whether the fused implementation (CUDA only) is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is not really cuda-only? You could use it for CPU. And other device like xla/mps/xpu might benefit from it as well.
Or you prefer to mention this for now and update it when we get more implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I thought it was CUDA only since fused_adam only has a CUDA dispatch, but yea we can update this when there are more implementations.
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L14471-L14476
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 additional jobs have failed, first few of them are: build Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "build ios failure not related" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#91896) following up to #90865 and #92048 Pull Request resolved: #91896 Approved by: https://github.com/albanD
…g to fused (#92181) A "fix" following #90865. Realized that fused is not compatible with torch.jit.is_scripting() when looking at a later line. Took the opportunity to make the code cleaner/slightly more performant (with the extends) as well. Pull Request resolved: #92181 Approved by: https://github.com/albanD
@janeyx99 what would be the expected behavior when |
Step 1 in faster default optimizers.
Preliminary benchmarks show gaps in improvement on CUDA for BERT_pytorch and resnet18:
![image](https://user-images.githubusercontent.com/31798555/207707118-14221802-77ce-4ee0-96e3-04638c07924c.png)