New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dtensor] enable foreach operators for adam optimizer #112108
Conversation
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc @awgu [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112108
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit e17ac89 with merge base 08dbfec (): BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc awgu ghstack-source-id: bf89d590477fe43dc8f00220ac9713e3c775a442 Pull Request resolved: #112108
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc awgu [ghstack-poisoned]
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc awgu ghstack-source-id: d4e77bc5939eb6955f536c906ef874a51c4f6dc6 Pull Request resolved: #112108
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement cc awgu [ghstack-poisoned]
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc awgu ghstack-source-id: 99df084c81043d803a6a18f82dd924ed5dc88096 Pull Request resolved: #112108
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
As titled. cc @XilunWu [ghstack-poisoned]
As titled. cc XilunWu ghstack-source-id: 875fb91d4e8dbe9e34718f9df86715ec41981923 Pull Request resolved: #112472
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement cc awgu [ghstack-poisoned]
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. cc awgu ghstack-source-id: b37c497620b9b15895bbe997c4774e601b2bf9fd Pull Request resolved: #112108
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@wanchaol What is the technical complexity to further enable fused Adam? |
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement Pull Request resolved: pytorch#112108 Approved by: https://github.com/wz337
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement Pull Request resolved: pytorch#112108 Approved by: https://github.com/wz337
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement Pull Request resolved: pytorch#112108 Approved by: https://github.com/wz337
Stack from ghstack (oldest at bottom):
This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance
compare to optimizer using torch.Tensor. Currently by default optimizer
won't do this for tensor subclass, we will need to enable this by
default in DTensor when all ops are covered, or early enable this when
exploring new FSDP, we just need to append DTensor to the optimizer
allow list.
Some latency measurement, on a 5-layer MLP model:
single tensor adam: 17ms
foreach multitensor adam: 4ms
so around 4.25x improvement
cc @awgu