New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dist] Add fallback reduce_scatter_base, allgather_base APIs to Gloo #112144
Conversation
Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112144
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 87c691b with merge base dd24e92 (): FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved via phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have some unit tests for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -1919,6 +1919,38 @@ class AsyncAllgatherCUDAWork : public AsyncAllgatherWork { | |||
|
|||
} // namespace | |||
|
|||
c10::intrusive_ptr<Work> ProcessGroupGloo::_reduce_scatter_base( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, maybe worth removing those fallbacks in the functional collective paths and this might help testing https://github.com/pytorch/pytorch/blob/main/torch/distributed/_functional_collectives_impl.py#L180
…Is to Gloo" Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) [ghstack-poisoned]
…Is to Gloo" Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) [ghstack-poisoned]
Differential Revision: [D50688958](https://our.internmc.facebook.com/intern/diff/D50688958/) Pull Request resolved: #112145 Approved by: https://github.com/fegin ghstack dependencies: #112144
…ytorch#112144) Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) Pull Request resolved: pytorch#112144 Approved by: https://github.com/wz337, https://github.com/fegin, https://github.com/wanchaol, https://github.com/XilunWu
Differential Revision: [D50688958](https://our.internmc.facebook.com/intern/diff/D50688958/) Pull Request resolved: pytorch#112145 Approved by: https://github.com/fegin ghstack dependencies: pytorch#112144
…ytorch#112144) Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) Pull Request resolved: pytorch#112144 Approved by: https://github.com/wz337, https://github.com/fegin, https://github.com/wanchaol, https://github.com/XilunWu
Differential Revision: [D50688958](https://our.internmc.facebook.com/intern/diff/D50688958/) Pull Request resolved: pytorch#112145 Approved by: https://github.com/fegin ghstack dependencies: pytorch#112144
Stack from ghstack (oldest at bottom):
Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to
enable FSDP on CPUs
Differential Revision: D50636382