Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dist] Add fallback reduce_scatter_base, allgather_base APIs to Gloo #112144

Closed
wants to merge 3 commits into from

Conversation

rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Oct 26, 2023

Stack from ghstack (oldest at bottom):

Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to
enable FSDP on CPUs

Differential Revision: D50636382

Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to
enable FSDP on CPUs

Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 26, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112144

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 87c691b with merge base dd24e92 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via phabricator.

Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some unit tests for these?

Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -1919,6 +1919,38 @@ class AsyncAllgatherCUDAWork : public AsyncAllgatherWork {

} // namespace

c10::intrusive_ptr<Work> ProcessGroupGloo::_reduce_scatter_base(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, maybe worth removing those fallbacks in the functional collective paths and this might help testing https://github.com/pytorch/pytorch/blob/main/torch/distributed/_functional_collectives_impl.py#L180

@XilunWu XilunWu self-requested a review October 31, 2023 21:20
…Is to Gloo"

Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to
enable FSDP on CPUs

Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/)

[ghstack-poisoned]
…Is to Gloo"

Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to
enable FSDP on CPUs

Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/)

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 7, 2023
Differential Revision: [D50688958](https://our.internmc.facebook.com/intern/diff/D50688958/)

Pull Request resolved: #112145
Approved by: https://github.com/fegin
ghstack dependencies: #112144
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/744/head branch November 10, 2023 15:24
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants