[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810

H-Huang · 2022-08-20T22:07:29Z

Stack from ghstack:

[8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations #84423 [8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations
[7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations
[6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations #83876 [6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations
[5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations #83859 [5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations
[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810 [4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations
[3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations #83735 [3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations

About this PR

Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op.
Update test to validate that a separate device implementation is not supported.

Context

#86225

Differential Revision: D39506979

…mplementations [ghstack-poisoned]

facebook-github-bot · 2022-08-20T22:07:35Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83810
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (22 Pending)

As of commit 29a2486 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…mplementations ghstack-source-id: be18410d00cd737835b5aeb68ed6ee23902833a1 Pull Request resolved: #83810

…PU / CUDA implementations" [ghstack-poisoned]

…mplementations ghstack-source-id: a7471b2c8deaabc3113e2fa01493fd80b2cc774e Pull Request resolved: #83810

…PU / CUDA implementations" [ghstack-poisoned]

…mplementations ghstack-source-id: 5dd79ca430a5ff5c1530db15b8e2a1cab3dc2715 Pull Request resolved: #83810

…PU / CUDA implementations" [ghstack-poisoned]

kwen2501

LGTM!

kwen2501 · 2022-08-31T06:35:16Z

torch/csrc/distributed/c10d/OpsImpl.cpp

+// sparse all_reduce in the Gloo backend
+TORCH_LIBRARY_IMPL(c10d, SparseCPU, m) {
+  m.impl("allreduce_", allreduce_cpu_);
+}


nit: is there actually a SparseCPU or SparseCUDA implementation in Gloo?
If not, do we need to add it here?

There is logic within allreduce() to handle sparse tensors (https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp#L1465-L1467) to which it branches to different logic, so having each implementation just call allreduce_ keeps behavior the same, but I think there may be a cleaner way to do this, we should discuss this.

…PU / CUDA implementations" [ghstack-poisoned]

pytorch-bot · 2022-09-13T17:23:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83810

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 84a5880:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang · 2022-09-14T15:11:29Z

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/csrc/distributed/c10d/OpsImpl.cpp

…PU / CUDA implementations" ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) [ghstack-poisoned]

H-Huang · 2022-09-28T00:36:40Z

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

H-Huang · 2022-09-28T06:03:31Z

@pytorchbot merge

pytorchmergebot · 2022-09-28T06:05:47Z

@pytorchbot successfully started a merge job. Check the current status here and land check progress here.
The merge job was triggered with the land checks (-l) flag. If you did not specify this flag yourself, you are likely enrolled in the land checks rollout. This means that your change will be merged once all checks on your PR and the land checks have passed (ETA 4 Hours). If you need to coordinate lands between different changes and cannot risk a land race, please add the ciflow/trunk label to your PR and wait for signal to complete, and then land your changes in proper order. Having trunk, pull, and Lint pre-run on a PR will bypass land checks and the ETA should be immediate. If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

…mplementations (#83810) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: #83810 Approved by: https://github.com/kwen2501

…mplementations (pytorch#83810) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: pytorch#83810 Approved by: https://github.com/kwen2501

…mplementations (#83810) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: #83810 Approved by: https://github.com/kwen2501

…mplementations (pytorch#83810) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: pytorch#83810 Approved by: https://github.com/kwen2501

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA i…

f19b77a

…mplementations [ghstack-poisoned]

H-Huang requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, awgu and mingzhe09088 as code owners August 20, 2022 22:07

facebook-github-bot added the cla signed label Aug 20, 2022

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 20, 2022

H-Huang added a commit that referenced this pull request Aug 20, 2022

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA i…

d429d3c

…mplementations ghstack-source-id: be18410d00cd737835b5aeb68ed6ee23902833a1 Pull Request resolved: #83810

H-Huang requested a review from kwen2501 August 20, 2022 22:09

H-Huang added module: c10d Issues/PRs related to collective communications and process groups release notes: distributed (c10d) release notes category topic: new features topic category labels Aug 20, 2022

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

feba8dc

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang added a commit that referenced this pull request Aug 22, 2022

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA i…

30084a6

…mplementations ghstack-source-id: a7471b2c8deaabc3113e2fa01493fd80b2cc774e Pull Request resolved: #83810

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

87e6990

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang added a commit that referenced this pull request Aug 22, 2022

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA i…

8346f2d

…mplementations ghstack-source-id: 5dd79ca430a5ff5c1530db15b8e2a1cab3dc2715 Pull Request resolved: #83810

H-Huang mentioned this pull request Aug 22, 2022

[5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations #83859

Closed

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

3eb482e

…PU / CUDA implementations" [ghstack-poisoned]

This was referenced Aug 22, 2022

[6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations #83876

Closed

[7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations #83916

Closed

kwen2501 approved these changes Aug 31, 2022

View reviewed changes

H-Huang added 3 commits August 31, 2022 11:42

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

0d01c2f

…PU / CUDA implementations" [ghstack-poisoned]

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

2553b14

…PU / CUDA implementations" [ghstack-poisoned]

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

30a65c1

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang added 2 commits August 31, 2022 15:27

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

d07baf1

…PU / CUDA implementations" [ghstack-poisoned]

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

4898b1e

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang mentioned this pull request Sep 1, 2022

[8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations #84423

Closed

H-Huang added 2 commits September 7, 2022 12:03

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

29a2486

…PU / CUDA implementations" [ghstack-poisoned]

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

ba88b62

…PU / CUDA implementations" [ghstack-poisoned]

Update on "[4/N] [Dispatchable Collectives] Update all_reduce_ with C…

5a06b5b

…PU / CUDA implementations" [ghstack-poisoned]

H-Huang mentioned this pull request Sep 14, 2022

[PT_BREAK] #83735 CI job failure due to TestPjRtDistributedDataParallel pytorch/xla#4005

Closed

kumpera reviewed Sep 27, 2022

View reviewed changes

torch/csrc/distributed/c10d/OpsImpl.cpp Show resolved Hide resolved

pytorchmergebot added the Merged label Sep 28, 2022

pytorchmergebot closed this in 06e0583 Sep 28, 2022

facebook-github-bot deleted the gh/H-Huang/76/head branch October 1, 2022 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810

H-Huang commented Aug 20, 2022 •

edited

facebook-github-bot commented Aug 20, 2022 •

edited

kwen2501 left a comment

kwen2501 Aug 31, 2022

H-Huang Aug 31, 2022

pytorch-bot bot commented Sep 13, 2022 •

edited

H-Huang commented Sep 14, 2022

H-Huang commented Sep 28, 2022

H-Huang commented Sep 28, 2022

pytorchmergebot commented Sep 28, 2022

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810

[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations #83810

Conversation

H-Huang commented Aug 20, 2022 • edited

About this PR

Context

facebook-github-bot commented Aug 20, 2022 • edited

🔗 Helpful links

✅ No Failures (22 Pending)

kwen2501 left a comment

Choose a reason for hiding this comment

kwen2501 Aug 31, 2022

Choose a reason for hiding this comment

H-Huang Aug 31, 2022

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 13, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83810

✅ No Failures

H-Huang commented Sep 14, 2022

H-Huang commented Sep 28, 2022

H-Huang commented Sep 28, 2022

pytorchmergebot commented Sep 28, 2022

H-Huang commented Aug 20, 2022 •

edited

facebook-github-bot commented Aug 20, 2022 •

edited

pytorch-bot bot commented Sep 13, 2022 •

edited