[10/N] Update barrier with CPU/CUDA implementations #86368

H-Huang · 2022-10-06T15:19:18Z

Stack from ghstack:

Remove ProcessGroupRoundRobin #87088 Remove ProcessGroupRoundRobin
[13/N] Update gather with CPU/CUDA implementations #86409 [13/N] Update gather with CPU/CUDA implementations
[12/N] Update scatter with CPU/CUDA implementations #86408 [12/N] Update scatter with CPU/CUDA implementations
[11/N] Update all_to_all with CPU/CUDA implementations #86407 [11/N] Update all_to_all with CPU/CUDA implementations
[10/N] Update barrier with CPU/CUDA implementations #86368 [10/N] Update barrier with CPU/CUDA implementations
[9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations #86166 [9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations

Changes

Updates for the barrier collective
NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from

Context

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu

[ghstack-poisoned]

pytorch-bot · 2022-10-06T15:19:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86368

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 51f3b75:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

H-Huang · 2022-10-06T15:21:25Z

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

awgu · 2022-10-06T18:36:05Z

torch/csrc/distributed/c10d/OpsImpl.cpp

@@ -286,6 +302,15 @@ TORCH_LIBRARY_IMPL(c10d, CPU, m) {
 TORCH_LIBRARY_IMPL(c10d, CUDA, m) {
  m.impl("reduce_scatter_", reduce_scatter_cuda_);
 }
+
+TORCH_LIBRARY_IMPL(c10d, CPU, m) {
+  m.impl("barrier", barrier_cpu);


How do you decide if there should be a trailing underscore?

the convention for PT operators (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml) is if the tensor is modified inplace, then operator should be appended with _. We don't do this for barrier and send

Thank you Professor Huang!

### Changes - Updates for the barrier collective ### Context #86225 Differential Revision: [D40145698](https://our.internmc.facebook.com/intern/diff/D40145698) [ghstack-poisoned]

H-Huang · 2022-10-07T16:06:21Z

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kwen2501

It is okay to land current changes since they lead to the same behavior.
But we should note that current change would not achieve dispatching of barrier.
Also added some minor comments about the test. (Sorry I missed them previously; can be improved in a future PR)

kwen2501 · 2022-10-28T17:37:03Z

test/distributed/test_c10d_common.py

@@ -1488,6 +1490,7 @@ def _test_collectives(self, backend):
            (dist.all_reduce,),


Sorry, I may have missed this earlier. Would passing self.rank to reduce and broadcast cause each rank identifying a different root and hang? They should have the same root.

Passing self.rank causes the broadcast operation to be sourced from that rank, but it does not hang waiting for other ranks to ACK the broadcast it sent. The same logic applies to reduce so thats why I believe it is not hanging

kwen2501 · 2022-10-28T17:41:23Z

test/distributed/test_c10d_common.py

-        if collective == dist.all_gather:
+        if collective == dist.barrier:
+            collective()
+        elif collective == dist.all_gather:
            collective([tensor], tensor, *args)


nit: this if-else block is getting bigger (as a result of wrapping/templating).
Maybe the test would be easier to read if we just write out each test call in _test_collectives, like:

dist.barrier() dist.all_reduce(tensor) ...

Is collective([tensor], tensor, *args) a correct format for all_gather? i.e. it will have a list of only one tensor for the output. Or are we testing the dispatching functionality with WORLD_SIZE=1 here? (if so the code makes sense)

Makes sense, the if statement is getting a bit messy, I will change it in future PRs.

We are just testing dispatching functionality and making sure the operation is callable. So handling the latter case, since collective correctness should be handled by other tests.

@mrshenli

### Changes - Updates for the barrier collective ### Context #86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu [ghstack-poisoned]

H-Huang · 2022-11-01T17:38:53Z

@pytorchbot merge

pytorchmergebot · 2022-11-01T17:40:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@mrshenli

### Changes - Updates for the barrier collective - NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from ### Context pytorch#86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu Pull Request resolved: pytorch#86368 Approved by: https://github.com/kwen2501

@mrshenli

### Changes - Updates for the barrier collective - NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from ### Context pytorch#86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu Pull Request resolved: pytorch#86368 Approved by: https://github.com/kwen2501

[10/N] Update barrier with CPU/CUDA implementations

d7b2fdb

[ghstack-poisoned]

H-Huang requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, awgu, kwen2501 and mingzhe09088 as code owners October 6, 2022 15:19

H-Huang mentioned this pull request Oct 6, 2022

[7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations #83916

Closed

H-Huang mentioned this pull request Oct 6, 2022

[8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations #84423

Closed

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Oct 6, 2022

H-Huang mentioned this pull request Oct 6, 2022

[9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations #86166

Closed

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Oct 6, 2022

awgu reviewed Oct 6, 2022

View reviewed changes

Update on "[10/N] Update barrier with CPU/CUDA implementations"

3c69c7c

### Changes - Updates for the barrier collective ### Context #86225 Differential Revision: [D40145698](https://our.internmc.facebook.com/intern/diff/D40145698) [ghstack-poisoned]

This was referenced Oct 6, 2022

[11/N] Update all_to_all with CPU/CUDA implementations #86407

Closed

[12/N] Update scatter with CPU/CUDA implementations #86408

Closed

[13/N] Update gather with CPU/CUDA implementations #86409

Closed

Update on "[10/N] Update barrier with CPU/CUDA implementations"

6679593

### Changes - Updates for the barrier collective ### Context #86225 Differential Revision: [D40145698](https://our.internmc.facebook.com/intern/diff/D40145698) [ghstack-poisoned]

H-Huang mentioned this pull request Oct 17, 2022

Remove ProcessGroupRoundRobin #87088

Closed

kwen2501 approved these changes Oct 28, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 28, 2022

Update on "[10/N] Update barrier with CPU/CUDA implementations"

51f3b75

### Changes - Updates for the barrier collective ### Context #86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu [ghstack-poisoned]

pytorchmergebot added the Merged label Nov 1, 2022

pytorchmergebot closed this in bed8102 Nov 1, 2022

This was referenced Nov 2, 2022

Allow Process Group to support multiple backends #88330

Closed

[14/N] Refactor _new_process_group_helper() to remove repeated code #88351

Closed

facebook-github-bot deleted the gh/H-Huang/84/head branch June 8, 2023 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[10/N] Update barrier with CPU/CUDA implementations #86368

[10/N] Update barrier with CPU/CUDA implementations #86368

Uh oh!

H-Huang commented Oct 6, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 6, 2022 •

edited

Loading

Uh oh!

H-Huang commented Oct 6, 2022

Uh oh!

awgu Oct 6, 2022

Uh oh!

H-Huang Oct 6, 2022

Uh oh!

awgu Oct 6, 2022

Uh oh!

H-Huang commented Oct 7, 2022

Uh oh!

kwen2501 left a comment

Uh oh!

kwen2501 Oct 28, 2022

Uh oh!

H-Huang Nov 1, 2022

Uh oh!

kwen2501 Oct 28, 2022

Uh oh!

H-Huang Nov 1, 2022

Uh oh!

H-Huang commented Nov 1, 2022

Uh oh!

pytorchmergebot commented Nov 1, 2022

Uh oh!

Uh oh!

		@@ -1488,6 +1490,7 @@ def _test_collectives(self, backend):
		(dist.all_reduce,),

[10/N] Update barrier with CPU/CUDA implementations #86368

[10/N] Update barrier with CPU/CUDA implementations #86368

Uh oh!

Conversation

H-Huang commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Context

Uh oh!

pytorch-bot bot commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86368

✅ No Failures

Uh oh!

H-Huang commented Oct 6, 2022

Uh oh!

awgu Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

H-Huang Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

H-Huang commented Oct 7, 2022

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 28, 2022

Choose a reason for hiding this comment

Uh oh!

H-Huang Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 28, 2022

Choose a reason for hiding this comment

Uh oh!

H-Huang Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

H-Huang commented Nov 1, 2022

Uh oh!

pytorchmergebot commented Nov 1, 2022

Merge started

Uh oh!

Uh oh!

H-Huang commented Oct 6, 2022 •

edited

Loading

pytorch-bot bot commented Oct 6, 2022 •

edited

Loading