Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10/N] Update barrier with CPU/CUDA implementations #86368

Closed
wants to merge 4 commits into from

Conversation

H-Huang
Copy link
Member

@H-Huang H-Huang commented Oct 6, 2022

Stack from ghstack:

Changes

  • Updates for the barrier collective
  • NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from

Context

#86225

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86368

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 51f3b75:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Oct 6, 2022
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Oct 6, 2022
H-Huang added a commit that referenced this pull request Oct 6, 2022
ghstack-source-id: e9d93b37f71e7a3d6a73d234b333e63e1cb00b17
Pull Request resolved: #86368
@H-Huang
Copy link
Member Author

H-Huang commented Oct 6, 2022

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -286,6 +302,15 @@ TORCH_LIBRARY_IMPL(c10d, CPU, m) {
TORCH_LIBRARY_IMPL(c10d, CUDA, m) {
m.impl("reduce_scatter_", reduce_scatter_cuda_);
}

TORCH_LIBRARY_IMPL(c10d, CPU, m) {
m.impl("barrier", barrier_cpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you decide if there should be a trailing underscore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the convention for PT operators (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml) is if the tensor is modified inplace, then operator should be appended with _. We don't do this for barrier and send

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Professor Huang!

### Changes
- Updates for the barrier collective

### Context
#86225

Differential Revision: [D40145698](https://our.internmc.facebook.com/intern/diff/D40145698)

[ghstack-poisoned]
### Changes
- Updates for the barrier collective

### Context
#86225

Differential Revision: [D40145698](https://our.internmc.facebook.com/intern/diff/D40145698)

[ghstack-poisoned]
@H-Huang
Copy link
Member Author

H-Huang commented Oct 7, 2022

@H-Huang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is okay to land current changes since they lead to the same behavior.
But we should note that current change would not achieve dispatching of barrier.
Also added some minor comments about the test. (Sorry I missed them previously; can be improved in a future PR)

@@ -1488,6 +1490,7 @@ def _test_collectives(self, backend):
(dist.all_reduce,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I may have missed this earlier. Would passing self.rank to reduce and broadcast cause each rank identifying a different root and hang? They should have the same root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing self.rank causes the broadcast operation to be sourced from that rank, but it does not hang waiting for other ranks to ACK the broadcast it sent. The same logic applies to reduce so thats why I believe it is not hanging

if collective == dist.all_gather:
if collective == dist.barrier:
collective()
elif collective == dist.all_gather:
collective([tensor], tensor, *args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. nit: this if-else block is getting bigger (as a result of wrapping/templating).
    Maybe the test would be easier to read if we just write out each test call in _test_collectives, like:
dist.barrier()
dist.all_reduce(tensor)
...
  1. Is collective([tensor], tensor, *args) a correct format for all_gather? i.e. it will have a list of only one tensor for the output. Or are we testing the dispatching functionality with WORLD_SIZE=1 here? (if so the code makes sense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Makes sense, the if statement is getting a bit messy, I will change it in future PRs.
  2. We are just testing dispatching functionality and making sure the operation is callable. So handling the latter case, since collective correctness should be handled by other tests.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 28, 2022
@H-Huang
Copy link
Member Author

H-Huang commented Nov 1, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
### Changes
- Updates for the barrier collective
- NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from

### Context
pytorch#86225

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu
Pull Request resolved: pytorch#86368
Approved by: https://github.com/kwen2501
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
### Changes
- Updates for the barrier collective
- NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from

### Context
pytorch#86225

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu
Pull Request resolved: pytorch#86368
Approved by: https://github.com/kwen2501
@facebook-github-bot facebook-github-bot deleted the gh/H-Huang/84/head branch June 8, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants