Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NCCL] create NCCL communicator for send/recv on demand #44922

Closed
wants to merge 5 commits into from

Conversation

mingzhe09088
Copy link
Contributor

@mingzhe09088 mingzhe09088 commented Sep 18, 2020

Stack from ghstack:

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: D23773726

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

[ghstack-poisoned]
mingzhe09088 pushed a commit that referenced this pull request Sep 18, 2020
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

ghstack-source-id: 112338493
Pull Request resolved: #44922
@dr-ci
Copy link

dr-ci bot commented Sep 18, 2020

💊 CI failures summary and remediations

As of commit 50d40ea (more details on the Dr. CI page):


Commit 50d40ea was recently pushed. Waiting for builds...


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 18 times.

torch/lib/c10d/ProcessGroupNCCL.hpp Outdated Show resolved Hide resolved
torch/lib/c10d/ProcessGroupNCCL.cpp Outdated Show resolved Hide resolved
torch/lib/c10d/ProcessGroupNCCL.cpp Outdated Show resolved Hide resolved
torch/lib/c10d/ProcessGroupNCCL.cpp Outdated Show resolved Hide resolved
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

[ghstack-poisoned]
mingzhe09088 pushed a commit that referenced this pull request Sep 21, 2020
Pull Request resolved: #44922

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.
ghstack-source-id: 112511958

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
Copy link
Contributor

@pritamdamania87 pritamdamania87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Although, I just realized that we should add a test to validate what happens when we send to self? For example, does NCCL support sending to rank 0 from rank 0? If so, we should add a test to validate our APIs support this correctly. If NCCL doesn't support this, we should add tests validating that our APIs raise an appropriate error.

@mingzhe09088
Copy link
Contributor Author

NCCL supports sending to self. Let me add a test for that.

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

[ghstack-poisoned]
mingzhe09088 pushed a commit that referenced this pull request Sep 24, 2020
Pull Request resolved: #44922

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.
ghstack-source-id: 112836264

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

[ghstack-poisoned]
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)

[ghstack-poisoned]
mingzhe09088 pushed a commit that referenced this pull request Oct 5, 2020
Pull Request resolved: #44922

For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.
ghstack-source-id: 113592757

Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 10d86d1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants