New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NCCL] create NCCL communicator for send/recv on demand #44922
Conversation
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) [ghstack-poisoned]
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) ghstack-source-id: 112338493 Pull Request resolved: #44922
💊 CI failures summary and remediationsAs of commit 50d40ea (more details on the Dr. CI page): Commit 50d40ea was recently pushed. Waiting for builds... This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 18 times. |
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) [ghstack-poisoned]
Pull Request resolved: #44922 For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. ghstack-source-id: 112511958 Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Although, I just realized that we should add a test to validate what happens when we send to self? For example, does NCCL support sending to rank 0 from rank 0? If so, we should add a test to validate our APIs support this correctly. If NCCL doesn't support this, we should add tests validating that our APIs raise an appropriate error.
NCCL supports sending to self. Let me add a test for that. |
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) [ghstack-poisoned]
Pull Request resolved: #44922 For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. ghstack-source-id: 112836264 Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) [ghstack-poisoned]
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/) [ghstack-poisoned]
Pull Request resolved: #44922 For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations. ghstack-source-id: 113592757 Differential Revision: [D23773726](https://our.internmc.facebook.com/intern/diff/D23773726/)
This pull request has been merged in 10d86d1. |
Stack from ghstack:
For NCCL send/recv operations, we will create NCCL communicator on demand following the same design as how it's currently done for collective operations.
Differential Revision: D23773726