-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add allgather_base as per our discussion re: ProcessGroup interface. #31892
Conversation
This pull request was exported from Phabricator. Differential Revision: D19290739 |
0e91a55
to
59263ef
Compare
59263ef
to
45252ca
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
45252ca
to
9e7d7ac
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
💊 CircleCI build failures summary and remediationsAs of commit 287e6a1: None of the build failures appear to be your fault.
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. 🚧 1 upstream failure recognized by patterns:These builds matched patterns, but were probably caused by upstream breakages:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 24 times. |
9e7d7ac
to
086bfb5
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
torch/lib/c10d/ProcessGroup.hpp
Outdated
@@ -128,6 +128,15 @@ class ProcessGroup { | |||
std::vector<at::Tensor>& inputTensors, | |||
const AllgatherOptions& opts = AllgatherOptions()) = 0; | |||
|
|||
// Q: Should we move it to protected or annotate with _ since it can be | |||
// confusing to use and is intended for internal use only? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could expose it for power users. I think it's fine to leave public. Btw, didn't we call it single
before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we agreed on _base during our meeting with @mrshenli
086bfb5
to
d586da2
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
d586da2
to
34d7311
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli We need to remove the coalesced variant first as well. |
Try to understand the minimum requirements to unblock #28068
|
Discussed with @mrshenli: If we just remove allgather_coalesced it'll break python API right now. How about we leave it and add comments not to touch it: "deprecated. do not use it. do not implement it". Let's do similar exercise with allreduce_coalesced in a follow up? |
34d7311
to
940e74f
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
940e74f
to
113cf3d
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
Fixed broken build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure is benign. Good to go. Thanks, @agolynski!
…ytorch#31892) Summary: Pull Request resolved: pytorch#31892 Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up. Test Plan: No functional changes, no tests yet. Differential Revision: D19290739 fbshipit-source-id: ec91720ef4f1f6aa445836d8153324ae769cdfd1
113cf3d
to
d321a4e
Compare
This pull request was exported from Phabricator. Differential Revision: D19290739 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D19290739 |
d321a4e
to
287e6a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agolynski has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@agolynski merged this pull request in 74621ca. |
…ytorch#31892) Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up. Pull Request resolved: pytorch#31892 Test Plan: No functional changes, no tests yet. Differential Revision: D19290739 Pulled By: agolynski fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
…ytorch#31892) Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up. Pull Request resolved: pytorch#31892 Test Plan: No functional changes, no tests yet. Differential Revision: D19290739 Pulled By: agolynski fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.
Test Plan: No functional changes, no tests yet.
Differential Revision: D19290739