-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Use MPI_Isend/MPI_Irecv to back send/recv #11630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Both the Python tests and the MPI tests in torch/lib/c10d/test pass locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, the rest LGTM
torch/lib/c10d/ProcessGroupMPI.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupMPI.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupMPI.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
The isCompleted function is changed to being non-const to accomodate setting some internal status on the work object in the case of completion. Previously, it was only checking a member field, but for the MPI backend it calls MPI_Test to poll for completion of an asynchronous request.
7092879 to
de50a59
Compare
|
Thanks. Addressed comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Flaky test for CPU only distributed tests where it should skip is a KP. |
| std::array<char, MPI_MAX_ERROR_STRING> buf; | ||
| int len = buf.size(); | ||
| MPI_CHECK(MPI_Error_string(status_.MPI_ERROR, buf.data(), &len)); | ||
| return std::runtime_error(std::string(buf.data(), len)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
The isCompleted function is changed to being non-const to accomodate
setting some internal status on the work object in the case of
completion. Previously, it was only checking a member field, but for the
MPI backend it calls MPI_Test to poll for completion of an asynchronous
request.