Skip to content

Conversation

pietern
Copy link
Contributor

@pietern pietern commented Sep 13, 2018

The isCompleted function is changed to being non-const to accomodate
setting some internal status on the work object in the case of
completion. Previously, it was only checking a member field, but for the
MPI backend it calls MPI_Test to poll for completion of an asynchronous
request.

@pietern pietern requested a review from teng-li September 13, 2018 06:20
@pietern pietern requested a review from apaszke as a code owner September 13, 2018 06:20
@pietern pietern added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 13, 2018
@pietern
Copy link
Contributor Author

pietern commented Sep 13, 2018

Both the Python tests and the MPI tests in torch/lib/c10d/test pass locally.

Copy link
Contributor

@teng-li teng-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, the rest LGTM

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

The isCompleted function is changed to being non-const to accomodate
setting some internal status on the work object in the case of
completion. Previously, it was only checking a member field, but for the
MPI backend it calls MPI_Test to poll for completion of an asynchronous
request.
@pietern pietern force-pushed the c10d-mpi-isend-irecv branch from 7092879 to de50a59 Compare September 13, 2018 07:47
@pietern
Copy link
Contributor Author

pietern commented Sep 13, 2018

Thanks. Addressed comments.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pietern
Copy link
Contributor Author

pietern commented Sep 13, 2018

Flaky test for CPU only distributed tests where it should skip is a KP.

@pietern pietern deleted the c10d-mpi-isend-irecv branch September 13, 2018 23:29
std::array<char, MPI_MAX_ERROR_STRING> buf;
int len = buf.size();
MPI_CHECK(MPI_Error_string(status_.MPI_ERROR, buf.data(), &len));
return std::runtime_error(std::string(buf.data(), len));

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants