-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Replaces c10d's CUDAEvent with ATen's #13464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_index_ = stream.device_index(); | ||
} | ||
|
||
AT_CUDA_CHECK(cudaEventRecord(event_, stream)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
cudaStreamWaitEvent(ncclStream.stream(), ncclEvent.getEvent(), 0)); | ||
at::cuda::CUDAEvent& ncclEvent = ncclEvents[i]; | ||
ncclEvent.record(at::cuda::getCurrentCUDAStream(devices[i].index())); | ||
ncclEvent.block(ncclStream); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Python lint failure is not related to this PR (that file was not even touched). Other failures are real. Diagnosing. |
Final commit clarifies device setting requirements for cudaEventQuery and removes excessive device guards. @pietern I think we may now be in a good place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nit.
@teng-li Can you do a pass here as well? Especially the changes in ProcessGroupNCCL.
|
||
void block (const CUDAStream& stream) { | ||
// Note: cudaStreamWaitEvent must be called on the same device as the STREAM | ||
// The event has no actual GPU resources associated with it |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
||
// Note: cudaEventRecord must be called on the same device as the STREAM | ||
void record(const CUDAStream& stream) { | ||
at::cuda::CUDAGuard device_index_guard(static_cast<int16_t>(stream.device_index())); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This PR: - Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files - Updates c10d's usage of CUDAEvent to reflect the ATen API - Updates c10d's usage of streams to reflect the ATen API - Removes use of historic THCState in the touched c10d files - (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case. The controller you requested could not be found. pietern Pull Request resolved: pytorch/pytorch#13464 Reviewed By: teng-li Differential Revision: D12924291 Pulled By: pietern fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0
This PR:
@teng-li @pietern