Skip to content

Conversation

mruberry
Copy link
Collaborator

@mruberry mruberry commented Nov 1, 2018

This PR:

  • Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files
  • Updates c10d's usage of CUDAEvent to reflect the ATen API
  • Updates c10d's usage of streams to reflect the ATen API
  • Removes use of historic THCState in the touched c10d files
  • (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case.

@teng-li @pietern

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks @mruberry!

@teng-li @ezyang Can you check this out as well for double/triple check?

device_index_ = stream.device_index();
}

AT_CUDA_CHECK(cudaEventRecord(event_, stream));

This comment was marked as off-topic.

This comment was marked as off-topic.

cudaStreamWaitEvent(ncclStream.stream(), ncclEvent.getEvent(), 0));
at::cuda::CUDAEvent& ncclEvent = ncclEvents[i];
ncclEvent.record(at::cuda::getCurrentCUDAStream(devices[i].index()));
ncclEvent.block(ncclStream);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@pietern pietern added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Nov 1, 2018
@mruberry
Copy link
Collaborator Author

mruberry commented Nov 1, 2018

Python lint failure is not related to this PR (that file was not even touched).

Other failures are real. Diagnosing.

@mruberry
Copy link
Collaborator Author

mruberry commented Nov 2, 2018

Final commit clarifies device setting requirements for cudaEventQuery and removes excessive device guards. @pietern I think we may now be in a good place?

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nit.

@teng-li Can you do a pass here as well? Especially the changes in ProcessGroupNCCL.


void block (const CUDAStream& stream) {
// Note: cudaStreamWaitEvent must be called on the same device as the STREAM
// The event has no actual GPU resources associated with it

This comment was marked as off-topic.


// Note: cudaEventRecord must be called on the same device as the STREAM
void record(const CUDAStream& stream) {
at::cuda::CUDAGuard device_index_guard(static_cast<int16_t>(stream.device_index()));

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 6, 2018
Summary:
This PR:

- Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files
- Updates c10d's usage of CUDAEvent to reflect the ATen API
- Updates c10d's usage of streams to reflect the ATen API
- Removes use of historic THCState in the touched c10d files
- (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case.

The controller you requested could not be found. pietern
Pull Request resolved: pytorch/pytorch#13464

Reviewed By: teng-li

Differential Revision: D12924291

Pulled By: pietern

fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0
@mruberry mruberry deleted the cuda_event_merge branch March 16, 2019 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants