Replaces c10d's CUDAEvent with ATen's #13464

mruberry · 2018-11-01T18:44:58Z

This PR:

Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files
Updates c10d's usage of CUDAEvent to reflect the ATen API
Updates c10d's usage of streams to reflect the ATen API
Removes use of historic THCState in the touched c10d files
(EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case.

@teng-li @pietern

…_event_merge

pietern

Great! Thanks @mruberry!

@teng-li @ezyang Can you check this out as well for double/triple check?

aten/src/ATen/cuda/CUDAEvent.h

+      device_index_ = stream.device_index();
    }

    AT_CUDA_CHECK(cudaEventRecord(event_, stream));


torch/lib/c10d/ProcessGroupNCCL.cpp

-        cudaStreamWaitEvent(ncclStream.stream(), ncclEvent.getEvent(), 0));
+    at::cuda::CUDAEvent& ncclEvent = ncclEvents[i];
+    ncclEvent.record(at::cuda::getCurrentCUDAStream(devices[i].index()));
+    ncclEvent.block(ncclStream);


torch/lib/c10d/ProcessGroupNCCL.cpp

mruberry · 2018-11-01T19:52:25Z

Python lint failure is not related to this PR (that file was not even touched).

Other failures are real. Diagnosing.

…_event_merge

mruberry · 2018-11-02T19:49:24Z

Final commit clarifies device setting requirements for cudaEventQuery and removes excessive device guards. @pietern I think we may now be in a good place?

pietern

Final nit.

@teng-li Can you do a pass here as well? Especially the changes in ProcessGroupNCCL.

aten/src/ATen/cuda/CUDAEvent.h


-  void block (const CUDAStream& stream) {
+  // Note: cudaStreamWaitEvent must be called on the same device as the STREAM
+  //  The event has no actual GPU resources associated with it


aten/src/ATen/cuda/CUDAEvent.h


+  // Note: cudaEventRecord must be called on the same device as the STREAM
  void record(const CUDAStream& stream) {
+    at::cuda::CUDAGuard device_index_guard(static_cast<int16_t>(stream.device_index()));


facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR: - Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files - Updates c10d's usage of CUDAEvent to reflect the ATen API - Updates c10d's usage of streams to reflect the ATen API - Removes use of historic THCState in the touched c10d files - (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case. The controller you requested could not be found. pietern Pull Request resolved: pytorch/pytorch#13464 Reviewed By: teng-li Differential Revision: D12924291 Pulled By: pietern fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0

mruberry added 6 commits October 31, 2018 14:01

Removes c10d::CUDAEvent in favor of at::cuda::CUDAEvent

6f37bc3

Merging with master

8e63bc3

Merging with master

80d1f22

Merge branch 'master' of https://github.com/pytorch/pytorch into cuda…

a57a88c

…_event_merge

Final updates

53a2bd4

Cleanup

e420b56

mruberry requested review from apaszke, pietern and teng-li as code owners November 1, 2018 18:44

pietern reviewed Nov 1, 2018

View reviewed changes

pietern added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Nov 1, 2018

mruberry added 3 commits November 1, 2018 16:39

Fixes CUDAEvent bug

32fd56d

Merge branch 'master' of https://github.com/pytorch/pytorch into cuda…

86d086d

…_event_merge

Removes excessive device setting

f17fdef

pietern reviewed Nov 2, 2018

View reviewed changes

aten/src/ATen/cuda/CUDAEvent.h

void block (const CUDAStream& stream) {

// Note: cudaStreamWaitEvent must be called on the same device as the STREAM

// The event has no actual GPU resources associated with it

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Nov 5, 2018

View reviewed changes

aten/src/ATen/cuda/CUDAEvent.h

// Note: cudaEventRecord must be called on the same device as the STREAM

void record(const CUDAStream& stream) {

at::cuda::CUDAGuard device_index_guard(static_cast<int16_t>(stream.device_index()));

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Nov 5, 2018

View reviewed changes

facebook-github-bot reviewed Nov 5, 2018

View reviewed changes

facebook-github-bot closed this in a340dce Nov 6, 2018

This was referenced Nov 10, 2018

Deduplicate at::cuda::CUDAEvent and c10d::CUDAEvent #13184

Closed

Use ATen CUDA event/stream wrappers in c10d #11912

Closed

mruberry deleted the cuda_event_merge branch March 16, 2019 04:43

ezyang added open source merged labels Jun 24, 2019

Replaces c10d's CUDAEvent with ATen's #13464

Replaces c10d's CUDAEvent with ATen's #13464

Uh oh!

Conversation

mruberry commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Uh oh!

mruberry commented Nov 1, 2018

Uh oh!

mruberry commented Nov 2, 2018

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mruberry commented Nov 1, 2018 •

edited

Loading