Skip to content

Commit

Permalink
Update on "[Dist profiling] Add test to ensure #48987 is resolved"
Browse files Browse the repository at this point in the history
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved.

To completely prevent the deadlock, we also make 2 further changes in the diff:
1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason)
2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device.

Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock.

Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/)

[ghstack-poisoned]
  • Loading branch information
rohan-varma committed Apr 2, 2021
1 parent 7192357 commit 9d36ce3
Showing 1 changed file with 0 additions and 2 deletions.
2 changes: 0 additions & 2 deletions torch/testing/_internal/distributed/distributed_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1513,8 +1513,6 @@ def call_dist_op(
if is_async:
for work in works:
work.wait()
t = time.time() - start
print(f"took {t} seconds")

def get_event(postfix):
return [event for event in prof.function_events if event.name.endswith(postfix)]
Expand Down

0 comments on commit 9d36ce3

Please sign in to comment.