Update on "[Dist profiling] Add test to ensure #48987 is resolved"

#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
pytorch · Apr 2, 2021 · 9d36ce3 · 9d36ce3
1 parent 7192357
commit 9d36ce3
Showing 1 changed file with 0 additions and 2 deletions.
diff --git a/torch/testing/_internal/distributed/distributed_test.py b/torch/testing/_internal/distributed/distributed_test.py
@@ -1513,8 +1513,6 @@ def call_dist_op(
                 if is_async:
                     for work in works:
                         work.wait()
-                t = time.time() - start
-                print(f"took {t} seconds")
 
             def get_event(postfix):
                 return [event for event in prof.function_events if event.name.endswith(postfix)]