Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update on "[Dist profiling] Add test to ensure #48987 is resolved"
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
- Loading branch information