[profiler] Remove usage of onEachDevice from legacy profiler #54125

ilia-cher · 2021-03-17T04:56:15Z

Stack from ghstack:

[profiler] Remove usage of onEachDevice from legacy profiler #54125 Remove usage of onEachDevice from legacy profiler

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

CI

Differential Revision: D27109481

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI [ghstack-poisoned]

facebook-github-bot · 2021-03-17T04:56:22Z

💊 CI failures summary and remediations

As of commit 82d4433 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI ghstack-source-id: 46680b804bc3a40e3cc3e7677b5608f12eb5d6ec Pull Request resolved: #54125

rohan-varma · 2021-03-17T17:43:14Z

torch/csrc/distributed/rpc/utils.cpp

+          if (it != startEvents.end()) {
+            e.setCudaUs(it->second->cudaElapsedUs(e));
+          } else {
+            TORCH_WARN("Found a pop event without a corresponding push event");


nit: might be good to log the event to help debugging. Also, should we setCudaUs to some reasonable value here?

will set setCudaUs, also we don't have names in pop events

rohan-varma

LGTM, but RPC test is failing, can help take a look at that

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]

ilia-cher · 2021-03-18T04:40:30Z

fixed RPC test issue

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]

Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI ghstack-source-id: 9a3444439bd0374456c9446133f0abfbdb15fc15 Pull Request resolved: #54125

codecov · 2021-03-18T11:21:56Z

Codecov Report

Merging #54125 (82d4433) into gh/ilia-cher/103/base (2d8795c) will decrease coverage by 0.00%.
The diff coverage is 60.00%.

@@                    Coverage Diff                    @@
##           gh/ilia-cher/103/base   #54125      +/-   ##
=========================================================
- Coverage                  77.47%   77.47%   -0.01%     
=========================================================
  Files                       1889     1889              
  Lines                     184759   184732      -27     
=========================================================
- Hits                      143140   143117      -23     
+ Misses                     41619    41615       -4

facebook-github-bot · 2021-03-18T19:21:47Z

@ilia-cher merged this pull request in 3b1e310.

@ilia-cher

#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]

@ilia-cher

#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]

@ilia-cher

…is resolved" #54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]

@ilia-cher

#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch#54125 Fixes pytorch#48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Reviewed By: rohan-varma Differential Revision: D27109481 Pulled By: ilia-cher fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c

ilia-cher requested review from albanD, H-Huang, mingzhe09088, mrshenli, pritamdamania87, rohan-varma, soulitzer and zhaojuanmao as code owners March 17, 2021 04:56

facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Mar 17, 2021

ilia-cher requested review from ngimel and bitfort March 17, 2021 04:56

rohan-varma reviewed Mar 17, 2021

View reviewed changes

rohan-varma approved these changes Mar 17, 2021

View reviewed changes

facebook-github-bot closed this in 3b1e310 Mar 18, 2021

facebook-github-bot added the Merged label Mar 18, 2021

facebook-github-bot deleted the gh/ilia-cher/103/head branch March 22, 2021 14:16

rohan-varma mentioned this pull request Apr 2, 2021

[Dist profiling] Fix ProcessGroupNCCL collective profiling #55204

Closed

ilia-cher changed the title ~~Remove usage of onEachDevice from legacy profiler~~ [profiler] Remove usage of onEachDevice from legacy profiler May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[profiler] Remove usage of onEachDevice from legacy profiler #54125

[profiler] Remove usage of onEachDevice from legacy profiler #54125

ilia-cher commented Mar 17, 2021 •

edited

facebook-github-bot commented Mar 17, 2021 •

edited

rohan-varma Mar 17, 2021

ilia-cher Mar 18, 2021

rohan-varma left a comment

ilia-cher commented Mar 18, 2021

codecov bot commented Mar 18, 2021

facebook-github-bot commented Mar 18, 2021

[profiler] Remove usage of onEachDevice from legacy profiler #54125

[profiler] Remove usage of onEachDevice from legacy profiler #54125

Conversation

ilia-cher commented Mar 17, 2021 • edited

facebook-github-bot commented Mar 17, 2021 • edited

💊 CI failures summary and remediations

rohan-varma Mar 17, 2021

Choose a reason for hiding this comment

ilia-cher Mar 18, 2021

Choose a reason for hiding this comment

rohan-varma left a comment

Choose a reason for hiding this comment

ilia-cher commented Mar 18, 2021

codecov bot commented Mar 18, 2021

Codecov Report

facebook-github-bot commented Mar 18, 2021

ilia-cher commented Mar 17, 2021 •

edited

facebook-github-bot commented Mar 17, 2021 •

edited