New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[profiler] Remove usage of onEachDevice from legacy profiler #54125
Conversation
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 82d4433 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI ghstack-source-id: 46680b804bc3a40e3cc3e7677b5608f12eb5d6ec Pull Request resolved: #54125
if (it != startEvents.end()) { | ||
e.setCudaUs(it->second->cudaElapsedUs(e)); | ||
} else { | ||
TORCH_WARN("Found a pop event without a corresponding push event"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be good to log the event to help debugging. Also, should we setCudaUs
to some reasonable value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will set setCudaUs
, also we don't have names in pop events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but RPC test is failing, can help take a look at that
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]
fixed RPC test issue |
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481) [ghstack-poisoned]
Summary: Fixes #48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI ghstack-source-id: 9a3444439bd0374456c9446133f0abfbdb15fc15 Pull Request resolved: #54125
Codecov Report
@@ Coverage Diff @@
## gh/ilia-cher/103/base #54125 +/- ##
=========================================================
- Coverage 77.47% 77.47% -0.01%
=========================================================
Files 1889 1889
Lines 184759 184732 -27
=========================================================
- Hits 143140 143117 -23
+ Misses 41619 41615 -4 |
@ilia-cher merged this pull request in 3b1e310. |
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
…is resolved" #54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved. To completely prevent the deadlock, we also make 2 further changes in the diff: 1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason) 2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device. Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock. Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#54125 Fixes pytorch#48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Reviewed By: rohan-varma Differential Revision: D27109481 Pulled By: ilia-cher fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c
Summary: Pull Request resolved: pytorch#54125 Fixes pytorch#48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt python test/test_profiler.py -v + CI Reviewed By: rohan-varma Differential Revision: D27109481 Pulled By: ilia-cher fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c
Stack from ghstack:
Summary:
Fixes #48987
Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v
python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v
Differential Revision: D27109481