Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[profiler] Remove usage of onEachDevice from legacy profiler #54125

Closed
wants to merge 6 commits into from

Conversation

ilia-cher
Copy link
Contributor

@ilia-cher ilia-cher commented Mar 17, 2021

Stack from ghstack:

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

  • CI

Differential Revision: D27109481

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 17, 2021

💊 CI failures summary and remediations

As of commit 82d4433 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481)

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Mar 17, 2021
Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

ghstack-source-id: 46680b804bc3a40e3cc3e7677b5608f12eb5d6ec
Pull Request resolved: #54125
if (it != startEvents.end()) {
e.setCudaUs(it->second->cudaElapsedUs(e));
} else {
TORCH_WARN("Found a pop event without a corresponding push event");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be good to log the event to help debugging. Also, should we setCudaUs to some reasonable value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will set setCudaUs, also we don't have names in pop events

Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but RPC test is failing, can help take a look at that

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481)

[ghstack-poisoned]
Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481)

[ghstack-poisoned]
@ilia-cher
Copy link
Contributor Author

fixed RPC test issue

Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481)

[ghstack-poisoned]
Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Differential Revision: [D27109481](https://our.internmc.facebook.com/intern/diff/D27109481)

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Mar 18, 2021
Summary:
Fixes #48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

ghstack-source-id: 9a3444439bd0374456c9446133f0abfbdb15fc15
Pull Request resolved: #54125
@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #54125 (82d4433) into gh/ilia-cher/103/base (2d8795c) will decrease coverage by 0.00%.
The diff coverage is 60.00%.

@@                    Coverage Diff                    @@
##           gh/ilia-cher/103/base   #54125      +/-   ##
=========================================================
- Coverage                  77.47%   77.47%   -0.01%     
=========================================================
  Files                       1889     1889              
  Lines                     184759   184732      -27     
=========================================================
- Hits                      143140   143117      -23     
+ Misses                     41619    41615       -4     

@facebook-github-bot
Copy link
Contributor

@ilia-cher merged this pull request in 3b1e310.

@facebook-github-bot facebook-github-bot deleted the gh/ilia-cher/103/head branch March 22, 2021 14:16
rohan-varma added a commit that referenced this pull request Apr 2, 2021
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved.

To completely prevent the deadlock, we also make 2 further changes in the diff:
1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason)
2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device.

Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock.

Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Apr 2, 2021
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved.

To completely prevent the deadlock, we also make 2 further changes in the diff:
1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason)
2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device.

Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock.

Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Apr 9, 2021
…is resolved"


#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved.

To completely prevent the deadlock, we also make 2 further changes in the diff:
1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason)
2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device.

Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock.

Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Apr 9, 2021
#54125 removed cuda event creation on each device from profiler, so #48987 should be resolved.

To completely prevent the deadlock, we also make 2 further changes in the diff:
1) In LegacyProfiler, pass record_cuda=false for __stop_profile mark event because it is not a user op, and was resulting in cuda event being recorded on the wrong device (due to below reason)
2) Add torch.cuda.set_device() in tests to ensure that calls to `cudaGetDevice` in profiler don't return the wrong device.

Note that using the profiler with `use_cuda=True` to profile distributed collectives isn't really an intended use case however, see the discussion in #52246. The profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with @ilia-cher, but we should still resolve this deadlock.

Differential Revision: [D27491711](https://our.internmc.facebook.com/intern/diff/D27491711/)

[ghstack-poisoned]
@ilia-cher ilia-cher changed the title Remove usage of onEachDevice from legacy profiler [profiler] Remove usage of onEachDevice from legacy profiler May 25, 2021
wubai pushed a commit to wubai/pytorch that referenced this pull request Aug 7, 2023
Summary:
Pull Request resolved: pytorch#54125

Fixes pytorch#48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Reviewed By: rohan-varma

Differential Revision: D27109481

Pulled By: ilia-cher

fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c
wubai pushed a commit to wubai/pytorch that referenced this pull request Aug 8, 2023
Summary:
Pull Request resolved: pytorch#54125

Fixes pytorch#48987

Test Plan:
python setup.py clean
TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

python setup.py clean
USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 | tee ~/output.txt
python test/test_profiler.py -v

+ CI

Reviewed By: rohan-varma

Differential Revision: D27109481

Pulled By: ilia-cher

fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants