Early terminate CUDA on common_utils TestCases #50914

walterddr · 2021-01-21T21:21:19Z

This is a follow up on #49869.

Previously CUDA early termination only happens for generic test classes that extends from DeviceTypeTestBase. However, JIT test cases which extends from common_utils.TestCase cannot benefit from the early termination.

This change moves the early termination logic into common_utils.TestCase class.

all tests extended from common_utils.TestCase now should early terminate if CUDA assert occurs.
For TestCases that extends from common_device_type.DeviceTypeTestBase, still only do torch.cuda.synchronize() when RTE is thrown.
For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as torch.cuda.is_initialize() returns true.
Disabling this on common_distributed.py

facebook-github-bot · 2021-01-21T21:21:35Z

💊 CI failures summary and remediations

As of commit ecaec43 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

codecov · 2021-01-22T01:30:13Z

Codecov Report

Merging #50914 (5ac57b4) into master (d5a2429) will increase coverage by 0.13%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master   #50914      +/-   ##
==========================================
+ Coverage   80.77%   80.90%   +0.13%     
==========================================
  Files        1952     1924      -28     
  Lines      213967   210016    -3951     
==========================================
- Hits       172827   169921    -2906     
+ Misses      41140    40095    -1045

facebook-github-bot

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

walterddr · 2021-01-22T15:36:55Z

This change shouldn't affect the behavior of device generic test cases. thus

PYTORCH_TEST_WITH_SLOW=1 python test/test_testing.py -k test_cuda_assert_should_stop -v

should still pass

test/test_testing.py

mruberry · 2021-01-25T09:35:30Z

For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as torch.cuda.is_available() returns true.

How much does this increase test time?

torch/testing/_internal/common_device_type.py

test/test_testing.py

mruberry

Overall looks like a smart generalization of the previous approach. I have a question and a few small inline comments about the draft.

walterddr · 2021-01-25T15:06:22Z

Thanks @mruberry for the suggestion. i will modify the comments.

I converted it back to a draft because I am still trying to avoid blindly running torch.cuda.synchronize() for any tests. I think python descriptor might be a good solution to dynamically determine when to run cuda sync. But if there's any better solution please kindly comment and let me know :-)

mruberry · 2021-01-25T15:08:05Z

Thanks @mruberry for the suggestion. i will modify the comments.

I converted it back to a draft because I am still trying to avoid blindly running torch.cuda.synchronize() for any tests. I think python descriptor might be a good solution to dynamically determine when to run cuda sync. But if there's any better solution please kindly comment and let me know :-)

I'm not sure there is a good solution short of converting the test suite to use the device generic test framework properly and inheriting the previous fix.

walterddr · 2021-01-25T15:19:33Z

I see. in this case I will do one final round of profiling to check the overhead in test time. and if it looks good I will enable this first then figure out how to make it more generic.
Since we are mostly on device generic test now. the only ones I saw is JIT and distributed. which both has their own common_*.py utility, it might be easier to alter JITCommonTestCase and MultiProcessTestCase to do smarter cuda sync

ngimel · 2021-01-26T04:29:18Z

You can use torch.cuda.is_initialized() instead of torch.cuda.is_available(), hopefully that won't always be true.

test/test_testing.py

facebook-github-bot

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-01-28T23:35:52Z

How much of a perf impact does this have on test builds with CUDA?

walterddr · 2021-01-28T23:56:48Z

based on a quick eyeball on circleCI, i would say < 2% on CI jobs comparing with master.

mruberry · 2021-01-29T00:32:41Z

based on a quick eyeball on circleCI, i would say < 2% on CI jobs comparing with master.

OK. The fix looks correct to me. Whether it's worth the perf impact is for @malfet to decide, though.

malfet · 2021-01-29T19:07:48Z

torch/testing/_internal/common_utils.py

Hmm, additional torch.cuda.synchronize() after every test would be quite expensive.
Can you measure the slowdown?
Also, have you checked, if exposing cudaGetLastError into a python runtime would achieve the same but will be much faster?

torch.cuda.synchronize, if nothing is running on the gpu, is ~1 us, pretty much like any other pytorch operation that was called by the test. cuda tests are expected to do their own synchronization before this call anyway, to test the results.

yeah. I actually ran some scuba queries to measure the slowdown (from this PR against he base master commit)
the result is not significant on cudnn7 tests CI jobs, and if comparing against master commits around that time, it's actually some times faster.

it would be handy to have @samestep 's test time reporting tool here

walterddr · 2021-02-03T16:09:59Z

Ping on this -- looks like latest master failure can be fixed by this PR - https://app.circleci.com/pipelines/github/pytorch/pytorch/268772/workflows/0d84bcf6-8228-4e94-825e-8420270b8409/jobs/10635515/tests#failed-test-0 (test_optim.py is not using device generic test case class)

I will try out Sam's #50171 and report the test time increase here

walterddr · 2021-02-08T21:32:16Z

rebased on Sam's reporting diff and the result is promising. (#51876)
I guess it follows with #49023 (comment)

All CUDA test should already be designed to have an implicit synchronization at the end, when CUDA tensors are copied to the host to be compared with CPU tensors or printed.

List of tests running on GPU machines had minimum impact (<2%):
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759657 (-20.47s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759659 (+3.63s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759660 (+20.97s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759661 (+7.18s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759662 (+0.12s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759663 (+10.35s)

Differences in the following are relatively large (2%-10%), but these tests also have relatively large variations. and the hosts actually doesn't have a GPU so it is most likely not related to this PR.
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759656 (-109.77s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759658 (+584.90s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759664 (+533.23s)
https://app.circleci.com/pipelines/github/pytorch/pytorch/270960/workflows/79c6c620-7789-4fcd-b25f-9c22b3f636d7/jobs/10759665 (+225.92s)

facebook-github-bot

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This could slow down non-GPU tests since it runs cuda synchronize when cuda is available, not is the target device

facebook-github-bot

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-11T01:31:42Z

This pull request has been reverted by 9f1f563.

walterddr · 2021-02-11T02:32:50Z

This change shouldn't affect the behavior of device generic test cases. thus
PYTORCH_TEST_WITH_SLOW=1 python test/test_testing.py -k test_cuda_assert_should_stop -v
should still pass

culprit test, failing one was named test_cuda_assert_should_not_stop

facebook-github-bot · 2021-02-12T01:47:06Z

@walterddr merged this pull request in c1b7ca8.

Summary: Take 2 of #50914 This change moves the early termination logic into common_utils.TestCase class. Pull Request resolved: #52126 Test Plan: CI with ci-all tag Reviewed By: malfet Differential Revision: D26391762 Pulled By: walterddr fbshipit-source-id: a149ecc47ccda7f2795e107fb95915506ae060b4

Summary: This is a follow up on pytorch#49869. Previously CUDA early termination only happens for generic test classes that extends from `DeviceTypeTestBase`. However, JIT test cases which extends from common_utils.TestCase cannot benefit from the early termination. This change moves the early termination logic into common_utils.TestCase class. - all tests extended from common_utils.TestCase now should early terminate if CUDA assert occurs. - For TestCases that extends from common_device_type.DeviceTypeTestBase, still only do torch.cuda.synchronize() when RTE is thrown. - For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as `torch.cuda.is_initialize()` returns true. - Disabling this on common_distributed.py Pull Request resolved: pytorch#50914 Reviewed By: malfet Differential Revision: D26019289 Pulled By: walterddr fbshipit-source-id: ddc7c1c0d00db4d073a6c8bc5b7733637a7e77d1

Summary: Take 2 of pytorch#50914 This change moves the early termination logic into common_utils.TestCase class. Pull Request resolved: pytorch#52126 Test Plan: CI with ci-all tag Reviewed By: malfet Differential Revision: D26391762 Pulled By: walterddr fbshipit-source-id: a149ecc47ccda7f2795e107fb95915506ae060b4

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 21, 2021

walterddr force-pushed the early_terminate_cuda_jit branch 2 times, most recently from f9fa8cd to cc7205b Compare January 21, 2021 22:02

walterddr marked this pull request as ready for review January 22, 2021 15:33

walterddr requested review from malfet, mruberry and ngimel January 22, 2021 15:33

facebook-github-bot reviewed Jan 22, 2021

View reviewed changes

walterddr force-pushed the early_terminate_cuda_jit branch 3 times, most recently from a749d9e to d9275ac Compare January 22, 2021 22:13

walterddr commented Jan 22, 2021

View reviewed changes

test/test_testing.py Outdated Show resolved Hide resolved

walterddr marked this pull request as draft January 23, 2021 15:02

walterddr force-pushed the early_terminate_cuda_jit branch from f673a9e to 612da61 Compare January 24, 2021 00:43

mruberry reviewed Jan 25, 2021

View reviewed changes

torch/testing/_internal/common_device_type.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 25, 2021

View reviewed changes

test/test_testing.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 25, 2021

View reviewed changes

walterddr force-pushed the early_terminate_cuda_jit branch from 5cda284 to 6bc9dc8 Compare January 26, 2021 17:26

walterddr commented Jan 27, 2021

View reviewed changes

test/test_testing.py Outdated Show resolved Hide resolved

facebook-github-bot reviewed Jan 28, 2021

View reviewed changes

walterddr marked this pull request as ready for review January 28, 2021 21:50

walterddr changed the title ~~Early terminate CUDA on common TestCases as well~~ Early terminate CUDA on common_utils TestCases Jan 28, 2021

malfet reviewed Jan 29, 2021

View reviewed changes

walterddr mentioned this pull request Feb 8, 2021

[DO NOT MERGE] test ci-all to measure early terminate cuda impact on test times #51876

Closed

malfet approved these changes Feb 8, 2021

View reviewed changes

facebook-github-bot reviewed Feb 8, 2021

View reviewed changes

walterddr force-pushed the early_terminate_cuda_jit branch from 5ac57b4 to fa7836d Compare February 8, 2021 23:10

facebook-github-bot reviewed Feb 8, 2021

View reviewed changes

Rong Rong and others added 5 commits February 9, 2021 10:58

moving early termination to common_utils

372d5f6

This could slow down non-GPU tests since it runs cuda synchronize when cuda is available, not is the target device

fix flake8

8299e2f

dont stop for common_distributed

4a1a4d8

fix tests and address diff comments

4374f52

use is_initialize() rather than is_available()

ecaec43

walterddr force-pushed the early_terminate_cuda_jit branch from fa7836d to ecaec43 Compare February 9, 2021 18:59

facebook-github-bot reviewed Feb 10, 2021

View reviewed changes

facebook-github-bot closed this in c1b7ca8 Feb 10, 2021

facebook-github-bot added the Reverted label Feb 11, 2021

walterddr mentioned this pull request Feb 11, 2021

[reland] Early terminate CUDA on common_utils TestCases #52126

Closed

facebook-github-bot added the Merged label Feb 12, 2021

Early terminate CUDA on common_utils TestCases #50914

Early terminate CUDA on common_utils TestCases #50914

Uh oh!

Conversation

walterddr commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

codecov bot commented Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

walterddr commented Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mruberry commented Jan 25, 2021

Uh oh!

Uh oh!

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

walterddr commented Jan 25, 2021

Uh oh!

mruberry commented Jan 25, 2021

Uh oh!

walterddr commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Jan 26, 2021

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented Jan 28, 2021

Uh oh!

walterddr commented Jan 28, 2021

Uh oh!

mruberry commented Jan 29, 2021

Uh oh!

malfet Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

walterddr Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

walterddr commented Feb 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

walterddr commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 11, 2021

Uh oh!

walterddr commented Feb 11, 2021

Uh oh!

facebook-github-bot commented Feb 12, 2021

Uh oh!

Reviewers

Assignees

walterddr commented Jan 21, 2021 •

edited

Loading

facebook-github-bot commented Jan 21, 2021 •

edited

Loading

codecov bot commented Jan 22, 2021 •

edited

Loading

walterddr commented Jan 22, 2021 •

edited

Loading

walterddr commented Jan 25, 2021 •

edited

Loading

walterddr commented Feb 3, 2021 •

edited

Loading

walterddr commented Feb 8, 2021 •

edited

Loading