[compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

xmfan · 2024-08-16T22:48:24Z

Stack from ghstack (oldest at bottom):

-> [compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

FIXES #123949 #124376
torch.cuda.memory_allocated returns the amount of memory allocated in the current process, so if it isn't 0 it means another test didn't properly clean up after itself. I'm keeping the memory check and isolating these tests in subprocess as we don't have a good way to test for activation refcount

e.g. https://github.com/pytorch/pytorch/runs/28838386083

_______________ TestCompiledAutograd.test_free_activation_memory _______________
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/inductor/test_compiled_autograd.py", line 1892, in test_free_activation_memory
    self.assertTrue(torch.cuda.memory_allocated() == 0)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…d() != 0 [ghstack-poisoned]

pytorch-bot · 2024-08-16T22:48:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133733

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 8848099 with merge base e7b870c ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, amz2023.linux.2xlarge) (gh)
Process completed with exit code 2.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (similar failure)
dynamo/test_view.py::ViewTests::test_view_to_2d
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (similar failure)
dynamo/test_repros.py::ReproTests::test_addr_alpha_beta_out
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh) (similar failure)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin3_dynamic_shapes_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…d() != 0 ghstack-source-id: e135563 Pull Request resolved: #133733

xmfan · 2024-08-18T03:20:28Z

@pytorchbot -i merge

pytorch-bot · 2024-08-18T03:20:31Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -i

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

xmfan · 2024-08-18T03:20:38Z

@pytorchbot merge -i

pytorchmergebot · 2024-08-18T03:22:35Z

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, amz2023.linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocate…

8848099

…d() != 0 [ghstack-poisoned]

pytorch-bot bot added module: inductor topic: not user facing topic category labels Aug 16, 2024

xmfan added a commit that referenced this pull request Aug 16, 2024

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocate…

0d2221a

…d() != 0 ghstack-source-id: e135563 Pull Request resolved: #133733

xmfan requested review from eellison and jansel August 16, 2024 22:52

jansel approved these changes Aug 17, 2024

View reviewed changes

xmfan added module: dynamo release notes: dynamo and removed module: dynamo labels Aug 18, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 18, 2024

pytorchmergebot added the merging label Aug 18, 2024

pytorchmergebot added the Merged label Aug 18, 2024

pytorchmergebot closed this in d717df2 Aug 18, 2024

pytorchmergebot removed the merging label Aug 18, 2024

github-actions bot deleted the gh/xmfan/80/head branch September 23, 2024 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

Uh oh!

xmfan commented Aug 16, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 16, 2024 •

edited

Loading

Uh oh!

xmfan commented Aug 18, 2024

Uh oh!

pytorch-bot bot commented Aug 18, 2024

Uh oh!

xmfan commented Aug 18, 2024

Uh oh!

pytorchmergebot commented Aug 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

[compiled autograd] fix flaky tests due to torch.cuda.memory_allocated() != 0 #133733

Uh oh!

Conversation

xmfan commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133733

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

xmfan commented Aug 18, 2024

Uh oh!

pytorch-bot bot commented Aug 18, 2024

Uh oh!

xmfan commented Aug 18, 2024

Uh oh!

pytorchmergebot commented Aug 18, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xmfan commented Aug 16, 2024 •

edited

Loading

pytorch-bot bot commented Aug 16, 2024 •

edited

Loading