make autocast cache global instead of thread-local #86492

vkuzo · 2022-10-07T20:50:53Z

Stack from ghstack (oldest at bottom):

-> make autocast cache global instead of thread-local #86492

Summary:

There is a memory leak because torch.clear_autocast_cache() clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:

python test/test_cuda.py -k autocast
python test/test_autocast.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. TODO full writeup before review Test Plan: I don't have a local repro yet, so need to verify that this is doing what I think it's doing before next steps. How - TBD, for now let's get a CI run. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2022-10-07T20:50:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86492

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 8352e85:

The following jobs have failed:

Check labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. TODO full writeup before review Test Plan: I don't have a local repro yet, so need to verify that this is doing what I think it's doing before next steps. How - TBD, for now let's get a CI run. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c40421a01c4c07aaaccd0055e594ea9db7f74d30 Pull Request resolved: #86492

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fbe1c5397a0b5b61f41541bb5a69d405e6fb1d40 Pull Request resolved: #86492

ezyang · 2022-10-24T23:55:25Z

Cc @eellison

Test?

ezyang · 2022-10-24T23:55:57Z

I will approve after auditing all uses of variable on desktop

ezyang

needs test

vkuzo · 2022-10-25T17:54:06Z

re: testing -

any tips on how to get a local repro of the original memory leak? I couldn't figure it out how to do it, yet.
without a local repro, I guess we could expose a binding to query the number of elements in the cache, and test that the number is reasonable - would that be valuable enough to justify the binding? IMO probably not, but I'm flexible, wdyt

ezyang · 2022-10-25T22:44:15Z

@eellison did you have a repro handy?

eellison · 2022-10-25T22:50:29Z

#86059 this test and removing torch.autograd.set_multithreading_enabled here should repro https://github.com/pytorch/pytorch/blob/master/test/test_ops.py#L2006

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4ab8e33f24991918ee4bb00607bebc8accd972ec Pull Request resolved: #86492

eellison · 2022-10-25T23:40:55Z

That should repro the cache failure - but sorry we actually need torch.autograd.set_multithreading_enabled in that test for other reasons. If you just run that same test without fake tensor that would give you a test.

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 718c3b14ed345bd3862762446010dd4f39f57ffa Pull Request resolved: #86492

ezyang · 2022-10-28T19:27:11Z

@eellison fwiw I still think you should use tls

vkuzo · 2022-10-28T19:32:38Z

I still don't have a local repro, so resorted to just verifying the the cache is in fact global. Lmk if someone has a better test idea.

eellison · 2022-10-28T19:36:29Z

fwiw I still think you should use tls

yes i agree - just somewhat low pri at the moment

github-actions · 2022-10-28T19:41:10Z

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

vkuzo · 2022-10-31T14:45:02Z

@pytorchbot merge

pytorchmergebot · 2022-10-31T14:46:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-10-31T14:46:39Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Lint

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

vkuzo · 2022-10-31T16:09:21Z

@pytorchbot help

pytorch-bot · 2022-10-31T16:09:23Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label')

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

Try @pytorchbot --help for more info.

vkuzo · 2022-10-31T16:09:37Z

@pytorchbot --help

pytorch-bot · 2022-10-31T16:09:40Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR

Merge

usage: @pytorchbot merge [-g | -f MESSAGE | -l] [-r [{viable/strict,master}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -g, --green           Merge when all status checks running on the PR pass. To add status checks, use labels like `ciflow/trunk`.
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
  -l, --land-checks     [Deprecated - your PR instead now gets the `ciflow/trunk` label on approval] Merge with land time checks. This will create a new branch with your changes rebased on viable/strict and run a majority of trunk tests _before_ landing to increase trunk reliability and decrease risk of revert. The tests added are: pull, Lint and trunk. Note that periodic is excluded.
  -r [{viable/strict,master}], --rebase [{viable/strict,master}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or master as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
You, along with any member of the pytorch organization, can rebase your PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

vkuzo · 2022-10-31T16:10:16Z

@pytorchbot merge -f "added labels to fix label check"

pytorchmergebot · 2022-10-31T16:12:32Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#86492 Approved by: https://github.com/ezyang

vkuzo changed the title ~~[wip]: make autocast cache global~~ make autocast cache global instead of thread-local Oct 24, 2022

vkuzo requested a review from ezyang October 24, 2022 22:27

vkuzo added the topic: bug fixes topic category label Oct 24, 2022

ezyang approved these changes Oct 25, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 25, 2022

jgong5 requested a review from leslie-fang-intel October 25, 2022 05:52

vkuzo added the release notes: autograd release notes category label Oct 28, 2022

pytorchmergebot added the Merged label Oct 31, 2022

pytorchmergebot closed this in 75dbe37 Oct 31, 2022

kshitij12345 mentioned this pull request Nov 24, 2022

torch.clear_autocast_cache() doesn't actually clear every cache #86136

Closed

vkuzo mentioned this pull request Nov 24, 2022

DISABLED test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32 (__main__.TestFakeTensorCUDA) #86059

Closed

facebook-github-bot deleted the gh/vkuzo/521/head branch June 8, 2023 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make autocast cache global instead of thread-local #86492

make autocast cache global instead of thread-local #86492

vkuzo commented Oct 7, 2022 •

edited

pytorch-bot bot commented Oct 7, 2022 •

edited

ezyang commented Oct 24, 2022

ezyang commented Oct 24, 2022

ezyang left a comment

vkuzo commented Oct 25, 2022

ezyang commented Oct 25, 2022

eellison commented Oct 25, 2022

eellison commented Oct 25, 2022

ezyang commented Oct 28, 2022

vkuzo commented Oct 28, 2022

eellison commented Oct 28, 2022

github-actions bot commented Oct 28, 2022

vkuzo commented Oct 31, 2022

pytorchmergebot commented Oct 31, 2022

pytorchmergebot commented Oct 31, 2022

vkuzo commented Oct 31, 2022

pytorch-bot bot commented Oct 31, 2022

vkuzo commented Oct 31, 2022

pytorch-bot bot commented Oct 31, 2022

vkuzo commented Oct 31, 2022

pytorchmergebot commented Oct 31, 2022

make autocast cache global instead of thread-local #86492

make autocast cache global instead of thread-local #86492

Conversation

vkuzo commented Oct 7, 2022 • edited

pytorch-bot bot commented Oct 7, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86492

❌ 1 Failures

ezyang commented Oct 24, 2022

ezyang commented Oct 24, 2022

ezyang left a comment

Choose a reason for hiding this comment

vkuzo commented Oct 25, 2022

ezyang commented Oct 25, 2022

eellison commented Oct 25, 2022

eellison commented Oct 25, 2022

ezyang commented Oct 28, 2022

vkuzo commented Oct 28, 2022

eellison commented Oct 28, 2022

github-actions bot commented Oct 28, 2022

This PR needs a label

vkuzo commented Oct 31, 2022

pytorchmergebot commented Oct 31, 2022

Merge started

pytorchmergebot commented Oct 31, 2022

Merge failed

vkuzo commented Oct 31, 2022

pytorch-bot bot commented Oct 31, 2022

vkuzo commented Oct 31, 2022

pytorch-bot bot commented Oct 31, 2022

PyTorchBot Help

Merge

Revert

Rebase

Label

vkuzo commented Oct 31, 2022

pytorchmergebot commented Oct 31, 2022

Merge started

vkuzo commented Oct 7, 2022 •

edited

pytorch-bot bot commented Oct 7, 2022 •

edited