Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make autocast cache global instead of thread-local #86492

Closed
wants to merge 4 commits into from

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Oct 7, 2022

Stack from ghstack (oldest at bottom):

Summary:

There is a memory leak because torch.clear_autocast_cache() clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:

python test/test_cuda.py -k autocast
python test/test_autocast.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

TODO full writeup before review

Test Plan:

I don't have a local repro yet, so need to verify that this is
doing what I think it's doing before next steps. How - TBD,
for now let's get a CI run.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 7, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86492

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 8352e85:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Oct 7, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

TODO full writeup before review

Test Plan:

I don't have a local repro yet, so need to verify that this is
doing what I think it's doing before next steps. How - TBD,
for now let's get a CI run.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c40421a01c4c07aaaccd0055e594ea9db7f74d30
Pull Request resolved: #86492
@vkuzo vkuzo changed the title [wip]: make autocast cache global make autocast cache global instead of thread-local Oct 24, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Oct 24, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fbe1c5397a0b5b61f41541bb5a69d405e6fb1d40
Pull Request resolved: #86492
@vkuzo vkuzo requested a review from ezyang October 24, 2022 22:27
@vkuzo vkuzo added the topic: bug fixes topic category label Oct 24, 2022
@ezyang
Copy link
Contributor

ezyang commented Oct 24, 2022

Cc @eellison

Test?

@ezyang
Copy link
Contributor

ezyang commented Oct 24, 2022

I will approve after auditing all uses of variable on desktop

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs test

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 25, 2022
@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 25, 2022

re: testing -

  • any tips on how to get a local repro of the original memory leak? I couldn't figure it out how to do it, yet.
  • without a local repro, I guess we could expose a binding to query the number of elements in the cache, and test that the number is reasonable - would that be valuable enough to justify the binding? IMO probably not, but I'm flexible, wdyt

@ezyang
Copy link
Contributor

ezyang commented Oct 25, 2022

@eellison did you have a repro handy?

@eellison
Copy link
Contributor

#86059 this test and removing torch.autograd.set_multithreading_enabled here should repro https://github.com/pytorch/pytorch/blob/master/test/test_ops.py#L2006

Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Oct 25, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 4ab8e33f24991918ee4bb00607bebc8accd972ec
Pull Request resolved: #86492
@eellison
Copy link
Contributor

That should repro the cache failure - but sorry we actually need torch.autograd.set_multithreading_enabled in that test for other reasons. If you just run that same test without fake tensor that would give you a test.

Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Oct 28, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 718c3b14ed345bd3862762446010dd4f39f57ffa
Pull Request resolved: #86492
@ezyang
Copy link
Contributor

ezyang commented Oct 28, 2022

@eellison fwiw I still think you should use tls

@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 28, 2022

I still don't have a local repro, so resorted to just verifying the the cache is in fact global. Lmk if someone has a better test idea.

@eellison
Copy link
Contributor

fwiw I still think you should use tls

yes i agree - just somewhat low pri at the moment

@github-actions
Copy link

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@vkuzo vkuzo added the release notes: autograd release notes category label Oct 28, 2022
@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 31, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 31, 2022

@pytorchbot help

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label')

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

Try @pytorchbot --help for more info.

@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 31, 2022

@pytorchbot --help

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2022

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR

Merge

usage: @pytorchbot merge [-g | -f MESSAGE | -l] [-r [{viable/strict,master}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -g, --green           Merge when all status checks running on the PR pass. To add status checks, use labels like `ciflow/trunk`.
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
  -l, --land-checks     [Deprecated - your PR instead now gets the `ciflow/trunk` label on approval] Merge with land time checks. This will create a new branch with your changes rebased on viable/strict and run a majority of trunk tests _before_ landing to increase trunk reliability and decrease risk of revert. The tests added are: pull, Lint and trunk. Note that periodic is excluded.
  -r [{viable/strict,master}], --rebase [{viable/strict,master}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or master as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
You, along with any member of the pytorch organization, can rebase your PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 31, 2022

@pytorchbot merge -f "added labels to fix label check"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: pytorch#86492
Approved by: https://github.com/ezyang
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Summary:

There is a memory leak because `torch.clear_autocast_cache()` clears
the autocast cache from the main thread, but autograd can write to
this cache from a background thread, so whatever autograd writes
will leak.

With some offline discussion we decided that a global cache is a
practical way to deal with this, and the performance impact of the
lock should be negligible.

Test Plan:

I don't have a local repro of the original issue, need to look into how to get
that.

A toy example
(https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b)
does cache clearing as expected on forward and backward pass.

local testing:
```
python test/test_cuda.py -k autocast
python test/test_autocast.py
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: pytorch#86492
Approved by: https://github.com/ezyang
@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/521/head branch June 8, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: autograd release notes category topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants