-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in _LRScheduler.step() #85602
Fix memory leak in _LRScheduler.step() #85602
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85602
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 1 PendingAs of commit 266ac1b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The change sounds ok but the CI failures are real right? |
Thanks for your comments. I will fix the CI. |
1 similar comment
Thanks for your comments. I will fix the CI. |
/easycla As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details. This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign. |
@albanD There are some updates to try to fix the tests. Could you please take another look? |
Thanks! |
test/test_optim.py
Outdated
import gc | ||
gc.collect() | ||
run() | ||
garbage = gc.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want the collect here I think?
THe whole point here is not to create ref cycles right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if there are any ref cycles created in run()
, the second gc.collect()
would return non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I was still under the idea of using weakref and didn't read the update carefully. Sounds good!
@albanD The CI needs to be triggered by a maintainer since I am a first-time contributor. Could you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good thanks
test/test_optim.py
Outdated
import gc | ||
gc.collect() | ||
run() | ||
garbage = gc.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I was still under the idea of using weakref and didn't read the update carefully. Sounds good!
@pytorchbot merge -g |
@pytorchbot successfully started a merge job. Check the current status here. |
@pytorchbot -h |
PyTorchBot Help
Merge
Revert
Rebase
Label
|
@pytorchbot revert -m "newly added test is flaky" -c nosignal |
For more information on the flaky test, see #86413 |
@pytorchbot successfully started a revert job. Check the current status here. |
@KinglittleQ your PR has been successfully reverted. |
This reverts commit eb32330. Reverted #85602 on behalf of https://github.com/albanD due to newly added test is flaky
@KinglittleQ my best guess of the flakyness here is that if any other object happens to be ready for collection when you run gc.collect(), it will fail the test even though we didn't create a cycle in the function. Could you update the test to use weakref again so that we make sure that: while the gc is disabled, the weakef remains alive after the function AND after running the gc manually the weakref is gone. |
…Q/pytorch into fix-memoryleak-in-lrscheduler
@albanD The test has been updated now. |
gc.disable() | ||
ref = run() | ||
assert ref() is None | ||
gc.enable() # restore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler
object should be freed after run()
if there are no ref cycles.
This test ensures that the weakref will be unavailable even if the gc is disabled. (gc would collect objects with ref cycles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks!
@pytorchbot merge -g |
Merge startedYour change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Fixes #85410 This diff removed the cyclic references in `_LRScheduler.step()`. Pull Request resolved: #85602 Approved by: https://github.com/albanD Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/b3fdb02fb25508d9c61d70b594f8a7fac3b2a365 Reviewed By: seemethere Differential Revision: D40197030 Pulled By: seemethere fbshipit-source-id: ed4be542b6f19bc1030d0ae74b40994a055cff29
Fixes #85410
This diff removed the cyclic references in
_LRScheduler.step()
.