-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MPS] Fix memory leak in copy_from_mps_ #114197
Conversation
By always calling `[destBuffer release]` when it was allocated. Leak was introduced by #84928 Add regression test. Before the change: ``` % python ../test/test_mps.py -v -k test_copy_cast_no_leak --repeat 10 test_copy_cast_no_leak (__main__.TestMemoryLeak) ... FAIL ====================================================================== FAIL: test_copy_cast_no_leak (__main__.TestMemoryLeak) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2554, in wrapper method(*args, **kwargs) File "/Users/nshulga/git/pytorch/pytorch/build/../test/test_mps.py", line 1064, in test_copy_cast_no_leak self.assertTrue(driver_before == driver_after, f"Detected {driver_after-driver_before} bytes leak of GPU memory") AssertionError: False is not true : Detected 65536 bytes leak of GPU memory To execute this test, run the following from the base repo dir: python test/test_mps.py -k test_copy_cast_no_leak This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- Ran 1 test in 1.102s FAILED (failures=1) ``` After: ``` % python ../test/test_mps.py -k test_copy_cast_no_leak --repeat 10 . ---------------------------------------------------------------------- Ran 1 test in 0.819s OK . ---------------------------------------------------------------------- Ran 1 test in 0.001s OK . ---------------------------------------------------------------------- Ran 1 test in 0.002s OK ... ``` Fixes #114096
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114197
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit ba5a4b8 with merge base e4ec554 (): FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot merge -f "MPS tests are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This has milestone 2.1.2 but doesn't look like it made the 2.1.2 release? |
I double checked and couldn't find this on https://github.com/orgs/pytorch/projects/40, so I think we indeed missed it for 2.1.2. Fortunately, it will be in 2.2.0 which will be available mid Jan cc @atalman |
@atalman let's debug later how it happen. Have we run the scrip to validate that all milestoned issues/PRs made it into the branch? |
@malfet I looked into this issue, auto-add items to the project was wrongly setup query was something like:
changed it to for 2.2.0:
Will have to make sure to add running github analyze script before generating final RC : https://github.com/pytorch/pytorch/blob/main/RELEASE.md#cutting-a-release-branch-preparations |
@atalman I've meant, do we run https://github.com/pytorch/builder/blob/main/analytics/github_analyze.py to detect commits present in milestone, but missing in the branch? It shows that two PRs are missing from the branch/release:
|
…116476) Formalize process of building and testing final rc. To avoid having missing PRs in the release, similar to this: #114197 Pull Request resolved: #116476 Approved by: https://github.com/huydhn
By always calling
[destBuffer release]
before leaving the scope in which it was allocated.Leak was introduced by #84928
Add regression test.
Before the change:
After:
Fixes #114096