-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix MacOS MP hang in Python-3.12+ #155698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155698
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 1 Unrelated FailureAs of commit 8476c10 with merge base 380e30a ( CANCELLED JOB - The following job was cancelled. Please retry:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot help |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge -f "Let's see what will happen" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "It causes weird flaky failures in MPS and do not upload usage logs anymore" -c weird |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 2b9d638. Reverted #155698 on behalf of https://github.com/malfet due to It causes weird flaky failures in MPS and do not upload usage logs anymore ([comment](#155698 (comment)))
@malfet your PR has been successfully reverted. |
Instead of `setup-miniconda` - Remove `CONDA_RUN` macro... - Hack the search path in `macos-test.sh` to put both python and python3 aliases first in the path (not sure what other action are messing with path environment variable) - Skip `TestMultiprocessing.test_fs_sharing` as even though it completes, it hangs on the shutdown both in CI and in all local setups I have - Skip `TestCppExtensionOpenRgistration.test_base_device_registration` as it hangs on the shutdown as well Pull Request resolved: pytorch#155698 Approved by: https://github.com/atalman ghstack dependencies: pytorch#155476, pytorch#155493, pytorch#155601, pytorch#155515, pytorch#155697
This reverts commit 2b9d638. Reverted pytorch#155698 on behalf of https://github.com/malfet due to It causes weird flaky failures in MPS and do not upload usage logs anymore ([comment](pytorch#155698 (comment)))
setup-python
from for Mac tests
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda12.8-py3.10-gcc11-no-ops / build Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 2 checks: pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable), trunk / linux-jammy-cuda12.8-py3.10-gcc11-no-ops / build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
||
_RT.__del__ = _noop # type: ignore[attr-defined] | ||
|
||
atexit.register(_leak_RT_at_exit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why patch on exit and not straight away?
This deleter might end up being called before yours?
Stack from ghstack (oldest at bottom):
By leaking resource_tracker destructor (introduced by python/cpython#88887 ) at exit, as at this point handle to child process might no longer be valid
Also, switch CI from using
setup-miniconda
tosetup-python
as an integration test for the fix as all data loader tests will hang otherwiseCONDA_RUN
macro...macos-test.sh
to put both python and python3 aliases first in the path (not sure what other action are messing with path environment variable)Fixes #153050
cc @albanD