Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Jun 11, 2025

Stack from ghstack (oldest at bottom):

By leaking resource_tracker destructor (introduced by python/cpython#88887 ) at exit, as at this point handle to child process might no longer be valid

Also, switch CI from using setup-miniconda to setup-python as an integration test for the fix as all data loader tests will hang otherwise

  • Remove CONDA_RUN macro...
  • Hack the search path in macos-test.sh to put both python and python3 aliases first in the path (not sure what other action are messing with path environment variable)

Fixes #153050

cc @albanD

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jun 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155698

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit 8476c10 with merge base 380e30a (image):

CANCELLED JOB - The following job was cancelled. Please retry:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label Jun 11, 2025
@malfet malfet added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jun 11, 2025
@malfet
Copy link
Contributor Author

malfet commented Jun 11, 2025

@pytorchbot help

Copy link

pytorch-bot bot commented Jun 11, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

[ghstack-poisoned]
malfet added a commit that referenced this pull request Jun 12, 2025
Instead of `setup-miniconda`

ghstack-source-id: 5c9d78e
Pull Request resolved: #155698
@malfet
Copy link
Contributor Author

malfet commented Jun 12, 2025

@pytorchbot merge -f "Let's see what will happen"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor Author

malfet commented Jun 12, 2025

@pytorchbot revert -m "It causes weird flaky failures in MPS and do not upload usage logs anymore" -c weird

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jun 12, 2025
This reverts commit 2b9d638.

Reverted #155698 on behalf of https://github.com/malfet due to It causes weird flaky failures in MPS and do not upload usage logs anymore ([comment](#155698 (comment)))
@pytorchmergebot
Copy link
Collaborator

@malfet your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jun 12, 2025
thatgeeman pushed a commit to thatgeeman/pytorch-docathon that referenced this pull request Jun 15, 2025
Instead of `setup-miniconda`
- Remove `CONDA_RUN` macro...
- Hack the search path in `macos-test.sh` to put both python and python3 aliases first in the path (not sure what other action are messing with path environment variable)
- Skip `TestMultiprocessing.test_fs_sharing` as even though it completes, it hangs on the shutdown both in CI and in all local setups I have
- Skip `TestCppExtensionOpenRgistration.test_base_device_registration` as it hangs on the shutdown as well
Pull Request resolved: pytorch#155698
Approved by: https://github.com/atalman
ghstack dependencies: pytorch#155476, pytorch#155493, pytorch#155601, pytorch#155515, pytorch#155697
thatgeeman pushed a commit to thatgeeman/pytorch-docathon that referenced this pull request Jun 15, 2025
This reverts commit 2b9d638.

Reverted pytorch#155698 on behalf of https://github.com/malfet due to It causes weird flaky failures in MPS and do not upload usage logs anymore ([comment](pytorch#155698 (comment)))
[ghstack-poisoned]
[ghstack-poisoned]
@malfet malfet added topic: bug fixes topic category module: python frontend For issues relating to PyTorch's Python frontend and removed topic: not user facing topic category labels Jun 24, 2025
[ghstack-poisoned]
malfet added a commit that referenced this pull request Jun 24, 2025
Instead of `setup-miniconda`

ghstack-source-id: 15bc9dd
Pull Request resolved: #155698
@malfet malfet changed the title [CI] Use setup-python from for Mac tests Fix MacOS MP hang in Python-3.12+ Jun 24, 2025
@malfet
Copy link
Contributor Author

malfet commented Jun 24, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda12.8-py3.10-gcc11-no-ops / build

Details for Dev Infra team Raised by workflow job

@malfet
Copy link
Contributor Author

malfet commented Jun 24, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable), trunk / linux-jammy-cuda12.8-py3.10-gcc11-no-ops / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here


_RT.__del__ = _noop # type: ignore[attr-defined]

atexit.register(_leak_RT_at_exit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why patch on exit and not straight away?
This deleter might end up being called before yours?

@github-actions github-actions bot deleted the gh/malfet/394/head branch July 25, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: python frontend For issues relating to PyTorch's Python frontend release notes: releng release notes category Reverted topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants