Skip to content

Conversation

nWEIdia
Copy link
Collaborator

@nWEIdia nWEIdia commented Apr 10, 2025

pytest -v test/distributed/test_c10d_ucc.py -k test_save_load
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/opt/pytorch/pytorch/.hypothesis/examples'))
rootdir: /opt/pytorch/pytorch
configfile: pytest.ini
plugins: anyio-4.9.0, hypothesis-6.130.13, flakefinder-1.1.0, rerunfailures-15.0, xdist-3.6.1, xdoctest-1.0.2, typeguard-4.3.0
collected 63 items / 62 deselected / 1 selected
Running 1 items in this shard

test/distributed/test_c10d_ucc.py::DistributedDataParallelTest::test_save_load_checkpoint PASSED [65.2581s] [100%]

================================================================================== 1 passed, 62 deselected in 68.78s (0:01:08)

@ptrblck @eqy @tinglvv @atalman @malfet

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

Copy link

pytorch-bot bot commented Apr 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150979

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 29ab761 with merge base 10c51b1 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Apr 10, 2025
Copy link
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any area which commit caused this?

@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 10, 2025
@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Apr 10, 2025

any area which commit caused this?

So far we only know a range (6c54963 , ad847da ]

Should we bisect it?

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented May 20, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased nWEIdia-patch-1 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout nWEIdia-patch-1 && git pull --rebase)

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented May 25, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased nWEIdia-patch-1 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout nWEIdia-patch-1 && git pull --rebase)

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Jun 2, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 2, 2025
… it now succeeds (#150979)

 pytest -v test/distributed/test_c10d_ucc.py  -k test_save_load
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/opt/pytorch/pytorch/.hypothesis/examples'))
rootdir: /opt/pytorch/pytorch
configfile: pytest.ini
plugins: anyio-4.9.0, hypothesis-6.130.13, flakefinder-1.1.0, rerunfailures-15.0, xdist-3.6.1, xdoctest-1.0.2, typeguard-4.3.0
collected 63 items / 62 deselected / 1 selected
Running 1 items in this shard

test/distributed/test_c10d_ucc.py::DistributedDataParallelTest::test_save_load_checkpoint PASSED [65.2581s]                                                                                               [100%]

================================================================================== 1 passed, 62 deselected in 68.78s (0:01:08)

@ptrblck @eqy @tinglvv @atalman @malfet

Pull Request resolved: #150979
Approved by: https://github.com/eqy
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
… it now succeeds (pytorch#150979)

 pytest -v test/distributed/test_c10d_ucc.py  -k test_save_load
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/opt/pytorch/pytorch/.hypothesis/examples'))
rootdir: /opt/pytorch/pytorch
configfile: pytest.ini
plugins: anyio-4.9.0, hypothesis-6.130.13, flakefinder-1.1.0, rerunfailures-15.0, xdist-3.6.1, xdoctest-1.0.2, typeguard-4.3.0
collected 63 items / 62 deselected / 1 selected
Running 1 items in this shard

test/distributed/test_c10d_ucc.py::DistributedDataParallelTest::test_save_load_checkpoint PASSED [65.2581s]                                                                                               [100%]

================================================================================== 1 passed, 62 deselected in 68.78s (0:01:08)

@ptrblck @eqy @tinglvv @atalman @malfet

Pull Request resolved: pytorch#150979
Approved by: https://github.com/eqy
@github-actions github-actions bot deleted the nWEIdia-patch-1 branch July 4, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants