Skip to content

Conversation

fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Sep 24, 2024

Summary: Sometimes, when the worker and role are the same, users want to skip TCPStore in _assign_worker_ranks and barrier in rendezvous

Test Plan: unit test

Differential Revision: D63351662

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (torchelastic) labels Sep 24, 2024
Copy link

pytorch-bot bot commented Sep 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136579

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Unrelated Failure

As of commit eba30de with merge base f0a9254 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 25, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

fduwjj added a commit to fduwjj/pytorch that referenced this pull request Sep 25, 2024
…6579)

Summary:
Pull Request resolved: pytorch#136579

Sometimes, when the worker and role are the same, users want to skip TCPStore in `_assign_worker_ranks` and barrier in rendezvous

Test Plan: unit test

Reviewed By: d4l3k

Differential Revision: D63351662
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

…6579)

Summary:
Pull Request resolved: pytorch#136579

Sometimes, when the worker and role are the same, users want to skip TCPStore in `_assign_worker_ranks` and barrier in rendezvous

Test Plan: unit test

Reviewed By: d4l3k

Differential Revision: D63351662
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63351662

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

fduwjj added a commit that referenced this pull request Sep 27, 2024
As title this is to reland #136579 as it broke some OSS CI

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)

[ghstack-poisoned]
@fduwjj
Copy link
Contributor Author

fduwjj commented Sep 27, 2024

reland in #136865

@fduwjj fduwjj closed this Sep 27, 2024
fduwjj added a commit that referenced this pull request Sep 27, 2024
… assign"

As title this is to reland #136579 as it broke some OSS CI

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)

cc XilunWu H-Huang awgu kwen2501 wanchaol fegin wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Sep 27, 2024
Pull Request resolved: #136865

As title this is to reland #136579 as it broke some OSS CI
ghstack-source-id: 245131066

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)
fduwjj added a commit that referenced this pull request Sep 27, 2024
…tore get in host assign"

As title this is to reland #136579 as it broke some OSS CI

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)

cc XilunWu H-Huang awgu kwen2501 wanchaol fegin wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Sep 27, 2024
… assign"

As title this is to reland #136579 as it broke some OSS CI

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)

cc XilunWu H-Huang awgu kwen2501 wanchaol fegin wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 27, 2024
…36865)

As title this is to reland #136579 as it broke some OSS CI

Differential Revision: [D63542918](https://our.internmc.facebook.com/intern/diff/D63542918/)

Pull Request resolved: #136865
Approved by: https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (torchelastic)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants