New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix torch.distributed.run init connect timeout by comparing host
with the current IP list
#90221
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90221
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 20be3c6: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D41373962 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w/ unit tests from internal
This pull request was exported from Phabricator. Differential Revision: D41373962 |
319c179
to
bbcca7a
Compare
…th the current IP list (pytorch#90221) Summary: Pull Request resolved: pytorch#90221 Pull Request: pytorch#79388 Fix torch.distributed.run init connect timeout by comparing `host` with the current IP list. Test Plan: ``` > buck2 test mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:utils_test -- --exact 'caffe2/test/distributed/elastic/rendezvous:utils_test - test_matches_machine_hostname_returns_true_if_ip_address_match_between_hosts (utils_test.UtilsTest)' Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Unit tests Reviewed By: d4l3k Differential Revision: D41373962 fbshipit-source-id: 00e6c102ed920d8b91a9978b5fe7d3ed37b62584
…th the current IP list (pytorch#90221) Summary: Pull Request resolved: pytorch#90221 Pull Request: pytorch#79388 Fix torch.distributed.run init connect timeout by comparing `host` with the current IP list. Test Plan: ``` > buck2 test mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:utils_test -- --exact 'caffe2/test/distributed/elastic/rendezvous:utils_test - test_matches_machine_hostname_returns_true_if_ip_address_match_between_hosts (utils_test.UtilsTest)' Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Unit tests Reviewed By: d4l3k Differential Revision: D41373962 fbshipit-source-id: 7f138e2ef74b057f70271d32b605710bc5d287f6
bbcca7a
to
20be3c6
Compare
This pull request was exported from Phabricator. Differential Revision: D41373962 |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Pull Request: #79388
Fix torch.distributed.run init connect timeout by comparing
host
with the current IP list.Test Plan: unit tests
Differential Revision: D41373962