-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Pass in smaller timeout into init_process_group for distributed_test #47896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Per title Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit df13f4c (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 11 times. |
…buted_test" Per title Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/) [ghstack-poisoned]
Pull Request resolved: #47896 Per title ghstack-source-id: 116607077 Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the fmt
library build error is coming from the other PR in the stack, but this one looks good!
…buted_test" Closes #47892. Since we have a 100s timeout on the entire test, we should have a smaller timeout than the default 30 min for the process group used for the test. This diff sets the timeout to 60s. For example, this is useful when running tests with NCCL_BLOCKING_WAIT so that we get the op timed out error instead of the test itself timing out. Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/) [ghstack-poisoned]
…buted_test" Closes #47892. Since we have a 100s timeout on the entire test, we should have a smaller timeout than the default 30 min for the process group used for the test. This diff sets the timeout to 60s. For example, this is useful when running tests with NCCL_BLOCKING_WAIT so that we get the op timed out error instead of the test itself timing out. Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/) [ghstack-poisoned]
Pull Request resolved: #47896 Per title ghstack-source-id: 116710141 Differential Revision: [D24943323](https://our.internmc.facebook.com/intern/diff/D24943323/)
This pull request has been merged in f824854. |
Stack from ghstack:
Closes #47892. Since we have a 100s timeout on the entire test, we should have a smaller timeout than the default 30 min for the process group used for the test.
This diff sets the timeout to 60s. For example, this is useful when running tests with NCCL_BLOCKING_WAIT so that we get the op timed out error instead of the test itself timing out.
Differential Revision: D24943323