New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeout for ProcessGroupGlooTest #85474
Conversation
We see spurious failures due to timeouts in`test_allreduce_coalesced_basics` but only when running the whole test suite with `python run_test.py --verbose -i distributed/test_c10d_gloo`. Increasing the timeout to 50s should provide enough leeway to avoid this. Note that the default for the `_timeout` is 30 minutes.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85474
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 3 PendingAs of commit 0b92aca: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@pytorchbot merge -g |
@pytorchbot successfully started a merge job. Check the current status here. |
We see spurious failures due to timeouts in`test_allreduce_coalesced_basics` but only when running the whole test suite with `python run_test.py --verbose -i distributed/test_c10d_gloo`. Increasing the timeout to 50s should provide enough leeway to avoid this. Note that the default for the `_timeout` is 30 minutes. Originally reported in EasyBuild at easybuilders/easybuild-easyconfigs#15137 (comment) and patch proposed by @casparvl Pull Request resolved: #85474 Approved by: https://github.com/rohan-varma
We see spurious failures due to timeouts in`test_allreduce_coalesced_basics` but only when running the whole test suite with `python run_test.py --verbose -i distributed/test_c10d_gloo`. Increasing the timeout to 50s should provide enough leeway to avoid this. Note that the default for the `_timeout` is 30 minutes. Originally reported in EasyBuild at easybuilders/easybuild-easyconfigs#15137 (comment) and patch proposed by @casparvl Pull Request resolved: pytorch#85474 Approved by: https://github.com/rohan-varma
We see spurious failures due to timeouts in
test_allreduce_coalesced_basics
but only when running the whole test suite withpython run_test.py --verbose -i distributed/test_c10d_gloo
. Increasing the timeout to 50s should provide enough leeway to avoid this. Note that the default for the_timeout
is 30 minutes.Originally reported in EasyBuild at easybuilders/easybuild-easyconfigs#15137 (comment) and patch proposed by @casparvl