New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gloo] Support work-level timeouts in ProcessGroupGloo #40948
Conversation
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 0c14c90 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/3)Step: "Run tests" (full log | diagnosis details | 🔁 rerun)
|
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
Pull Request resolved: #40948 Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. ghstack-source-id: 107095290 Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/)
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
Pull Request resolved: #40948 Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. ghstack-source-id: 107187566 Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a test for this?
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
Added tests in the next PR in this stack (#41265) |
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in `waitSend` and `waitRecv` functions from Gloo's `unbound_buffer` construct. Differential Revision: [D22173763](https://our.internmc.facebook.com/intern/diff/D22173763/) [ghstack-poisoned]
This pull request has been merged in b979129. |
Stack from ghstack:
Add work-level timeouts to ProcessGroupGloo. This uses the timeout support in
waitSend
andwaitRecv
functions from Gloo'sunbound_buffer
construct.Differential Revision: D22173763