New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c10d] Increment sequence numbers on collectives. #55718
Conversation
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 5a61cfb (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build (1/1)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) ghstack-source-id: 126216756 Pull Request resolved: #55718
test/distributed/test_c10d.py
Outdated
|
||
if dist.get_world_size(process_group) > 2: | ||
# Test when certain ranks don't call collectives | ||
if dist.get_rank(process_group) not in [0, 1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to also add a non-contiguous number like 3 in the list of [0, 1]
.
test/distributed/test_c10d.py
Outdated
|
||
@skip_if_lt_x_gpu(4) | ||
@requires_nccl() | ||
def test_sequence_num_incremented_nccl_subgroup(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Create a helper function to avoid the boilerplate code here, which is same as test_sequence_num_incremented_gloo_subgroup
? It seems that only one arg differs in these two tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the debuggability!
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 126779220 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 126819376 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 126896607 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 126971120 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 127033533 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 127099736 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/) [ghstack-poisoned]
Pull Request resolved: #55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 127215077 Differential Revision: [D27690690](https://our.internmc.facebook.com/intern/diff/D27690690/)
This pull request has been merged in 7ff1990. |
Summary: Pull Request resolved: pytorch#55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 127215077 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27690690 fbshipit-source-id: cb284b7c760763b7c0f814a41f06656fabf806d6
Stack from ghstack:
Increments sequence numbers when ProcessGroupGloo::enqueue or
ProcessGroupNCCL::collective is run, which is a common call all collectives
make. The next step will be to log these along with other collective info in
debug mode as well as integrating them with the process group wrapper.
Differential Revision: D27690690