Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use allgatherv for sparse allreduce #22226

Closed
pietern opened this issue Jun 25, 2019 · 1 comment
Closed

Use allgatherv for sparse allreduce #22226

pietern opened this issue Jun 25, 2019 · 1 comment
Assignees
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pietern
Copy link
Contributor

pietern commented Jun 25, 2019

The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv, and avoid the padding trick, once #22225 is merged. This is mostly a win for memory usage is there is severe size imbalance between processes. The runtime likely won't change much, do to the nature of the underlying allgather implementation (it takes N steps where each step takes an amount of time proportional to the size of the largest contribution).

@pietern pietern added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels Jun 25, 2019
@pietern
Copy link
Contributor Author

pietern commented Jun 27, 2019

cc @zhaojuanmao

@zhaojuanmao zhaojuanmao self-assigned this Jul 1, 2019
zhaojuanmao added a commit to zhaojuanmao/pytorch that referenced this issue Sep 18, 2019
Summary:
per pytorch#22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes.

close pytorch#22226
Pull Request resolved: pytorch#23917

Test Plan:
buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics

buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda

buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks

Differential Revision: D16664985

Pulled By: zhaojuanmao

fbshipit-source-id: a9a139da2b64617d2bb7f0b12f272e920140e5d1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants