Use allgatherv for sparse allreduce #22226

pietern · 2019-06-25T20:20:34Z

The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv, and avoid the padding trick, once #22225 is merged. This is mostly a win for memory usage is there is severe size imbalance between processes. The runtime likely won't change much, do to the nature of the underlying allgather implementation (it takes N steps where each step takes an amount of time proportional to the size of the largest contribution).

The text was updated successfully, but these errors were encountered:

pietern · 2019-06-27T20:48:49Z

cc @zhaojuanmao

Summary: per pytorch#22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes. close pytorch#22226 Pull Request resolved: pytorch#23917 Test Plan: buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks Differential Revision: D16664985 Pulled By: zhaojuanmao fbshipit-source-id: a9a139da2b64617d2bb7f0b12f272e920140e5d1

pietern added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels Jun 25, 2019

zhaojuanmao self-assigned this Jul 1, 2019

zhaojuanmao mentioned this issue Aug 7, 2019

use allgatherv for sparse all reduce #23917

Closed

facebook-github-bot closed this as completed in ed09704 Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use allgatherv for sparse allreduce #22226

Use allgatherv for sparse allreduce #22226

pietern commented Jun 25, 2019

pietern commented Jun 27, 2019

Use allgatherv for sparse allreduce #22226

Use allgatherv for sparse allreduce #22226

Comments

pietern commented Jun 25, 2019

pietern commented Jun 27, 2019