Use allgatherv for sparse allreduce #22226
Labels
enhancement
Not as big of a feature, but technically not a bug. Should be easy to fix
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
The current sparse allreduce in
ProcessGroupGloo
pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv, and avoid the padding trick, once #22225 is merged. This is mostly a win for memory usage is there is severe size imbalance between processes. The runtime likely won't change much, do to the nature of the underlying allgather implementation (it takes N steps where each step takes an amount of time proportional to the size of the largest contribution).The text was updated successfully, but these errors were encountered: