Skip to content

Support all_reduce a list of same-device tensors  #21640

@mrshenli

Description

@mrshenli

distributed_c10d.py provides APIs to reduce a single tensor per process, or multiple tensors per process where each of them needs to reside on a different device. It will be useful (e.g., for model averaging) to support reducing a list of tensors per process where all of them are on the same device. The implementation should apply bucketing under the hood to get better performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureA request for a proper, new feature.oncall: distributedAdd this issue/PR to distributed oncall triage queuetodoNot as important as medium or high priority tasks, but we will work on these.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions