Support all_reduce a list of same-device tensors 

[`distributed_c10d.py`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/distributed_c10d.py) provides APIs to reduce a single tensor per process, or multiple tensors per process where each of them needs to reside on a different device. It will be useful (e.g., for model averaging) to support reducing a list of tensors per process where all of them are on the same device. The implementation should apply bucketing under the hood to get better performance. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support all_reduce a list of same-device tensors #21640

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support all_reduce a list of same-device tensors #21640

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions