Skip to content

Increase default bucket_cap_mb value to from 25MB to a more optimal value #118421

@atalman

Description

@atalman

🚀 The feature, motivation and pitch

Context: #117748

All-reduce comms are used in DDP's backward pass and by default the bucket size is set to 25MB via bucket_cap_mb. Documentation about this can be found here: https://github.com/pytorch/pytorch/blob/main/benchmarks/distributed/ddp/README.md?plain=1#L160

Default 25mb bucket size is very small and most users would have to increase it. Hence this feature request to find and set more optimal default value for general usage.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @ptrblck @malfet @roywei @chrisG

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    oncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions