Increase default bucket_cap_mb value to from 25MB to a more optimal value #118421
Labels
module: distributed
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 The feature, motivation and pitch
Context: #117748
All-reduce comms are used in DDP's backward pass and by default the bucket size is set to 25MB via bucket_cap_mb. Documentation about this can be found here: https://github.com/pytorch/pytorch/blob/main/benchmarks/distributed/ddp/README.md?plain=1#L160
Default 25mb bucket size is very small and most users would have to increase it. Hence this feature request to find and set more optimal default value for general usage.
cc @ptrblck @malfet @roywei @wconstab @chrisG
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: