Increase segment sizes in coll/han and coll/adapt #11360
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



Increase segment sizes for bcast, reduce, and allreduce to 512k. On modern machines, higher segment sizes seem to be more efficient as they reduce the overhead of segmenting (less messages, better chance at saturating the network).
Example for increased segment sizes on Hawk (64 core AMD EPYC Rome, ConnectX-6):
Reduce with 64k (current segment size)

Reduce with 512k (new segment size)

Note the lower latency on the right side of the plots. The change in segment size yields an improvement of about 10x for han over tuned. There is no data for han over sm because sm crashes at this segment size.