Add fused buffer scaling and unpack/pack kernels on GPU. #2973

romerojosh · 2021-06-10T23:07:41Z

Checklist before submitting

Did you read the contributor guide?
Did you update the docs?
Did you write any tests to validate this change?
Did you update the CHANGELOG, if this change affects users?

Description

For a Horovod allreduce operation using NCCL, the batched fusion buffer pack, prescaling, postscaling, and batched fusion buffer unpack are all individual CUDA kernel launches. This PR introduces a fused batched memcpy and scaling CUDA kernel to perform a (un)pack and scaling in a single kernel launch. This fusion reduces the number of kernels Horovod launches and improves performance by removing extra read/writes of GPU buffer memory between the existing individual pack and scaling kernels.

github-actions · 2021-06-11T00:54:57Z

Unit Test Results

    783 files -   37     783 suites - 37 6h 9m 15s ⏱️ - 7m 31s
    601 tests ±    0     566 ✔️ ±    0     35 💤 ±    0 0 ❌ ±0
16 311 runs - 730 12 293 ✔️ - 490 4 018 💤 - 240 0 ❌ ±0

Results for commit 0bbc3a2. ± Comparison against base commit 52fffed.

♻️ This comment has been updated with latest results.

Signed-off-by: Josh Romero <joshr@nvidia.com>

tgaddair

LGTM!

Add fused buffer scaling and unpack/pack kernels on GPU.

0bbc3a2

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh force-pushed the fuse_pack_scale branch from 044d214 to 0bbc3a2 Compare June 11, 2021 16:12

romerojosh requested a review from tgaddair June 11, 2021 18:27

tgaddair approved these changes Jun 27, 2021

View reviewed changes

tgaddair merged commit b77f89b into horovod:master Jun 27, 2021

maxhgerlach mentioned this pull request Jul 28, 2022

implement 2D torus allreduce using NCCL #3608

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fused buffer scaling and unpack/pack kernels on GPU. #2973

Add fused buffer scaling and unpack/pack kernels on GPU. #2973

romerojosh commented Jun 10, 2021

github-actions bot commented Jun 11, 2021 •

edited

tgaddair left a comment

Add fused buffer scaling and unpack/pack kernels on GPU. #2973

Add fused buffer scaling and unpack/pack kernels on GPU. #2973

Conversation

romerojosh commented Jun 10, 2021

Checklist before submitting

Description

github-actions bot commented Jun 11, 2021 • edited

Unit Test Results

tgaddair left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 11, 2021 •

edited