Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fused buffer scaling and unpack/pack kernels on GPU. #2973

Merged
merged 1 commit into from Jun 27, 2021

Conversation

romerojosh
Copy link
Collaborator

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

For a Horovod allreduce operation using NCCL, the batched fusion buffer pack, prescaling, postscaling, and batched fusion buffer unpack are all individual CUDA kernel launches. This PR introduces a fused batched memcpy and scaling CUDA kernel to perform a (un)pack and scaling in a single kernel launch. This fusion reduces the number of kernels Horovod launches and improves performance by removing extra read/writes of GPU buffer memory between the existing individual pack and scaling kernels.

@github-actions
Copy link

github-actions bot commented Jun 11, 2021

Unit Test Results

     783 files   -   37       783 suites   - 37   6h 9m 15s ⏱️ - 7m 31s
     601 tests ±    0       566 ✔️ ±    0       35 💤 ±    0  0 ❌ ±0 
16 311 runs   - 730  12 293 ✔️  - 490  4 018 💤  - 240  0 ❌ ±0 

Results for commit 0bbc3a2. ± Comparison against base commit 52fffed.

♻️ This comment has been updated with latest results.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgaddair tgaddair merged commit b77f89b into horovod:master Jun 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants