Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix averaging using CUDA half2 implementation one element half buffers. #2375

Merged
merged 2 commits into from Oct 14, 2020

Conversation

romerojosh
Copy link
Collaborator

A bug was introduced in #1949 where when using the half2 ScaleBufferImpl for CUDA, if the buffer is a single half element, division is skipped due to mishandled rounding to compute thread blocks for half2 usage. This PR fixes this case so that the scaling is completed correctly in this case.

Signed-off-by: Josh Romero <joshr@nvidia.com>
@github-actions
Copy link

Unit Test Results

   521 files  -     60     521 suites  -60   4h 23m 46s ⏱️ - 5m 56s
   507 tests ±       0     481 ✔️ ±    0       26 💤 ±    0  0 ✖️ ±0 
9 842 runs  -1 224  7 853 ✔️ -916  1 989 💤 -308  0 ✖️ ±0 

results for commit 7b5d441 ± comparison against base commit 9aaeb06

@tgaddair tgaddair merged commit 263b66e into horovod:master Oct 14, 2020
@github-actions

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants