New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for additional reduction operations for allreduce (min, max, product). #3660
Conversation
Unit Test Results (with flaky tests) 1 201 files - 3 1 201 suites - 3 11h 32m 32s ⏱️ - 10m 6s Results for commit 1129880. ± Comparison against base commit 25ed803. ♻️ This comment has been updated with latest results. |
42edd02
to
65d51c9
Compare
…ax, product). Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
65d51c9
to
9c26bf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this PR a lot, nice work @romerojosh! Not only does it add useful functionality, multiple code paths are cleaner now thanks to effectively using the ReduceOp
abstraction.
I've left some minor comments.
@@ -490,7 +490,7 @@ void DoHorovodOperationCudaOnCPU(void*, void* on_complete_ptr, void* param) { | |||
enqueue_result = EnqueueTensorAllreduces( | |||
hvd_contexts, hvd_cpu_buffers, hvd_cpu_buffers, ready_event_lists, | |||
ops_param->op_names, device, callbacks, | |||
(average) ? ReduceOp::AVERAGE : ReduceOp::SUM, prescale_factor, | |||
reduce_op, prescale_factor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great that the extra layers of converting between average and reduce_op arguments in MXNet could be removed now!
Signed-off-by: Josh Romero <joshr@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Checklist before submitting
Description
Currently, Horovod only supports allreduce operations using sum or average reduction operations. This PR implements an expanded set of supported reduce operations (min, max, product) for allreduce.
This PR includes changes from #3646 so should not be merged until that one lands.