-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multiple reducers support for team-level parallel reduce #5727
Conversation
Retest this please. |
1 similar comment
Retest this please. |
0d99fef
to
742ea19
Compare
66b0a3a
to
ba8427a
Compare
TeamThreadRange, ThreadVectorRange, TeamVectorRange
…Target tests list
Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
ba8427a
to
af212c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
EXPECT_EQ(n, hostView(0)); | ||
EXPECT_EQ((n * (n + 1) / 2), hostView(1)); | ||
EXPECT_EQ(n * n * (n + 1) / 2, hostView(2)); | ||
EXPECT_EQ(n * n, hostView(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 4 combined reduction rather than 3 or 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason. Just wanted to test something more than just 2 reducers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explicitly compare against reduction identity when the range is empty.
Looks good other than that.
I plan on ignoring
in the CUDA-11.6-NVCC build. Still waiting on at least one of the HIP build to pass before I merge. |
Retest this please. |
…s#5727) * Added interfaces and unit-tests for combined reducers supports for TeamThreadRange, ThreadVectorRange, TeamVectorRange * fixed warnings from unit tests * Fixed errors from CI tests * Added n=0 test cases * fixed warnings from unit tests * Removed team combined reducers unit-test file from OpenACC and OpenMPTarget tests list * quick syntax fix * Adjusted unit test skip conditions for openmptarget and openacc * Put in a macro guard to check if KOKKOS_ENABLE_CUDA_LAMBDA is defined * Addressing comments from reviews * Update core/src/impl/Kokkos_Combined_Reducer.hpp Co-authored-by: Daniel Arndt <arndtd@ornl.gov> * Update core/unit_test/TestTeamCombinedReducers.hpp Co-authored-by: Damien L-G <dalg24+github@gmail.com> * Converted write_one_value_back_on_device to a static function * Clang-formatted * git rebasing * Removed unnecessary fences from parallel_reduce_impl * Adjusted unit tests based on feedbacks * Adjusted expect_eq values in the unit tests * Removed a few ternary conditions --------- Co-authored-by: Daniel Arndt <arndtd@ornl.gov> Co-authored-by: Damien L-G <dalg24+github@gmail.com>
Resolves #4510
Added an interface to allow parallel_reduce with TeamThreadRange, ThreadVectorRange or TeamVectorRange policy to be called with more than one reducer.