-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPTarget: Adding Implementation for nested reducers #3845
OpenMPTarget: Adding Implementation for nested reducers #3845
Conversation
…ion, kokkos-reducers.
Then he should also co-author the commits here (https://docs.github.com/en/github/committing-changes-to-your-project/creating-a-commit-with-multiple-authors). 🙂 |
You should enable the corresponding tests in this pull request. |
Yes working on the unit tests right now. Will update them soon. |
The indentation is off. 😉 |
…the corresponding unit tests.
#pragma omp barrier | ||
|
||
if constexpr (std::is_arithmetic<ValueType>::value) { | ||
#pragma omp for reduction(+ : tmp_scratch[:1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong: it does + for the inter thread reduction, and then join within a thread.
…e. Wrapped the commented code in ifdefs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is way too much new code duplication in this pull request for me to be comfortable saying I could maintain it. Please consolidate.
template <class FunctorType, class ReducerType, class PointerType, | ||
class ValueType, class... PolicyArgs> | ||
struct ParallelReduceSpecialize<FunctorType, TeamPolicyInternal<PolicyArgs...>, | ||
ReducerType, PointerType, ValueType, 0, 1> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* functor has join = */ 0,
/* use reducer type = */ 1
Also, why are these int
s instead of bool
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should still address this.
|
||
#pragma omp declare reduction( \ | ||
custom:ValueType \ | ||
: OpenMPTargetReducerWrapper <ReducerType>::join(omp_out, omp_in)) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
practically everything except for this pragma here is copied from the 0, 0
case above, but the only way to figure that out is to run diff on the code segments. This is way too much code duplication for me to feel comfortable approving this. It needs to be restructured so that the repeated lines aren't in two places; otherwise we can't maintain this.
inline static | ||
typename std::enable_if<!std::is_same<TagType, void>::value>::type | ||
execute_impl(const FunctorType& f, const PolicyType& p, | ||
PointerType result_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs by one line from the 50+ lines of code in the non-tag version. Please refactor this to remove the copy-pasted code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is addressed . The check for tag is pushed inside the function and the duplicated function is deleted.
@@ -1286,8 +1286,16 @@ struct TestTeamBroadcast< | |||
using policy_type_f = | |||
Kokkos::TeamPolicy<ScheduleType, ExecSpace, BroadcastTag>; | |||
|
|||
// FIXME_OPENMPTARGET temporary restriction for team size to be at least 32 | |||
#ifdef KOKKOS_ENABLE_OPENMPTARGET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since execution spaces are unconditionally declared now, we shouldn't need this ifdef
any more (right)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we haven't merged #3838.
Unfortunately he cannot be associated with any PRs due to conflicts in his work policy. |
I was thinking of consolidating the duplicated code in a separate PR once this gets merged. It has been an issue since before this PR. But if there is a strong opinion on doing it in this PR itself, I have no issues. |
If other reviewers think that's fine, then I'm okay with it. I just think it's easier to avoid producing duplicate code in the first place, though. At the very least, mark the duplicated lines with comments like "begin copy-paste X" and "end copy-paste from X" so that we can remove it later. |
core/unit_test/TestTeamBasic.hpp
Outdated
@@ -66,7 +66,8 @@ TEST(TEST_CATEGORY, team_for) { | |||
} | |||
|
|||
// FIXME_OPENMPTARGET wrong results | |||
#ifndef KOKKOS_ENABLE_OPENMPTARGET | |||
// FIXME_SYCL team reduction and broadcast not yet implemented | |||
#if !defined(KOKKOS_ENABLE_SYCL) && !defined(KOKKOS_ENABLE_OPENMPTARGET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like an involuntary change. @masterleinad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works for SYCL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the ifdef
around SYCL
} | ||
#endif // KOKKOS_IMPL_HIERARCHICAL_REDUCERS_WORKAROUND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would undef KOKKOS_IMPL_HIERARCHICAL_REDUCERS_WORKAROUND
right there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
value = *local_value; | ||
#endif*/ | ||
KOKKOS_INLINE_FUNCTION void team_broadcast(ValueType& value, | ||
const int& thread_id) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to quit passing arithmetic types by const reference.
@@ -51,6 +51,11 @@ | |||
#include <Kokkos_Atomic.hpp> | |||
#include "Kokkos_OpenMPTarget_Abort.hpp" | |||
|
|||
// FIXME_OPENMPTARGET - Using this macro to implement a workaround for | |||
// hierarchical reducers. It avoids hitting the code path which we wanted to | |||
// write but dosen't work. undef'ed at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// write but dosen't work. undef'ed at the end. | |
// write but doesn't work. undef'ed at the end. |
using type = typename if_c<sizeof(ValueType) < TEAM_REDUCE_SIZE, ValueType, | ||
void>::type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use std::conditional_t
here?
using type = typename if_c<sizeof(ValueType) < TEAM_REDUCE_SIZE, ValueType, | ||
void>::type; | ||
type* team_scratch = | ||
(type*)((char*)m_glb_scratch + TEAM_REDUCE_SIZE * omp_get_team_num()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style casts for pointers 🙁
typename std::enable_if<!Kokkos::is_reducer_type<ValueType>::value>::type | ||
parallel_reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you implement this one in terms of the reducer
implementation using Kokkos::Sum
?
|
||
#if !defined(KOKKOS_IMPL_HIERARCHICAL_REDUCERS_WORKAROUND) | ||
// For some reason the actual version we wanted to write doesn't work | ||
// and crashes. We should trye this with every new compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// and crashes. We should trye this with every new compiler | |
// and crashes. We should try this with every new compiler |
static_assert(sizeof(ValueType) <= | ||
Impl::OpenMPTargetExecTeamMember::TEAM_REDUCE_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a FIXME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be implemented in terms of the reducer variant using Kokkos::Sum
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a FIXME?
No this is not a FIXME, essentially we are restricting the nested array-reduction size to be <= 32 elements per team.
The reduction uses a global scratch, so 32 elements with 16 bytes per element is essentially 512 bytes per team. Anything more than this might be detrimental for the performance.
Can't this be implemented in terms of the reducer variant using
Kokkos::Sum
?
We were trying to rely on OpenMP constructs as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am misunderstanding but does that mean that you can't do reductions for large types (and that is never intended to work)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to say its better to not perform reductions larger than 32 elements as long as we have the current implementation strategy where we allocate 16 bytes per element in the scratch memory. But maybe @crtrott can answer this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May question solely is if we should put a FIXME there or not. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right. We should put a FIXME there. Will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I forgot that we have already added a FIXME_OPENMPTARGET for this issue in one of the variants of the parallel_reduce
functions. I will repeat that everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static_assert
seems to be duplicated now looking at lines 1573-1577.
… with c++ style. Added FIXME messages to the restriction of 32 elements per team in the array reduction of parallel_reduce.
static_assert(sizeof(ValueType) <= | ||
Impl::OpenMPTargetExecTeamMember::TEAM_REDUCE_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems duplicated looking at line 1614-1618.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to undef KOKKOS_IMPL_HIERARCHICAL_REDUCERS_WORKAROUND
core/unit_test/TestTeamBasic.hpp
Outdated
#ifdef KOKKOS_ENABLE_OPENMPTARGET | ||
if constexpr (!std::is_same<TEST_EXECSPACE, | ||
Kokkos::Experimental::OpenMPTarget>::value) | ||
#endif | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ |
Same comment below
} | ||
#endif // KOKKOS_IMPL_HIERARCHICAL_REDUCERS_WORKAROUND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
static_assert(sizeof(ValueType) <= | ||
Impl::OpenMPTargetExecTeamMember::TEAM_REDUCE_SIZE); | ||
|
||
ValueType* TeamThread_scratch = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: prefer team_thread_scratch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted it to match TeamThreadRangeBoundariesStruct
in the function definition. But I can change it if its about being consistent in the way we name variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking
const Lambda& lambda, ValueType& result) { | ||
ValueType* tmp_scratch = | ||
KOKKOS_INLINE_FUNCTION | ||
typename std::enable_if<!Kokkos::is_reducer_type<ValueType>::value>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typename std::enable_if<!Kokkos::is_reducer_type<ValueType>::value>::type | |
std::enable_if_t<!Kokkos::is_reducer_type<ValueType>::value> |
// This is the variant we actually wanted to write | ||
template <typename iType, class Lambda, typename ReducerType> | ||
KOKKOS_INLINE_FUNCTION | ||
typename std::enable_if<Kokkos::is_reducer_type<ReducerType>::value>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typename std::enable_if<Kokkos::is_reducer_type<ReducerType>::value>::type | |
std::enable_if_t<Kokkos::is_reducer_type<ReducerType>::value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
Changes requested were partially addressed and will be fully addressed in follow-up PR
The PR contains the following updates in the OpenMPTarget backend:
join
Major contributor to the PR - Patrick Steinbrecher (@psteinbrecher)