TeamPolicy with reducers with valuetypes without += broken on CUDA #2410

crtrott · 2019-10-01T15:51:17Z

I root caused the issue to our TeamPolicy::team_size_max function not taking reducers in addition to the functor and the parallel reduce tag. Hence during the "figure out how many registers this kernel will use" step, it tries to instantiate the kernel as if it were a sum reduction.
Technically that makes that wrong for any reducer other than sum. But for every reducer which has a native value type (like double) it happens to work, though the register count determination might be slightly off - if so it would result in a runtime dispatch error.

Here is a reproducer:

#include<Kokkos_Core.hpp>
#include<cmath>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  {
    using ExecSpace = Kokkos::DefaultExecutionSpace;
    using ReducerType = Kokkos::MinMax<double, ExecSpace>;
    using ReducerValueType = typename ReducerType::value_type;
    using DynamicScheduleType = Kokkos::Schedule<Kokkos::Dynamic>;
    using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace, DynamicScheduleType>;
    using TeamHandleType = typename TeamPolicyType::member_type;

    static constexpr int num_teams = 1000;
    static constexpr int num_threads = 1000;
    ReducerValueType val;
    ReducerType reducer(val);
    Kokkos::parallel_reduce(
      TeamPolicyType(num_teams, Kokkos::AUTO),
      KOKKOS_LAMBDA(const TeamHandleType& team, ReducerValueType& teamVal) {
    }, reducer);
  }
  Kokkos::finalize();
}

To fix this: the (non-deprecated) max and recommended team size functions need an option to take in a reducer too. There are two options:

add an overload team_size_max(FunctorType,ParallelReduceTag,Reducer)
make ParallelReduceTag templated on ReducerType

I am thinking option 1 is better.

What needs to happen:

add a test which exposes this problem: i.e. test ALL reducers with TeamPolicy too
add that overload
use it inside the ParallelReduce class to determine max and recommended team size where appropriate

The text was updated successfully, but these errors were encountered:

crtrott · 2019-10-23T18:28:10Z

I think there is a band aid solution where we just hand-code something like 128 instead of calling team_size_max.

Tests are missing!

crtrott · 2019-10-23T22:02:41Z

OK I put in a PR with a proper fix for CUDA (but the new function overloads were not implemented for the other backends yet - but they can simply drop the reducer and call the overloads without reducers).

srajama1 · 2019-11-01T12:45:12Z

@crtrott @DavidPoliakoff This needs to be patched to Trilinos.

masterleinad · 2019-11-01T12:53:10Z

@srajama1 See #2499 that pull request will be merged eventually.

srajama1 · 2019-11-01T21:00:51Z

@masterleinad : Thanks !

Fix issue #2410 max_team_size not compiling for reducers with scalar types without +=

crtrott added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Blocks Promotion Overview issue for release-blocking bugs labels Oct 1, 2019

crtrott added this to the 3.0 Release milestone Oct 1, 2019

crtrott added this to To do in Milestone: Release 3.0 via automation Oct 1, 2019

DavidPoliakoff self-assigned this Oct 1, 2019

crtrott added a commit to crtrott/kokkos that referenced this issue Oct 23, 2019

Fix issue kokkos#2410.

2c6f076

Tests are missing!

crtrott mentioned this issue Oct 23, 2019

Fix issue #2410 max_team_size not compiling for reducers with scalar types without += #2499

Merged

dalg24 added a commit to dalg24/kokkos that referenced this issue Oct 24, 2019

fixup! Fix issue kokkos#2410.

367c626

masterleinad moved this from To do to In progress in Milestone: Release 3.0 Oct 25, 2019

sayerhs mentioned this issue Oct 31, 2019

Kokkos: Compilation error when using Kokkos::MinMax with team/thread parallel_reduce trilinos/Trilinos#6000

Closed

dalg24 added a commit that referenced this issue Nov 12, 2019

Merge pull request #2499 from crtrott/issue-2410

4508fc9

Fix issue #2410 max_team_size not compiling for reducers with scalar types without +=

crtrott added bug - fix pushed to develop branch and removed Blocks Promotion Overview issue for release-blocking bugs labels Nov 13, 2019

crtrott moved this from In progress to Done in Milestone: Release 3.0 Nov 13, 2019

ndellingwood closed this as completed Dec 20, 2019

ndellingwood mentioned this issue Jan 28, 2020

Kokkos + KokkosKernels Promotion To 2.9.99 trilinos/Trilinos#6671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TeamPolicy with reducers with valuetypes without += broken on CUDA #2410

TeamPolicy with reducers with valuetypes without += broken on CUDA #2410

crtrott commented Oct 1, 2019

crtrott commented Oct 23, 2019

crtrott commented Oct 23, 2019

srajama1 commented Nov 1, 2019

masterleinad commented Nov 1, 2019

srajama1 commented Nov 1, 2019

TeamPolicy with reducers with valuetypes without += broken on CUDA #2410

TeamPolicy with reducers with valuetypes without += broken on CUDA #2410

Comments

crtrott commented Oct 1, 2019

crtrott commented Oct 23, 2019

crtrott commented Oct 23, 2019

srajama1 commented Nov 1, 2019

masterleinad commented Nov 1, 2019

srajama1 commented Nov 1, 2019