Harmonize Custom Reductions over nesting levels #802

crtrott · 2017-05-17T02:31:21Z

We have currently a number of different ways of doing custom reductions which need some unifying.

The four main things are:

join/init/final on the functor
- works only on the outer level
Kokkos::Experimental::Max/Min/...
- works only on the outer level
- lives in core/Kokkos_Parallel_Reduce.hpp
- does not support array reductions (i.e. like finding the maxes of N vectors at the same time)
Kokkos::Max/Min/...
- works only on inner levels
- lives in core/impl/Kokkos_Reducer.hpp
- does support array reductions
- less total code through more abstraction compared to the Kokkos::Experimental variants
join lambda
- works only on inner levels

crtrott · 2017-05-17T02:36:17Z

I fixed up the Kokkos::Experimental::Max one to also work for inner levels using pretty much the infrastructure in place to support Kokkos::Max (with some ifdefs for the difference between taking pointers vs taking reference to join and init etc.).

I tried this code:

Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), 
  KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
    Scalar team_max;
    for(int rr = 0; rr<R; rr++) {
      int i = team.league_rank();
      Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), 
        [&] (const int& j, Scalar& thread_max) {
          Scalar t_max;
          Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), 
            [&] (const int& k, Scalar& max_) {
              if(a((i*32 + j)*32 + k)>lmax) lmax = a((i*32 + j)*32 + k);
          },Kokkos::Experimental::Max<Scalar>(t_max));
          if(t_max>thread_max) thread_max = t_max;
      },Kokkos::Experimental::Max<Scalar>(team_max));
    }
    if(team_max>lmax) lmax = team_max;
},Kokkos::Experimental::Max<Scalar>(max));

On KNL with N = 1000000 and R = 10000 this takes 5.7s with 256 threads using Kokkos::Max and 4.4s with Kokkos::Experimental::Max for the inner level.

ibaned · 2017-05-17T18:57:19Z

Need to document new requirements for custom reductions, then change the design.

crtrott · 2017-05-17T19:03:59Z

Requirements:

only static size for custom reductions right now
get rid of the lambda version, just through reducers
should be fast
common interface over all levels
outer level allow for memory space argument to enable asynch reductions
need to support FAD types (may want to specialize)

hcedwar · 2017-05-17T20:05:56Z

Common non-summation custom reductions (e.g., product, min, max, and, or) require initialization of thread-local temporary values to an identity that is appropriate for that reduction operator. Some of these identity values are defined in std::numeric_limits<T> where T is a built-in numeric type (this is not portable to CUDA).
A reduction identity value satisfies:

   x = reduce( x , identity );
   x = reduce( identity , x );

for all possible values of x.
Custom reductions with custom scalar types requires a portable traits mechanism to obtain the identity value. This traits interface should be minimal and targeted to reduction operators.
For example,

struct Kokkos::reduction_identity<T> {
  constexpr static T sum(); // 0
  constexpr static T prod(); // 1
  constexpr static T max();  // minimum value
  constexpr static T min();   // maximum value
  constexpr static T bor();    // 0, only for integer type
  constexpr static T band(); // !0, only for integer type
};

see kokkos#802

dholladay00 · 2017-05-22T15:36:44Z

I currently have custom types for reductions, they no longer work. I need to sum up a vector and also take a weighted sum of that vector, which can be done with a single reduction but with 2 doubles returned. I had worked with @crtrott to create the type and they previously worked.

Is it now possible to do that with something a Kokkos::Array<double, 2> as the reduction type?

If not, I tried implementing the struct above

namespace Kokkos {
  template<class T>
  struct reduction_identity;
  
  template<>
  struct reduction_identity<sum_2_numbers> {
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers sum()
    {
      return static_cast<sum_2_numbers>(sum_2_numbers(0.,0.));
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers prod()
    {
      sum_2_numbers r();
      r.sum_one = 1.0;
      r.sum_two = 1.0;
      return r;
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers max()
    {
      sum_2_numbers r();
      r.sum_one = DBL_MIN;
      r.sum_two = DBL_MIN;
      return r;
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers min()
    {
      sum_2_numbers r();
      r.sum_one = DBL_MAX;
      r.sum_two = DBL_MAX;
      return r;
    }
  };
}

However, I don't know much about constexpr functions and run into issues error: a constexpr function cannot have a nonliteral return type "sum_2_numbers".

I only use the sum operator, so maybe I only need to include the specialization for sum. Even so I get that error is the sum operator.

I see 2 potential solutions:

Replace custom types with arrays (this would be ideal)
Figure out how to make these specializations work for custom types with c++ wizardry.

As always, help is greatly appreciated @crtrott @hcedwar and I would be happy to track this in a separate issue if that would be helpful.

dholladay00 · 2017-05-22T19:20:14Z

Fix in place, add constexpr to default ctor.

Current fix:

struct sum_2_numbers {
double sum_one;
double sum_two;

KOKKOS_INLINE_FUNCTION
constexpr sum_2_numbers()
: sum_one(0), sum_two(0) { }

...

};

I still think the ability to use something like Kokkos::Array<double, 2> would be ideal, but the above fix gets me up and running again with the new unified reduction stuff.

crtrott added the Enhancement Improve existing capability; will potentially require voting label May 17, 2017

crtrott self-assigned this May 17, 2017

ibaned added the Blocks Promotion Overview issue for release-blocking bugs label May 17, 2017

hcedwar added this to InProgress in Custom Reductions May 17, 2017

crtrott added a commit to crtrott/kokkos that referenced this issue May 18, 2017

CustomReducers: adding NumericTraits files with reduction_identity

4c9478f

see kokkos#802

crtrott added InDevelop and removed Blocks Promotion Overview issue for release-blocking bugs labels May 18, 2017

crtrott mentioned this issue May 18, 2017

Custom reduction and nested parallelism #796

Closed

crtrott moved this from InProgress to Done in Custom Reductions May 18, 2017

crtrott mentioned this issue May 18, 2017

ThreadVectorRange Customized Reduction Bug #739

Closed

crtrott closed this as completed May 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmonize Custom Reductions over nesting levels #802

Harmonize Custom Reductions over nesting levels #802

crtrott commented May 17, 2017 •

edited

crtrott commented May 17, 2017 •

edited

ibaned commented May 17, 2017 •

edited

crtrott commented May 17, 2017 •

edited

hcedwar commented May 17, 2017

dholladay00 commented May 22, 2017 •

edited

dholladay00 commented May 22, 2017 •

edited

Harmonize Custom Reductions over nesting levels #802

Harmonize Custom Reductions over nesting levels #802

Comments

crtrott commented May 17, 2017 • edited

crtrott commented May 17, 2017 • edited

ibaned commented May 17, 2017 • edited

crtrott commented May 17, 2017 • edited

hcedwar commented May 17, 2017

dholladay00 commented May 22, 2017 • edited

dholladay00 commented May 22, 2017 • edited

crtrott commented May 17, 2017 •

edited

crtrott commented May 17, 2017 •

edited

ibaned commented May 17, 2017 •

edited

crtrott commented May 17, 2017 •

edited

dholladay00 commented May 22, 2017 •

edited

dholladay00 commented May 22, 2017 •

edited