Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize Custom Reductions over nesting levels #802

Closed
crtrott opened this issue May 17, 2017 · 6 comments
Closed

Harmonize Custom Reductions over nesting levels #802

crtrott opened this issue May 17, 2017 · 6 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting

Comments

@crtrott
Copy link
Member

crtrott commented May 17, 2017

We have currently a number of different ways of doing custom reductions which need some unifying.

The four main things are:

  • join/init/final on the functor
    • works only on the outer level
  • Kokkos::Experimental::Max/Min/...
    • works only on the outer level
    • lives in core/Kokkos_Parallel_Reduce.hpp
    • does not support array reductions (i.e. like finding the maxes of N vectors at the same time)
  • Kokkos::Max/Min/...
    • works only on inner levels
    • lives in core/impl/Kokkos_Reducer.hpp
    • does support array reductions
    • less total code through more abstraction compared to the Kokkos::Experimental variants
  • join lambda
    • works only on inner levels
@crtrott
Copy link
Member Author

crtrott commented May 17, 2017

I fixed up the Kokkos::Experimental::Max one to also work for inner levels using pretty much the infrastructure in place to support Kokkos::Max (with some ifdefs for the difference between taking pointers vs taking reference to join and init etc.).

I tried this code:

Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), 
  KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
    Scalar team_max;
    for(int rr = 0; rr<R; rr++) {
      int i = team.league_rank();
      Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), 
        [&] (const int& j, Scalar& thread_max) {
          Scalar t_max;
          Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), 
            [&] (const int& k, Scalar& max_) {
              if(a((i*32 + j)*32 + k)>lmax) lmax = a((i*32 + j)*32 + k);
          },Kokkos::Experimental::Max<Scalar>(t_max));
          if(t_max>thread_max) thread_max = t_max;
      },Kokkos::Experimental::Max<Scalar>(team_max));
    }
    if(team_max>lmax) lmax = team_max;
},Kokkos::Experimental::Max<Scalar>(max));

On KNL with N = 1000000 and R = 10000 this takes 5.7s with 256 threads using Kokkos::Max and 4.4s with Kokkos::Experimental::Max for the inner level.

@crtrott crtrott added the Enhancement Improve existing capability; will potentially require voting label May 17, 2017
@crtrott crtrott self-assigned this May 17, 2017
@ibaned ibaned added the Blocks Promotion Overview issue for release-blocking bugs label May 17, 2017
@ibaned
Copy link
Contributor

ibaned commented May 17, 2017

Need to document new requirements for custom reductions, then change the design.

@crtrott
Copy link
Member Author

crtrott commented May 17, 2017

Requirements:

  • only static size for custom reductions right now
  • get rid of the lambda version, just through reducers
  • should be fast
  • common interface over all levels
  • outer level allow for memory space argument to enable asynch reductions
  • need to support FAD types (may want to specialize)

@hcedwar hcedwar added this to InProgress in Custom Reductions May 17, 2017
@hcedwar
Copy link
Contributor

hcedwar commented May 17, 2017

Common non-summation custom reductions (e.g., product, min, max, and, or) require initialization of thread-local temporary values to an identity that is appropriate for that reduction operator. Some of these identity values are defined in std::numeric_limits<T> where T is a built-in numeric type (this is not portable to CUDA).
A reduction identity value satisfies:

   x = reduce( x , identity );
   x = reduce( identity , x );

for all possible values of x.
Custom reductions with custom scalar types requires a portable traits mechanism to obtain the identity value. This traits interface should be minimal and targeted to reduction operators.
For example,

struct Kokkos::reduction_identity<T> {
  constexpr static T sum(); // 0
  constexpr static T prod(); // 1
  constexpr static T max();  // minimum value
  constexpr static T min();   // maximum value
  constexpr static T bor();    // 0, only for integer type
  constexpr static T band(); // !0, only for integer type
};

crtrott added a commit to crtrott/kokkos that referenced this issue May 18, 2017
@crtrott crtrott added InDevelop and removed Blocks Promotion Overview issue for release-blocking bugs labels May 18, 2017
@crtrott crtrott moved this from InProgress to Done in Custom Reductions May 18, 2017
@dholladay00
Copy link

dholladay00 commented May 22, 2017

I currently have custom types for reductions, they no longer work. I need to sum up a vector and also take a weighted sum of that vector, which can be done with a single reduction but with 2 doubles returned. I had worked with @crtrott to create the type and they previously worked.

Is it now possible to do that with something a Kokkos::Array<double, 2> as the reduction type?

If not, I tried implementing the struct above

namespace Kokkos {
  template<class T>
  struct reduction_identity;
  
  template<>
  struct reduction_identity<sum_2_numbers> {
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers sum()
    {
      return static_cast<sum_2_numbers>(sum_2_numbers(0.,0.));
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers prod()
    {
      sum_2_numbers r();
      r.sum_one = 1.0;
      r.sum_two = 1.0;
      return r;
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers max()
    {
      sum_2_numbers r();
      r.sum_one = DBL_MIN;
      r.sum_two = DBL_MIN;
      return r;
    }
    KOKKOS_FORCEINLINE_FUNCTION constexpr static sum_2_numbers min()
    {
      sum_2_numbers r();
      r.sum_one = DBL_MAX;
      r.sum_two = DBL_MAX;
      return r;
    }
  };
}

However, I don't know much about constexpr functions and run into issues error: a constexpr function cannot have a nonliteral return type "sum_2_numbers".

I only use the sum operator, so maybe I only need to include the specialization for sum. Even so I get that error is the sum operator.

I see 2 potential solutions:

  1. Replace custom types with arrays (this would be ideal)
  2. Figure out how to make these specializations work for custom types with c++ wizardry.

As always, help is greatly appreciated @crtrott @hcedwar and I would be happy to track this in a separate issue if that would be helpful.

@dholladay00
Copy link

dholladay00 commented May 22, 2017

Fix in place, add constexpr to default ctor.

Current fix:

struct sum_2_numbers {
double sum_one;
double sum_two;

KOKKOS_INLINE_FUNCTION
constexpr sum_2_numbers()
: sum_one(0), sum_two(0) { }

...

};

I still think the ability to use something like Kokkos::Array<double, 2> would be ideal, but the above fix gets me up and running again with the new unified reduction stuff.

@crtrott crtrott closed this as completed May 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
No open projects
Development

No branches or pull requests

4 participants