Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New kernel reduction interface #1522

Open
mdavis36 opened this issue Jul 25, 2023 · 3 comments
Open

New kernel reduction interface #1522

mdavis36 opened this issue Jul 25, 2023 · 3 comments

Comments

@mdavis36
Copy link
Contributor

mdavis36 commented Jul 25, 2023

The new reduction interface should integrate with kernel through the current kernel_param interface. Reduce arguments will be passed in and the appropriate lambda arguments can be generated in a similar way to how they are generated in the forall interface:

Kernel's statement::Lambda allows arguments be populated implicitly or explicitly depending on how you define the statement::Lambda type. In the implicit case we need to populate lambda objects with all arguments as required by the elements of the kernel_param tuple regardless of the use in the lambda body:

data_t worksum;

using EXEC_POL_I =
RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
  RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
    RAJA::statement::Lambda<0>
  >
RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
  RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
    RAJA::statement::Lambda<1>
  >
>;


RAJA::kernel_param<SEQ_EXEC_POL_I>( 
  RAJA::make_tuple((int)0,
                   (int)0,
                   Tile_Array,
                   RAJA::expt::Reduce<RAJA::operator::add>(&worksum)),

    [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array, Index_type&, data_t& m_worksum)
    { ... }, // This lambda does reduction work

    [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array, Index_type&, data_t& m_worksum)
    { ... } // This lambda does NOT do reduction work.
  );

RAJA::Kernel Also allows for explicit argument definitions within a statement::Lambda type:

data_t worksum = 0;

using EXEC_POL =

  RAJA::statement::For<1, RAJA::loop_exec,
    RAJA::statement::For<0, RAJA::loop_exec,
      RAJA::statement::Lambda<0, Segs<0>, Segs<1>, Offsets<0>, Offsets<1>, Params<0>, Params<1> >
    >
  >
  RAJA::statement::For<0, RAJA::loop_exec,
    RAJA::statement::For<1, RAJA::loop_exec,
      RAJA::statement::Lambda<1, Segs<0, 1>, Offsets<0, 1>, Params<0> >
    >
  >  

RAJA::kernel_param<EXEC_POL>( 
  RAJA::make_tuple(Tile_Array,
                   RAJA::expt::Reduce<RAJA::operator::add>(&worksum)),

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array, data_t& m_worksum) {
	...
  },

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array) {
	...
  }
);
@rchen20
Copy link
Member

rchen20 commented Aug 21, 2023

Hey @mdavis36, in the implicit lambda case, are there typos where data_t m_red ought to be data_t & worksum? If so, is this implying that we need to pass the reduced data to each lambda, regardless of whether that lambda actually performs a reduction?

@mdavis36
Copy link
Contributor Author

@rchen20 Updated the example above, the lambda argument itself is m_worksum, the target for the final reduction result is worksum. These should be different. m_worksum is the thread local value to be used before the actual reduction work is done later.

@rhornung67 rhornung67 added this to the FY24 Development milestone Sep 14, 2023
@rcarson3
Copy link
Member

rcarson3 commented Nov 22, 2023

@mdavis36 if I'm reading the above would this essentially collapse all the various different reduction types (e.g. `RAJA::ReduceSum<RAJA::seq_reduce, int> RAJA::ReduceSum<RAJA::omp_reduce_ordered, int> RAJA::ReduceSum<RAJA::cuda_reduce, int>, etc...) down to one single type ? So, you would only need 1 data type for all your different execution policies?

If so I'd just like to say that I'd be very much for such a feature as forall loops of mine that have those operations are the only ones I can't abstract away to a single forall abstraction using something like raja::expt::dynamic_forall feature for all the execution policies I support in my libraries/apps (cpu, openmp, cuda, hip, etc...).

Unfortunately, things like std::variant or std::visit still are not supported on the device, at least to my current knowledge of things, which would have allowed a simple-ish solution to the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants