Internal way for using ThreadVectorRange without TeamHandle #574

crtrott · 2016-12-09T00:53:46Z

For low level libraries (such as KokkosKernels or Stokhos) it can be useful to be able to do ThreadVectorRanges without the TeamHandle. We don't like that much because its dangerous to loose the information that you might run code in parallel when you don't expect it, but I understand the design point and it looks valid to me. What I am doing it giving a non-public API way to do this. Basically you can create the meta object for the parallel for (ThreadVectorRange looks like a non-templated class but is actually a function call which returns the implementation execution policy which lives in impl) directly, and we provide a constructor which does not require the TeamHandle:
instead of:

Kokkos::parallel_for(Kokkos::ThreadVectorRange(team_handle,N), [&] (const int& i) {
   ...
});

do:

Kokkos::parallel_for(Kokkos::Impl::ThreadVectorRangeBoundariesStruct(N), [&] (const int& i) {
   ...
});

crtrott · 2016-12-09T00:55:15Z

Btw: this would work correctly both inside of a Team and a Range kernel.

Addresses issue #574

kostrzewa · 2023-07-13T16:41:39Z

@crtrott This was removed in 0126dcb although I found it very useful to have a very compact notation (via macros) to perform thread:

 44 typedef Kokkos::TeamPolicy<>::member_type team_handle;
[...]
 54 #define FORALLSITES_BEGIN(extent, l1_blocksize, vlen, idx) \
 55   assert( ( (extent) % (l1_blocksize) == 0) ); \
 56   Kokkos::parallel_for( \
 57     Kokkos::TeamPolicy<>((extent)/(l1_blocksize), (l1_blocksize), vlen), \
 58     KOKKOS_LAMBDA(const team_handle& team) { \
 59       Kokkos::parallel_for( \
 60         Kokkos::TeamThreadRange(team, (l1_blocksize)), \
 61           [&](const size_t & tidx){ \
 62             const size_t idx = team.league_rank() * team.team_size() \
 63                                + tidx;
 64 
 65 #define FORALLSITES_END }); });

and vector level loops:

 91 #define VEC_LOOP_BEGIN(vlen, v) \
 92   Kokkos::parallel_for(\
 93     Kokkos::Impl::ThreadVectorRangeBoundariesStruct<int,team_handle>(vlen), \
 94     [&](const int & v){                                                                                                                
 95 
 96 #define VEC_LOOP_END });

the latter over my own vectorized types. I then have template expressions and arithmetic operator overloads on these vectorized types, such that I never have to explictly write vector-level loops and still have basic linalg running essentially at the roofline limit. On CPUs, these are also testable on their own without an enclosing ThreadTeamRange and without using a view.

This seemed like a good way at the time to use ThreadVectorRange for SIMD on CPUs and coalesced access on GPUs (with an appropriately long vector length) at the same time at the cost of having to pack the data into vectors, of course.

Maybe there's a good reason that these constructors for Kokkos::Impl::ThreadVectorRangeBoundariesStruct have been removed and perhaps I should look into the simd type instead to achieve basically the same thing.

Would be very grateful to hear your thoughts on this. Thank you!

crtrott added the Enhancement Improve existing capability; will potentially require voting label Dec 9, 2016

crtrott added this to the END 2016 milestone Dec 9, 2016

crtrott self-assigned this Dec 9, 2016

crtrott added a commit that referenced this issue Dec 12, 2016

Core add low level way of doing vector ranges without team_handle.

d664f0b

Addresses issue #574

crtrott added the InDevelop label Dec 12, 2016

crtrott closed this as completed Dec 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal way for using ThreadVectorRange without TeamHandle #574

Internal way for using ThreadVectorRange without TeamHandle #574

crtrott commented Dec 9, 2016

crtrott commented Dec 9, 2016

kostrzewa commented Jul 13, 2023

Internal way for using ThreadVectorRange without TeamHandle #574

Internal way for using ThreadVectorRange without TeamHandle #574

Comments

crtrott commented Dec 9, 2016

crtrott commented Dec 9, 2016

kostrzewa commented Jul 13, 2023