Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal way for using ThreadVectorRange without TeamHandle #574

Closed
crtrott opened this issue Dec 9, 2016 · 2 comments
Closed

Internal way for using ThreadVectorRange without TeamHandle #574

crtrott opened this issue Dec 9, 2016 · 2 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@crtrott
Copy link
Member

crtrott commented Dec 9, 2016

For low level libraries (such as KokkosKernels or Stokhos) it can be useful to be able to do ThreadVectorRanges without the TeamHandle. We don't like that much because its dangerous to loose the information that you might run code in parallel when you don't expect it, but I understand the design point and it looks valid to me. What I am doing it giving a non-public API way to do this. Basically you can create the meta object for the parallel for (ThreadVectorRange looks like a non-templated class but is actually a function call which returns the implementation execution policy which lives in impl) directly, and we provide a constructor which does not require the TeamHandle:
instead of:

Kokkos::parallel_for(Kokkos::ThreadVectorRange(team_handle,N), [&] (const int& i) {
   ...
});

do:

Kokkos::parallel_for(Kokkos::Impl::ThreadVectorRangeBoundariesStruct(N), [&] (const int& i) {
   ...
});
@crtrott crtrott added the Enhancement Improve existing capability; will potentially require voting label Dec 9, 2016
@crtrott crtrott added this to the END 2016 milestone Dec 9, 2016
@crtrott crtrott self-assigned this Dec 9, 2016
@crtrott
Copy link
Member Author

crtrott commented Dec 9, 2016

Btw: this would work correctly both inside of a Team and a Range kernel.

@kostrzewa
Copy link

@crtrott This was removed in 0126dcb although I found it very useful to have a very compact notation (via macros) to perform thread:

 44 typedef Kokkos::TeamPolicy<>::member_type team_handle;
[...]
 54 #define FORALLSITES_BEGIN(extent, l1_blocksize, vlen, idx) \
 55   assert( ( (extent) % (l1_blocksize) == 0) ); \
 56   Kokkos::parallel_for( \
 57     Kokkos::TeamPolicy<>((extent)/(l1_blocksize), (l1_blocksize), vlen), \
 58     KOKKOS_LAMBDA(const team_handle& team) { \
 59       Kokkos::parallel_for( \
 60         Kokkos::TeamThreadRange(team, (l1_blocksize)), \
 61           [&](const size_t & tidx){ \
 62             const size_t idx = team.league_rank() * team.team_size() \
 63                                + tidx;
 64 
 65 #define FORALLSITES_END }); });

and vector level loops:

 91 #define VEC_LOOP_BEGIN(vlen, v) \
 92   Kokkos::parallel_for(\
 93     Kokkos::Impl::ThreadVectorRangeBoundariesStruct<int,team_handle>(vlen), \
 94     [&](const int & v){                                                                                                                
 95 
 96 #define VEC_LOOP_END });

the latter over my own vectorized types. I then have template expressions and arithmetic operator overloads on these vectorized types, such that I never have to explictly write vector-level loops and still have basic linalg running essentially at the roofline limit. On CPUs, these are also testable on their own without an enclosing ThreadTeamRange and without using a view.

This seemed like a good way at the time to use ThreadVectorRange for SIMD on CPUs and coalesced access on GPUs (with an appropriately long vector length) at the same time at the cost of having to pack the data into vectors, of course.

Maybe there's a good reason that these constructors for Kokkos::Impl::ThreadVectorRangeBoundariesStruct have been removed and perhaps I should look into the simd type instead to achieve basically the same thing.

Would be very grateful to hear your thoughts on this. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

2 participants