Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel_for with TeamPolicy::team_size_recommended with launch bounds not working -- reported by Daniel Holladay #1283

Closed
dholladay00 opened this issue Dec 14, 2017 · 6 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)

Comments

@dholladay00
Copy link

On the CUDA backend, I need to make use of launch bounds to get rid of compile time errors regarding regcount. After this change, when the team_policy is constructed:

typedef Kokkos::TeamPolicy<schedule, Kokkos::LaunchBounds<64,4> > team_policy;
const team_policy policy(ns, team_policy::team_size_recommended(lambda));

The parallel_for with this team_policy does not execute. It does execute with this policy:

typedef Kokkos::TeamPolicy<schedule, Kokkos::LaunchBounds<64,4> > team_policy;
const team_policy policy(ns, Kokkos::AUTO);
@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Dec 15, 2017
@crtrott crtrott added this to the 2018 February milestone Dec 15, 2017
@crtrott
Copy link
Member

crtrott commented Dec 15, 2017

Clearly a bug where team_size_recommended does not apply LaunchBounds to its check.

@dholladay00
Copy link
Author

I am showing similar for team_size_max, so could you check into that function as well?

@crtrott
Copy link
Member

crtrott commented Dec 22, 2017

Yeah.

@crtrott crtrott changed the title parallel_for with TeamPolicy::team_size_recommended with launch bounds not working parallel_for with TeamPolicy::team_size_recommended with launch bounds not working -- reported by Daniel Holladay Dec 22, 2017
@ibaned ibaned assigned ibaned and crtrott and unassigned ibaned Jan 31, 2018
@crtrott
Copy link
Member

crtrott commented Feb 24, 2018

Turns out this is slightly harder to fix than I thought. In particular the interface actually doesn't allow you to this for functors with more than one operator. Have to think about this a bit more and delay this to the next promotion.

@crtrott crtrott modified the milestones: 2018 February, 2018 April Feb 24, 2018
@crtrott crtrott modified the milestones: 2018 April, 2018 June Apr 18, 2018
@crtrott
Copy link
Member

crtrott commented Jul 9, 2018

Ok here is a an idea interface wise:

team_policy.team_size_max(functor,parallel_for,TagType());
team_polciy.team_size_recommended(functor,parallel_reduce);

For this we need new overloads for parallel_for, parallel_reduce, parallel_scan something like

Impl::ParallelForTag parallel_for() {return ParallelForTag(); }

@crtrott
Copy link
Member

crtrott commented Aug 13, 2018

Ok got that now in develop. We actually ended up directly using the Tags i.e.

team_policy.team_size_max(functor,Kokkos::ParallelForTag());
team_policy.team_size_recommended(functor,Kokkos::ParallelReduceTag());

Note that those are now member functions, and they take everything into account, including scratch memory etc.

@crtrott crtrott closed this as completed Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

3 participants