-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TeamPolicy with reducers with valuetypes without += broken on CUDA #2410
Labels
Bug
Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
Comments
crtrott
added
Bug
Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Blocks Promotion
Overview issue for release-blocking bugs
labels
Oct 1, 2019
I think there is a band aid solution where we just hand-code something like 128 instead of calling team_size_max. |
OK I put in a PR with a proper fix for CUDA (but the new function overloads were not implemented for the other backends yet - but they can simply drop the reducer and call the overloads without reducers). |
dalg24
added a commit
to dalg24/kokkos
that referenced
this issue
Oct 24, 2019
@crtrott @DavidPoliakoff This needs to be patched to Trilinos. |
@masterleinad : Thanks ! |
dalg24
added a commit
that referenced
this issue
Nov 12, 2019
Fix issue #2410 max_team_size not compiling for reducers with scalar types without +=
crtrott
added
bug - fix pushed to develop branch
and removed
Blocks Promotion
Overview issue for release-blocking bugs
labels
Nov 13, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reported here:
trilinos/Trilinos#6000
I root caused the issue to our
TeamPolicy::team_size_max
function not taking reducers in addition to the functor and the parallel reduce tag. Hence during the "figure out how many registers this kernel will use" step, it tries to instantiate the kernel as if it were a sum reduction.Technically that makes that wrong for any reducer other than sum. But for every reducer which has a native value type (like double) it happens to work, though the register count determination might be slightly off - if so it would result in a runtime dispatch error.
Here is a reproducer:
To fix this: the (non-deprecated) max and recommended team size functions need an option to take in a reducer too. There are two options:
team_size_max(FunctorType,ParallelReduceTag,Reducer)
ParallelReduceTag
templated onReducerType
I am thinking option 1 is better.
What needs to happen:
The text was updated successfully, but these errors were encountered: