-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TeamThreadRange loop count on device is passed by reference to host static constexpr #1733
Comments
OK. I think the problem is that the loop count variable of TeamThreadRange on CUDA is passed by reference and causes a ODR-use of the static variable. I tried passing it by value and I got the expected output. Here's what the changes are to the Kokkos code:
Would such a change to Kokkos make sense? |
This is more obvious with nvcc, which directly complains about the ODR use:
I would like to propose this as a bug from kokkos if it makes sense :) |
Change the title a bit to make it more clear. @ibaned |
One question: if you change it to not have the |
Removing the reference works as expected. |
Actually, which |
Ok this needs fixing. Generally we gonna reduce the use of |
#2022 should fix this. |
Hi, I've got this weird behavior when running a TeamThreadRange parallel_for:
This only outputs:
Same behavior is observed if
nTrials
andnReplica
are file scope constexpr:or if they are declared inside main with static storage:
Only when they are declared and defined inside main without static storage:
will I get the expected output:
This only happens on GPU. The CPU version runs fine. All these examples are compiled using clang 6.0.1 and cuda 9.0.
The weird thing is that this seems to only affect the TeamThreadRange loop but not the team policy loop.
Does anyone got an explanation for this and how do I get around this issue for my original example where I need to use template parameters of a class to specify the loop count for TeamThreadRange?
The text was updated successfully, but these errors were encountered: