-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong results for a parallel_reduce with CUDA8 / Maxwell50 #352
Comments
On the platform with a Maxwell50 GPU,I used cuda-memcheck with tool racecheck, and got issues coming from macro BLOCK_REDUCE_STEP (intra-warp reduction): (these issue are not present, where running the simple reduction code on Kepler30 hardware) ========= CUDA-MEMCHECK Just for checking, I also rebuilt kokkos with arch Kepler30 and run it on actual Maxwell50 hardware, and the problem is still there. |
This is related to issue #196, I delayed looking into this due to other higher priority issues but will get back to this soon. |
Ok I think I might have identified the issue (which might be a bug in Cuda) see issue #398. |
Hi Christian, Thank you for looking into this. |
Great. What I am going to do is to mark this issue as resolved, but I will keep the related Pascal issue open in order to track a real fix which doesn't hurt performance (that said the current fix only hurts performance for large scalar values (i.e. > 64bit) below that it should be a wash). |
I have 2 systems:
On the old GPU, unit test cuda.reduce is ok as well as example/tutorial/02_simple_reduce
However on the newer GPU (sm_50 / K2200), unit test cuda.reduce is ok, but 02_simple_reduce gives wrong results.
I tried printing from inside reduce kernel, and the printed values are OK but as soon as kernel has finished, the reduce final result is wrong, as if the result in GPU memory was OK, but not transfered back in host memory (?).
I checked and rechecked CUDA arch flag to make sure, i didn't mess up the build flags.
Am I doing something possibly wrong here ?
The text was updated successfully, but these errors were encountered: