-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos shared memory on Cuda uses a lot of registers #31
Comments
Ryan is working now to see whether commenting out the printf that I added to scratch space allocation makes it use fewer registers. The one reason I added that printf was because some of the Kokkos examples were failing to allocate shared memory in the Kokkos::Serial case. That was making the examples fail without any obvious reason why. It would make sense to rewrite those examples so that they check whether scratch allocation returned a NULL pointer, and fail out safely in that case. |
Shared memory should work in Serial. If it doesn't file a bug. |
Ryan says: "With the printf string removed ( |
Hi Christian -- my issue with the examples was more that they don't check whether scratch allocations succeeded (returned non-NULL). I put the |
?Wow wouldn't have expected that. From: Mark Hoemmen notifications@github.com Ryan says: "With the printf string removed (printf("");), it drops to 23 registers. With the printf call removed completely, it drops to 5 registers." Reply to this email directly or view it on GitHubhttps://github.com//issues/31#issuecomment-114571783. |
How about generating a pull request which lets the printf statement in but protects it with the macro KOKKOS_HAVE_DEBUG |
Sure, will do! |
btw there is only one place in the whole Kokkos package that uses |
Ah hm. I wanted to rename it anyway to KOKKOS_ENABLE_DEBUG KOKKOS_ENABLE_DEBUG should probably be used more. For example when checking whether lengths given to View allocators are negative etc. |
Merged into develop. |
Our intern Ryan Eberhardt has been experimenting with shared memory on CUDA. He found out that using Kokkos to access shared memory uses a lot more registers than not. I suspect that this has to do with the error checking and printf error message that I added to
Kokkos::ScratchMemorySpace
(seekokkos/core/src/Kokkos_ScratchSpace.hpp
) a while back.The first example code uses Kokkos to access shared memory. CUDA says:
ptxas info : Used 30 registers, 336 bytes cmem[0]
The second example code does NOT use shared memory, but still uses Kokkos. CUDA says:
ptxas info : Used 4 registers, 368 bytes cmem[0]
The third example code uses raw CUDA -- no Kokkos -- to access shared memory. CUDA says:
ptxas info : Used 2 registers, 32 bytes cmem[0]
Ryan verified that this actually uses shared memory (the compiler doesn't optimize it away).
The text was updated successfully, but these errors were encountered: