-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hopper support #5538
Add Hopper support #5538
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
Lines 218 to 232 in 61d7db5
switch (compute_capability) { | |
case 30: | |
case 32: | |
case 35: return 16; | |
case 37: return 80; | |
case 50: | |
case 53: | |
case 60: | |
case 62: return 64; | |
case 52: | |
case 61: return 96; | |
case 70: | |
case 80: | |
case 86: return 8; | |
case 75: return 32; |
Added the shared config, and also the printconfig thing. Confirmed in tuning guide that it also can do 8kB shared memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cmake/compile_tests/cuda_compute_capability.cc
?
Fixed: also now actually checked for each use of ARCH_AMPERE and ARCH_VOLTA in our code base and made the appropriate adjustments. |
7edac6b
to
27393a0
Compare
Fixed the logic mistake in the half precision thing. |
Addresses issue #5524