-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Atomic{Min,Max} for Kepler30 #3780
Conversation
janciesko
commented
Feb 4, 2021
•
edited
edited
- Allows to maintain support for Kepler (Compute Capability 3.0)
- Uses fallbacks for atomic min and max ops.
core/src/Cuda/Kokkos_Cuda_View.hpp
Outdated
@@ -139,7 +139,7 @@ struct CudaLDGFetch { | |||
|
|||
template <typename iType> | |||
KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const { | |||
#ifdef __CUDA_ARCH__ | |||
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about KOKKOS_ARCH_KEPLER32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KOKKOS_ARCH_KEPLER32 works a no changes needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot Kepler32
core/src/Cuda/Kokkos_Cuda_View.hpp
Outdated
@@ -139,7 +139,7 @@ struct CudaLDGFetch { | |||
|
|||
template <typename iType> | |||
KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const { | |||
#ifdef __CUDA_ARCH__ | |||
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30) | |
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 350) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says "only supported by devices of compute capability 3.5 and higher"
@@ -101,6 +101,52 @@ inline __host__ unsigned long long int atomic_fetch_max( | |||
|
|||
#endif | |||
|
|||
#if defined(KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(KOKKOS_ARCH_KEPLER30) | |
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 350) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
dest, val); | ||
} | ||
|
||
#else //(!KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#else //(!KOKKOS_ARCH_KEPLER30) | |
#else // supported by devices of compute capability 3.5 and higher |
@@ -178,6 +226,52 @@ inline __host__ unsigned long long int atomic_max_fetch( | |||
} | |||
#endif | |||
|
|||
#if defined(KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(KOKKOS_ARCH_KEPLER30) | |
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 350) |
dest, val); | ||
} | ||
|
||
#else //(!KOKKOS_ARCH_KEPLER30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#else //(!KOKKOS_ARCH_KEPLER30) | |
#else // supported by devices of compute capability 3.5 and higher |
Ok checked the 9.0 documentation: it looks like its just 64 bit integer atomics which are not supported pre 3.5, 32 bit ones seem to be fine (min/max/or/xor etc.) |
43872e1
to
b968927
Compare