Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Atomic{Min,Max} for Kepler30 #3780

Merged
merged 1 commit into from
Feb 11, 2021
Merged

Conversation

janciesko
Copy link
Contributor

@janciesko janciesko commented Feb 4, 2021

  • Allows to maintain support for Kepler (Compute Capability 3.0)
  • Uses fallbacks for atomic min and max ops.

@@ -139,7 +139,7 @@ struct CudaLDGFetch {

template <typename iType>
KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const {
#ifdef __CUDA_ARCH__
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about KOKKOS_ARCH_KEPLER32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KOKKOS_ARCH_KEPLER32 works a no changes needed.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot Kepler32

@@ -139,7 +139,7 @@ struct CudaLDGFetch {

template <typename iType>
KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const {
#ifdef __CUDA_ARCH__
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(__CUDA_ARCH__) && !defined(KOKKOS_ARCH_KEPLER30)
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 350)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

320

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation says "only supported by devices of compute capability 3.5 and higher"

@@ -101,6 +101,52 @@ inline __host__ unsigned long long int atomic_fetch_max(

#endif

#if defined(KOKKOS_ARCH_KEPLER30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(KOKKOS_ARCH_KEPLER30)
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 350)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

dest, val);
}

#else //(!KOKKOS_ARCH_KEPLER30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#else //(!KOKKOS_ARCH_KEPLER30)
#else // supported by devices of compute capability 3.5 and higher

@@ -178,6 +226,52 @@ inline __host__ unsigned long long int atomic_max_fetch(
}
#endif

#if defined(KOKKOS_ARCH_KEPLER30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(KOKKOS_ARCH_KEPLER30)
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 350)

dest, val);
}

#else //(!KOKKOS_ARCH_KEPLER30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#else //(!KOKKOS_ARCH_KEPLER30)
#else // supported by devices of compute capability 3.5 and higher

@crtrott
Copy link
Member

crtrott commented Feb 5, 2021

Ok checked the 9.0 documentation: it looks like its just 64 bit integer atomics which are not supported pre 3.5, 32 bit ones seem to be fine (min/max/or/xor etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants