Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove KOKKOS_IMPL_CUDA_* macros #4138

Merged
merged 4 commits into from
Jul 21, 2021

Conversation

masterleinad
Copy link
Contributor

Remove most KOKKOS_IMPL_CUDA_* macros. These are internal (and short) and only really have one definition anyway.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad changed the title Remove KOKKOS_IMPL_CUDA_SYNCWARP_NEEDS_MASK Remove KOKKOS_IMPL_CUDA_* macros Jul 6, 2021
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one mistake. Otherwise looks ok. But I want another careful review. Screwing this up, like that one mistake, would mostly manifest as super rare race conditions....

.jenkins Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad added this to In progress in Developer: Daniel Arndt Jul 7, 2021
@masterleinad masterleinad moved this from In progress to Awaiting Feedback in Developer: Daniel Arndt Jul 7, 2021
@masterleinad masterleinad requested a review from crtrott July 7, 2021 20:23
@masterleinad masterleinad removed the [WIP] label Jul 8, 2021
@masterleinad masterleinad marked this pull request as ready for review July 8, 2021 14:05
@crtrott crtrott added this to In progress in Kokkos Release 3.5 Jul 14, 2021
@crtrott crtrott moved this from In progress to Awaiting Feedback in Kokkos Release 3.5 Jul 14, 2021
@crtrott crtrott requested a review from nliber July 14, 2021 18:54
@dalg24
Copy link
Member

dalg24 commented Jul 20, 2021

Please elaborate what changes are proposed and why you suggested them. (I don't have an issue with them, just with the way you present them)

@masterleinad
Copy link
Contributor Author

Please elaborate what changes are proposed and why you suggested them. (I don't have an issue with them, just with the way you present them)

Since requiring CUDA version >9.0, these macros only have one definition and there is no expectation that we need to provide alternatives for other versions. Thus, this pull request is solely about code cleanup removing branches that will never be executed, and avoiding macros that hide the actual code being used. The specific macros being considered are:

  • KOKKOS_IMPL_CUDA_ACTIVEMASK
  • KOKKOS_IMPL_CUDA_SYNCWARP
  • KOKKOS_IMPL_CUDA_SYNCWARP_MASK
  • KOKKOS_IMPL_CUDA_BALLOT
  • KOKKOS_IMPL_CUDA_BALLOT_MASK
  • KOKKOS_IMPL_CUDA_SHFL
  • KOKKOS_IMPL_CUDA_SHFL_MASK
  • KOKKOS_IMPL_CUDA_SHFL_UP
  • KOKKOS_IMPL_CUDA_SHFL_DOWN
  • KOKKOS_IMPL_CUDA_SHFL_DOWN_MASK

The other macros in core/src/Cuda/Kokkos_Cuda_Version_9_8_Compatibility.hpp seem useful. The name of the file might be a misnomer now, though.

Additionally, some tests in core/unit_test/TestViewMapping_subview.hpp are reenabled for CUDA >10.0. The intent before was to disable them for CUDA 9 without taking newer versions into account.

@crtrott crtrott merged commit b758384 into kokkos:develop Jul 21, 2021
Kokkos Release 3.5 automation moved this from Awaiting Feedback to Done Jul 21, 2021
@dalg24
Copy link
Member

dalg24 commented Jul 21, 2021

The other macros in core/src/Cuda/Kokkos_Cuda_Version_9_8_Compatibility.hpp seem useful.

Actually we can probably get rid of everything.

KOKKOS_IMPL_CUDA_MAX_SHFL_SIZEOF is not actually used anywhere.

// TODO DSH shouldn't this be KOKKOS_IMPL_CUDA_MAX_SHFL_SIZEOF instead of
// sizeof(int)? (Need benchmarks to decide which is faster)
using shuffle_as_t = int;
enum : int { N = sizeof(Scalar) / sizeof(shuffle_as_t) };

KOKKOS_IMPL_CUDA_SYNCWARP_OR_RETURN is exclusively used in core/src/Cuda/Kokkos_Cuda_Task.hpp and all occurrences are actually commented except

#if defined(KOKKOS_ENABLE_DEBUG)
KOKKOS_IMPL_CUDA_SYNCWARP_OR_RETURN("TaskQueue CUDA task_ptr");
#endif
.
At the very least we could move the macro definition closer to its point of use.

@ndellingwood
Copy link
Contributor

ndellingwood commented Jul 22, 2021

@masterleinad @dalg24 The KOKKOS_IMPL_CUDA_BALLOT macro was used in Trilinos by Sacado, merge of this PR caused nightly Trilinos builds to fail. Is the correct work around to see about adding macro to Sacado?

#if defined(__CUDA_ARCH__)
#define KOKKOS_IMPL_CUDA_BALLOT(x) __ballot_sync(__activemask(), x)
#else
define KOKKOS_IMPL_CUDA_BALLOT(x) 0
#endif

Edit: This was a build with cuda/9.2 on Kepler architecture

@masterleinad
Copy link
Contributor Author

Yes, KOKKOS_IMPL_CUDA_BALLOT(x) should be replaced by __ballot_sync(__activemask(), x).

ndellingwood added a commit to trilinos/Trilinos that referenced this pull request Jul 22, 2021
ndellingwood added a commit to trilinos/Trilinos that referenced this pull request Jul 22, 2021
ndellingwood added a commit to trilinos/Trilinos that referenced this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants