Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

team_broadcast of bool failed on CUDA backend #1908

Closed
Char-Aznable opened this issue Nov 16, 2018 · 4 comments
Closed

team_broadcast of bool failed on CUDA backend #1908

Char-Aznable opened this issue Nov 16, 2018 · 4 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting

Comments

@Char-Aznable
Copy link
Contributor

Consider this:

  Kokkos::parallel_for(Kokkos::TeamPolicy(nTeams,teamSize,nVectors),KOKKOS_LAMBDA(const auto& team) {
    bool b[teamSize];
    const auto iThread = team.team_rank();
    b[iThread] = iThread % 2;
    for(std::size_t i = 0; i < teamSize; ++i) {
      team.team_broadcast(b[i],i);
    }
  });

clang-7 + cuda 9.2 gives the compilation error:

In file included from /home/aznb/mycodes/kokkos/core/src/Kokkos_Core.hpp:74:
In file included from /home/aznb/mycodes/kokkos/core/src/Kokkos_Cuda.hpp:307:
/home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_Team.hpp:177:9: error: no matching function for call to 'cuda_shfl'
        cuda_shfl( val, tmp, blockDim.x * thread_id, blockDim.x * blockDim.y );
        ^~~~~~~~~
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:40:12: note: in instantiation of function template specialization 'Kokkos::Impl::CudaTeamMember::team_broadcast<bool>' requested here
      team.team_broadcast(b[i],i);
           ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda>::exec_team<void>' requested here
      this-> template exec_team< WorkTag >(
                      ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda>::operator()' requested here
  driver();
  ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda> >' requested here
              cuda_parallel_launch_local_memory<DriverType>,
              ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda>, Kokkos::LaunchBounds<0, 0>, false>::get_block_size' requested here
  return CudaGetOptBlockSize<DriverType,LaunchBounds,(CudaTraits::ConstantMemoryUseThreshold < sizeof(DriverType))>::get_block_size(f,vector_length,shmem_extra_block,shmem_extra_thread);
                                                                                                                     ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda>, Kokkos::LaunchBounds<0, 0> >' requested here
        Kokkos::Impl::cuda_get_opt_block_size< ParallelFor, LaunchBounds >( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length() )
                      ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69), Kokkos::TeamPolicy<>, Kokkos::Cuda>::ParallelFor' requested here
    Impl::ParallelFor< FunctorType , ExecPolicy > closure( functor , policy );
                                                  ^
/home/aznb/mycodes/testKokkos/testTeamBroadCast.cpp:35:69)>' requested here
  Kokkos::parallel_for(Kokkos::TeamPolicy(nTeams,teamSize,nVectors),KOKKOS_LAMBDA(const auto& team) {
          ^
/home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp:70:6: note: candidate template ignored: requirement 'sizeof(int) == sizeof(bool)' was not satisfied [with T = bool]
void cuda_shfl( T & out , T const & in , int lane ,
     ^
/home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp:95:6: note: candidate template ignored: requirement '(sizeof(int) < sizeof(bool)) && (0 == (sizeof(bool) % sizeof(int)))' was not satisfied [with T = bool]
void cuda_shfl( T & out , T const & in , int lane ,
     ^
1 error generated when compiling for sm_61.
make[2]: *** [CMakeFiles/testTeamBroadCast.dir/build.make:63: CMakeFiles/testTeamBroadCast.dir/testTeamBroadCast.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:73: CMakeFiles/testTeamBroadCast.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
@crtrott
Copy link
Member

crtrott commented Nov 19, 2018

Args yeah, bool is super problematic right now because it happens to not be a multiple of 4 bytes. Fixing this is a very substantial implementation effort (i.e. we need a whole different implementation of reduction trees etc.) So far most people simply use a 32bit int as their bool equivalent. Let me see what we can do, or whether this is important enough.

@mhoemmen
Copy link
Contributor

(Tpetra avoids bool in reductions specifically for this reason.)

@ndellingwood ndellingwood added the Enhancement Improve existing capability; will potentially require voting label Nov 28, 2018
@ndellingwood ndellingwood added this to the Backlog milestone Nov 28, 2018
@dhollman
Copy link

We need to provide a specialization that handles types with size less than int. It shouldn't be too hard to do.

@dalg24
Copy link
Member

dalg24 commented Mar 4, 2020

Will try to get #2631 merged in to make 3.1

@crtrott crtrott moved this from In progress to Done in Milestone: Release 3.1 Mar 12, 2020
@crtrott crtrott closed this as completed Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
No open projects
Development

No branches or pull requests

7 participants