Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda build with clang 10 has errors with the atomic unit tests #3237

Closed
calewis opened this issue Jul 28, 2020 · 6 comments
Closed

Cuda build with clang 10 has errors with the atomic unit tests #3237

calewis opened this issue Jul 28, 2020 · 6 comments
Assignees
Labels
Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor InDevelop Enhancement, fix, etc. has been merged into the develop branch;

Comments

@calewis
Copy link

calewis commented Jul 28, 2020

With clang 10 and newer unit test cuda 1 fails to build:

Errors:

In file included from kokkos-main/core/unit_test/cuda/TestCuda_AtomicOperations_unsignedlongint.cpp:46:
In file included from kokkos-main/core/unit_test/TestAtomicOperations_unsignedlongint.hpp:45:
kokkos-main/core/unit_test/TestAtomicOperations.hpp:668:40: error: reference to __host__ function 'atomic_fetch_or' in __host__ __device__ function
  void operator()(int) const { Kokkos::atomic_fetch_or(&data(), (T)i1); }
                                       ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:458:5: note: in instantiation of member function 'TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>::operator()' requested here
    m_functor(i);
    ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:480:22: note: in instantiation of function template specialization 'Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>::exec_range<void>' requested here
      this->template exec_range<WorkTag>(iwork);
                     ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:124:3: note: in instantiation of member function 'Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>::operator()' requested here
  driver();
  ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:479:18: note: in instantiation of function template specialization 'Kokkos::Impl::cuda_parallel_launch_local_memory<Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda> >' requested here
          &attr, cuda_parallel_launch_local_memory<DriverType>));
                 ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:489:43: note: in instantiation of member function 'Kokkos::Impl::CudaParallelLaunch<Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0, 0>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory>::get_cuda_func_attributes' requested here
                           LaunchBounds>::get_cuda_func_attributes();
                                          ^
kokkos-main/core/src/Kokkos_Parallel.hpp:204:11: note: in instantiation of member function 'Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>::execute' requested here
  closure.execute();
          ^
kokkos-main/core/unit_test/TestAtomicOperations.hpp:686:11: note: in instantiation of function template specialization 'Kokkos::parallel_for<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda> >' requested here
  Kokkos::parallel_for(1, f);
          ^
kokkos-main/core/unit_test/TestAtomicOperations.hpp:710:17: note: in instantiation of function template specialization 'TestAtomicOperations::OrAtomic<unsigned long, Kokkos::Cuda>' requested here
  T res       = OrAtomic<T, DeviceType>(i0, i1);
                ^
kokkos-main/core/unit_test/TestAtomicOperations.hpp:951:20: note: in instantiation of function template specialization 'TestAtomicOperations::OrAtomicTest<unsigned long, Kokkos::Cuda>' requested here
    case 7: return OrAtomicTest<T, DeviceType>((T)i0, (T)i1);
                   ^
kokkos-main/core/unit_test/TestAtomicOperations_unsignedlongint.hpp:52:40: note: in instantiation of function template specialization 'TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long, Kokkos::Cuda>' requested here
    ASSERT_TRUE((TestAtomicOperations::AtomicOperationsTestIntegralType<
                                       ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:458:5: note: called by 'exec_range<void>'
    m_functor(i);
    ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:480:22: note: called by 'operator()'
      this->template exec_range<WorkTag>(iwork);
                     ^
kokkos-main/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:124:3: note: called by 'cuda_parallel_launch_local_memory<Kokkos::Impl::ParallelFor<TestAtomicOperations::OrFunctor<unsigned long, Kokkos::Cuda>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda> >'
  driver();
  ^
kokkos-main/core/src/impl/Kokkos_Atomic_Fetch_Or.hpp:110:26: note: 'atomic_fetch_or' declared here
inline unsigned long int atomic_fetch_or(volatile unsigned long int* const dest,
                         ^

This has been replicated on other systems so I don't think it is arch dependent, but I am on sm61 and cuda 10.1, in case it matters.

@calewis calewis added the Blocks Promotion Overview issue for release-blocking bugs label Jul 28, 2020
@calewis
Copy link
Author

calewis commented Jul 29, 2020

I believe the issue is that there is no explicit unsigned long int* overload for the device. Clang10 complains that it can't use the host version of the function. Clang9 finds the following one:

template <typename T>
KOKKOS_INLINE_FUNCTION T atomic_fetch_or(volatile T* const dest, const T val) {
  asm("/* How did we get to here? */"); /* Added so I could track this down */
  return Impl::atomic_fetch_oper(Impl::OrOper<T, const T>(), dest, val);
}

in Atomic_Generic.hpp. I don't know why clang10 doesn't find this one, or if this is the intended one in this case.
@crtrott @dhollman

@calewis
Copy link
Author

calewis commented Jul 30, 2020

I believe this is a compiler bug in clang.

Here is my pseudo code of how clang filters the overload set for host and device functions:

auto matchDeviceType(caller, callee){
   if(isHostDevice(callee)){ // HostDevice is always a match
      return HostDevice;
   }
   if(isDevice(caller) && isDevice(callee)){
     return SameSide;
   }
   if((isDevice(caller) && isHost(callee)) || (isHost(caller) && isDevice(callee)){
     return WrongSide;
   }
}

void filterOverloadSet(candidtates, caller){
   anyGoodCalls = false;
   for(auto &Cand : candidtates){
     if(matchDeviceType(Cand, caller) == SameSide){ // Problem is here hostdevice is also good. 
       anyGoodCalls = true;
     }
   }   if(anyGoodCalls){ // Since this is false in our case the host func never gets removed 
     eraseAllWrongSide(candidates); 
   }
}

Where they don't remove WrongSideed candidates if a __host__ __device__ version instead of an exact type match was found (real code here: https://github.com/llvm/llvm-project/blob/8224c5047e9cef2db4b0e31427cdf90a2568a341/clang/lib/Sema/SemaOverload.cpp#L9854).

When I add Sema::CFP_HOSTDEVICE to help set ContainsSameSideCandidate our unit test compiles again (I haven't run it yet).

@calewis
Copy link
Author

calewis commented Aug 3, 2020

Hopefully the following gets seen soon. https://bugs.llvm.org/show_bug.cgi?id=46922 I have emailed the author of some of the code I changed.

@calewis calewis self-assigned this Aug 5, 2020
@calewis
Copy link
Author

calewis commented Aug 5, 2020

I don't think there is an easy way to work around this, we can work around it for our atomics, but the following fails using c++11

#include <initializer_list>

__host__ __device__ int element(std::initializer_list<int> il) {
  int result = 0;
  for(auto i : il){
    result += i;
  }
  return result;
}

Because initializer_list<int>::begin() is a __host__ function. So I am not sure how to mitigate clang-cuda issues for clangs greater than or equal to 10.

@ndellingwood
Copy link
Contributor

I think this was resolved in Kokkos by PR #3259, marking as InDevelop

@ndellingwood ndellingwood added Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor InDevelop and removed Blocks Promotion Overview issue for release-blocking bugs labels Aug 19, 2020
@calewis
Copy link
Author

calewis commented Aug 19, 2020

@ndellingwood we worked around it, but the underlying issue is still present. We can probably close this though.

@calewis calewis closed this as completed Aug 19, 2020
@ajpowelsnl ajpowelsnl added the InDevelop Enhancement, fix, etc. has been merged into the develop branch; label Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compiler Issue An issue that Kokkos cannot / should not fix; Kokkos must communicate to relevant vendor InDevelop Enhancement, fix, etc. has been merged into the develop branch;
Projects
None yet
Development

No branches or pull requests

3 participants