Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang compilation of CUDA backend on Windows #3345

Merged
merged 13 commits into from
Nov 10, 2020
Merged

Clang compilation of CUDA backend on Windows #3345

merged 13 commits into from
Nov 10, 2020

Conversation

j8asic
Copy link
Contributor

@j8asic j8asic commented Sep 6, 2020

As clang-cl is does not enable CUDA compilation (/clang:--cuda*** options do not work), the code changes here enable clang++ to compile Kokkos. So one can simply use CMake with ninja and build the project. Most of the errors were due to clashing of preprocessor defines (see also #3344).
Changes:

  • remove addition of -x cu for Clang, as it already uses -x cuda
  • avoid Windows atomics, use GNU's coming with Clang
  • use __thread instead of __declspec(thread), see here
  • extend _atomic_compare_exchange_strong to accept sizeof(T) == 16 (needed when using a pair of int64 as arg)
  • include Windows.h for GetCurrentProcessId in Kokkos_Core.cpp

@dalg24-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@@ -55,6 +55,8 @@
#include <cerrno>
#ifndef _WIN32
#include <unistd.h>
#else
#include <Windows.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because msvc includes it by default but msvc-cl doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed because clang-cl also includes it, but pure clang++ doesn't.

@@ -381,7 +381,7 @@ KOKKOS_INTERNAL_INLINE_DEVICE_IF_CUDA_ARCH bool _atomic_compare_exchange_strong(
T* dest, T compare, T val, MemoryOrderSuccess, MemoryOrderFailure,
typename std::enable_if<
(sizeof(T) == 1 || sizeof(T) == 2 || sizeof(T) == 4 ||
sizeof(T) == 8) &&
sizeof(T) == 8 || sizeof(T) == 16) &&
Copy link
Member

@crtrott crtrott Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work also with msvc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow seem it doesn't. This was my attempt of a clean fix for #3344.
This is the msvc (clang-cl) error:

2>C:\Code\kokkos\core\src\impl/Kokkos_Atomic_Generic.hpp(210,16): error : no matching function for call to 'atomic_compare_exchange'
2>C:\Code\kokkos\core\src\impl/Kokkos_Atomic_Generic.hpp(556,16): message : in instantiation of function template specialization 'Kokkos::Impl::atomic_fetch_oper<Kokkos::Impl::AddOper<long long, const long long>, long long>' requested here
2>C:\Code\kokkos\core\src\impl/Kokkos_Atomic_Increment.hpp(132,11): message : in instantiation of function template specialization 'Kokkos::atomic_fetch_add<long long>' requested here
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(154,12): message : candidate function not viable: no known conversion from 'unsigned long long *' to 'volatile int *const' for 1st argument
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(162,13): message : candidate function not viable: no known conversion from 'unsigned long long *' to 'volatile long *const' for 1st argument
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(174,21): message : candidate function not viable: no known conversion from 'unsigned long long *' to 'volatile unsigned int *const' for 1st argument
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(180,22): message : candidate function not viable: no known conversion from 'unsigned long long *' to 'volatile unsigned long *const' for 1st argument
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(189,10): message : candidate template ignored: requirement 'sizeof(unsigned long long) == sizeof(int)' was not satisfied [with T = unsigned long long]
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(208,10): message : candidate template ignored: requirement 'sizeof(unsigned long long) != sizeof(int) && sizeof(unsigned long long) == sizeof(long)' was not satisfied [with T = unsigned long long]
2>C:\Code\kokkos\core\src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp(253,10): message : candidate template ignored: requirement '(sizeof(unsigned long long) != 4) && (sizeof(unsigned long long) != 8)' was not satisfied [with T = unsigned long long]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're okay with this - I'll implement a define to distinguish between clang++ and clang-cl, because msvc gets activated when _MSC_VER is found.

#if defined(_MSC_VER) && !defined(KOKKOS_COMPILER_INTEL)
#define KOKKOS_COMPILER_MSVC _MSC_VER
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to fix this using c9535e4, see below.

@crtrott
Copy link
Member

crtrott commented Sep 7, 2020

OK to test.

@crtrott
Copy link
Member

crtrott commented Sep 7, 2020

This looks generally good. I will test on my machine too on windows. Thanks for this!

@masterleinad
Copy link
Contributor

You will also need to fix the indentation using clang-format 8 or the patch

diff --git a/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp b/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
index c4685df..a652c24 100644
--- a/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
+++ b/core/src/impl/Kokkos_Atomic_Compare_Exchange_Strong.hpp
@@ -380,8 +380,8 @@ template <class T, class MemoryOrderSuccess, class MemoryOrderFailure>
 KOKKOS_INTERNAL_INLINE_DEVICE_IF_CUDA_ARCH bool _atomic_compare_exchange_strong(
     T* dest, T compare, T val, MemoryOrderSuccess, MemoryOrderFailure,
     typename std::enable_if<
-        (sizeof(T) == 1 || sizeof(T) == 2 || sizeof(T) == 4 ||
-         sizeof(T) == 8 || sizeof(T) == 16) &&
+        (sizeof(T) == 1 || sizeof(T) == 2 || sizeof(T) == 4 || sizeof(T) == 8 ||
+         sizeof(T) == 16) &&
             std::is_same<
                 typename MemoryOrderSuccess::memory_order,
                 typename std::remove_cv<MemoryOrderSuccess>::type>::value &&

@crtrott
Copy link
Member

crtrott commented Sep 8, 2020

We can apply the format patch if you like, and its clang 8.0.0 (yeah it is that specific for clang-format ...).

@j8asic
Copy link
Contributor Author

j8asic commented Sep 8, 2020

We can apply the format patch if you like, and its clang 8.0.0 (yeah it is that specific for clang-format ...).

Sure, please do.

This looks generally good. I will test on my machine too on windows. Thanks for this!

I was in doubt (and forgot to push) a change to include --cuda-path flag, which I'll do asap. What's the more appropriate variable to use for this: CUDAToolkit_BIN_DIR or Kokkos_CUDA_DIR?

@j8asic
Copy link
Contributor Author

j8asic commented Sep 8, 2020

As the new commit summary writes, it solves the preprocessor clashing between Windows, CUDA & GNU atomics. Btw KOKKOS_COMPILER_CLANG_MSVC is added to CMake (you mentioned in my last pull request it should be added). This solves #3344, but #3346 is still there (since const dim3 grid code mixes unsigned and signed types for min function).

@dalg24
Copy link
Member

dalg24 commented Sep 9, 2020

Please look at dalg24:j8asic_windows_clang_cuda

I applied clang-format on the two commits and edited the author name from "adminfesb <adminfesb@JEFE.local>" -> "j8asic <j8asic@gmail.com>" in the first commit.

I tried to push directly to your branch but the permissions changed and it was rejected.

@j8asic
Copy link
Contributor Author

j8asic commented Sep 9, 2020

@dalg24 I have no idea what happened (long time haven't used Windows)...
Ok, I've just set-up clang-format and for future I'll style it...

@dalg24
Copy link
Member

dalg24 commented Sep 9, 2020

@dalg24 I have no idea what happened (long time haven't used Windows)...
Ok, I've just set-up clang-format and for future I'll style it...

Either grant me permission to push to your branch and I will fix it or force push my branch to yours.

@dalg24
Copy link
Member

dalg24 commented Sep 10, 2020

Retest this please

1 similar comment
@dalg24
Copy link
Member

dalg24 commented Sep 16, 2020

Retest this please

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues see comments.

@@ -25,6 +25,16 @@ IF (KOKKOS_ENABLE_PTHREAD)
SET(KOKKOS_ENABLE_THREADS ON)
ENDIF()

# as CMAKE_CXX_SIMULATE_ID does not work, detect clang-cl to avoid clang++,
# cl, and clang-cl clashes, requires CMake >= 3.15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work we have lower cmake version as minimum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in mind. For a lower-version cmake, it will default to clang-cl. Users that specifically use clang++ need CMake >= 3.15.

#link omp library from LLVM lib dir
IF(KOKKOS_COMPILER_CLANG_MSVC)
#for clang-cl expression /openmp yields an error, so directly add the specific Clang flag
SET(ClangOpenMPFlag /clang:-fopenmp=libomp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got added in the Travis build now, and thus fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see the error, will change it. Thanks.

adminfesb and others added 9 commits October 27, 2020 14:50
* add --cuda-path clang option
* add KOKKOS_COMPILER_CLANG_MSVC if using clang-cl
* Clang CUDA compilation needs GNU atomics activated
* fixed clashing of Windows and GNU atomic functions declarations
* add --cuda-path clang option
* add KOKKOS_COMPILER_CLANG_MSVC if using clang-cl
* Clang CUDA compilation needs GNU atomics activated
* fixed clashing of Windows and GNU atomic functions declarations
@crtrott
Copy link
Member

crtrott commented Oct 28, 2020

I fixed the CUDA path issue in CMake on a branch which also rebases on top of #3532: (my branch is on crtrott/windows-clang-with-cuda). But it turns out that compared to 3532 this broke the nvcc windows build.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the existing nvcc build on windows. Question: do you want me to force push a rebase to this branch, or shall I open a new PR with the rebase of this branch on #3532 and then we continue there?

@crtrott
Copy link
Member

crtrott commented Oct 28, 2020

I think I figured it out. https://github.com/crtrott/kokkos/tree/windows-clang-with-cuda

@crtrott
Copy link
Member

crtrott commented Oct 28, 2020

Also @j8asic can you check my branch and see whether your build still works as expected? All our builds now work (nvcc on linux, clang for cuda on linux, nvcc in VS, msvc in VS, clang-cl in VS).

@crtrott
Copy link
Member

crtrott commented Oct 29, 2020

I force pushed my rebase onto this.

@j8asic
Copy link
Contributor Author

j8asic commented Oct 29, 2020

Checked out https://github.com/crtrott/kokkos/tree/windows-clang-with-cuda

  • First error: generally under Windows:
CMake Error at C:/Code/kokkos-crtrott/build/KokkosConfig.cmake:55 (INCLUDE):
  INCLUDE could not find load file: C:/Code/kokkos-crtrott/build/KokkosTargets.cmake

I usually comment the include line to continue...

  • Second error, SET_TARGET_PROPERTIES in KokkosConfig.cmake sets linking dirs without quotes, which due to spaces fails (CUDA is by default installed in Program Files). INTERFACE_INCLUDE_DIRECTORIES and INTERFACE_LINK_LIBRARIES should be with quotes.

  • Otherwise, GNU clang++ compiles nicely.

  • The sad thing is that I get runtime errors on reductions (illegal memory access) and an error when using constant-space kernels (cudaErrorInvalidDeviceFunction). Local-space (lightweight hint) with parallel_for works. I'll open an issue for this and try to see why is this happening.

  • In CMake when setting nvcc as the compiler, I get the following output:

Setting default Kokkos CXX standard to 14
The CXX compiler identification is MSVC 19.27.29112.0
Detecting CXX compiler ABI info
Detecting CXX compiler ABI info - failed
Check for working CXX compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe
Check for working CXX compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe - broken
CMake Error at C:/Program Files/CMake/share/cmake-3.19/Modules/CMakeTestCXXCompiler.cmake:59 (message):
  The C++ compiler

    "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"
  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: C:/Code/kokkos-crtrott/build/CMakeFiles/CMakeTmp
    
    Run Build Command(s):C:/Code/ninja.exe cmTC_972e6 && [1/2] Building CXX object CMakeFiles\cmTC_972e6.dir\testCXXCompiler.cxx.obj

    FAILED: CMakeFiles/cmTC_972e6.dir/testCXXCompiler.cxx.obj 

    C:\PROGRA~1\NVIDIA~2\CUDA\v10.1\bin\nvcc.exe  /nologo /TP   /DWIN32 /D_WINDOWS /W3 /GR /EHsc  /MDd /Zi /Ob0 /Od /RTC1 /showIncludes /FoCMakeFiles\cmTC_972e6.dir\testCXXCompiler.cxx.obj /FdCMakeFiles\cmTC_972e6.dir\ /FS -c testCXXCompiler.cxx

    nvcc fatal   : Don't know what to do with 'C:/FS'
    ninja: build stopped: subcommand failed.

Don't know why nvcc is detected as MSVC...

@crtrott
Copy link
Member

crtrott commented Nov 2, 2020

Hi, what do you mean with "general in Windows" the appveyor test on Windows did pass, so at least a simple MSVC build seems to work. Regarding NVCC: you need some setup fancy. See here: #3063 (comment) Note you have to update all the paths according to your install. And those paths change every time you update Visual Studio ...

@j8asic
Copy link
Contributor Author

j8asic commented Nov 2, 2020

My ninja setup is the same as yours. I had this nvcc problem like another user, since googling brought me to Kokkos issues. The culprit in my case is PATH containing both CUDA dir and MSVC compiler dir, CMake then forces that nvcc is detected as cl, don't know why. In any case, I removed the MSVC path and it works. So nvcc + cl works for me as well, run some test also, confirmed.

Second point is that KokkosTargets.cmake shows up on installing the built lib. It an inconvenience as I would maybe want to use CMake build dir as Kokkos dir directly, but okay...

KokkosConfig.cmake still gives errors as it should use quotes when setting Windows paths (lines 42, 43, 49, 50). KokkosConfigCommon.cmake line 105 as well.

Finally, clang++ works like confirmed above, but I'm frustrated that I get these runtime errors with functors in const-space memory and reductions... As said, I'm going to look into it...

Do you want me to fix CMake errors and push it to my branch?

@crtrott
Copy link
Member

crtrott commented Nov 3, 2020

Yeah try pushing the fixes to cmake and we can recheck if everything works.

@crtrott
Copy link
Member

crtrott commented Nov 3, 2020

Note: the travis failure is probably an unrelated timeout.

@crtrott
Copy link
Member

crtrott commented Nov 5, 2020

@j8asic if you still want this in for the next release we need to do this soon. We are starting release process next week.

@crtrott
Copy link
Member

crtrott commented Nov 6, 2020

Thanks @j8asic . I will check this again with our configs, and if it all passes we can have that in the next release

@j8asic
Copy link
Contributor Author

j8asic commented Nov 6, 2020

@crtrott you're welcome. It works, but for example I'm not sure what happens when TPL_INCLUDES is a list.
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_INCLUDE_DIRECTORIES \"${TPL_INCLUDES}\"")
Because then each item should be quoted, and KOKKOS_APPEND_CONFIG_LINE is not friendly about the following:
KOKKOS_APPEND_CONFIG_LINE(INTERFACE_INCLUDE_DIRECTORIES "${TPL_INCLUDES}")

Btw I'm not fixed to a release, and am completely demotivated, since I wasted a month trying to deploy on Windows and lost users:

  • nvcc 11 + msvc cannot handle relevant c++17 code (some compiler bugs in deep_copy and variadic args unrolling) :(
  • clang with cuda has runtime issue due to who knows what :(

@crtrott
Copy link
Member

crtrott commented Nov 10, 2020

oh man that is a bummer. These compiler issues are really a drain on all of us.

@crtrott
Copy link
Member

crtrott commented Nov 10, 2020

Passed with MSVC, NVCC+MSVC and MSVC-Clang on my machine.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@crtrott crtrott merged commit 29fb7fb into kokkos:develop Nov 10, 2020
@masterleinad masterleinad mentioned this pull request Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants