Windows CUDA support #3018

crtrott · 2020-05-07T20:02:06Z

These changes allow a CUDA build on Windows to succeed (the tests don't pass yet though).

The source file changes are largely fine I believe. Some warnings fixes, std::min on windows not working (msvc has a macro min) and some variadic template inheritance stuff.

Some of the CMake changes are fine, but there are two questionable changes at least:

CUDATPL: when adding cuda as a library it ended up as -lcuda.lib on the command line which made nvcc look for cuda.lib.lib …
Since nvcc_wrapper doesn't work I had to add -x cu somehow, but it can't be on link lines so CMAKE_CXX_FLAGS didn't work. I hence added it as compile option, which doesn't forward though …

Here is my json cmake setup, note some nastiness here where I explicitly say -ccbin, CMAKE_LINKER and CMAKE_AR ...:

    {
      "name": "Cuda-Release",
      "generator": "Ninja",
      "configurationType": "RelWithDebInfo",
      "buildRoot": "${projectDir}\\out\\build\\${name}",
      "installRoot": "${projectDir}\\out\\install\\${name}",
      "cmakeCommandArgs": "-DCMAKE_C_COMPILER=nvcc -DCMAKE_CXX_COMPILER=nvcc -DCMAKE_CXX_FLAGS=\"-ccbin \\\"C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.25.28610\\bin\\HostX64\\x64\\\"\" -DCMAKE_C_FLAGS=\"-arch=sm_70 -ccbin \\\"C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.25.28610\\bin\\HostX64\\x64\\\" -I\\\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2\\include\\\" \" -DCMAKE_LINKER=\"C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/Llvm/bin/lld-link.exe\" -DCMAKE_AR=\"C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/Llvm/bin/llvm-ar.exe\"",
      "buildCommandArgs": "-v",
      "ctestCommandArgs": "",
      "variables": [
        {
          "name": "Kokkos_ARCH_SNB",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_LIBDL",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_PROFILING",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_TESTS",
          "value": "True",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ARCH_VOLTA70",
          "value": "True",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_CUDA",
          "value": "True",
          "type": "BOOL"
        }
      ],
      "inheritEnvironments": []
    }

dalg24 · 2020-05-07T20:48:08Z

cmake/Modules/FindTPLCUDA.cmake

-   )
+   IF(WIN32)
+     KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE
+       LINK_LIBRARIES kokkoscore


I don't get that one

I haven't had a time to do a more extensive review yet - but I really think we need to discuss using FindCUDA or FindCUDAToolkit. I don't see any reason for us to re-engineer something built into CMake - particularly for Windows.

See my comment above. This is not intended to be committed, I just didn't have a good solution right away.

cmake/kokkos_compiler_id.cmake

containers/performance_tests/CMakeLists.txt

core/unit_test/CMakeLists.txt

jrmadsen

Using #define NOMINMAX before including the windows header will eliminate the need to remove std:: from all the min and max calls.

crtrott · 2020-05-11T19:59:07Z

Using #define NOMINMAX before including the windows header will eliminate the need to remove std:: from all the min and max calls.

But then I might break downstream code which relies on windows behavior and includes Kokkos?

crtrott · 2020-05-13T23:13:44Z

This includes #3028

crtrott · 2020-05-15T03:11:55Z

I think this is good to go.

cmake/Modules/FindTPLCUDA.cmake

jjwilke

Looks good. Do we want to squash, though? Looks like a lot of small intermediate commits.

.gitignore

dalg24 · 2020-05-15T16:13:59Z

containers/src/Kokkos_UnorderedMap.hpp

@@ -66,7 +66,7 @@

 namespace Kokkos {

-enum { UnorderedMapInvalidIndex = ~0u };
+enum : unsigned { UnorderedMapInvalidIndex = ~0u };


Did you consider making this a static constexpr member variable?

yes but I ran into trouble.

dalg24 · 2020-05-15T16:17:27Z

core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp

@@ -534,7 +534,8 @@ struct CudaReductionsFunctor<FunctorType, ArgTag, false, true> {
    __syncthreads();
    unsigned int num_teams_done = 0;
    if (threadIdx.x + threadIdx.y == 0) {
-      num_teams_done = Kokkos::atomic_fetch_add(global_flags, 1) + 1;
+      num_teams_done =
+          Kokkos::atomic_fetch_add(global_flags, (unsigned int)1) + 1;


Did it not compile w/o casting? Also probably deduce the type pointed to by global_flags if it is really necessary.

it did compile but gave tons of warnings in the windows build (hundreds).

dalg24 · 2020-05-15T16:19:40Z

core/src/Kokkos_Atomic.hpp

 #include "impl/Kokkos_Atomic_Generic.hpp"

-#ifndef _WIN32
+//#ifndef _WIN32


Why did you comment instead of removing?

leftover because I wasn't sure this worked.

core/src/Kokkos_Atomic.hpp

dhollman

Most changes are requests for documentation or stylistic. But mostly LGTM.

In general, though, when we make these sorts of changes where we change something that should work fine according to the standard but doesn't because of a bug in a specific compiler:

we should do everything we can to avoid forking the code on a preprocessor macro. I have a more detailed argument about this in one of my comments, but basically, I think a default stance of "let's fork the code because we needed a change for this compiler and we don't want to hurt compilation times or complicate things on other compilers" is dangerous and severely hurts maintainability. It's basically us asking future maintainers of the code to edit things in two places because we were too lazy to test whether the updated solution worked on all compilers we support. I understand wanting to make the smallest change possible to get things working, but that sort of mentality can also build technical debt pretty rapidly.
these changes should be documented with what compiler it is a workaround for, what issue it addresses, how it addresses the issue, and perhaps a code snippet of the way we did it before that didn't work. This will make it much easier in the future to understand why we wrote things the way we did (especially when we're doing things like writing new backends based on old ones, as I'm sure @dalg24 and friends can attest to), will keep someone from changing things back (or at least give them an idea of what to check before making such changes), and give us an understanding of why we do things a certain way (and potentially when we can stop doing them that way in the future, if we ever decide we want to). I think we're well past the point where we can just make arcane and minor changes to the minutia of C++ usage in Kokkos without documenting the reason for the change. So much of the technical debt in Kokkos comes from us not doing this before.

dhollman · 2020-05-15T15:46:45Z

containers/src/Kokkos_DualView.hpp

+  t_dev d_view;
+  t_host h_view;


This is fine, but I'm not sure I understand this. t_modified_flags and t_modified_flag should have the same size and alignment as t_dev and t_host in most cases I can think of, so I don't know how this would change things. Maybe elaborating on the issue in the comment might help?

t_modified_flags is not the same size as t_dev. t_modified_flags has only ever static extents, while t_dev and t_host have whatever the user requested.

dhollman · 2020-05-15T15:51:34Z

containers/src/Kokkos_UnorderedMap.hpp

@@ -66,7 +66,7 @@

 namespace Kokkos {

-enum { UnorderedMapInvalidIndex = ~0u };
+enum : unsigned { UnorderedMapInvalidIndex = ~0u };


Okay seriously, do we know of any compilers that support underlying types for anonymous enums and not constexpr variables? IIRC underlying enum types was a late-implemented C++11 feature in many cases. This just seems a little ridiculous at this point. Without a good reason not to, I would strongly prefer we change this to:

Suggested change

enum : unsigned { UnorderedMapInvalidIndex = ~0u };

constexpr unsigned UnorderedMapInvalidIndex = ~0u;

@dalg24 thoughts? 😉

I tried this and it failed horribly ...

Actually my failure came from in-class enums which need c++17 in order to be inline initialized. So we could change this. But I rather have a separate PR which systematically goes through all enums, potentially replacing in-class ones dependent on C++17 (ifdefing).

dhollman · 2020-05-15T15:53:05Z

containers/src/Kokkos_UnorderedMap.hpp

@@ -264,7 +264,7 @@ class UnorderedMap {
  //@}

 private:
-  enum { invalid_index = ~static_cast<size_type>(0) };
+  enum : size_type { invalid_index = ~static_cast<size_type>(0) };


Suggested change

enum : size_type { invalid_index = ~static_cast<size_type>(0) };

static constexpr auto invalid_index = ~static_cast<size_type>(0);

this requires C++17

dhollman · 2020-05-15T15:53:43Z

containers/unit_tests/TestErrorReporter.hpp

@@ -55,6 +55,7 @@
 #endif

 namespace Test {
+using namespace std;


Even in the tests, please put this at function scope.

containers/unit_tests/TestErrorReporter.hpp

dhollman · 2020-05-15T16:53:36Z

core/unit_test/default/TestDefaultDeviceDevelop.cpp

+#include <default/TestDefaultDeviceType_Category.hpp>
+
+namespace Test {
+
+TEST(defaultdevicetype, development_test) {}
+
+}  // namespace Test


I'd really prefer a more formal CMake option that's something like KOKKOS_ENABLE_SEPARATE_TESTS rather than this ad-hoc work flow solution of copy-pasting failing tests into a file that corresponds to a single target. If what you wanted was for each test file to be a single target, there should just be a way to do that, but this feels like the wrong way to go about solving the problem (though I'm glad that now that you have a workflow that matches mine a little more closely, you finally acknowledge that it's a problem :-D). @dalg24 and @jjwilke thoughts?

I actually also want to use this to simply test code fast. Setting up new projects in Visual studio which actually work (in particular for CUDA) is a horrendous pain. This is a super fast way of giving me something to build where I can stick code.

dhollman · 2020-05-15T16:56:28Z

core/unit_test/incremental/Test04_ParallelFor_RangePolicy.hpp

+using value_type = double;
+int num_elements = 10;


Yet another example of why aligning equals signs is problematic. Removing a line causes the diff to incorrectly show modifications to multiple lines (Because git can't tell that this is just a whitespace change since the whitespace comes in the middle of the line). Just a nit pick; don't mind me 🙄

dhollman · 2020-05-15T16:57:08Z

core/unit_test/incremental/Test04_ParallelFor_RangePolicy.hpp


-  ParallelForFunctor(value_type *data) : _data(data) {}
+  ParallelForFunctor(value_type *data, const value_type value)


const on a parameter type here?

dhollman · 2020-05-15T16:58:35Z

core/unit_test/incremental/Test04_ParallelFor_RangePolicy.hpp


  KOKKOS_INLINE_FUNCTION
-  void operator()(const int i) const { _data[i] = (i + 1) * value; }
+  void operator()(const int i) const { _data[i] = (i + 1) * _value; }


Couldn't you have just changed value to be constexpr? Not a big deal; just seems like that was the fix probably.

nope didn't work. Was the first thing I tried ...

core/unit_test/incremental/Test06_ParallelFor_MDRangePolicy.hpp

- A number of warnings are fixed due to differently signed enums. - Defaulted functions don't seem to work in some cases (ViewMapping) - And some nasty thing with variadic template inheritance - needed to specify template aliases to avoid compiler confusion - Some atomics include changes

WINDOWS CUDA SUpport: fix typo Fix MSVC build again.

…MSVC + fix for DualView alignment

Fix warnings on Windows: largely enums were made int instead of what the type of the assigned value is, so need to be more eplicit. revert a setting of enums. Fix some missing parenthesis and formatting

Move the add of -x cu in the right place. Fix typo in CUDA TPL discovery. Addressing review comments.

crtrott · 2020-05-16T00:23:19Z

I pushed a rebase.

dhollman

Aside from the things I need to change, LGTM

core/src/impl/Kokkos_ViewCtor.hpp

dhollman

LGTM

codecov-commenter · 2020-05-19T01:25:01Z

Codecov Report

Merging #3018 into develop will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff            @@
##           develop   #3018    +/-   ##
========================================
  Coverage     82.5%   82.6%            
========================================
  Files          122     122            
  Lines         7954    8093   +139     
========================================
+ Hits          6568    6690   +122     
- Misses        1386    1403    +17

Flag	Coverage Δ
#clang	`81.4% <100.0%> (+<0.1%)`	⬆️
#gcc	`82.9% <100.0%> (+0.1%)`	⬆️

Impacted Files	Coverage Δ
containers/src/Kokkos_UnorderedMap.hpp	`97.6% <ø> (ø)`
core/src/impl/Kokkos_FunctorAdapter.hpp	`100.0% <ø> (ø)`
core/src/impl/Kokkos_ViewLayoutTiled.hpp	`91.9% <ø> (ø)`
core/src/impl/Kokkos_ViewMapping.hpp	`90.7% <ø> (-1.9%)`	⬇️
containers/src/Kokkos_DualView.hpp	`77.0% <100.0%> (ø)`
core/src/impl/Kokkos_ViewCtor.hpp	`100.0% <100.0%> (ø)`
core/src/impl/Kokkos_Atomic_View.hpp	`87.5% <0.0%> (-12.5%)`	⬇️
core/src/Kokkos_NumericTraits.hpp	`94.7% <0.0%> (-5.3%)`	⬇️
core/src/Kokkos_CopyViews.hpp	`41.8% <0.0%> (-0.5%)`	⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4737705...c372282. Read the comment docs.

crtrott added the [WIP] label May 7, 2020

dalg24 reviewed May 7, 2020

View reviewed changes

jrmadsen reviewed May 8, 2020

View reviewed changes

crtrott removed the [WIP] label May 13, 2020

jjwilke suggested changes May 15, 2020

View reviewed changes

cmake/Modules/FindTPLCUDA.cmake Outdated Show resolved Hide resolved

jjwilke mentioned this pull request May 15, 2020

Use CMake-provided FindCUDAToolkit to configure CUDA #3028

Closed

jjwilke approved these changes May 15, 2020

View reviewed changes

dalg24 reviewed May 15, 2020

View reviewed changes

.gitignore Show resolved Hide resolved

dalg24 reviewed May 15, 2020

View reviewed changes

core/src/Kokkos_Atomic.hpp Show resolved Hide resolved

dhollman suggested changes May 15, 2020

View reviewed changes

crtrott force-pushed the windows-cuda branch from 5b8a783 to d744f86 Compare May 16, 2020 00:05

crtrott and others added 11 commits May 15, 2020 17:10

CMAKE changes to enable CUDA on Windows

eb4a7d7

FIx more cmake stuff for Windows CUDA

7a106ab

WINDOWS CUDA SUpport: fix typo Fix MSVC build again.

Force nvcc_wrapper and link libraries to obey CUDA_ROOT

7989d0f

Fix more CUDA Windows stuff

ceab8c6

Add a new Development test target, it has a reproducer for kokkos#3031.

cc50304

Use FindCUDAToolkit for CUDA libraries

0fda7bb

Windows: Workaround for detection of init/join/final on functors for …

593c6dd

…MSVC + fix for DualView alignment

Apply formating

6936dac

Fix warnings on Windows: largely enums were made int instead of what the type of the assigned value is, so need to be more eplicit. revert a setting of enums. Fix some missing parenthesis and formatting

Fix construction order in DualView

38c41f1

Fix HIP build and formatting

b90efce

Move the add of -x cu in the right place. Fix typo in CUDA TPL discovery. Addressing review comments.

crtrott force-pushed the windows-cuda branch from ceaca3b to b90efce Compare May 16, 2020 00:22

crtrott mentioned this pull request May 16, 2020

Printf place holders #3043

Open

dhollman reviewed May 18, 2020

View reviewed changes

core/src/impl/Kokkos_ViewCtor.hpp Show resolved Hide resolved

minor changes to address some of my comments in the pull request

c372282

dhollman approved these changes May 18, 2020

View reviewed changes

crtrott merged commit 4956077 into kokkos:develop May 19, 2020

crtrott deleted the windows-cuda branch May 19, 2020 03:06

ndellingwood mentioned this pull request May 20, 2020

Nightly test failures - OpenMP builds on Power8 with xl/16.1.1 #3054

Closed

BinWang0213 mentioned this pull request May 23, 2020

Can not build on Windows with CUDA #3062

Closed

masterleinad mentioned this pull request Feb 28, 2023

Windows Support #1533

Closed

masterleinad mentioned this pull request Jun 6, 2024

Fix using CUDAToolkit for CMake 3.28.4 and higher #7062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows CUDA support #3018

Windows CUDA support #3018

crtrott commented May 7, 2020 •

edited

dalg24 May 7, 2020

jjwilke May 8, 2020

crtrott May 11, 2020

jrmadsen left a comment •

edited

crtrott commented May 11, 2020

crtrott commented May 13, 2020

crtrott commented May 15, 2020

jjwilke left a comment

dalg24 May 15, 2020

crtrott May 15, 2020

dalg24 May 15, 2020

crtrott May 15, 2020

dalg24 May 15, 2020

crtrott May 15, 2020

dhollman left a comment

dhollman May 15, 2020

crtrott May 15, 2020

dhollman May 15, 2020

dhollman May 15, 2020

crtrott May 15, 2020

crtrott May 15, 2020

dhollman May 15, 2020

crtrott May 15, 2020

dhollman May 15, 2020

dhollman May 15, 2020

crtrott May 15, 2020

dhollman May 15, 2020

dhollman May 15, 2020

dhollman May 15, 2020

crtrott May 15, 2020

crtrott commented May 16, 2020

dhollman left a comment

dhollman left a comment

codecov-commenter commented May 19, 2020 •

edited

	enum : unsigned { UnorderedMapInvalidIndex = ~0u };
	constexpr unsigned UnorderedMapInvalidIndex = ~0u;

	enum : size_type { invalid_index = ~static_cast<size_type>(0) };
	static constexpr auto invalid_index = ~static_cast<size_type>(0);


		ParallelForFunctor(value_type *data) : _data(data) {}
		ParallelForFunctor(value_type *data, const value_type value)

Windows CUDA support #3018

Windows CUDA support #3018

Conversation

crtrott commented May 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrmadsen left a comment • edited

Choose a reason for hiding this comment

crtrott commented May 11, 2020

crtrott commented May 13, 2020

crtrott commented May 15, 2020

jjwilke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhollman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crtrott commented May 16, 2020

dhollman left a comment

Choose a reason for hiding this comment

dhollman left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 19, 2020 • edited

Codecov Report

crtrott commented May 7, 2020 •

edited

jrmadsen left a comment •

edited

codecov-commenter commented May 19, 2020 •

edited