Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows CUDA support #3018

Merged
merged 12 commits into from
May 19, 2020
Merged

Windows CUDA support #3018

merged 12 commits into from
May 19, 2020

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented May 7, 2020

These changes allow a CUDA build on Windows to succeed (the tests don't pass yet though).

The source file changes are largely fine I believe. Some warnings fixes, std::min on windows not working (msvc has a macro min) and some variadic template inheritance stuff.

Some of the CMake changes are fine, but there are two questionable changes at least:

  • CUDATPL: when adding cuda as a library it ended up as -lcuda.lib on the command line which made nvcc look for cuda.lib.lib …
  • Since nvcc_wrapper doesn't work I had to add -x cu somehow, but it can't be on link lines so CMAKE_CXX_FLAGS didn't work. I hence added it as compile option, which doesn't forward though …

Here is my json cmake setup, note some nastiness here where I explicitly say -ccbin, CMAKE_LINKER and CMAKE_AR ...:

    {
      "name": "Cuda-Release",
      "generator": "Ninja",
      "configurationType": "RelWithDebInfo",
      "buildRoot": "${projectDir}\\out\\build\\${name}",
      "installRoot": "${projectDir}\\out\\install\\${name}",
      "cmakeCommandArgs": "-DCMAKE_C_COMPILER=nvcc -DCMAKE_CXX_COMPILER=nvcc -DCMAKE_CXX_FLAGS=\"-ccbin \\\"C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.25.28610\\bin\\HostX64\\x64\\\"\" -DCMAKE_C_FLAGS=\"-arch=sm_70 -ccbin \\\"C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.25.28610\\bin\\HostX64\\x64\\\" -I\\\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2\\include\\\" \" -DCMAKE_LINKER=\"C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/Llvm/bin/lld-link.exe\" -DCMAKE_AR=\"C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/Llvm/bin/llvm-ar.exe\"",
      "buildCommandArgs": "-v",
      "ctestCommandArgs": "",
      "variables": [
        {
          "name": "Kokkos_ARCH_SNB",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_LIBDL",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_PROFILING",
          "value": "False",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_TESTS",
          "value": "True",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ARCH_VOLTA70",
          "value": "True",
          "type": "BOOL"
        },
        {
          "name": "Kokkos_ENABLE_CUDA",
          "value": "True",
          "type": "BOOL"
        }
      ],
      "inheritEnvironments": []
    }

@crtrott crtrott added the [WIP] label May 7, 2020
)
IF(WIN32)
KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE
LINK_LIBRARIES kokkoscore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get that one

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a time to do a more extensive review yet - but I really think we need to discuss using FindCUDA or FindCUDAToolkit. I don't see any reason for us to re-engineer something built into CMake - particularly for Windows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. This is not intended to be committed, I just didn't have a good solution right away.

cmake/kokkos_compiler_id.cmake Outdated Show resolved Hide resolved
containers/performance_tests/CMakeLists.txt Outdated Show resolved Hide resolved
core/unit_test/CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@jrmadsen jrmadsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #define NOMINMAX before including the windows header will eliminate the need to remove std:: from all the min and max calls.

@crtrott
Copy link
Member Author

crtrott commented May 11, 2020

Using #define NOMINMAX before including the windows header will eliminate the need to remove std:: from all the min and max calls.

But then I might break downstream code which relies on windows behavior and includes Kokkos?

@crtrott crtrott removed the [WIP] label May 13, 2020
@crtrott
Copy link
Member Author

crtrott commented May 13, 2020

This includes #3028

@crtrott
Copy link
Member Author

crtrott commented May 15, 2020

I think this is good to go.

cmake/Modules/FindTPLCUDA.cmake Outdated Show resolved Hide resolved
Copy link

@jjwilke jjwilke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do we want to squash, though? Looks like a lot of small intermediate commits.

.gitignore Show resolved Hide resolved
@@ -66,7 +66,7 @@

namespace Kokkos {

enum { UnorderedMapInvalidIndex = ~0u };
enum : unsigned { UnorderedMapInvalidIndex = ~0u };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making this a static constexpr member variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but I ran into trouble.

@@ -534,7 +534,8 @@ struct CudaReductionsFunctor<FunctorType, ArgTag, false, true> {
__syncthreads();
unsigned int num_teams_done = 0;
if (threadIdx.x + threadIdx.y == 0) {
num_teams_done = Kokkos::atomic_fetch_add(global_flags, 1) + 1;
num_teams_done =
Kokkos::atomic_fetch_add(global_flags, (unsigned int)1) + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it not compile w/o casting? Also probably deduce the type pointed to by global_flags if it is really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it did compile but gave tons of warnings in the windows build (hundreds).

#include "impl/Kokkos_Atomic_Generic.hpp"

#ifndef _WIN32
//#ifndef _WIN32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment instead of removing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover because I wasn't sure this worked.

Copy link

@dhollman dhollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes are requests for documentation or stylistic. But mostly LGTM.

In general, though, when we make these sorts of changes where we change something that should work fine according to the standard but doesn't because of a bug in a specific compiler:

  • we should do everything we can to avoid forking the code on a preprocessor macro. I have a more detailed argument about this in one of my comments, but basically, I think a default stance of "let's fork the code because we needed a change for this compiler and we don't want to hurt compilation times or complicate things on other compilers" is dangerous and severely hurts maintainability. It's basically us asking future maintainers of the code to edit things in two places because we were too lazy to test whether the updated solution worked on all compilers we support. I understand wanting to make the smallest change possible to get things working, but that sort of mentality can also build technical debt pretty rapidly.
  • these changes should be documented with what compiler it is a workaround for, what issue it addresses, how it addresses the issue, and perhaps a code snippet of the way we did it before that didn't work. This will make it much easier in the future to understand why we wrote things the way we did (especially when we're doing things like writing new backends based on old ones, as I'm sure @dalg24 and friends can attest to), will keep someone from changing things back (or at least give them an idea of what to check before making such changes), and give us an understanding of why we do things a certain way (and potentially when we can stop doing them that way in the future, if we ever decide we want to). I think we're well past the point where we can just make arcane and minor changes to the minutia of C++ usage in Kokkos without documenting the reason for the change. So much of the technical debt in Kokkos comes from us not doing this before.

Comment on lines +194 to +195
t_dev d_view;
t_host h_view;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I'm not sure I understand this. t_modified_flags and t_modified_flag should have the same size and alignment as t_dev and t_host in most cases I can think of, so I don't know how this would change things. Maybe elaborating on the issue in the comment might help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t_modified_flags is not the same size as t_dev. t_modified_flags has only ever static extents, while t_dev and t_host have whatever the user requested.

@@ -66,7 +66,7 @@

namespace Kokkos {

enum { UnorderedMapInvalidIndex = ~0u };
enum : unsigned { UnorderedMapInvalidIndex = ~0u };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay seriously, do we know of any compilers that support underlying types for anonymous enums and not constexpr variables? IIRC underlying enum types was a late-implemented C++11 feature in many cases. This just seems a little ridiculous at this point. Without a good reason not to, I would strongly prefer we change this to:

Suggested change
enum : unsigned { UnorderedMapInvalidIndex = ~0u };
constexpr unsigned UnorderedMapInvalidIndex = ~0u;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalg24 thoughts? 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and it failed horribly ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my failure came from in-class enums which need c++17 in order to be inline initialized. So we could change this. But I rather have a separate PR which systematically goes through all enums, potentially replacing in-class ones dependent on C++17 (ifdefing).

@@ -264,7 +264,7 @@ class UnorderedMap {
//@}

private:
enum { invalid_index = ~static_cast<size_type>(0) };
enum : size_type { invalid_index = ~static_cast<size_type>(0) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum : size_type { invalid_index = ~static_cast<size_type>(0) };
static constexpr auto invalid_index = ~static_cast<size_type>(0);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requires C++17

@@ -55,6 +55,7 @@
#endif

namespace Test {
using namespace std;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in the tests, please put this at function scope.

containers/unit_tests/TestErrorReporter.hpp Show resolved Hide resolved
Comment on lines +50 to +56
#include <default/TestDefaultDeviceType_Category.hpp>

namespace Test {

TEST(defaultdevicetype, development_test) {}

} // namespace Test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really prefer a more formal CMake option that's something like KOKKOS_ENABLE_SEPARATE_TESTS rather than this ad-hoc work flow solution of copy-pasting failing tests into a file that corresponds to a single target. If what you wanted was for each test file to be a single target, there should just be a way to do that, but this feels like the wrong way to go about solving the problem (though I'm glad that now that you have a workflow that matches mine a little more closely, you finally acknowledge that it's a problem :-D). @dalg24 and @jjwilke thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually also want to use this to simply test code fast. Setting up new projects in Visual studio which actually work (in particular for CUDA) is a horrendous pain. This is a super fast way of giving me something to build where I can stick code.

Comment on lines +55 to +56
using value_type = double;
int num_elements = 10;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another example of why aligning equals signs is problematic. Removing a line causes the diff to incorrectly show modifications to multiple lines (Because git can't tell that this is just a whitespace change since the whitespace comes in the middle of the line). Just a nit pick; don't mind me 🙄


ParallelForFunctor(value_type *data) : _data(data) {}
ParallelForFunctor(value_type *data, const value_type value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const on a parameter type here?


KOKKOS_INLINE_FUNCTION
void operator()(const int i) const { _data[i] = (i + 1) * value; }
void operator()(const int i) const { _data[i] = (i + 1) * _value; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you have just changed value to be constexpr? Not a big deal; just seems like that was the fix probably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope didn't work. Was the first thing I tried ...

crtrott and others added 11 commits May 15, 2020 17:10
- A number of warnings are fixed due to differently signed enums.
- Defaulted functions don't seem to work in some cases (ViewMapping)
- And some nasty thing with variadic template inheritance
  - needed to specify template aliases to avoid compiler confusion
- Some atomics include changes
WINDOWS CUDA SUpport: fix typo

Fix MSVC build again.
Fix warnings on Windows: largely enums were made int instead of what the type of the assigned value is, so need to be more eplicit.

revert a setting of enums.

Fix some missing parenthesis and formatting
Move the add of -x cu in the right place.

Fix typo in CUDA TPL discovery.

Addressing review comments.
@crtrott
Copy link
Member Author

crtrott commented May 16, 2020

I pushed a rebase.

@crtrott crtrott mentioned this pull request May 16, 2020
Copy link

@dhollman dhollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the things I need to change, LGTM

core/src/impl/Kokkos_ViewCtor.hpp Show resolved Hide resolved
Copy link

@dhollman dhollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2020

Codecov Report

Merging #3018 into develop will increase coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop   #3018    +/-   ##
========================================
  Coverage     82.5%   82.6%            
========================================
  Files          122     122            
  Lines         7954    8093   +139     
========================================
+ Hits          6568    6690   +122     
- Misses        1386    1403    +17     
Flag Coverage Δ
#clang 81.4% <100.0%> (+<0.1%) ⬆️
#gcc 82.9% <100.0%> (+0.1%) ⬆️
Impacted Files Coverage Δ
containers/src/Kokkos_UnorderedMap.hpp 97.6% <ø> (ø)
core/src/impl/Kokkos_FunctorAdapter.hpp 100.0% <ø> (ø)
core/src/impl/Kokkos_ViewLayoutTiled.hpp 91.9% <ø> (ø)
core/src/impl/Kokkos_ViewMapping.hpp 90.7% <ø> (-1.9%) ⬇️
containers/src/Kokkos_DualView.hpp 77.0% <100.0%> (ø)
core/src/impl/Kokkos_ViewCtor.hpp 100.0% <100.0%> (ø)
core/src/impl/Kokkos_Atomic_View.hpp 87.5% <0.0%> (-12.5%) ⬇️
core/src/Kokkos_NumericTraits.hpp 94.7% <0.0%> (-5.3%) ⬇️
core/src/Kokkos_CopyViews.hpp 41.8% <0.0%> (-0.5%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4737705...c372282. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants