-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows CUDA support #3018
Windows CUDA support #3018
Conversation
cmake/Modules/FindTPLCUDA.cmake
Outdated
) | ||
IF(WIN32) | ||
KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE | ||
LINK_LIBRARIES kokkoscore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a time to do a more extensive review yet - but I really think we need to discuss using FindCUDA or FindCUDAToolkit. I don't see any reason for us to re-engineer something built into CMake - particularly for Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above. This is not intended to be committed, I just didn't have a good solution right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using #define NOMINMAX
before including the windows header will eliminate the need to remove std::
from all the min and max calls.
But then I might break downstream code which relies on windows behavior and includes Kokkos? |
This includes #3028 |
I think this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Do we want to squash, though? Looks like a lot of small intermediate commits.
@@ -66,7 +66,7 @@ | |||
|
|||
namespace Kokkos { | |||
|
|||
enum { UnorderedMapInvalidIndex = ~0u }; | |||
enum : unsigned { UnorderedMapInvalidIndex = ~0u }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider making this a static constexpr member variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but I ran into trouble.
@@ -534,7 +534,8 @@ struct CudaReductionsFunctor<FunctorType, ArgTag, false, true> { | |||
__syncthreads(); | |||
unsigned int num_teams_done = 0; | |||
if (threadIdx.x + threadIdx.y == 0) { | |||
num_teams_done = Kokkos::atomic_fetch_add(global_flags, 1) + 1; | |||
num_teams_done = | |||
Kokkos::atomic_fetch_add(global_flags, (unsigned int)1) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it not compile w/o casting? Also probably deduce the type pointed to by global_flags
if it is really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it did compile but gave tons of warnings in the windows build (hundreds).
core/src/Kokkos_Atomic.hpp
Outdated
#include "impl/Kokkos_Atomic_Generic.hpp" | ||
|
||
#ifndef _WIN32 | ||
//#ifndef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you comment instead of removing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover because I wasn't sure this worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most changes are requests for documentation or stylistic. But mostly LGTM.
In general, though, when we make these sorts of changes where we change something that should work fine according to the standard but doesn't because of a bug in a specific compiler:
- we should do everything we can to avoid forking the code on a preprocessor macro. I have a more detailed argument about this in one of my comments, but basically, I think a default stance of "let's fork the code because we needed a change for this compiler and we don't want to hurt compilation times or complicate things on other compilers" is dangerous and severely hurts maintainability. It's basically us asking future maintainers of the code to edit things in two places because we were too lazy to test whether the updated solution worked on all compilers we support. I understand wanting to make the smallest change possible to get things working, but that sort of mentality can also build technical debt pretty rapidly.
- these changes should be documented with what compiler it is a workaround for, what issue it addresses, how it addresses the issue, and perhaps a code snippet of the way we did it before that didn't work. This will make it much easier in the future to understand why we wrote things the way we did (especially when we're doing things like writing new backends based on old ones, as I'm sure @dalg24 and friends can attest to), will keep someone from changing things back (or at least give them an idea of what to check before making such changes), and give us an understanding of why we do things a certain way (and potentially when we can stop doing them that way in the future, if we ever decide we want to). I think we're well past the point where we can just make arcane and minor changes to the minutia of C++ usage in Kokkos without documenting the reason for the change. So much of the technical debt in Kokkos comes from us not doing this before.
t_dev d_view; | ||
t_host h_view; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I'm not sure I understand this. t_modified_flags
and t_modified_flag
should have the same size and alignment as t_dev
and t_host
in most cases I can think of, so I don't know how this would change things. Maybe elaborating on the issue in the comment might help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t_modified_flags is not the same size as t_dev. t_modified_flags has only ever static extents, while t_dev and t_host have whatever the user requested.
@@ -66,7 +66,7 @@ | |||
|
|||
namespace Kokkos { | |||
|
|||
enum { UnorderedMapInvalidIndex = ~0u }; | |||
enum : unsigned { UnorderedMapInvalidIndex = ~0u }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay seriously, do we know of any compilers that support underlying types for anonymous enums and not constexpr
variables? IIRC underlying enum types was a late-implemented C++11 feature in many cases. This just seems a little ridiculous at this point. Without a good reason not to, I would strongly prefer we change this to:
enum : unsigned { UnorderedMapInvalidIndex = ~0u }; | |
constexpr unsigned UnorderedMapInvalidIndex = ~0u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalg24 thoughts? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this and it failed horribly ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my failure came from in-class enums which need c++17 in order to be inline initialized. So we could change this. But I rather have a separate PR which systematically goes through all enums, potentially replacing in-class ones dependent on C++17 (ifdefing).
@@ -264,7 +264,7 @@ class UnorderedMap { | |||
//@} | |||
|
|||
private: | |||
enum { invalid_index = ~static_cast<size_type>(0) }; | |||
enum : size_type { invalid_index = ~static_cast<size_type>(0) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum : size_type { invalid_index = ~static_cast<size_type>(0) }; | |
static constexpr auto invalid_index = ~static_cast<size_type>(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this requires C++17
@@ -55,6 +55,7 @@ | |||
#endif | |||
|
|||
namespace Test { | |||
using namespace std; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in the tests, please put this at function scope.
#include <default/TestDefaultDeviceType_Category.hpp> | ||
|
||
namespace Test { | ||
|
||
TEST(defaultdevicetype, development_test) {} | ||
|
||
} // namespace Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really prefer a more formal CMake option that's something like KOKKOS_ENABLE_SEPARATE_TESTS
rather than this ad-hoc work flow solution of copy-pasting failing tests into a file that corresponds to a single target. If what you wanted was for each test file to be a single target, there should just be a way to do that, but this feels like the wrong way to go about solving the problem (though I'm glad that now that you have a workflow that matches mine a little more closely, you finally acknowledge that it's a problem :-D). @dalg24 and @jjwilke thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually also want to use this to simply test code fast. Setting up new projects in Visual studio which actually work (in particular for CUDA) is a horrendous pain. This is a super fast way of giving me something to build where I can stick code.
using value_type = double; | ||
int num_elements = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another example of why aligning equals signs is problematic. Removing a line causes the diff to incorrectly show modifications to multiple lines (Because git can't tell that this is just a whitespace change since the whitespace comes in the middle of the line). Just a nit pick; don't mind me 🙄
|
||
ParallelForFunctor(value_type *data) : _data(data) {} | ||
ParallelForFunctor(value_type *data, const value_type value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
on a parameter type here?
|
||
KOKKOS_INLINE_FUNCTION | ||
void operator()(const int i) const { _data[i] = (i + 1) * value; } | ||
void operator()(const int i) const { _data[i] = (i + 1) * _value; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you have just changed value
to be constexpr
? Not a big deal; just seems like that was the fix probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope didn't work. Was the first thing I tried ...
core/unit_test/incremental/Test06_ParallelFor_MDRangePolicy.hpp
Outdated
Show resolved
Hide resolved
- A number of warnings are fixed due to differently signed enums. - Defaulted functions don't seem to work in some cases (ViewMapping) - And some nasty thing with variadic template inheritance - needed to specify template aliases to avoid compiler confusion - Some atomics include changes
WINDOWS CUDA SUpport: fix typo Fix MSVC build again.
…MSVC + fix for DualView alignment
Fix warnings on Windows: largely enums were made int instead of what the type of the assigned value is, so need to be more eplicit. revert a setting of enums. Fix some missing parenthesis and formatting
Move the add of -x cu in the right place. Fix typo in CUDA TPL discovery. Addressing review comments.
I pushed a rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the things I need to change, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## develop #3018 +/- ##
========================================
Coverage 82.5% 82.6%
========================================
Files 122 122
Lines 7954 8093 +139
========================================
+ Hits 6568 6690 +122
- Misses 1386 1403 +17
Continue to review full report at Codecov.
|
These changes allow a CUDA build on Windows to succeed (the tests don't pass yet though).
The source file changes are largely fine I believe. Some warnings fixes, std::min on windows not working (msvc has a macro min) and some variadic template inheritance stuff.
Some of the CMake changes are fine, but there are two questionable changes at least:
Here is my json cmake setup, note some nastiness here where I explicitly say -ccbin, CMAKE_LINKER and CMAKE_AR ...: