-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract and use get_gpu() (kokkos/kokkos#3040, trilinos/Trilinos#6840) #3048
Conversation
This fixes the tests KokkosCore_UnitTest_DefaultInit_<x>_MPI_1 that fail when running with ctest GPU allocation feature that fail becuase they don't run on GPU device 0 (see kokkos#3040). Note: Since get_ctest_gpu() returns 0 if CTest has not provided anything, it is safe to always call it. This new function get_gpu() should really be unit tested on its own.
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## master #3048 +/- ##
========================================
- Coverage 80.1% 79.9% -0.3%
========================================
Files 121 121
Lines 7764 7785 +21
========================================
+ Hits 6220 6221 +1
- Misses 1544 1564 +20
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can be fixed in a simpler way.
Also, it would be good to target the develop
branch first.
@@ -273,7 +281,7 @@ void check_correct_initialization(const Kokkos::InitArguments& argstruct) { | |||
|
|||
int expected_device = argstruct.device_id; | |||
if (argstruct.device_id < 0) { | |||
expected_device = 0; | |||
expected_device = Kokkos::Impl::get_gpu(argstruct); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just do
expected_device = Kokkos::Impl::get_gpu(argstruct); | |
expected_device = Kokkos::Cuda().cuda_device(); |
instead? I think we can ask the execution space for the device id it was initialized with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give that a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to:
expected_device = Kokkos::Cuda().cuda_device();
seems to work as well. The updated commit is 0e51c8d
Even though not needed, I would argue that you should keep the refactoring to pull out the Kokkos::Impl::get_gpu()
. That function could be unit tested in isolation much better and make this more solid and robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function would only be used in this context to initialize the device (id) properly and this id is available via Kokkos::Cuda().cuda_device()
. Hence, I don't see a reason to test the former rather than the latter.
I don't feel strongly about pulling it out, though. This looks good enough for me.
The above coverage report does not make sense. How can the refactoring I did decrease code coverage? And why is it comparing to 'master' instead of 'develop' (the target if this PR)? |
…s#3040, trilinos/Trilinos#6840) This seems to also fix the KokkosCore_UnitTest_DefaultInit_<x>_MPI_1 tests that where failing when running with ctest GPU allocation feature that fail because they don't run on GPU device 0 (see kokkos#3040).
OK to test |
#endif | ||
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_ROCM) || \ | ||
defined(KOKKOS_ENABLE_HIP) | ||
int get_gpu(const InitArguments& args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you define it outside the anonymous namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed the example of get_ctest_gpu() that was also in that file:
kokkos/core/src/impl/Kokkos_Core.cpp
Line 80 in 0e51c8d
int get_ctest_gpu(const char* local_rank_str) {
Also, implementation functions like this should be independently unit tested. Having this in Kokkos::Impl instead of an anonymous namespace would allow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that's why get_ctest_gpu
is outside and we test it in core/unit_test/TestCTestDevice.cpp
using a forward declaration.
But get_gpu
does not have dedicated tests and can (and IMO should) be within that anonymous namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine to move git_gpu() into an anonymous namespace for now. When someone gets around to unit testing it, it can be moved back to the Kokkos::Impl namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not test this last change. I guess we will test the Kokkos PR testing :-)
@dalg24, does this trigger the Jenkins tester? |
Yes |
OK to test |
This fixes the tests KokkosCore_UnitTest_DefaultInit__MPI_1 that fail when
running with ctest GPU allocation feature that fail becuase they don't run on
GPU device 0 (see #3040).
Note: Since get_ctest_gpu() returns 0 if CTest has not provided anything, it
is safe to always call it.
This new function get_gpu() should really be unit tested on its own.