Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda: Fix nvcc warnings #7021

Merged
merged 12 commits into from
Jun 12, 2024
Merged

Conversation

masterleinad
Copy link
Contributor

Fixes #6991. Currently, we only turn warnings into errors. We should incorporate all the fixes for the warnings here as well, though.

@dalg24
Copy link
Member

dalg24 commented May 22, 2024

Retest this please

@dalg24 dalg24 added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Failure - Nightly Nightly Build Failure Backend - CUDA Continuous Integration labels May 22, 2024
.jenkins Outdated Show resolved Hide resolved
.jenkins Outdated Show resolved Hide resolved
.jenkins Outdated Show resolved Hide resolved
.jenkins Outdated Show resolved Hide resolved
@masterleinad masterleinad force-pushed the use_werror_for_cuda branch 3 times, most recently from 47b41bd to 9c09b07 Compare May 29, 2024 15:09
@@ -345,7 +345,7 @@ pipeline {
sh '''rm -rf build && mkdir -p build && cd build && \
../gnu_generate_makefile.bash \
--with-options=compiler_warnings \
--cxxflags="-Werror -Werror all-warnings" \
--cxxflags="-Werror -Werror all-warnings -Xcudafe --diag_suppress=20208" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to move pragma push into the Complex.hpp to avoid seeing the warning. At this point, it seems to be better to just disable the warning for this build in the compiler flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does all_warnings accompany the second -Werror? I'm really asking for clarification on why -Werror is needed twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Werror is for the host compiler and -Werror all-warnings is for the device compiler.

@@ -186,7 +186,7 @@ class Cuda {
///
/// This matches the __CUDA_ARCH__ specification.
KOKKOS_DEPRECATED static size_type device_arch() {
const cudaDeviceProp& cudaProp = Cuda().cuda_device_prop();
const cudaDeviceProp cudaProp = Cuda().cuda_device_prop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What warning was raised here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dangling reference

@@ -4910,7 +4910,7 @@ class NeverThrown {
class GTEST_TEST_CLASS_NAME_(test_suite_name, test_name) \
: public parent_class { \
public: \
GTEST_TEST_CLASS_NAME_(test_suite_name, test_name)() = default; \
GTEST_TEST_CLASS_NAME_(test_suite_name, test_name)() { (void)test_info_; }\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please reference issues or rejected prs on googletest that show that there is no interest on resolving this upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google/googletest#4104 (comment) mentions that nvcc is an unsupported compiler (in a slightly different context).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be a problem if we later want to update gtest ?

What do you think of pre-installing all Kokkos dependencies in the CI so that we use the flags we want solely on Kokkos ? We could then compile gtest with the host compiler ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be a problem if we later want to update gtest ?

We will notice if this is still a problem or not.

What do you think of pre-installing all Kokkos dependencies in the CI so that we use the flags we want solely on Kokkos ? We could then compile gtest with the host compiler ?

This is not about installing gtest but rather using it. We compiler warnings when we define unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about installing gtest but rather using it. We compiler warnings when we define unit tests.

I had in mind that using an installed gtest would treat gtest include header path as system path thus removing the warnings ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try that (in a separate pull request); it's certainly a bigger change than changing this one line.

@masterleinad
Copy link
Contributor Author

Retest this please.

@Rombur
Copy link
Member

Rombur commented Jun 3, 2024

Retest this please

@masterleinad
Copy link
Contributor Author

The only errors in Cuda builds are

var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(76): error #177-D: function "<unnamed>::TestTeamThreadMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced
/var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(136): error #177-D: function "<unnamed>::TestTeamVectorMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced
/var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(196): error #177-D: function "<unnamed>::TestThreadVectorMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced

fixed by #7049.

@masterleinad masterleinad marked this pull request as ready for review June 3, 2024 19:14
@masterleinad
Copy link
Contributor Author

Rebased after #7049 was merged.

Comment on lines +24 to +35
// Suppress "'long double' is treated as 'double' in device code"
#ifdef KOKKOS_COMPILER_NVCC
#ifdef __NVCC_DIAG_PRAGMA_SUPPORT__
#pragma nv_diagnostic push
#pragma nv_diag_suppress 20208
#else
#ifdef __CUDA_ARCH__
#pragma diagnostic push
#pragma diag_suppress 20208
#endif
#endif
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that is not the right approach for that test.
We already have all sorts of guards for long double in these, I feel like we should consider reworking them like we did for the math function (I don;t remember the details but I see that it is not part of this PR meaning we somehow resolved it without suppressing the diagnostic altogether)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just disable all tests for long double when we compile with Cuda support (which is what we are doing for other configurations) but these seem to work fine after suppressing the warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the long double tests on CUDA already effectively switched off, in that they are run as double in CUDA device code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything that uses TestNumericTraits at least, see

#if (defined(KOKKOS_COMPILER_NVCC) && defined(KOKKOS_ENABLE_CUDA)) || \
defined(KOKKOS_ENABLE_SYCL) || defined(KOKKOS_ENABLE_OPENMPTARGET)
template <class Tag>
struct TestNumericTraits<
#if defined(KOKKOS_ENABLE_CUDA)
Kokkos::Cuda,
#elif defined(KOKKOS_ENABLE_SYCL)
Kokkos::Experimental::SYCL,
#else
Kokkos::Experimental::OpenMPTarget,
#endif
long double, Tag> {
template <class T>
using trait = typename Tag::template trait<T>;
TestNumericTraits() {
(void)take_address_of(trait<long double>::value);
// Do nothing on the device.
// According to the doc
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-variables
// the traits member constant value cannot be directly used in device code.
}
};
#endif
.

@masterleinad
Copy link
Contributor Author

Only 'HIP-ROCm-5.6-C++20' is failing.

Copy link
Collaborator

@aprokop aprokop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

Copy link
Contributor

@ajpowelsnl ajpowelsnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

@@ -345,7 +345,7 @@ pipeline {
sh '''rm -rf build && mkdir -p build && cd build && \
../gnu_generate_makefile.bash \
--with-options=compiler_warnings \
--cxxflags="-Werror -Werror all-warnings" \
--cxxflags="-Werror -Werror all-warnings -Xcudafe --diag_suppress=20208" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does all_warnings accompany the second -Werror? I'm really asking for clarification on why -Werror is needed twice.

Comment on lines +24 to +35
// Suppress "'long double' is treated as 'double' in device code"
#ifdef KOKKOS_COMPILER_NVCC
#ifdef __NVCC_DIAG_PRAGMA_SUPPORT__
#pragma nv_diagnostic push
#pragma nv_diag_suppress 20208
#else
#ifdef __CUDA_ARCH__
#pragma diagnostic push
#pragma diag_suppress 20208
#endif
#endif
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the long double tests on CUDA already effectively switched off, in that they are run as double in CUDA device code?

@masterleinad
Copy link
Contributor Author

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

The changes here are sufficient for the configurations in the CI. We can discuss other configurations in the original issue or a follow-up. nvcc 11.0 in particular will raise quite a bit more warnings and it's not clear to me if that's worth putting a lot of effort in (which might make code less readable).

@ajpowelsnl
Copy link
Contributor

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

The changes here are sufficient for the configurations in the CI. We can discuss other configurations in the original issue or a follow-up. nvcc 11.0 in particular will raise quite a bit more warnings and it's not clear to me if that's worth putting a lot of effort in (which might make code less readable).

OK, sounds good. I'll see if there are any more "low hanging fruit" fixes to do in the issue.

@crtrott crtrott merged commit 660136f into kokkos:develop Jun 12, 2024
28 of 29 checks passed
@ndellingwood
Copy link
Contributor

Nice fixes :)
I added a changelog entry for 4.4 under General Enhancements, but please move elsewhere if deemed more appropriate (Bug Fixes?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - CUDA Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Continuous Integration Failure - Nightly Nightly Build Failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Several warnings in cuda builds
8 participants