Cuda: Fix nvcc warnings #7021

masterleinad · 2024-05-21T14:49:37Z

Fixes #6991. Currently, we only turn warnings into errors. We should incorporate all the fixes for the warnings here as well, though.

dalg24 · 2024-05-22T12:14:56Z

Retest this please

.jenkins

masterleinad · 2024-05-31T19:21:04Z

.jenkins

@@ -345,7 +345,7 @@ pipeline {
                        sh '''rm -rf build && mkdir -p build && cd build && \
                              ../gnu_generate_makefile.bash \
                                --with-options=compiler_warnings \
-                                --cxxflags="-Werror -Werror all-warnings" \
+                                --cxxflags="-Werror -Werror all-warnings -Xcudafe --diag_suppress=20208" \


I needed to move pragma push into the Complex.hpp to avoid seeing the warning. At this point, it seems to be better to just disable the warning for this build in the compiler flags.

Does all_warnings accompany the second -Werror? I'm really asking for clarification on why -Werror is needed twice.

-Werror is for the host compiler and -Werror all-warnings is for the device compiler.

dalg24 · 2024-05-31T19:45:19Z

core/src/Cuda/Kokkos_Cuda.hpp

@@ -186,7 +186,7 @@ class Cuda {
  ///
  /// This matches the __CUDA_ARCH__ specification.
  KOKKOS_DEPRECATED static size_type device_arch() {
-    const cudaDeviceProp& cudaProp = Cuda().cuda_device_prop();
+    const cudaDeviceProp cudaProp = Cuda().cuda_device_prop();


What warning was raised here?

dangling reference

dalg24 · 2024-05-31T19:47:36Z

tpls/gtest/gtest/gtest.h

@@ -4910,7 +4910,7 @@ class NeverThrown {
  class GTEST_TEST_CLASS_NAME_(test_suite_name, test_name)                    \
      : public parent_class {                                                 \
   public:                                                                    \
-    GTEST_TEST_CLASS_NAME_(test_suite_name, test_name)() = default;           \
+    GTEST_TEST_CLASS_NAME_(test_suite_name, test_name)() { (void)test_info_; }\


Could you please reference issues or rejected prs on googletest that show that there is no interest on resolving this upstream?

google/googletest#4104 (comment) mentions that nvcc is an unsupported compiler (in a slightly different context).

Could it be a problem if we later want to update gtest ?

What do you think of pre-installing all Kokkos dependencies in the CI so that we use the flags we want solely on Kokkos ? We could then compile gtest with the host compiler ?

Could it be a problem if we later want to update gtest ?

We will notice if this is still a problem or not.

What do you think of pre-installing all Kokkos dependencies in the CI so that we use the flags we want solely on Kokkos ? We could then compile gtest with the host compiler ?

This is not about installing gtest but rather using it. We compiler warnings when we define unit tests.

This is not about installing gtest but rather using it. We compiler warnings when we define unit tests.

I had in mind that using an installed gtest would treat gtest include header path as system path thus removing the warnings ?

We could try that (in a separate pull request); it's certainly a bigger change than changing this one line.

masterleinad · 2024-05-31T21:12:07Z

Retest this please.

Rombur · 2024-06-03T13:49:42Z

Retest this please

masterleinad · 2024-06-03T19:14:19Z

The only errors in Cuda builds are

var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(76): error #177-D: function "<unnamed>::TestTeamThreadMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced
/var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(136): error #177-D: function "<unnamed>::TestTeamVectorMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced
/var/jenkins/workspace/Kokkos_PR-7021/core/unit_test/TestTeamMDRangePolicyCTAD.cpp(196): error #177-D: function "<unnamed>::TestThreadVectorMDRangeCTAD::test_ctad_inside_parallel_for" was declared but never referenced

fixed by #7049.

… to help

masterleinad · 2024-06-04T12:54:59Z

Rebased after #7049 was merged.

dalg24 · 2024-06-04T13:28:01Z

core/unit_test/TestNumericTraits.hpp

+// Suppress "'long double' is treated as 'double' in device code"
+#ifdef KOKKOS_COMPILER_NVCC
+#ifdef __NVCC_DIAG_PRAGMA_SUPPORT__
+#pragma nv_diagnostic push
+#pragma nv_diag_suppress 20208
+#else
+#ifdef __CUDA_ARCH__
+#pragma diagnostic push
+#pragma diag_suppress 20208
+#endif
+#endif
+#endif


I feel that is not the right approach for that test.
We already have all sorts of guards for long double in these, I feel like we should consider reworking them like we did for the math function (I don;t remember the details but I see that it is not part of this PR meaning we somehow resolved it without suppressing the diagnostic altogether)

We could just disable all tests for long double when we compile with Cuda support (which is what we are doing for other configurations) but these seem to work fine after suppressing the warning.

Are the long double tests on CUDA already effectively switched off, in that they are run as double in CUDA device code?

Yes, everything that uses TestNumericTraits at least, see

kokkos/core/unit_test/TestNumericTraits.hpp

Lines 182 to 204 in 63f0520

#if (defined(KOKKOS_COMPILER_NVCC) && defined(KOKKOS_ENABLE_CUDA)) || \

defined(KOKKOS_ENABLE_SYCL) || defined(KOKKOS_ENABLE_OPENMPTARGET)

template <class Tag>

struct TestNumericTraits<

#if defined(KOKKOS_ENABLE_CUDA)

Kokkos::Cuda,

#elif defined(KOKKOS_ENABLE_SYCL)

Kokkos::Experimental::SYCL,

#else

Kokkos::Experimental::OpenMPTarget,

#endif

long double, Tag> {

template <class T>

using trait = typename Tag::template trait<T>;

TestNumericTraits() {

(void)take_address_of(trait<long double>::value);

// Do nothing on the device.

// According to the doc

// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-variables

// the traits member constant value cannot be directly used in device code.

}

};

#endif

.

masterleinad · 2024-06-04T15:15:48Z

Only 'HIP-ROCm-5.6-C++20' is failing.

aprokop

Looks fine to me.

ajpowelsnl

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

ajpowelsnl · 2024-06-11T17:54:42Z

.jenkins

@@ -345,7 +345,7 @@ pipeline {
                        sh '''rm -rf build && mkdir -p build && cd build && \
                              ../gnu_generate_makefile.bash \
                                --with-options=compiler_warnings \
-                                --cxxflags="-Werror -Werror all-warnings" \
+                                --cxxflags="-Werror -Werror all-warnings -Xcudafe --diag_suppress=20208" \


Does all_warnings accompany the second -Werror? I'm really asking for clarification on why -Werror is needed twice.

ajpowelsnl · 2024-06-11T18:36:54Z

core/unit_test/TestNumericTraits.hpp

+// Suppress "'long double' is treated as 'double' in device code"
+#ifdef KOKKOS_COMPILER_NVCC
+#ifdef __NVCC_DIAG_PRAGMA_SUPPORT__
+#pragma nv_diagnostic push
+#pragma nv_diag_suppress 20208
+#else
+#ifdef __CUDA_ARCH__
+#pragma diagnostic push
+#pragma diag_suppress 20208
+#endif
+#endif
+#endif


Are the long double tests on CUDA already effectively switched off, in that they are run as double in CUDA device code?

masterleinad · 2024-06-11T18:49:55Z

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

The changes here are sufficient for the configurations in the CI. We can discuss other configurations in the original issue or a follow-up. nvcc 11.0 in particular will raise quite a bit more warnings and it's not clear to me if that's worth putting a lot of effort in (which might make code less readable).

ajpowelsnl · 2024-06-11T22:48:56Z

These fixes look fine, however, questions remain about more (potentially) complicated fixes for warnings associated with methods in TestStdAlgorithmsCommon.hpp that appear in CUDA-11.2.x compilations -- do we want to fix warnings that will likely disappear with CUDA updates?

The changes here are sufficient for the configurations in the CI. We can discuss other configurations in the original issue or a follow-up. nvcc 11.0 in particular will raise quite a bit more warnings and it's not clear to me if that's worth putting a lot of effort in (which might make code less readable).

OK, sounds good. I'll see if there are any more "low hanging fruit" fixes to do in the issue.

ndellingwood · 2024-06-12T18:09:59Z

Nice fixes :)
I added a changelog entry for 4.4 under General Enhancements, but please move elsewhere if deemed more appropriate (Bug Fixes?)

dalg24 added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Failure - Nightly Nightly Build Failure Backend - CUDA Continuous Integration labels May 22, 2024

masterleinad commented May 22, 2024

View reviewed changes

.jenkins Outdated Show resolved Hide resolved

.jenkins Outdated Show resolved Hide resolved

dalg24 reviewed May 22, 2024

View reviewed changes

.jenkins Outdated Show resolved Hide resolved

.jenkins Outdated Show resolved Hide resolved

masterleinad force-pushed the use_werror_for_cuda branch 3 times, most recently from 47b41bd to 9c09b07 Compare May 29, 2024 15:09

masterleinad commented May 31, 2024

View reviewed changes

dalg24 reviewed May 31, 2024

View reviewed changes

crtrott approved these changes Jun 1, 2024

View reviewed changes

masterleinad force-pushed the use_werror_for_cuda branch from 8e60eb6 to bc4a6dd Compare June 1, 2024 12:06

masterleinad mentioned this pull request Jun 3, 2024

Nightly test failure, intel/19.0.5 icpc: TestTeamMDRangePolicyCTAD.cpp error #177: function "<unnamed>::TestTeamThreadMDRangeCTAD::test_ctad_inside_parallel_for" #7049

Closed

masterleinad marked this pull request as ready for review June 3, 2024 19:14

masterleinad added 11 commits June 4, 2024 08:50

Cuda: Fix nvcc warnings

e899550

Fix quotation marks in CXX flags

726a8f2

Fix kokkos_swap

1876867

Fix array size

d0d99bd

Fix gtest

13447c4

Fix .jenkins whitespce

5906cba

Only use -Werror all-warnings with explicit nvcc_wrapper

9d1842e

Fix dangling reference

0e88744

Suppress 'long double' is treated as 'double' in device code

2a15c75

Use -Xcudafe --diag_suppress=20208 for 11.6 build; nothing else seems…

5b0d945

… to help

Try moving pragma suppress to tests

1625ec2

Use -Xcudafe --diag_suppress=20208 in Makefile build

2c3fd02

masterleinad force-pushed the use_werror_for_cuda branch from bc4a6dd to 2c3fd02 Compare June 4, 2024 12:50

dalg24 reviewed Jun 4, 2024

View reviewed changes

aprokop mentioned this pull request Jun 6, 2024

nvcc warnings are not caught in testing arborx/ArborX#272

Open

aprokop approved these changes Jun 7, 2024

View reviewed changes

ajpowelsnl reviewed Jun 11, 2024

View reviewed changes

crtrott merged commit 660136f into kokkos:develop Jun 12, 2024
28 of 29 checks passed

ndellingwood mentioned this pull request Jun 12, 2024

CHANGELOG for 4.4 #6914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda: Fix nvcc warnings #7021

Cuda: Fix nvcc warnings #7021

masterleinad commented May 21, 2024

dalg24 commented May 22, 2024

masterleinad May 31, 2024

ajpowelsnl Jun 11, 2024

masterleinad Jun 11, 2024

dalg24 May 31, 2024

masterleinad May 31, 2024

dalg24 May 31, 2024

masterleinad May 31, 2024

tpadioleau Jun 4, 2024

masterleinad Jun 4, 2024

tpadioleau Jun 4, 2024

masterleinad Jun 4, 2024

masterleinad commented May 31, 2024

Rombur commented Jun 3, 2024

masterleinad commented Jun 3, 2024

masterleinad commented Jun 4, 2024

dalg24 Jun 4, 2024

masterleinad Jun 4, 2024

ajpowelsnl Jun 11, 2024 •

edited

masterleinad Jun 11, 2024

masterleinad commented Jun 4, 2024

aprokop left a comment

ajpowelsnl left a comment

ajpowelsnl Jun 11, 2024

ajpowelsnl Jun 11, 2024 •

edited

masterleinad commented Jun 11, 2024

ajpowelsnl commented Jun 11, 2024

ndellingwood commented Jun 12, 2024

	#if (defined(KOKKOS_COMPILER_NVCC) && defined(KOKKOS_ENABLE_CUDA)) \|\| \
	defined(KOKKOS_ENABLE_SYCL) \|\| defined(KOKKOS_ENABLE_OPENMPTARGET)
	template <class Tag>
	struct TestNumericTraits<
	#if defined(KOKKOS_ENABLE_CUDA)
	Kokkos::Cuda,
	#elif defined(KOKKOS_ENABLE_SYCL)
	Kokkos::Experimental::SYCL,
	#else
	Kokkos::Experimental::OpenMPTarget,
	#endif
	long double, Tag> {
	template <class T>
	using trait = typename Tag::template trait<T>;
	TestNumericTraits() {
	(void)take_address_of(trait<long double>::value);
	// Do nothing on the device.
	// According to the doc
	// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-variables
	// the traits member constant value cannot be directly used in device code.
	}
	};
	#endif

Cuda: Fix nvcc warnings #7021

Cuda: Fix nvcc warnings #7021

Conversation

masterleinad commented May 21, 2024

dalg24 commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented May 31, 2024

Rombur commented Jun 3, 2024

masterleinad commented Jun 3, 2024

masterleinad commented Jun 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajpowelsnl Jun 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Jun 4, 2024

aprokop left a comment

Choose a reason for hiding this comment

ajpowelsnl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajpowelsnl Jun 11, 2024 • edited

Choose a reason for hiding this comment

masterleinad commented Jun 11, 2024

ajpowelsnl commented Jun 11, 2024

ndellingwood commented Jun 12, 2024

ajpowelsnl Jun 11, 2024 •

edited

ajpowelsnl Jun 11, 2024 •

edited