Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix atomic operations bug for Min and Max #6435

Merged
merged 7 commits into from
Oct 6, 2023

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Sep 12, 2023

This will fix atomic operations reported on slack, where atomic_min/max would give the wrong answer for a mix of negative/positive values. That was tracked down to a bug in the PTX for the atomic operations inside Desul, and not caught on our side due to test deficiencies.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about tpls/desul/include/desul/atomics/cuda/cuda_cc7_asm_atomic_op.inc_{forceglobal,generic}?

Also please confirm that we don't need to update the atomic_fetch_op headers

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a corresponding desul pull request.

@crtrott
Copy link
Member Author

crtrott commented Sep 14, 2023

desul/desul#109

@crtrott
Copy link
Member Author

crtrott commented Sep 14, 2023

Updated atomic_fetch ops too.

@masterleinad masterleinad dismissed their stale review September 21, 2023 12:08

The desul pull reqiuest has been merged.

@masterleinad
Copy link
Contributor

masterleinad commented Sep 21, 2023

There are still CI failures in the 32-bit build:

 RUN      ] openmp.atomic_operations_complexdouble
atomic_div failed with type N6Kokkos7complexIdEE
atomic_fetch_div failed with type N6Kokkos7complexIdEE
atomic_div_fetch failed with type N6Kokkos7complexIdEE
atomic_div_fetch did not return updated value with type N6Kokkos7complexIdEE
/__w/kokkos/kokkos/core/unit_test/TestAtomicOperations_complexdouble.hpp:40: Failure
Value of: (update != 0 ? atomic_op_test<DivAtomicTest, T, Kokkos::OpenMP>( old_val, update) : true)
  Actual: false
Expected: true
[  FAILED  ] openmp.atomic_operations_complexdouble (0 ms)
[ RUN      ] openmp.atomic_operations_complexfloat
atomic_div failed with type N6Kokkos7complexIfEE
atomic_fetch_div failed with type N6Kokkos7complexIfEE
atomic_div_fetch failed with type N6Kokkos7complexIfEE
atomic_div_fetch did not return updated value with type N6Kokkos7complexIfEE
/__w/kokkos/kokkos/core/unit_test/TestAtomicOperations_complexfloat.hpp:35: Failure
Value of: (update != 0 ? atomic_op_test<DivAtomicTest, T, Kokkos::OpenMP>( old_val, update) : true)
  Actual: false
Expected: true
[  FAILED  ] openmp.atomic_operations_complexfloat (0 ms)
[ RUN      ] openmp.atomic_operations_double
atomic_div failed with type d
atomic_fetch_div failed with type d
atomic_div_fetch failed with type d
atomic_div_fetch did not return updated value with type d
/__w/kokkos/kokkos/core/unit_test/TestAtomicOperations_double.hpp:25: Failure
Value of: (TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP>(i, end - i + start, t))
  Actual: false
Expected: true
[  FAILED  ] openmp.atomic_operations_double (0 ms)
[ RUN      ] openmp.atomic_operations_float
atomic_div failed with type d
atomic_fetch_div failed with type d
atomic_div_fetch failed with type d
atomic_div_fetch did not return updated value with type d
/__w/kokkos/kokkos/core/unit_test/TestAtomicOperations_float.hpp:25: Failure
Value of: (TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP>(i, end - i + start, t))
  Actual: false
Expected: true
[  FAILED  ] openmp.atomic_operations_float (0 ms)

We could ignore that failure by disabling the test for 32bit, adding a FIXME, and move on, of course. There are other tests failing for Kokkos::complex<double> in the 32-bit build.

Comment on lines 41 to 42
// disable division test for 32bit where we have accuracy issues with
// division atomics still compile it though
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// disable division test for 32bit where we have accuracy issues with
// division atomics still compile it though
// FIXME_32BIT disable division test for 32bit where we have accuracy issues with
// division atomics, still compile it though

old_val, update)
: true));

// disable division test for 32bit where we have accuracy issues with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// disable division test for 32bit where we have accuracy issues with
// FIXME_32BIT disable division test for 32bit where we have accuracy issues with

@@ -22,8 +22,11 @@ TEST(TEST_CATEGORY, atomic_operations_double) {
const int end = 11;
for (int i = start; i < end; ++i) {
for (int t = 0; t < 8; t++)
ASSERT_TRUE((TestAtomicOperations::AtomicOperationsTestNonIntegralType<
double, TEST_EXECSPACE>(i, end - i + start, t)));
// disable division test for 32bit where we have accuracy issues with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// disable division test for 32bit where we have accuracy issues with
// FIXME_32BIT disable division test for 32bit where we have accuracy issues with

@@ -22,8 +22,11 @@ TEST(TEST_CATEGORY, atomic_operations_float) {
const int end = 11;
for (int i = start; i < end; ++i) {
for (int t = 0; t < 8; t++)
ASSERT_TRUE((TestAtomicOperations::AtomicOperationsTestNonIntegralType<
double, TEST_EXECSPACE>(i, end - i + start, t)));
// disable division test for 32bit where we have accuracy issues with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// disable division test for 32bit where we have accuracy issues with
// FIXME_32BIT disable division test for 32bit where we have accuracy issues with

@masterleinad
Copy link
Contributor

Fine with me if you add some FIXME_32BIT comments.


// disable division test for 32bit where we have accuracy issues with
// division atomics still compile it though
if (sizeof(void*) == 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kokkos/core/unit_test/TestAtomicOperations_complexdouble.hpp:43:8: error: suggest explicit braces to avoid ambiguous 'else' [-Werror=dangling-else]

     if (sizeof(void*) == 8)

        ^

@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Oct 4, 2023
Copy link
Contributor

@ldh4 ldh4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly looked at changes in TestAtomicOperations files. Those look fine to me.

@@ -205,4 +222,4 @@ __DESUL_IMPL_CUDA_ASM_ATOMIC_FETCH_BIN_OP()
#undef __DESUL_IMPL_CUDA_ASM_ATOMIC_FETCH_INC
#undef __DESUL_IMPL_CUDA_ASM_ATOMIC_FETCH_DEC
#undef __DESUL_IMPL_CUDA_ASM_ATOMIC_FETCH_AND

#undef __DESUL_IMPL_CUDA_ASM_ATOMIC_FETCH_BIN_OP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added? Was it just missing before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@crtrott crtrott merged commit 8181d70 into kokkos:develop Oct 6, 2023
27 of 28 checks passed
@crtrott crtrott deleted the fix-atomics branch October 6, 2023 14:44
@masterleinad masterleinad mentioned this pull request Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants