Skip to content

Commit

Permalink
OpenACC: add atomics support (kokkos#6446)
Browse files Browse the repository at this point in the history
* Initial OpenACC atomic construct implementation.

* Partially fixed bugs in the OpenACC atomic implementations.

* First working version, where general atomic implementations work only on a sequential host

* Update Unit Test CMake

* Update as suggested by code review:
- Remove const_cast()
- Change Kokkos::abort() with printf()
- Add FIXME_OPENACC comment.

* Disable unsupporte OpenACC atomic tests.
(OpenACC C/C++ does not support atomic max/min/mod operations)
Disable TestOpenACC_BitManipulationBuiltins for OpenACC due to errors.

* Apply ClangFormat

* Disable unsupported unit tests when by old NVHPC compilers (V22.5 or older).

* Apply ClangFormat

* Update tpls/desul/include/desul/atomics/Fetch_Op_OpenACC.hpp

Co-authored-by: Damien L-G <dalg24+github@gmail.com>

* Restore unit tests that were disabled for old NVHPC compilers (V22.5 or older)

* Update unit test CMakeLists.txt to include unit tests enabled by this
PR.

* Change the minimum version of the NVHPC compiler from 22.3 to 22.9 for the OpenACC backend.

* Changed the way to guard unsupported atomic tests for the OpenACC
backend.

* Remove unnecessary guarding on unsupported atomic tests for the OpenACC
backend.

* Minor updates according to the code review.

* Changed the supported-type-checking code from macro to alias template as
suggested by the code review.

* Undo changing the minimim required version of NVHPC.

* Apply suggestions from code review

Co-authored-by: Damien L-G <dalg24+github@gmail.com>

* Change the KOKKOS_COMPILER_NVHPC macro to __NVCOMPILER

* Rename a variable's name from `tmp` to `old in atomic_fetch_*()
functions.

* Change `ptr[0]` to `*ptr` as suggested by the code review.

* Add FIXME comments in `device_atomic_thread_fence()`.

* Moved definitions into the desul::Impl namespace as suggested by the
code review.

* Clean up the OpenACC atomic implementations.
Re-enable atomic max/min tests for OpenACC.

* Fix a typo (sid => std)

* Minor bug fix on OpenACC

* Update .jenkins to the latest.

* Test this please

* Try again

* Fix typo

* Deal with conflicts

* Disable complex float atomic unit tests for OpenACC backend

* Sync with PR opened on the desul side

* DO NOT MERGE disable all CI but OpenACC

* - Disable atomic-fetch-shift tests for NVHPC OpenACC compiler, which fail
due to compiler bugs, which are reported to NVIDIA.
- Change the values of start and end variables in
TestAtomicOperations_double.hpp and TestAtomicOperations_float.hpp so
that atomic-division tests calculate trivial divisions. (In the original
tests, NVHPC compiler failed since device results are slightly different
from the host results due to precision mismatch.)

* Add atomic_op_test)rel() to TestAtomicOperations.hpp, which compares the
host and device atomic operations using a relative error.

* Revert "DO NOT MERGE disable all CI but OpenACC"

This reverts commit 18132bf.

* [desul_atomics] Fixup Kokkos -> DESUL in error message with OpenACC

---------

Co-authored-by: Seyong Lee <lees2@ornl.gov>
  • Loading branch information
dalg24 and seyonglee committed Nov 9, 2023
1 parent fb73a73 commit 97a90d5
Show file tree
Hide file tree
Showing 17 changed files with 855 additions and 20 deletions.
6 changes: 6 additions & 0 deletions Makefile.kokkos
Original file line number Diff line number Diff line change
Expand Up @@ -1440,6 +1440,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
else
tmp := $(call desul_append_header,"/* $H""undef DESUL_ATOMICS_ENABLE_OPENMP */")
endif

ifeq ($(KOKKOS_INTERNAL_USE_OPENACC), 1)
tmp := $(call desul_append_header,"$H""define DESUL_ATOMICS_ENABLE_OPENACC")
else
tmp := $(call desul_append_header,"/* $H""undef DESUL_ATOMICS_ENABLE_OPENACC */")
endif
tmp := $(call desul_append_header, "")
tmp := $(call desul_append_header, "$H""endif")

Expand Down
3 changes: 3 additions & 0 deletions core/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ IF (NOT desul_FOUND)
IF(KOKKOS_ENABLE_OPENMPTARGET)
SET(DESUL_ATOMICS_ENABLE_OPENMP ON) # not a typo Kokkos OpenMPTarget -> Desul OpenMP
ENDIF()
IF(KOKKOS_ENABLE_OPENACC)
SET(DESUL_ATOMICS_ENABLE_OPENACC ON)
ENDIF()
CONFIGURE_FILE(
${CMAKE_CURRENT_SOURCE_DIR}/../../tpls/desul/Config.hpp.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/desul/atomics/Config.hpp
Expand Down
2 changes: 1 addition & 1 deletion core/src/Kokkos_Macros.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
#if !defined(KOKKOS_ENABLE_THREADS) && !defined(KOKKOS_ENABLE_CUDA) && \
!defined(KOKKOS_ENABLE_OPENMP) && !defined(KOKKOS_ENABLE_HPX) && \
!defined(KOKKOS_ENABLE_OPENMPTARGET) && !defined(KOKKOS_ENABLE_HIP) && \
!defined(KOKKOS_ENABLE_SYCL)
!defined(KOKKOS_ENABLE_SYCL) && !defined(KOKKOS_ENABLE_OPENACC)
#define KOKKOS_INTERNAL_NOT_PARALLEL
#endif

Expand Down
19 changes: 1 addition & 18 deletions core/unit_test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ SET(KOKKOS_THREADS_NAME Threads)
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
SET(KOKKOS_OPENACC_FEATURE_LEVEL 9)
ELSE()
SET(KOKKOS_OPENACC_FEATURE_LEVEL 16)
SET(KOKKOS_OPENACC_FEATURE_LEVEL 17)
ENDIF()

SET(KOKKOS_OPENACC_NAME Experimental::OpenACC)
Expand Down Expand Up @@ -524,17 +524,7 @@ IF(KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
list(REMOVE_ITEM OpenACC_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/default/TestDefaultDeviceType_a1.cpp
${CMAKE_CURRENT_SOURCE_DIR}/default/TestDefaultDeviceType_b1.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_double.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_float.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_int.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_longint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_longlongint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_shared.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_unsignedint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_unsignedlongint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Atomics.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicViews.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_BlockSizeDeduction.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_DeepCopyAlignment.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_HostSharedPtr.cpp
Expand All @@ -551,17 +541,10 @@ IF(KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reducers_d.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reductions.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reductions_DeviceView.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_b.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c02.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c03.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c05.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c08.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c11.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamBasic.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamScratch.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamTeamSize.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_UniqueToken.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewMapping_b.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewResize.cpp
)
endif()
Expand Down
74 changes: 74 additions & 0 deletions core/unit_test/TestAtomicOperations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,63 @@ bool atomic_op_test(T old_val, T update) {
return result == 0;
}

template <class T>
constexpr T relative_error_threshold = T(1.0e-15);

template <class Op, class T, class ExecSpace>
bool atomic_op_test_rel(T old_val, T update) {
Kokkos::View<T[3], ExecSpace> op_data("op_data");
Kokkos::deep_copy(op_data, old_val);
int result = 0;
Kokkos::parallel_reduce(
Kokkos::RangePolicy<ExecSpace>(0, 1),
KOKKOS_LAMBDA(int, int& local_result) {
auto fetch_result =
Op::atomic_op(&op_data(0), &op_data(1), &op_data(2), update);
T expected_val = Op::op(old_val, update);
Kokkos::memory_fence();
if (expected_val == T(0)) {
if (fabs(op_data(0)) > relative_error_threshold<T>) local_result += 1;
if (fabs(op_data(1)) > relative_error_threshold<T>) local_result += 2;
if (fabs(op_data(2)) > relative_error_threshold<T>) local_result += 4;
if (fetch_result.first != old_val) local_result += 8;
if (fabs(fetch_result.second) > relative_error_threshold<T>)
local_result += 16;
} else {
if (fabs((op_data(0) - expected_val) / expected_val) >
relative_error_threshold<T>)
local_result += 1;
if (fabs((op_data(1) - expected_val) / expected_val) >
relative_error_threshold<T>)
local_result += 2;
if (fabs((op_data(2) - expected_val) / expected_val) >
relative_error_threshold<T>)
local_result += 4;
if (fetch_result.first != old_val) local_result += 8;
if (fabs((fetch_result.second - expected_val) / expected_val) >
relative_error_threshold<T>)
local_result += 16;
}
},
result);
if ((result & 1) != 0)
printf("atomic_%s failed with type %s\n", Op::name(), typeid(T).name());
if ((result & 2) != 0)
printf("atomic_fetch_%s failed with type %s\n", Op::name(),
typeid(T).name());
if ((result & 4) != 0)
printf("atomic_%s_fetch failed with type %s\n", Op::name(),
typeid(T).name());
if ((result & 8) != 0)
printf("atomic_fetch_%s did not return old value with type %s\n",
Op::name(), typeid(T).name());
if ((result & 16) != 0)
printf("atomic_%s_fetch did not return updated value with type %s\n",
Op::name(), typeid(T).name());

return result == 0;
}

//---------------------------------------------------
//--------------atomic_test_control------------------
//---------------------------------------------------
Expand Down Expand Up @@ -395,6 +452,12 @@ bool AtomicOperationsTestIntegralType(int old_val_in, int update_in, int test) {
case 9: return atomic_op_test<XorAtomicTest, T, ExecSpace>(old_val, update);
case 10:
return atomic_op_test<NandAtomicTest, T, ExecSpace>(old_val, update);
#if defined(KOKKOS_ENABLE_OPENACC) && defined(KOKKOS_COMPILER_NVHPC)
// FIXME_NVHPC: atomic-fetch-shift operation fails due to NVHPC OpenACC
// compiler bugs, which are reported to NVIDIA.
case 11: return true;
case 12: return true;
#else
case 11:
return update_in >= 0 ? atomic_op_test<LShiftAtomicTest, T, ExecSpace>(
old_val, update)
Expand All @@ -403,6 +466,7 @@ bool AtomicOperationsTestIntegralType(int old_val_in, int update_in, int test) {
return update_in >= 0 ? atomic_op_test<RShiftAtomicTest, T, ExecSpace>(
old_val, update)
: true;
#endif
case 13:
return atomic_op_test<IncAtomicTest, T, ExecSpace>(old_val, update);
case 14:
Expand Down Expand Up @@ -440,10 +504,20 @@ bool AtomicOperationsTestNonIntegralType(int old_val_in, int update_in,
case 2: return atomic_op_test<MaxAtomicTest, T, ExecSpace>(old_val, update);
case 3: return atomic_op_test<MinAtomicTest, T, ExecSpace>(old_val, update);
case 4: return atomic_op_test<MulAtomicTest, T, ExecSpace>(old_val, update);
#if defined(KOKKOS_ENABLE_OPENACC) && defined(KOKKOS_COMPILER_NVHPC)
// NVHPC may use different internal precisions for the device and host
// atomic operations. Therefore, relative errors are used to compare the
// host results and device results.
case 5:
return update != 0 ? atomic_op_test_rel<DivAtomicTest, T, ExecSpace>(
old_val, update)
: true;
#else
case 5:
return update != 0
? atomic_op_test<DivAtomicTest, T, ExecSpace>(old_val, update)
: true;
#endif
case 6:
return atomic_op_test<LoadStoreAtomicTest, T, ExecSpace>(old_val, update);
}
Expand Down
4 changes: 3 additions & 1 deletion core/unit_test/TestAtomics.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,9 @@ TEST(TEST_CATEGORY, atomics) {
ASSERT_TRUE((TestAtomic::Loop<float, TEST_EXECSPACE>(100, 2)));
ASSERT_TRUE((TestAtomic::Loop<float, TEST_EXECSPACE>(100, 3)));

#ifndef KOKKOS_ENABLE_OPENMPTARGET
// FIXME_OPENMPTARGET
// FIXME_OPENACC: atomic operations on composite types are not supported.
#if !defined(KOKKOS_ENABLE_OPENMPTARGET) && !defined(KOKKOS_ENABLE_OPENACC)
ASSERT_TRUE((TestAtomic::Loop<Kokkos::complex<float>, TEST_EXECSPACE>(1, 1)));
ASSERT_TRUE((TestAtomic::Loop<Kokkos::complex<float>, TEST_EXECSPACE>(1, 2)));
ASSERT_TRUE((TestAtomic::Loop<Kokkos::complex<float>, TEST_EXECSPACE>(1, 3)));
Expand Down
1 change: 1 addition & 0 deletions tpls/desul/Config.hpp.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,6 @@ SPDX-License-Identifier: (BSD-3-Clause)
#cmakedefine DESUL_ATOMICS_ENABLE_SYCL
#cmakedefine DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION
#cmakedefine DESUL_ATOMICS_ENABLE_OPENMP
#cmakedefine DESUL_ATOMICS_ENABLE_OPENACC

#endif
3 changes: 3 additions & 0 deletions tpls/desul/include/desul/atomics/Compare_Exchange.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ SPDX-License-Identifier: (BSD-3-Clause)
#ifdef DESUL_HAVE_OPENMP_ATOMICS
#include <desul/atomics/Compare_Exchange_OpenMP.hpp>
#endif
#ifdef DESUL_HAVE_OPENACC_ATOMICS
#include <desul/atomics/Compare_Exchange_OpenACC.hpp>
#endif
#ifdef DESUL_HAVE_SYCL_ATOMICS
#include <desul/atomics/Compare_Exchange_SYCL.hpp>
#endif
Expand Down
149 changes: 149 additions & 0 deletions tpls/desul/include/desul/atomics/Compare_Exchange_OpenACC.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
/*
Copyright (c) 2019, Lawrence Livermore National Security, LLC
and DESUL project contributors. See the COPYRIGHT file for details.
Source: https://github.com/desul/desul
SPDX-License-Identifier: (BSD-3-Clause)
*/

#ifndef DESUL_ATOMICS_COMPARE_EXCHANGE_OPENACC_HPP_
#define DESUL_ATOMICS_COMPARE_EXCHANGE_OPENACC_HPP_

#include <openacc.h>

#include <desul/atomics/Common.hpp>
#include <desul/atomics/Thread_Fence_OpenACC.hpp>
#include <type_traits>

namespace desul {
namespace Impl {

#ifdef __NVCOMPILER

#pragma acc routine seq
template <class T, class MemoryOrder, class MemoryScope>
T device_atomic_exchange(T* dest, T value, MemoryOrder, MemoryScope /*scope*/) {
if constexpr (std::is_arithmetic_v<T> && ((sizeof(T) == 4) || (sizeof(T) == 8))) {
T return_val;
#pragma acc atomic capture
{
return_val = *dest;
*dest = value;
}
return return_val;
} else {
// FIXME_OPENACC
printf(
"DESUL error in device_atomic_exchange(): Not supported atomic operation in "
"the OpenACC backend\n");
// Acquire a lock for the address
// while (!lock_address_openacc((void*)dest, scope)) {
// }
// device_atomic_thread_fence(MemoryOrderAcquire(), scope);
T return_val = *dest;
*dest = value;
// device_atomic_thread_fence(MemoryOrderRelease(), scope);
// unlock_address_openacc((void*)dest, scope);
return return_val;
}
}

#pragma acc routine seq
template <class T, class MemoryOrder, class MemoryScope>
T device_atomic_compare_exchange(
T* dest, T compare, T value, MemoryOrder, MemoryScope scope) {
// Floating point types treated separetely to work around compiler errors
// "parse invalid cast opcode for cast from 'i32' to 'float'".
// Also not just "forwarding" arguments to atomicCAS because it does not have an
// overload that takes int64_t
if constexpr (std::is_integral_v<T> && ((sizeof(T) == 4) || (sizeof(T) == 8))) {
static_assert(sizeof(unsigned int) == 4);
static_assert(sizeof(unsigned long long int) == 8);
using cas_t =
std::conditional_t<(sizeof(T) == 4), unsigned int, unsigned long long int>;
cas_t return_val = atomicCAS(reinterpret_cast<cas_t*>(dest),
reinterpret_cast<cas_t&>(compare),
reinterpret_cast<cas_t&>(value));
return reinterpret_cast<T&>(return_val);
#ifdef DESUL_CUDA_ARCH_IS_PRE_PASCAL
} else if constexpr (std::is_same_v<T, float>) {
#else
} else if constexpr (std::is_same_v<T, float> || std::is_same_v<T, double>) {
#endif
return atomicCAS(dest, compare, value);
} else {
// FIXME_OPENACC
printf(
"DESUL error in device_atomic_compare_exchange(): Not supported atomic "
"operation in the OpenACC backend\n");
T current_val = *dest;
// Acquire a lock for the address
// while (!lock_address_openacc((void*)dest, scope)) {
//}
// device_atomic_thread_fence(MemoryOrderAcquire(), scope);
if (current_val == compare) {
*dest = value;
// device_atomic_thread_fence(MemoryOrderRelease(), scope);
}
// unlock_address_openacc((void*)dest, scope);
return current_val;
}
}

#else // not NVHPC

#pragma acc routine seq
template <class T, class MemoryOrder, class MemoryScope>
T device_atomic_exchange(T* dest, T value, MemoryOrder, MemoryScope) {
if constexpr (std::is_arithmetic_v<T>) {
T return_val;
#pragma acc atomic capture
{
return_val = *dest;
*dest = value;
}
return return_val;
} else {
// FIXME_OPENACC
printf(
"DESUL error in device_atomic_exchange(): Not supported atomic operation in "
"the OpenACC backend\n");
// Acquire a lock for the address
// while (!lock_address_openacc((void*)dest, scope)) {
// }
// device_atomic_thread_fence(MemoryOrderAcquire(), scope);
T return_val = *dest;
*dest = value;
// device_atomic_thread_fence(MemoryOrderRelease(), scope);
// unlock_address_openacc((void*)dest, scope);
return return_val;
}
}

#pragma acc routine seq
template <class T, class MemoryOrder, class MemoryScope>
T device_atomic_compare_exchange(
T* dest, T compare, T value, MemoryOrder, MemoryScope scope) {
// FIXME_OPENACC
printf(
"DESUL error in device_atomic_compare_exchange(): Not supported atomic operation "
"in the OpenACC backend\n");
T current_val = *dest;
// Acquire a lock for the address
// while (!lock_address_openacc((void*)dest, scope)) {
//}
// device_atomic_thread_fence(MemoryOrderAcquire(), scope);
if (current_val == compare) {
*dest = value;
// device_atomic_thread_fence(MemoryOrderRelease(), scope);
}
// unlock_address_openacc((void*)dest, scope);
return current_val;
}

#endif

} // namespace Impl
} // namespace desul

#endif
3 changes: 3 additions & 0 deletions tpls/desul/include/desul/atomics/Fetch_Op.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ SPDX-License-Identifier: (BSD-3-Clause)
#ifdef DESUL_HAVE_OPENMP_ATOMICS
#include <desul/atomics/Fetch_Op_OpenMP.hpp>
#endif
#ifdef DESUL_HAVE_OPENACC_ATOMICS
#include <desul/atomics/Fetch_Op_OpenACC.hpp>
#endif
#ifdef DESUL_HAVE_SYCL_ATOMICS
#include <desul/atomics/Fetch_Op_SYCL.hpp>
#endif
Expand Down

0 comments on commit 97a90d5

Please sign in to comment.