forked from kokkos/kokkos
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OpenACC: add atomics support (kokkos#6446)
* Initial OpenACC atomic construct implementation. * Partially fixed bugs in the OpenACC atomic implementations. * First working version, where general atomic implementations work only on a sequential host * Update Unit Test CMake * Update as suggested by code review: - Remove const_cast() - Change Kokkos::abort() with printf() - Add FIXME_OPENACC comment. * Disable unsupporte OpenACC atomic tests. (OpenACC C/C++ does not support atomic max/min/mod operations) Disable TestOpenACC_BitManipulationBuiltins for OpenACC due to errors. * Apply ClangFormat * Disable unsupported unit tests when by old NVHPC compilers (V22.5 or older). * Apply ClangFormat * Update tpls/desul/include/desul/atomics/Fetch_Op_OpenACC.hpp Co-authored-by: Damien L-G <dalg24+github@gmail.com> * Restore unit tests that were disabled for old NVHPC compilers (V22.5 or older) * Update unit test CMakeLists.txt to include unit tests enabled by this PR. * Change the minimum version of the NVHPC compiler from 22.3 to 22.9 for the OpenACC backend. * Changed the way to guard unsupported atomic tests for the OpenACC backend. * Remove unnecessary guarding on unsupported atomic tests for the OpenACC backend. * Minor updates according to the code review. * Changed the supported-type-checking code from macro to alias template as suggested by the code review. * Undo changing the minimim required version of NVHPC. * Apply suggestions from code review Co-authored-by: Damien L-G <dalg24+github@gmail.com> * Change the KOKKOS_COMPILER_NVHPC macro to __NVCOMPILER * Rename a variable's name from `tmp` to `old in atomic_fetch_*() functions. * Change `ptr[0]` to `*ptr` as suggested by the code review. * Add FIXME comments in `device_atomic_thread_fence()`. * Moved definitions into the desul::Impl namespace as suggested by the code review. * Clean up the OpenACC atomic implementations. Re-enable atomic max/min tests for OpenACC. * Fix a typo (sid => std) * Minor bug fix on OpenACC * Update .jenkins to the latest. * Test this please * Try again * Fix typo * Deal with conflicts * Disable complex float atomic unit tests for OpenACC backend * Sync with PR opened on the desul side * DO NOT MERGE disable all CI but OpenACC * - Disable atomic-fetch-shift tests for NVHPC OpenACC compiler, which fail due to compiler bugs, which are reported to NVIDIA. - Change the values of start and end variables in TestAtomicOperations_double.hpp and TestAtomicOperations_float.hpp so that atomic-division tests calculate trivial divisions. (In the original tests, NVHPC compiler failed since device results are slightly different from the host results due to precision mismatch.) * Add atomic_op_test)rel() to TestAtomicOperations.hpp, which compares the host and device atomic operations using a relative error. * Revert "DO NOT MERGE disable all CI but OpenACC" This reverts commit 18132bf. * [desul_atomics] Fixup Kokkos -> DESUL in error message with OpenACC --------- Co-authored-by: Seyong Lee <lees2@ornl.gov>
- Loading branch information
Showing
17 changed files
with
855 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
tpls/desul/include/desul/atomics/Compare_Exchange_OpenACC.hpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
/* | ||
Copyright (c) 2019, Lawrence Livermore National Security, LLC | ||
and DESUL project contributors. See the COPYRIGHT file for details. | ||
Source: https://github.com/desul/desul | ||
SPDX-License-Identifier: (BSD-3-Clause) | ||
*/ | ||
|
||
#ifndef DESUL_ATOMICS_COMPARE_EXCHANGE_OPENACC_HPP_ | ||
#define DESUL_ATOMICS_COMPARE_EXCHANGE_OPENACC_HPP_ | ||
|
||
#include <openacc.h> | ||
|
||
#include <desul/atomics/Common.hpp> | ||
#include <desul/atomics/Thread_Fence_OpenACC.hpp> | ||
#include <type_traits> | ||
|
||
namespace desul { | ||
namespace Impl { | ||
|
||
#ifdef __NVCOMPILER | ||
|
||
#pragma acc routine seq | ||
template <class T, class MemoryOrder, class MemoryScope> | ||
T device_atomic_exchange(T* dest, T value, MemoryOrder, MemoryScope /*scope*/) { | ||
if constexpr (std::is_arithmetic_v<T> && ((sizeof(T) == 4) || (sizeof(T) == 8))) { | ||
T return_val; | ||
#pragma acc atomic capture | ||
{ | ||
return_val = *dest; | ||
*dest = value; | ||
} | ||
return return_val; | ||
} else { | ||
// FIXME_OPENACC | ||
printf( | ||
"DESUL error in device_atomic_exchange(): Not supported atomic operation in " | ||
"the OpenACC backend\n"); | ||
// Acquire a lock for the address | ||
// while (!lock_address_openacc((void*)dest, scope)) { | ||
// } | ||
// device_atomic_thread_fence(MemoryOrderAcquire(), scope); | ||
T return_val = *dest; | ||
*dest = value; | ||
// device_atomic_thread_fence(MemoryOrderRelease(), scope); | ||
// unlock_address_openacc((void*)dest, scope); | ||
return return_val; | ||
} | ||
} | ||
|
||
#pragma acc routine seq | ||
template <class T, class MemoryOrder, class MemoryScope> | ||
T device_atomic_compare_exchange( | ||
T* dest, T compare, T value, MemoryOrder, MemoryScope scope) { | ||
// Floating point types treated separetely to work around compiler errors | ||
// "parse invalid cast opcode for cast from 'i32' to 'float'". | ||
// Also not just "forwarding" arguments to atomicCAS because it does not have an | ||
// overload that takes int64_t | ||
if constexpr (std::is_integral_v<T> && ((sizeof(T) == 4) || (sizeof(T) == 8))) { | ||
static_assert(sizeof(unsigned int) == 4); | ||
static_assert(sizeof(unsigned long long int) == 8); | ||
using cas_t = | ||
std::conditional_t<(sizeof(T) == 4), unsigned int, unsigned long long int>; | ||
cas_t return_val = atomicCAS(reinterpret_cast<cas_t*>(dest), | ||
reinterpret_cast<cas_t&>(compare), | ||
reinterpret_cast<cas_t&>(value)); | ||
return reinterpret_cast<T&>(return_val); | ||
#ifdef DESUL_CUDA_ARCH_IS_PRE_PASCAL | ||
} else if constexpr (std::is_same_v<T, float>) { | ||
#else | ||
} else if constexpr (std::is_same_v<T, float> || std::is_same_v<T, double>) { | ||
#endif | ||
return atomicCAS(dest, compare, value); | ||
} else { | ||
// FIXME_OPENACC | ||
printf( | ||
"DESUL error in device_atomic_compare_exchange(): Not supported atomic " | ||
"operation in the OpenACC backend\n"); | ||
T current_val = *dest; | ||
// Acquire a lock for the address | ||
// while (!lock_address_openacc((void*)dest, scope)) { | ||
//} | ||
// device_atomic_thread_fence(MemoryOrderAcquire(), scope); | ||
if (current_val == compare) { | ||
*dest = value; | ||
// device_atomic_thread_fence(MemoryOrderRelease(), scope); | ||
} | ||
// unlock_address_openacc((void*)dest, scope); | ||
return current_val; | ||
} | ||
} | ||
|
||
#else // not NVHPC | ||
|
||
#pragma acc routine seq | ||
template <class T, class MemoryOrder, class MemoryScope> | ||
T device_atomic_exchange(T* dest, T value, MemoryOrder, MemoryScope) { | ||
if constexpr (std::is_arithmetic_v<T>) { | ||
T return_val; | ||
#pragma acc atomic capture | ||
{ | ||
return_val = *dest; | ||
*dest = value; | ||
} | ||
return return_val; | ||
} else { | ||
// FIXME_OPENACC | ||
printf( | ||
"DESUL error in device_atomic_exchange(): Not supported atomic operation in " | ||
"the OpenACC backend\n"); | ||
// Acquire a lock for the address | ||
// while (!lock_address_openacc((void*)dest, scope)) { | ||
// } | ||
// device_atomic_thread_fence(MemoryOrderAcquire(), scope); | ||
T return_val = *dest; | ||
*dest = value; | ||
// device_atomic_thread_fence(MemoryOrderRelease(), scope); | ||
// unlock_address_openacc((void*)dest, scope); | ||
return return_val; | ||
} | ||
} | ||
|
||
#pragma acc routine seq | ||
template <class T, class MemoryOrder, class MemoryScope> | ||
T device_atomic_compare_exchange( | ||
T* dest, T compare, T value, MemoryOrder, MemoryScope scope) { | ||
// FIXME_OPENACC | ||
printf( | ||
"DESUL error in device_atomic_compare_exchange(): Not supported atomic operation " | ||
"in the OpenACC backend\n"); | ||
T current_val = *dest; | ||
// Acquire a lock for the address | ||
// while (!lock_address_openacc((void*)dest, scope)) { | ||
//} | ||
// device_atomic_thread_fence(MemoryOrderAcquire(), scope); | ||
if (current_val == compare) { | ||
*dest = value; | ||
// device_atomic_thread_fence(MemoryOrderRelease(), scope); | ||
} | ||
// unlock_address_openacc((void*)dest, scope); | ||
return current_val; | ||
} | ||
|
||
#endif | ||
|
||
} // namespace Impl | ||
} // namespace desul | ||
|
||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.