Skip to content

Commit

Permalink
OpenACC CMakechange Clacc (kokkos#6250)
Browse files Browse the repository at this point in the history
* Update cmake configurations and OpenACC code so that the LLVM-Clacc compiler can comple the OpenACC backend.

* Disable problematic unit tests that cause clang-linker-wrapper to hang
when targeting AMD GPUs.

* Update unit_test/CMakeLists.txt to remove supported unit tests from the
OpenACC/Clacc removal list.
Remove NVHPC-specific changes from cmake/kokkos_enable_devices.cmake
Remove incomplete changes to Makefile.kokkos.

* Disable unit tests that cause Clacc to hang for a long time when building on AMD GPUs.

* Apply suggestions from code review

Co-authored-by: Daniel Arndt <arndtd@ornl.gov>

* Minor update according to the code review.

* Revert the changes related to KOKKOS_OPENACC_WITHOUT_GPU, which will be
handled in a separate PR.

* Fix an error in algorithms/CMakeLists.txt

* Update core/unit_test/CMakeLists.txt

Co-authored-by: Damien L-G <dalg24+github@gmail.com>

* Update the comment in Kokkos_OpenACC_ParallelScan_Range.hpp as requested by the code review.

* Update comments in core/unit_test/CMakeLists.txt as suggested by the
code review.
Re-enabled TestCompilerMacros.cpp for the OpenACC backend compilers
(NVHPC and Clacc)

* Delete outdated comment and code in core/unit_test/CMakeLists.txt

* Merge OpenACC parallel_scan(range) implementations into single one using
macros.

* Simplify the parallel_scan(range) implementation further using macro.

* Remove `-lm` option from Clang when targeting AMD GPUs with OpenACC

* Undo removing space in line 1219 of core/unit_test/CMakeLists.txt

* Prefix macros with `KOKKOS_IMPL_ACC`
Remove ELEMENT_VALUES_SIZE

* Remove `TestCompilerMacros.cpp` from the removal list.
Use `num_elements` in the definition of
`KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE`.

---------

Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
Co-authored-by: Damien L-G <dalg24@gmail.com>
  • Loading branch information
4 people committed Jul 20, 2023
1 parent ba79dc4 commit 8452f8d
Show file tree
Hide file tree
Showing 7 changed files with 190 additions and 38 deletions.
2 changes: 1 addition & 1 deletion algorithms/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ IF (NOT Kokkos_INSTALL_TESTING)
ADD_SUBDIRECTORY(src)
ENDIF()
# FIXME_OPENACC: temporarily disabled due to unimplemented features
IF(NOT ((KOKKOS_ENABLE_OPENMPTARGET OR KOKKOS_ENABLE_OPENACC) AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC))
IF(NOT ((KOKKOS_ENABLE_OPENMPTARGET AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) OR KOKKOS_ENABLE_OPENACC))
KOKKOS_ADD_TEST_DIRECTORIES(unit_tests)
ENDIF()
14 changes: 11 additions & 3 deletions cmake/kokkos_arch.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ KOKKOS_ARCH_OPTION(AMPERE86 GPU "NVIDIA Ampere generation CC 8.6" "KOKK
KOKKOS_ARCH_OPTION(ADA89 GPU "NVIDIA Ada generation CC 8.9" "KOKKOS_SHOW_CUDA_ARCHS")
KOKKOS_ARCH_OPTION(HOPPER90 GPU "NVIDIA Hopper generation CC 9.0" "KOKKOS_SHOW_CUDA_ARCHS")

IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET)
IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET OR Kokkos_ENABLE_OPENACC)
SET(KOKKOS_SHOW_HIP_ARCHS ON)
ENDIF()

Expand Down Expand Up @@ -631,8 +631,8 @@ FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
MESSAGE(FATAL_ERROR "Multiple GPU architectures given! Already have ${AMDGPU_ARCH_ALREADY_SPECIFIED}, but trying to add ${ARCH}. If you are re-running CMake, try clearing the cache and running again.")
ENDIF()
SET(AMDGPU_ARCH_ALREADY_SPECIFIED ${ARCH} PARENT_SCOPE)
IF (NOT KOKKOS_ENABLE_HIP AND NOT KOKKOS_ENABLE_OPENMPTARGET)
MESSAGE(WARNING "Given AMD GPU architecture ${ARCH}, but Kokkos_ENABLE_HIP and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.")
IF (NOT KOKKOS_ENABLE_HIP AND NOT KOKKOS_ENABLE_OPENMPTARGET AND NOT KOKKOS_ENABLE_OPENACC)
MESSAGE(WARNING "Given AMD GPU architecture ${ARCH}, but Kokkos_ENABLE_HIP, Kokkos_ENABLE_OPENACC, and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.")
UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE)
ELSE()
IF(KOKKOS_ENABLE_HIP)
Expand Down Expand Up @@ -744,9 +744,17 @@ ENDIF()

IF (KOKKOS_ENABLE_OPENACC)
IF(KOKKOS_CUDA_ARCH_FLAG)
SET(CLANG_CUDA_ARCH ${KOKKOS_CUDA_ARCH_FLAG})
STRING(REPLACE "sm_" "cc" NVHPC_CUDA_ARCH ${KOKKOS_CUDA_ARCH_FLAG})
COMPILER_SPECIFIC_FLAGS(
NVHPC -acc -gpu=${NVHPC_CUDA_ARCH}
Clang -Xopenmp-target=nvptx64-nvidia-cuda -march=${CLANG_CUDA_ARCH}
-fopenmp-targets=nvptx64-nvidia-cuda
)
ELSEIF(KOKKOS_AMDGPU_ARCH_FLAG)
COMPILER_SPECIFIC_FLAGS(
Clang -Xopenmp-target=amdgcn-amd-amdhsa -march=${KOKKOS_AMDGPU_ARCH_FLAG}
-fopenmp-targets=amdgcn-amd-amdhsa
)
ELSE()
COMPILER_SPECIFIC_FLAGS(
Expand Down
10 changes: 10 additions & 0 deletions cmake/kokkos_enable_devices.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,16 @@ ENDIF()
KOKKOS_DEVICE_OPTION(OPENMP ${OMP_DEFAULT} HOST "Whether to build OpenMP backend")

KOKKOS_DEVICE_OPTION(OPENACC OFF DEVICE "Whether to build the OpenACC backend")
IF (KOKKOS_ENABLE_OPENACC)
COMPILER_SPECIFIC_FLAGS(
Clang -fopenacc -fopenacc-fake-async-wait
-Wno-openacc-and-cxx -Wno-openmp-mapping -Wno-unknown-cuda-version
-Wno-pass-failed
)
COMPILER_SPECIFIC_DEFS(
Clang KOKKOS_WORKAROUND_OPENMPTARGET_CLANG
)
ENDIF()

KOKKOS_DEVICE_OPTION(OPENMPTARGET OFF DEVICE "Whether to build the OpenMP target backend")
IF (KOKKOS_ENABLE_OPENMPTARGET)
Expand Down
3 changes: 3 additions & 0 deletions core/perf_test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
IF ((KOKKOS_ENABLE_OPENMPTARGET OR KOKKOS_ENABLE_OPENACC) AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
RETURN()
ENDIF()
IF (KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
RETURN()
ENDIF()

# all PerformanceTest_* executables are part of regular tests
# TODO: finish converting these into benchmarks (in progress)
Expand Down
94 changes: 62 additions & 32 deletions core/src/OpenACC/Kokkos_OpenACC_ParallelScan_Range.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,25 @@

#include <OpenACC/Kokkos_OpenACC.hpp>
#include <OpenACC/Kokkos_OpenACC_FunctorAdapter.hpp>
#include <OpenACC/Kokkos_OpenACC_Macros.hpp>
#include <Kokkos_Parallel.hpp>

// Clacc uses an alternative implementation to work around not-yet-implemented
// OpenACC features: Clacc does not fully support private clauses for
// gang-private variables, and the alternative implementation allocates
// the gang-private arrays on GPU global memory using array expansion,
// instead of using the private clause.
/* clang-format off */
#ifdef KOKKOS_COMPILER_CLANG
#define KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(THREADID) \
element_values[team_id * 2 * chunk_size + THREADID]
#define KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE create(element_values [0:num_elements])
#else
#define KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(THREADID) element_values[THREADID]
#define KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE private(element_values [0:num_elements])
#endif
/* clang-format on */

namespace Kokkos::Impl {

template <class Functor, class GivenValueType, class... Traits>
Expand Down Expand Up @@ -69,23 +86,28 @@ class ParallelScanOpenACCBase {
functor(m_functor);
const IndexType N = end - begin;
const IndexType n_chunks = (N + chunk_size - 1) / chunk_size;
#ifdef KOKKOS_COMPILER_CLANG
int const num_elements = n_chunks * 2 * chunk_size;
#else
int const num_elements = 2 * chunk_size;
#endif
Kokkos::View<ValueType*, Kokkos::Experimental::OpenACCSpace> chunk_values(
"Kokkos::OpenACCParallelScan::chunk_values", n_chunks);
Kokkos::View<ValueType*, Kokkos::Experimental::OpenACCSpace> offset_values(
"Kokkos::OpenACCParallelScan::offset_values", n_chunks);
Kokkos::View<ValueType, Kokkos::Experimental::OpenACCSpace> m_result_total(
"Kokkos::OpenACCParallelScan::m_result_total");
std::unique_ptr<ValueType[]> element_values_owner(
new ValueType[2 * chunk_size]);
new ValueType[num_elements]);
ValueType* element_values = element_values_owner.get();
typename Analysis::Reducer final_reducer(m_functor);

#pragma acc enter data copyin(functor, final_reducer) \
copyin(chunk_values, offset_values) async(async_arg)

#pragma acc parallel loop gang vector_length(chunk_size) private( \
element_values [0:2 * chunk_size]) \
present(functor, chunk_values, final_reducer) async(async_arg)
/* clang-format off */
KOKKOS_IMPL_ACC_PRAGMA(parallel loop gang vector_length(chunk_size) KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE present(functor, chunk_values, final_reducer) async(async_arg))
/* clang-format on */
for (IndexType team_id = 0; team_id < n_chunks; ++team_id) {
IndexType current_step = 0;
IndexType next_step = 1;
Expand All @@ -97,34 +119,37 @@ class ParallelScanOpenACCBase {
ValueType update;
final_reducer.init(&update);
if ((idx > 0) && (idx < N)) functor(idx - 1, update, false);
element_values[thread_id] = update;
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(thread_id) = update;
}
for (IndexType step_size = 1; step_size < chunk_size; step_size *= 2) {
#pragma acc loop vector
for (IndexType thread_id = 0; thread_id < chunk_size; ++thread_id) {
if (thread_id < step_size) {
element_values[next_step * chunk_size + thread_id] =
element_values[current_step * chunk_size + thread_id];
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(next_step * chunk_size +
thread_id) =
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(current_step * chunk_size +
thread_id);
} else {
ValueType localValue =
element_values[current_step * chunk_size + thread_id];
final_reducer.join(&localValue,
&element_values[current_step * chunk_size +
thread_id - step_size]);
element_values[next_step * chunk_size + thread_id] = localValue;
ValueType localValue = KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size + thread_id);
final_reducer.join(&localValue, &KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size +
thread_id - step_size));
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(next_step * chunk_size +
thread_id) = localValue;
}
}
temp = current_step;
current_step = next_step;
next_step = temp;
}
chunk_values(team_id) =
element_values[current_step * chunk_size + chunk_size - 1];
chunk_values(team_id) = KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size + chunk_size - 1);
}

ValueType tempValue;
#pragma acc serial loop present(chunk_values, offset_values, final_reducer) \
async(async_arg)
#pragma acc parallel loop seq num_gangs(1) num_workers(1) vector_length(1) \
present(chunk_values, offset_values, final_reducer) async(async_arg)
for (IndexType team_id = 0; team_id < n_chunks; ++team_id) {
if (team_id == 0) {
final_reducer.init(&offset_values(0));
Expand All @@ -135,10 +160,9 @@ class ParallelScanOpenACCBase {
}
}

#pragma acc parallel loop gang vector_length(chunk_size) private( \
element_values [0:2 * chunk_size]) \
present(functor, offset_values, final_reducer) copyin(m_result_total) \
async(async_arg)
/* clang-format off */
KOKKOS_IMPL_ACC_PRAGMA(parallel loop gang vector_length(chunk_size) KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE present(functor, offset_values, final_reducer) copyin(m_result_total) async(async_arg))
/* clang-format on */
for (IndexType team_id = 0; team_id < n_chunks; ++team_id) {
IndexType current_step = 0;
IndexType next_step = 1;
Expand All @@ -153,21 +177,24 @@ class ParallelScanOpenACCBase {
final_reducer.join(&update, &offset_values(team_id));
}
if ((idx > 0) && (idx < N)) functor(idx - 1, update, false);
element_values[thread_id] = update;
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(thread_id) = update;
}
for (IndexType step_size = 1; step_size < chunk_size; step_size *= 2) {
#pragma acc loop vector
for (IndexType thread_id = 0; thread_id < chunk_size; ++thread_id) {
if (thread_id < step_size) {
element_values[next_step * chunk_size + thread_id] =
element_values[current_step * chunk_size + thread_id];
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(next_step * chunk_size +
thread_id) =
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(current_step * chunk_size +
thread_id);
} else {
ValueType localValue =
element_values[current_step * chunk_size + thread_id];
final_reducer.join(&localValue,
&element_values[current_step * chunk_size +
thread_id - step_size]);
element_values[next_step * chunk_size + thread_id] = localValue;
ValueType localValue = KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size + thread_id);
final_reducer.join(&localValue, &KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size +
thread_id - step_size));
KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(next_step * chunk_size +
thread_id) = localValue;
}
}
temp = current_step;
Expand All @@ -178,8 +205,8 @@ class ParallelScanOpenACCBase {
for (IndexType thread_id = 0; thread_id < chunk_size; ++thread_id) {
const IndexType local_offset = team_id * chunk_size;
const IndexType idx = local_offset + thread_id;
ValueType update =
element_values[current_step * chunk_size + thread_id];
ValueType update = KOKKOS_IMPL_ACC_ACCESS_ELEMENTS(
current_step * chunk_size + thread_id);
if (idx < N) functor(idx, update, true);
if (idx == N - 1) {
if (m_result_ptr_device_accessible) {
Expand Down Expand Up @@ -285,4 +312,7 @@ class Kokkos::Impl::ParallelScanWithTotal<
}
};

#undef KOKKOS_IMPL_ACC_ACCESS_ELEMENTS
#undef KOKKOS_IMPL_ACC_ELEMENT_VALUES_CLAUSE

#endif
3 changes: 3 additions & 0 deletions core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ struct OpenACC_Traits {
#if defined(KOKKOS_IMPL_ARCH_NVIDIA_GPU)
static constexpr acc_device_t dev_type = acc_device_nvidia;
static constexpr bool may_fallback_to_host = false;
#elif defined(KOKKOS_ARCH_VEGA) || defined(KOKKOS_ARCH_NAVI)
static constexpr acc_device_t dev_type = acc_device_radeon;
static constexpr bool may_fallback_to_host = false;
#else
static constexpr acc_device_t dev_type = acc_device_not_host;
static constexpr bool may_fallback_to_host = true;
Expand Down
102 changes: 100 additions & 2 deletions core/unit_test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,13 @@ SET(KOKKOS_SYCL_FEATURE_LEVEL 999)
SET(KOKKOS_SYCL_NAME Experimental::SYCL)
SET(KOKKOS_THREADS_FEATURE_LEVEL 999)
SET(KOKKOS_THREADS_NAME Threads)
SET(KOKKOS_OPENACC_FEATURE_LEVEL 16)
# FIXME_OPENACC - The Clang compiler only compiles the first 9 incremental tests for the OpenACC backend.
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
SET(KOKKOS_OPENACC_FEATURE_LEVEL 9)
ELSE()
SET(KOKKOS_OPENACC_FEATURE_LEVEL 16)
ENDIF()

SET(KOKKOS_OPENACC_NAME Experimental::OpenACC)


Expand Down Expand Up @@ -523,6 +529,86 @@ IF(KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
)
endif()

# FIXME_OPENACC - Comment non-passing tests with the Clang compiler
IF(KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
list(REMOVE_ITEM OpenACC_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/default/TestDefaultDeviceType_a1.cpp
${CMAKE_CURRENT_SOURCE_DIR}/default/TestDefaultDeviceType_b1.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_double.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_float.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_int.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_longint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_longlongint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_shared.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_unsignedint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicOperations_unsignedlongint.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Atomics.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_AtomicViews.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_BlockSizeDeduction.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_DeepCopyAlignment.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_HostSharedPtrAccessOnDevice.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MathematicalFunctions1.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MathematicalFunctions2.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MDRange_c.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MDRange_f.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_NumericTraits.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_RangePolicy.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_RangePolicyRequire.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reducers_a.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reducers_d.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reductions.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Reductions_DeviceView.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamBasic.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamScratch.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_TeamTeamSize.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_UniqueToken.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewMapping_b.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewResize.cpp
# This test is not removed above for OpenACC+NVHPC but all its TEST
# functions are not compiled for the case of KOKKOS_COMPILER_NVHPC.
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewCtorDimMatch.cpp
# These tests are not removed above for OpenACC+NVHPC.
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Abort.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Complex.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ExecutionSpace.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ExecSpacePartitioning.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_Init.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MathematicalConstants.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MathematicalSpecialFunctions.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MinMaxClamp.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewLayoutStrideAssignment.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewMapping_a.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewMemoryAccessViolation.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_WithoutInitializing.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewAPI_d.cpp
)
# When tested on a systme with AMD MI60 GPU and ROCm V5.4.0, these cause
# clang-linker-wrapper to hang for a long time while building the unit tests.
# In some cases, including them caused the build not to complete after an hour,
# but excluding them permitted the build to finish in 1.5 mins or less.
IF(KOKKOS_AMDGPU_ARCH)
list(REMOVE_ITEM OpenACC_SOURCES
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_BitManipulationBuiltins.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_MathematicalFunctions3.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ParallelScanRangePolicy.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c04.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c05.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c06.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c07.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c08.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c09.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c10.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c11.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_SubView_c12.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewAPI_b.cpp
${CMAKE_CURRENT_BINARY_DIR}/openacc/TestOpenACC_ViewAPI_c.cpp
)
endif()
# Fails serial.atomics_tpetra_max_abs when we test with Clacc.
list(REMOVE_ITEM Serial_SOURCES1
${CMAKE_CURRENT_BINARY_DIR}/serial/TestSerial_Atomics.cpp)
endif()

if(Kokkos_ENABLE_SERIAL)
KOKKOS_ADD_EXECUTABLE_AND_TEST(
CoreUnitTest_Serial1
Expand Down Expand Up @@ -829,7 +915,6 @@ endif()
# FIXME_OPENMPTARGET, FIXME_OPENACC - Comment non-passing tests with the NVIDIA HPC compiler nvc++
if ((KOKKOS_ENABLE_OPENMPTARGET OR KOKKOS_ENABLE_OPENACC) AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
LIST(REMOVE_ITEM DEFAULT_DEVICE_SOURCES
TestCompilerMacros.cpp
default/TestDefaultDeviceType_a1.cpp
default/TestDefaultDeviceType_b1.cpp
default/TestDefaultDeviceType_c1.cpp
Expand All @@ -845,6 +930,19 @@ if ((KOKKOS_ENABLE_OPENMPTARGET OR KOKKOS_ENABLE_OPENACC) AND KOKKOS_CXX_COMPILE
)
endif()

# FIXME_OPENACC - Comment non-passing tests with the Clang compiler
if (KOKKOS_ENABLE_OPENACC AND KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
SET(DEFAULT_DEVICE_SOURCES
TestCompilerMacros.cpp
UnitTestMainInit.cpp
TestInitializationSettings.cpp
TestParseCmdLineArgsAndEnvVars.cpp
default/TestDefaultDeviceType_d.cpp
default/TestDefaultDeviceTypeResize.cpp
default/TestDefaultDeviceTypeViewAPI.cpp
)
endif()

KOKKOS_ADD_EXECUTABLE_AND_TEST(
CoreUnitTest_Default
SOURCES ${DEFAULT_DEVICE_SOURCES}
Expand Down

0 comments on commit 8452f8d

Please sign in to comment.