ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

anagainaru · 2023-03-29T12:41:24Z

Describe the bug
When building agains Kokkos 4.0, GPU applications using ADIOS2 give runtime errors.

To Reproduce

Build Kokkos 4.0 with the following flags

    -D CMAKE_BUILD_TYPE=RelWithDebInfo
    -D CMAKE_CXX_COMPILER=/path/to/kokkos/bin/nvcc_wrapper
    -D Kokkos_ENABLE_CUDA=ON
    -D Kokkos_ENABLE_CUDA_LAMBDA=ON
    -D CMAKE_CXX_STANDARD=17
    -D CMAKE_CXX_EXTENSIONS=OFF
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE
    -D BUILD_SHARED_LIBS=ON

    -D Kokkos_ARCH_AMPERE80=ON     # if building on Perlmutter
    -D Kokkos_ARCH_VOLTA70=ON      # if building on Summit

Build ADIOS2 with Kokkos 4.0

    -DADIOS2_USE_Kokkos=ON 
    -DCMAKE_CXX_COMPILER=g++ 
    -DCMAKE_C_COMPILER=gcc 
    -DCMAKE_CUDA_FLAGS="-fPIC" 
    -DKokkos_ROOT=/path/to/kokkos/install

Run the CUDA example

$ ./bin/CudaBPWriteRead_cuda
Using engine BP5
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory
Aborted

Expected behavior

$ ./bin/CudaBPWriteRead_cuda
Using engine BP5
Simualation step 0 : 6000 elements: 0
Simualation step 1 : 6000 elements: 10
Simualation step 2 : 6000 elements: 20
Simualation step 3 : 6000 elements: 30
Simualation step 4 : 6000 elements: 40
Simualation step 5 : 6000 elements: 50
Simualation step 6 : 6000 elements: 60
Simualation step 7 : 6000 elements: 70
Simualation step 8 : 6000 elements: 80
Simualation step 9 : 6000 elements: 90

Desktop (please complete the following information):
Tested on Perlmutter

gcc/11.2.0
cudatoolkit/11.7
cmake/3.20.4

Tested on Summit

module load gcc/10.2
module load cuda/11.5
module load cmake/3.23

Additional context

Removing from source/adios2/CMakeLists.txt the following lines:

set_property(SOURCE helper/adiosKokkos.cpp PROPERTY LANGUAGE CUDA)
set_property(SOURCE helper/adiosKokkos.cpp APPEND PROPERTY COMPILE_FLAGS "--extended-lambda")

and building ADIOS2 with -D CMAKE_CXX_COMPILER=/path/to/kokkos/bin/nvcc_wrapper make the code run correctly on Perlmutter and Summit.

The text was updated successfully, but these errors were encountered:

vicentebolea · 2023-05-02T18:01:57Z

@anagainaru using the v2.9.0 tag, I could build and run the example correctly using Kokkos 4.0.

The difference was when configure ADIOS2. I used the following flags:

Kokkos_DIR=`pwd`/Kokkos_cuda cmake -GNinja -S ADIOS2 -B ADIOS2_build -DADIOS2_USE_Kokkos=ON  -DKokkos_COMPILE_LAUNCHER=`pwd`/Kokkos_cuda/bin/kokkos_launch_compiler -DKokkos_NVCC_WRAPPER=`pwd`/Kokkos_cuda/bin/nvcc_wrapper -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc

anagainaru · 2023-05-03T13:01:15Z

Thanks @vicentebolea ! Will take a look at this by the end of the week and if everything looks good we can lift the restriction on Kokkos 3.7 and update the documentation

anagainaru · 2023-05-03T21:36:51Z

I tested this on Summit and I don't see any difference even if I use the new flags:

./bin/CudaBPWriteRead_cuda
Using engine BP5
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory
Aborted (core dumped)

How did you built Kokkos?

vicentebolea · 2023-05-03T21:38:40Z

There has been changes (updates) in the default modules in Summit. Did you encounter that error after a fresh install?

anagainaru · 2023-05-03T22:10:54Z

I imported the same modules as before (only ninja was loaded without specifying the version). Which modules did you use?

module load gcc/10.2
module load cuda/11.5
module load cmake/3.23
module load ninja

I used the master branch in Kokkos so that is something else I can check (will use the 4.0 release).

vicentebolea · 2023-05-03T22:14:28Z

I used kokkos 4.0.00 (not the newly 4.0.01).

vicentebolea · 2023-05-03T22:25:12Z

Modules are:

  1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps      7) git-lfs/2.11.0   9) gcc/10.2.0                      11) ninja/1.10.2             13) nsight-systems/2021.3.1.54  15) cmake/3.23.2
  2) hsi/5.0.2.p5    4) xalt/1.2.1                   6) git/2.36.1   8) tmux/3.2a       10) spectrum-mpi/10.4.0.3-20210112  12) nsight-compute/2021.2.1  14) cuda/11.5.2

Kokkos options are:

/Whether to build CUDA backend                                                                                                                                                                                     
Kokkos_ENABLE_CUDA:BOOL=ON                                                                                                                                                                                          
                                                                                                                                                                                                                    
//Whether to activate experimental relaxed constexpr functions                                                                                                                                                      
Kokkos_ENABLE_CUDA_CONSTEXPR:BOOL=OFF                                                                                                                                                                               
                                                                                                                                                                                                                    
//Whether to activate experimental lambda features                                                                                                                                                                  
Kokkos_ENABLE_CUDA_LAMBDA:BOOL=ON                                                                                                                                                                                   
                                                                                                                                                                                                                    
//Whether to use CUDA LDG intrinsics                                                                                                                                                                                
Kokkos_ENABLE_CUDA_LDG_INTRINSIC:BOOL=OFF                                                                                                                                                                           
                                                                                                                                                                                                                    
//Whether to enable relocatable device code (RDC) for CUDA                                                                                                                                                          
Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE:BOOL=OFF                                                                                                                                                                 
                                                                                                                                                                                                                    
//Whether to use unified memory (UM) for CUDA by default                                                                                                                                                            
Kokkos_ENABLE_CUDA_UVM:BOOL=OFF

vicentebolea · 2023-05-03T22:32:54Z

cmake -S kokkos-4.0.00/ -B Kokkos_build_cuda2/     -DCMAKE_BUILD_TYPE=RelWithDebInfo      -DKokkos_ENABLE_SERIAL=ON     -DKokkos_ARCH_POWER9=ON     -DKokkos_ENABLE_CUDA=ON     -DKokkos_ENABLE_CUDA_LAMBDA=ON    
 -DKokkos_ARCH_VOLTA70=ON     -DCMAKE_CXX_STANDARD=17     -DCMAKE_CXX_EXTENSIONS=OFF     -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE

anagainaru assigned vicentebolea Mar 29, 2023

vicentebolea added the area: build Build issues label May 12, 2023

vicentebolea mentioned this issue May 16, 2023

ci: use nvcc_wrapper in adiosKokkos #3623

Merged

vicentebolea closed this as completed in #3623 May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

anagainaru commented Mar 29, 2023

vicentebolea commented May 2, 2023

anagainaru commented May 3, 2023

anagainaru commented May 3, 2023

vicentebolea commented May 3, 2023

anagainaru commented May 3, 2023

vicentebolea commented May 3, 2023

vicentebolea commented May 3, 2023

vicentebolea commented May 3, 2023

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

Comments

anagainaru commented Mar 29, 2023

vicentebolea commented May 2, 2023

anagainaru commented May 3, 2023

anagainaru commented May 3, 2023

vicentebolea commented May 3, 2023

anagainaru commented May 3, 2023

vicentebolea commented May 3, 2023

vicentebolea commented May 3, 2023

vicentebolea commented May 3, 2023