Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

Closed
anagainaru opened this issue Mar 29, 2023 · 8 comments · Fixed by #3623
Closed

ADIOS2 GPU-aware breaks with Kokkos 4.0 #3566

anagainaru opened this issue Mar 29, 2023 · 8 comments · Fixed by #3623
Assignees
Labels
area: build Build issues

Comments

@anagainaru
Copy link
Contributor

Describe the bug
When building agains Kokkos 4.0, GPU applications using ADIOS2 give runtime errors.

To Reproduce

  1. Build Kokkos 4.0 with the following flags
    -D CMAKE_BUILD_TYPE=RelWithDebInfo
    -D CMAKE_CXX_COMPILER=/path/to/kokkos/bin/nvcc_wrapper
    -D Kokkos_ENABLE_CUDA=ON
    -D Kokkos_ENABLE_CUDA_LAMBDA=ON
    -D CMAKE_CXX_STANDARD=17
    -D CMAKE_CXX_EXTENSIONS=OFF
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE
    -D BUILD_SHARED_LIBS=ON

    -D Kokkos_ARCH_AMPERE80=ON     # if building on Perlmutter
    -D Kokkos_ARCH_VOLTA70=ON      # if building on Summit
  1. Build ADIOS2 with Kokkos 4.0
    -DADIOS2_USE_Kokkos=ON 
    -DCMAKE_CXX_COMPILER=g++ 
    -DCMAKE_C_COMPILER=gcc 
    -DCMAKE_CUDA_FLAGS="-fPIC" 
    -DKokkos_ROOT=/path/to/kokkos/install
  1. Run the CUDA example
$ ./bin/CudaBPWriteRead_cuda
Using engine BP5
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory
Aborted

Expected behavior

$ ./bin/CudaBPWriteRead_cuda
Using engine BP5
Simualation step 0 : 6000 elements: 0
Simualation step 1 : 6000 elements: 10
Simualation step 2 : 6000 elements: 20
Simualation step 3 : 6000 elements: 30
Simualation step 4 : 6000 elements: 40
Simualation step 5 : 6000 elements: 50
Simualation step 6 : 6000 elements: 60
Simualation step 7 : 6000 elements: 70
Simualation step 8 : 6000 elements: 80
Simualation step 9 : 6000 elements: 90

Desktop (please complete the following information):
Tested on Perlmutter

gcc/11.2.0
cudatoolkit/11.7
cmake/3.20.4

Tested on Summit

module load gcc/10.2
module load cuda/11.5
module load cmake/3.23

Additional context

Removing from source/adios2/CMakeLists.txt the following lines:

set_property(SOURCE helper/adiosKokkos.cpp PROPERTY LANGUAGE CUDA)
set_property(SOURCE helper/adiosKokkos.cpp APPEND PROPERTY COMPILE_FLAGS "--extended-lambda")

and building ADIOS2 with -D CMAKE_CXX_COMPILER=/path/to/kokkos/bin/nvcc_wrapper make the code run correctly on Perlmutter and Summit.

@vicentebolea
Copy link
Collaborator

@anagainaru using the v2.9.0 tag, I could build and run the example correctly using Kokkos 4.0.

The difference was when configure ADIOS2. I used the following flags:

Kokkos_DIR=`pwd`/Kokkos_cuda cmake -GNinja -S ADIOS2 -B ADIOS2_build -DADIOS2_USE_Kokkos=ON  -DKokkos_COMPILE_LAUNCHER=`pwd`/Kokkos_cuda/bin/kokkos_launch_compiler -DKokkos_NVCC_WRAPPER=`pwd`/Kokkos_cuda/bin/nvcc_wrapper -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc

@anagainaru
Copy link
Contributor Author

Thanks @vicentebolea ! Will take a look at this by the end of the week and if everything looks good we can lift the restriction on Kokkos 3.7 and update the documentation

@anagainaru
Copy link
Contributor Author

I tested this on Summit and I don't see any difference even if I use the new flags:

./bin/CudaBPWriteRead_cuda
Using engine BP5
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory
Aborted (core dumped)

How did you built Kokkos?

@vicentebolea
Copy link
Collaborator

There has been changes (updates) in the default modules in Summit. Did you encounter that error after a fresh install?

@anagainaru
Copy link
Contributor Author

I imported the same modules as before (only ninja was loaded without specifying the version). Which modules did you use?

module load gcc/10.2
module load cuda/11.5
module load cmake/3.23
module load ninja

I used the master branch in Kokkos so that is something else I can check (will use the 4.0 release).

@vicentebolea
Copy link
Collaborator

I used kokkos 4.0.00 (not the newly 4.0.01).

@vicentebolea
Copy link
Collaborator

Modules are:

  1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps      7) git-lfs/2.11.0   9) gcc/10.2.0                      11) ninja/1.10.2             13) nsight-systems/2021.3.1.54  15) cmake/3.23.2
  2) hsi/5.0.2.p5    4) xalt/1.2.1                   6) git/2.36.1   8) tmux/3.2a       10) spectrum-mpi/10.4.0.3-20210112  12) nsight-compute/2021.2.1  14) cuda/11.5.2

Kokkos options are:

/Whether to build CUDA backend                                                                                                                                                                                     
Kokkos_ENABLE_CUDA:BOOL=ON                                                                                                                                                                                          
                                                                                                                                                                                                                    
//Whether to activate experimental relaxed constexpr functions                                                                                                                                                      
Kokkos_ENABLE_CUDA_CONSTEXPR:BOOL=OFF                                                                                                                                                                               
                                                                                                                                                                                                                    
//Whether to activate experimental lambda features                                                                                                                                                                  
Kokkos_ENABLE_CUDA_LAMBDA:BOOL=ON                                                                                                                                                                                   
                                                                                                                                                                                                                    
//Whether to use CUDA LDG intrinsics                                                                                                                                                                                
Kokkos_ENABLE_CUDA_LDG_INTRINSIC:BOOL=OFF                                                                                                                                                                           
                                                                                                                                                                                                                    
//Whether to enable relocatable device code (RDC) for CUDA                                                                                                                                                          
Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE:BOOL=OFF                                                                                                                                                                 
                                                                                                                                                                                                                    
//Whether to use unified memory (UM) for CUDA by default                                                                                                                                                            
Kokkos_ENABLE_CUDA_UVM:BOOL=OFF       

@vicentebolea
Copy link
Collaborator

cmake -S kokkos-4.0.00/ -B Kokkos_build_cuda2/     -DCMAKE_BUILD_TYPE=RelWithDebInfo      -DKokkos_ENABLE_SERIAL=ON     -DKokkos_ARCH_POWER9=ON     -DKokkos_ENABLE_CUDA=ON     -DKokkos_ENABLE_CUDA_LAMBDA=ON    
 -DKokkos_ARCH_VOLTA70=ON     -DCMAKE_CXX_STANDARD=17     -DCMAKE_CXX_EXTENSIONS=OFF     -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: build Build issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants