Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.7.01] Fix initialization of Cuda lock arrays #5622

Merged

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Nov 7, 2022

Cherry-picking #5619 to fix #5596

…os_cuda_lock"

This reverts commit ee50e87, reversing
changes made to 023c3aa.
@masterleinad masterleinad added the Blocks Promotion Overview issue for release-blocking bugs label Nov 7, 2022
@masterleinad
Copy link
Contributor Author

diff --git a/lib/kokkos/tpls/desul/include/desul/atomics/Lock_Array_Cuda.hpp b/lib/kokkos/tpls/desul/include/desul/atomics/Lock_Array_Cuda.hpp
index 2166fa3cb7..cc4d5a317b 100644
--- a/lib/kokkos/tpls/desul/include/desul/atomics/Lock_Array_Cuda.hpp
+++ b/lib/kokkos/tpls/desul/include/desul/atomics/Lock_Array_Cuda.hpp
@@ -137,7 +137,6 @@ namespace Impl {
 namespace {
 static int lock_array_copied = 0;
 inline int eliminate_warning_for_lock_array() { return lock_array_copied; }
-}  // namespace
 
 #ifdef __CUDACC_RDC__
 inline
@@ -156,7 +155,7 @@ static
   }
   lock_array_copied = 1;
 }
-
+}
 }  // namespace Impl
 }  // namespace des

still gives me

> ../build/lmp -in in.lj -k on g 1 -sf kk -pk kokkos
LAMMPS (3 Nov 2022)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:106)
  will use up to 1 GPU(s) per node
terminate called after throwing an instance of 'std::runtime_error'
  what():  Desul::Error: init_lock_arrays_cuda: post init kernel error(cudaErrorIllegalAddress): an illegal memory access was encountered
Aborted (core dumped)

@ndellingwood
Copy link
Contributor

@dalg24 @crtrott @ajpowelsnl is this PR still intended to go into the 3.7.01 patch and blocker for release?

@ajpowelsnl
Copy link
Contributor

@dalg24 @crtrott @ajpowelsnl is this PR still intended to go into the 3.7.01 patch and blocker for release?

My understanding is that it's still up for discussion -- I don't know if the problem is well understood yet. @masterleinad might be able to share additional insight.

@ndellingwood
Copy link
Contributor

@ajpowelsnl thanks for the update. Once determined/resolved so this is no longer a blocker on 3.7.01 we can move forward with prepping the release PRs and Trilinos snapshot

@masterleinad masterleinad marked this pull request as ready for review November 29, 2022 22:33
@ajpowelsnl
Copy link
Contributor

To confirm, this PR fixes #5596, and possibly also #5269? Also, this fix did not require reverting #4682, correct? Final question -- does this topic need further discussion in the meeting today?

@masterleinad
Copy link
Contributor Author

Reproducer using LAMMPS:

git clone https://github.com/lammps/lammps.git
cd lammps
mkdir build
cd build
cmake -C ../cmake/presets/basic.cmake -C ../cmake/presets/kokkos-cuda.cmake -D BUILD_SHARED_LIBS=on -DPKG_KOKKOS=ON -DCMAKE_BUILD_TYPE=DEBUG -DKokkos_ARCH_VOLTA70=ON -DKokkos_ARCH_PASCAL60=OFF ../cmake/
make -j16
cd ../bench
../build/lmp -in in.lj -k on g 1 -sf kk -pk kokkos

@dalg24 dalg24 merged commit f6a08d5 into kokkos:release-candidate-3.7.01 Nov 30, 2022
@dalg24 dalg24 mentioned this pull request Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs Patch Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants