Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation error for using the cuda backend #2

Open
wangyf opened this issue Apr 28, 2022 · 1 comment
Open

segmentation error for using the cuda backend #2

wangyf opened this issue Apr 28, 2022 · 1 comment

Comments

@wangyf
Copy link

wangyf commented Apr 28, 2022

Hi, I am trying to learn and use your MPI+kokkos code. It went well for MPI+openmp in kokkos but failed for MPI+cuda. Here is what I got: Do you have sense what's wrong for cuda backend? Thanks

I'm MPI task #3 (out of 4) pinned to GPU #0
(out of 4) pinned to GPU #0
We are about to start simulation with the following characteristics
Global resolution : 256 x 256 x 1
Local resolution : 128 x 128 x 1
MPI Cartesian topology : 2x2x1
[c196-012:10061:0:10061] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ab54d4c3e80)
[c196-012:10062:0:10062] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ac7db4c3e80)
[c196-012:10063:0:10063] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b64db4c3e80)
[c196-012:10064:0:10064] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b131d4c3e80)
==== backtrace (tid: 10064) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x000000000089f648 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
2 0x000000000089f648 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:146
3 0x000000000089bce9 write_to_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:326
4 0x000000000089bce9 send_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:890
5 0x00000000008959a4 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1540
6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
18 0x00000000004c453a hydroSimu::MpiComm::sendrecv() ???:0
19 0x0000000000491e9b euler_kokkos::SolverBase::transfert_boundaries_2d() ???:0
20 0x00000000004a1877 euler_kokkos::SolverBase::make_boundaries_mpi() ???:0
21 0x000000000044d786 euler_kokkos::muscl::SolverHydroMuscl<2>::make_boundaries() ???:0
22 0x0000000000445223 euler_kokkos::muscl::SolverHydroMuscl<2>::SolverHydroMuscl() ???:0
23 0x0000000000445d05 euler_kokkos::muscl::SolverHydroMuscl<2>::create() ???:0
24 0x00000000004152e8 euler_kokkos::SolverFactory::create() ???:0
25 0x00000000004116e6 main() ???:0
26 0x0000000000022555 __libc_start_main() ???:0
27 0x0000000000414fec _start() ???:0

=================================
==== backtrace (tid: 10063) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x000000000089f648 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
2 0x000000000089f648 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:146
3 0x000000000089bce9 write_to_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:326
4 0x000000000089bce9 send_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:890
5 0x00000000008959a4 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1540
6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
18 0x00000000004c453a hydroSimu::MpiComm::sendrecv() ???:0
19 0x0000000000491e9b euler_kokkos::SolverBase::transfert_boundaries_2d() ???:0
20 0x00000000004a1877 euler_kokkos::SolverBase::make_boundaries_mpi() ???:0
21 0x000000000044d786 euler_kokkos::muscl::SolverHydroMuscl<2>::make_boundaries() ???:0
22 0x0000000000445223 euler_kokkos::muscl::SolverHydroMuscl<2>::SolverHydroMuscl() ???:0
23 0x0000000000445d05 euler_kokkos::muscl::SolverHydroMuscl<2>::create() ???:0
24 0x00000000004152e8 euler_kokkos::SolverFactory::create() ???:0
25 0x00000000004116e6 main() ???:0
26 0x0000000000022555 __libc_start_main() ???:0
27 0x0000000000414fec _start() ???:0

@pkestene
Copy link
Owner

hi @wangyf
I guess you are using Intel MPI.
Is there another MPI implementation you could try on your system ?

From the log, it hard to tell what's wrong, but can you tell if your interconnect is PSM2 / OmniPath by any chance ?
On such system, Kokkos::Initialize (which internally calls cudaSetDevice) must be called before MPI_Init for proper init.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants