You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to learn and use your MPI+kokkos code. It went well for MPI+openmp in kokkos but failed for MPI+cuda. Here is what I got: Do you have sense what's wrong for cuda backend? Thanks
I'm MPI task #3 (out of 4) pinned to GPU #0
(out of 4) pinned to GPU #0
We are about to start simulation with the following characteristics
Global resolution : 256 x 256 x 1
Local resolution : 128 x 128 x 1
MPI Cartesian topology : 2x2x1
[c196-012:10061:0:10061] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ab54d4c3e80)
[c196-012:10062:0:10062] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ac7db4c3e80)
[c196-012:10063:0:10063] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b64db4c3e80)
[c196-012:10064:0:10064] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b131d4c3e80)
==== backtrace (tid: 10064) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x000000000089f648 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
2 0x000000000089f648 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:146
3 0x000000000089bce9 write_to_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:326
4 0x000000000089bce9 send_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:890
5 0x00000000008959a4 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1540
6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
18 0x00000000004c453a hydroSimu::MpiComm::sendrecv() ???:0
19 0x0000000000491e9b euler_kokkos::SolverBase::transfert_boundaries_2d() ???:0
20 0x00000000004a1877 euler_kokkos::SolverBase::make_boundaries_mpi() ???:0
21 0x000000000044d786 euler_kokkos::muscl::SolverHydroMuscl<2>::make_boundaries() ???:0
22 0x0000000000445223 euler_kokkos::muscl::SolverHydroMuscl<2>::SolverHydroMuscl() ???:0
23 0x0000000000445d05 euler_kokkos::muscl::SolverHydroMuscl<2>::create() ???:0
24 0x00000000004152e8 euler_kokkos::SolverFactory::create() ???:0
25 0x00000000004116e6 main() ???:0
26 0x0000000000022555 __libc_start_main() ???:0
27 0x0000000000414fec _start() ???:0
hi @wangyf
I guess you are using Intel MPI.
Is there another MPI implementation you could try on your system ?
From the log, it hard to tell what's wrong, but can you tell if your interconnect is PSM2 / OmniPath by any chance ?
On such system, Kokkos::Initialize (which internally calls cudaSetDevice) must be called before MPI_Init for proper init.
Hi, I am trying to learn and use your MPI+kokkos code. It went well for MPI+openmp in kokkos but failed for MPI+cuda. Here is what I got: Do you have sense what's wrong for cuda backend? Thanks
I'm MPI task #3 (out of 4) pinned to GPU #0
(out of 4) pinned to GPU #0
We are about to start simulation with the following characteristics
Global resolution : 256 x 256 x 1
Local resolution : 128 x 128 x 1
MPI Cartesian topology : 2x2x1
[c196-012:10061:0:10061] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ab54d4c3e80)
[c196-012:10062:0:10062] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2ac7db4c3e80)
[c196-012:10063:0:10063] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b64db4c3e80)
[c196-012:10064:0:10064] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2b131d4c3e80)
==== backtrace (tid: 10064) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x000000000089f648 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
2 0x000000000089f648 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:146
3 0x000000000089bce9 write_to_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:326
4 0x000000000089bce9 send_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:890
5 0x00000000008959a4 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1540
6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
18 0x00000000004c453a hydroSimu::MpiComm::sendrecv() ???:0
19 0x0000000000491e9b euler_kokkos::SolverBase::transfert_boundaries_2d() ???:0
20 0x00000000004a1877 euler_kokkos::SolverBase::make_boundaries_mpi() ???:0
21 0x000000000044d786 euler_kokkos::muscl::SolverHydroMuscl<2>::make_boundaries() ???:0
22 0x0000000000445223 euler_kokkos::muscl::SolverHydroMuscl<2>::SolverHydroMuscl() ???:0
23 0x0000000000445d05 euler_kokkos::muscl::SolverHydroMuscl<2>::create() ???:0
24 0x00000000004152e8 euler_kokkos::SolverFactory::create() ???:0
25 0x00000000004116e6 main() ???:0
26 0x0000000000022555 __libc_start_main() ???:0
27 0x0000000000414fec _start() ???:0
=================================
==== backtrace (tid: 10063) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x000000000089f648 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
2 0x000000000089f648 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:146
3 0x000000000089bce9 write_to_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:326
4 0x000000000089bce9 send_cell() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:890
5 0x00000000008959a4 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1540
6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
18 0x00000000004c453a hydroSimu::MpiComm::sendrecv() ???:0
19 0x0000000000491e9b euler_kokkos::SolverBase::transfert_boundaries_2d() ???:0
20 0x00000000004a1877 euler_kokkos::SolverBase::make_boundaries_mpi() ???:0
21 0x000000000044d786 euler_kokkos::muscl::SolverHydroMuscl<2>::make_boundaries() ???:0
22 0x0000000000445223 euler_kokkos::muscl::SolverHydroMuscl<2>::SolverHydroMuscl() ???:0
23 0x0000000000445d05 euler_kokkos::muscl::SolverHydroMuscl<2>::create() ???:0
24 0x00000000004152e8 euler_kokkos::SolverFactory::create() ???:0
25 0x00000000004116e6 main() ???:0
26 0x0000000000022555 __libc_start_main() ???:0
27 0x0000000000414fec _start() ???:0
The text was updated successfully, but these errors were encountered: