-
Notifications
You must be signed in to change notification settings - Fork 934
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
Open MPI: 4.0.1
Open MPI repo revision: v4.0.1
Open MPI release date: Mar 26, 2019
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.gz
portals4 was built from a git clone.
Configured with ../configure --prefix=pwd/../_install --with-portals4=/home/pt2/portals4/_install
Please describe the system on which you are running
- Operating system/version: Debian 8
- Computer hardware: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
- Network type: Ethernet
Details of the problem
When I launch an MPI program, it crashes immediately in MPI_Init.
The stack trace varies slightly sometimes, but is always something like:
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fe7978538d0]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe7974ce067]
[ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe7974cf448]
[ 3] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(+0x12497)[0x7fe786575497]
[ 4] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(PtlMEUnlink+0x2d)[0x7fe78656f5dd]
[ 5] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_mtl_portals4.so(ompi_mtl_portals4_flowctl_fini+0x13)[0x7fe7867934e3]
[ 6] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_mtl_portals4.so(ompi_mtl_portals4_finalize+0x18)[0x7fe78678ba28]
[ 7] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(mca_pml_base_select+0x376)[0x7fe797b1ed46]
[ 8] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_mpi_init+0x692)[0x7fe797aab792]
[ 9] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(MPI_Init+0x5b)[0x7fe797ad96ab]
[10] ./a.out[0x400858]
[11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe7974bab45]
[12] ./a.out[0x400769]
*** End of error message ***
Interestingly, this happens even if I don't select portals4 as the btl. For example:
`which mpirun` --report-bindings -mca btl tcp,self -n 2 -H bold-node012,bold-node013 ./a.out
If I compile without portals4, this same command line launches the hello world program correctly.
If I run ./a.out without mpirun, it also crashes. gdb gives a more detailed stack trace:
#0 0x00007ffff7537067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff7538448 in __GI_abort () at abort.c:89
#2 0x00007fffe6fb94d3 in to_obj (type=POOL_ME, handle=0) at ../../../src/ib/ptl_obj.c:497
#3 0x00007fffe6fb09c0 in to_me (handle=0, me_p=0x7fffffffe8e0) at ../../../src/ib/ptl_me.h:78
#4 0x00007fffe6fb10f1 in PtlMEUnlink (me_handle=0) at ../../../src/ib/ptl_me.c:423
#5 0x00007fffe48de4e3 in ompi_mtl_portals4_flowctl_fini () from /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_mtl_portals4.so
#6 0x00007fffe48d6a28 in ompi_mtl_portals4_finalize () from /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_mtl_portals4.so
#7 0x00007ffff7b87d46 in mca_pml_base_select () from /home/pt2/openmpi-4.0.1/_build/../_install/lib/libmpi.so.40
#8 0x00007ffff7b14792 in ompi_mpi_init () from /home/pt2/openmpi-4.0.1/_build/../_install/lib/libmpi.so.40
#9 0x00007ffff7b426ab in PMPI_Init () from /home/pt2/openmpi-4.0.1/_build/../_install/lib/libmpi.so.40
#10 0x0000000000400858 in main (argc=1, argv=0x7fffffffec38) at hello.c:5
I noticed that portals4 sometimes fails a test when I run make check (may be this issue ), so it could be a portals4 issue.
I also tried configuring with --enable-btl-portals4-flow-control, but it does not seem to make a difference.