-
Notifications
You must be signed in to change notification settings - Fork 936
Description
I did a rebase to master head from a version from October and I found that some of the multithreaded applications I use get deadlock when the program is initialized with THREAD_MULTIPLE with more than 1 thread doing communication. (1 is fine)
I did a bisection and it seems to start from the commit 84f63d0 from @hjelmn . I dug a little bit deeper and found that it might be the problem from opal_free_lifo_pop_atomic(). This is the stack from a typical MT ping-ping injection rate. It looks like the item_free field is always 1 although it is the only thread trying to do the pop.
#0 opal_lifo_pop_atomic (lifo=0x7ffff7dda780 <mca_pml_base_recv_requests>) at ../../../../opal/class/opal_lifo.h:247
247 if (opal_atomic_swap_32((volatile int32_t *) &item->item_free, 1)) {
(gdb) bt
#0 opal_lifo_pop_atomic (lifo=0x7ffff7dda780 <mca_pml_base_recv_requests>) at ../../../../opal/class/opal_lifo.h:247
#1 0x00007fffe2f37002 in opal_free_list_get_mt (flist=0x7ffff7dda780 <mca_pml_base_recv_requests>) at ../../../../opal/class/opal_free_list.h:193
#2 0x00007fffe2f370ec in opal_free_list_get (flist=0x7ffff7dda780 <mca_pml_base_recv_requests>) at ../../../../opal/class/opal_free_list.h:222
#3 0x00007fffe2f38537 in mca_pml_ob1_recv (addr=0x7fffd00008c0, count=1, datatype=0x603360 <ompi_mpi_byte>, src=1, tag=2, comm=0x603160 <ompi_mpi_comm_world>, status=0x7fffdfa04dc0) at pml_ob1_irecv.c:121
#4 0x00007ffff7ae5e84 in PMPI_Recv (buf=0x7fffd00008c0, count=1, type=0x603360 <ompi_mpi_byte>, source=1, tag=2, comm=0x603160 <ompi_mpi_comm_world>, status=0x7fffdfa04dc0) at precv.c:79
#5 0x000000000040150a in thread_work (info=0x902fd0) at pairwise.c:193
#6 0x00007ffff7815e25 in start_thread () from /usr/lib64/libpthread.so.0
#7 0x00007ffff754334d in clone () from /usr/lib64/libc.so.6Another application that has the same problem is GRID. The threaded-stencil benchmark deadlocks but thread-test from ompi-tests seems to run without problems.
OS: Red Hat Scientific Linux release 7.3
Any suggestion?