Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 24, 2015

No description provided.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 24, 2015

This is the backtrace that hinted we may actually have a use for recursive locks.

#0  0x00002b885ee5a625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00002b885ee5be05 in abort () at abort.c:92
#2  0x00002b885fba15ac in opal_mutex_lock (m=0xba1d40) at ../../../../../opal/threads/mutex_unix.h:93
#3  0x00002b885fba2fc8 in mca_mpool_grdma_release_memory (mpool=0xbaa280, base=0x2b8869e0d000, size=995328) at ../../../../../opal/mca/mpool/grdma/mpool_grdma_module.c:437
#4  0x00002b885fb9fbc0 in mca_mpool_base_mem_cb (base=0x2b8869e0d000, size=995328, cbdata=0x0, from_alloc=true) at ../../../../opal/mca/mpool/base/mpool_base_mem_cb.c:71
#5  0x00002b885fa9b5dc in opal_mem_hooks_release_hook (buf=0x2b8869e0d000, length=995328, from_alloc=true) at ../../opal/memoryhooks/memory.c:131
#6  0x00002b885fb9794c in opal_memory_linux_free_ptmalloc2_munmap (start=0x2b8869e0d000, length=995328, from_alloc=1) at ../../../../../opal/mca/memory/linux/memory_linux_munmap.c:71
#7  0x00002b885fb9821e in new_heap (size=135168, top_pad=131072) at ../../../../../opal/mca/memory/linux/arena.c:555
#8  0x00002b885fb987fc in opal_memory_ptmalloc2_int_new_arena (size=336) at ../../../../../opal/mca/memory/linux/arena.c:752
#9  0x00002b885fb98746 in arena_get2 (a_tsd=0x2b885fe41680, size=336) at ../../../../../opal/mca/memory/linux/arena.c:717
#10 0x00002b885fb9b0ba in opal_memory_ptmalloc2_malloc (bytes=336) at ../../../../../opal/mca/memory/linux/malloc.c:3432
#11 0x00002b885fb99f15 in opal_memory_linux_malloc_hook (sz=336, caller=0x2b885fae8da6) at ../../../../../opal/mca/memory/linux/hooks.c:690
#12 0x00002b885fb11928 in btl_openib_malloc_hook (sz=336, caller=0x2b885fae8da6) at ../../../../../opal/mca/btl/openib/btl_openib_component.c:168
#13 0x00002b885fae8da6 in opal_malloc (size=336, file=0x2b885fbfda70 "../../../../../opal/class/opal_object.h", line=468) at ../../../opal/util/malloc.c:101
#14 0x00002b885fbb9549 in opal_obj_new (cls=0x2b885fe397e0) at ../../../../../opal/class/opal_object.h:468
#15 0x00002b885fbb93e9 in opal_obj_new_debug (type=0x2b885fe397e0, file=0x2b885fbfdc70 "../../../../../opal/mca/rcache/vma/rcache_vma_tree.c", line=114) at ../../../../../opal/class/opal_object.h:245
#16 0x00002b885fbb9f39 in mca_rcache_vma_new (vma_rcache=0xba1d00, start=47864862511104, end=47864870903807) at ../../../../../opal/mca/rcache/vma/rcache_vma_tree.c:114
#17 0x00002b885fbbae2c in mca_rcache_vma_tree_insert (vma_rcache=0xba1d00, reg=0xbb8f00, limit=0) at ../../../../../opal/mca/rcache/vma/rcache_vma_tree.c:407
#18 0x00002b885fbb90cb in mca_rcache_vma_insert (rcache=0xba1d00, reg=0xbb8f00, limit=0) at ../../../../../opal/mca/rcache/vma/rcache_vma.c:119
#19 0x00002b885fba2981 in mca_mpool_grdma_register (mpool=0xbaa280, addr=0x2b8868209040, size=8388608, flags=0, reg=0x2b8867e05b80) at ../../../../../opal/mca/mpool/grdma/mpool_grdma_module.c:295
#20 0x00002b885fb0fbd5 in mca_btl_openib_register_mem (btl=0xbaa670, endpoint=0xc3d060, base=0x2b8868209040, size=8388608, flags=2) at ../../../../../opal/mca/btl/openib/btl_openib.c:1718
#21 0x00002b885e8bf574 in mca_pml_ob1_rdma_btls (bml_endpoint=0xc3ddb0, base=0x2b8868209040 "", size=8388608, rdma_btls=0xbb5340) at ../../../../../ompi/mca/pml/ob1/pml_ob1_rdma.c:71
#22 0x00002b885e8bc919 in mca_pml_ob1_send_request_start_btl (sendreq=0xbb5000, bml_btl=0xc3df10) at ../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:425
#23 0x00002b885e8bca90 in mca_pml_ob1_send_request_start_seq (sendreq=0xbb5000, endpoint=0xc3ddb0, seqn=2) at ../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:471
#24 0x00002b885e8bd9e3 in mca_pml_ob1_send (buf=0x2b8868209040, count=8388608, datatype=0x601bc0, dst=1, tag=0, sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x6015c0) at ../../../../../ompi/mca/pml/ob1/pml_ob1_isend.c:246
#25 0x00002b885e750cc4 in PMPI_Send (buf=0x2b8868209040, count=8388608, type=0x601bc0, dest=1, tag=0, comm=0x6015c0) at psend.c:78
#26 0x0000000000400bff in runfunc ()
#27 0x00002b885ec129d1 in start_thread (arg=0x2b8867e06700) at pthread_create.c:301
#28 0x00002b885ef108fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

@hjelmn
Copy link
Member Author

hjelmn commented Aug 24, 2015

A better fix will be to not use the malloc hooks for internal allocations. This may take some time to implement so this fix is being submitted as a stop-gap until that is ready.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 24, 2015

@bosilca Thoughts?

@hjelmn
Copy link
Member Author

hjelmn commented Aug 24, 2015

I should also note that the second patch in this PR does indeed fix the hang with the openib btl and thread multiple.

@hjelmn hjelmn added the bug label Aug 24, 2015
hjelmn added a commit to hjelmn/ompi that referenced this pull request Aug 24, 2015
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi#827

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn mentioned this pull request Aug 24, 2015
@hjelmn hjelmn added this to the Open MPI v2.0.0 milestone Aug 24, 2015
@bosilca
Copy link
Member

bosilca commented Aug 24, 2015

We now have a unique, but valid, use case for recursive mutexes. This condones their addition to OMPI code base.

However, I wonder if the current lock/unlock in the mpool is as fine grain as we would expect. Why is the mpool using the rcache lock to protect it's own internal initialization?

@hjelmn
Copy link
Member Author

hjelmn commented Aug 24, 2015

@bosilca The lock protects the rcache. One function (mca_mpool_grdma_register) is updating the rcache and further down the stack the other (mca_mpool_grdma_release_memory) is reading from it. At some point (before I joined the project) it was decided that the rcache code did not handle its own locking. Instead the caller is expected to hold the rcache lock if they are using the rcache in a multi-threaded way which is the case here.

@rolfv
Copy link

rolfv commented Aug 25, 2015

It might be more correct if you put an #if OPAL_ENABLE_DEBUG around the three debug variables, m_lock_debug, m_lock_file, m_lock_line.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 25, 2015

@rolfv Ah, yes. Will fix that.

@bosilca
Copy link
Member

bosilca commented Aug 25, 2015

@hjelmn that mutex protects more than just the rcache, as many of the mpool operations are also hidden behind. If there was a decision to share the rcache mutex then this was begging for the current issue to happen, as the mpool is indeed capable to recurse in both single-threaded and multi-threaded cases (due to the fact the allocating memory might result in a call to releasing to the OS some of the old mmaped memory arenas).

Your patch (adding recursive mutexes) is correct as long as we assume that the libc unmap is called in the same thread as where the malloc was called.

@hjelmn hjelmn force-pushed the recursive_locks branch 2 times, most recently from 895a5c6 to 5999fdf Compare August 25, 2015 23:12
@hjelmn
Copy link
Member Author

hjelmn commented Aug 25, 2015

@rolfv fixed the optimized build

@hjelmn
Copy link
Member Author

hjelmn commented Aug 26, 2015

@bosilca I will merge this fix and PR to v2.x. When I have a chance I will evaluate the mpool/rcache locking and see what can be done to remove the cause of the recursive locking.

hjelmn added 2 commits August 26, 2015 10:01
This new class is the same as the opal_mutex_t class but has a
different constructor. This constructor adds the recursive flag to the
mutex attributes for the lock. This class can be used where there may
be re-enty into the lock from within the same thread.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
There is currently a path through the grdma mpool and vma rcache that
leads to deadlock. It happens during the rcache insert. Before the
insert the rcache mutex is locked. During the call a new vma item is
allocated and then inserted into the rcache tree. The allocation
currently goes through the malloc hooks which may (and does) call back
into the mpool if the ptmalloc heap needs to be reallocated. This
callback tries to lock the rcache mutex which leads to the
deadlock. This has been observed with multi-threaded tests and the
openib btl.

This change may lead to some minor slowdown in the rcache vma when
threading is enabled. This will only affect larger message paths in
some of the btls.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@bosilca
Copy link
Member

bosilca commented Aug 26, 2015

@hjelmn the only viable solution is to avoid allocating any memory in the code protected by the mpool (in fact rcache) lock. By looking at the code it looks difficult to not to do so if we leave the rcache_new (and therefore the OBJ_NEW) deep inside the rcache_insert. However, if we split the rcahce usage in 2: allocation and insertion, then in the multi-threaded case we would need 1) special protections inside the rcache for the rcache objects manipulation and 2) special protection to avoid inserting the same element twice (with a sequence of find + new + insert from multiple threads).

@hjelmn
Copy link
Member Author

hjelmn commented Aug 31, 2015

@bosilca Thanks, I will take a look at implementing a solution next month.

hjelmn added a commit that referenced this pull request Aug 31, 2015
Add support for recursive locks (revisited)
@hjelmn hjelmn merged commit 2aab6ad into open-mpi:master Aug 31, 2015
hjelmn added a commit to hjelmn/ompi-release that referenced this pull request Sep 1, 2015
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds. Fixed in
   open-mpi/ompi@c101385.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi/ompi#827

master commit open-mpi/ompi@64e4419

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hjelmn added a commit to hjelmn/ompi that referenced this pull request Sep 8, 2015
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi#827

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
bosilca pushed a commit to bosilca/ompi that referenced this pull request Sep 15, 2015
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi#827

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
ggouaillardet pushed a commit to ggouaillardet/ompi-release that referenced this pull request Sep 28, 2015
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds. Fixed in
   open-mpi/ompi@c101385.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi/ompi#827

master commit open-mpi/ompi@64e4419

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn deleted the recursive_locks branch August 3, 2016 18:17
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants