Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal_memory_ptmalloc2_free() hanging when invalid address provided #1312

Closed
matcabral opened this issue Jan 19, 2016 · 2 comments
Closed

opal_memory_ptmalloc2_free() hanging when invalid address provided #1312

matcabral opened this issue Jan 19, 2016 · 2 comments
Assignees
Milestone

Comments

@matcabral
Copy link
Contributor

Testing PSM2 library, I found that deadlock is triggered when calling fee() with an invalid address. This is with OMPI 1.10.1. See stack trace below: It looks like opal_memory_ptmalloc2_free() locks a mutex before calling opal_memory_ptmalloc2_int_free(). Then, segfault occurs, the signal handler is called and we have opal_memory_ptmalloc2_free() called again inside the signal handler that will wait forever on the mutex locked by the first call.

#0  0x00002afb2bc7e99d in nanosleep () from /lib64/libpthread.so.0
#1  0x00002afb2c569005 in opal_memory_ptmalloc2_free ()
   from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#2  0x00002afb2c4ee87f in opal_class_finalize ()
   from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#3  0x00002afb2b782b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#4  0x00002afb2bec4e49 in __run_exit_handlers () from /lib64/libc.so.6
#5  0x00002afb2bec4e95 in exit () from /lib64/libc.so.6
#6  0x00002afb320fc2ca in hfi_sighdlr (sig=11, p1=<optimized out>,
ucv=<optimized out>)
    at opa_debug.c:190
#7  <signal handler called>
#8  0x00002afb2c568b2a in opal_memory_ptmalloc2_int_free ()
   from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#9  0x00002afb2c569053 in opal_memory_ptmalloc2_free ()
   from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#10 0x00002afb320edadc in ips_free_epaddr (epaddr=0x2afb34755540) at
ips_proto_connect.c:634
#11 ips_proto_disconnect (proto=proto@entry=0x1aeb180, force=force@entry=0,
numep=numep@entry=5160,
    array_of_epaddr=array_of_epaddr@entry=0x2afb3ea8a290,
    array_of_epaddr_mask=array_of_epaddr_mask@entry=0x2afb3ea80130,
    array_of_errors=array_of_errors@entry=0x2afb3ea851e0,
timeout_in=timeout_in@entry=52000000000)
    at ips_proto_connect.c:1439
#12 0x00002afb320e6e84 in ips_proto_fini (proto=proto@entry=0x1aeb180, force=0,
timeout_in=52000000000)
    at ips_proto.c:641
#13 0x00002afb320e001f in ips_ptl_fini (ptl=0x1aeb040, force=<optimized out>,
timeout_in=<optimized out>)
    at ptl.c:433
#14 0x00002afb320d36dd in __psm2_ep_close (ep=0x1aeac80, mode=0,
timeout_in=52000000000) at psm_ep.c:1107
#15 0x00002afb31ec0b17 in ompi_mtl_psm2_finalize ()
   from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/openmpi/mca_mtl_psm2.so
#16 0x00002afb2b9dcd12 in ompi_mpi_finalize () from
/usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libmpi.so.12
#17 0x0000000000400d42 in main ()
@matcabral matcabral changed the title opal_memory_ptmalloc2_int_free() hanging when invalid address provided opal_memory_ptmalloc2_free() hanging when invalid address provided Jan 19, 2016
@rhc54 rhc54 added this to the v1.10.2 milestone Jan 20, 2016
@jsquyres
Copy link
Member

@matcabral I think you should be using _exit() in your hfi_sighdlr() (instead of exit()). That might well resolve your issue...?

@matcabral
Copy link
Contributor Author

Yes, thanks!
by definition _exit() should solve the hang. I haven't reproduced the hang so far. In any case, this is not an OMPI issue, so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants