Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock between usrsctp_conninput and usrsctp_close #711

Closed
JonathanLennox opened this issue Apr 22, 2024 · 1 comment
Closed

Deadlock between usrsctp_conninput and usrsctp_close #711

JonathanLennox opened this issue Apr 22, 2024 · 1 comment

Comments

@JonathanLennox
Copy link
Contributor

In a stress-test of usrsctp (the same test as was attached to #709) I saw a deadlock between usrsctp_close and usrsctp_conninput. Looking at the code, I suspect this could happen for the kernel implementation as well.

The issue is that sctp_common_input_processing acquires (in sctp_findassociation_addr) stcb->tcb_mtx, then, through the call stack sctp_process_data -> sctp_process_a_data_chunk -> sctp_add_to_readq, tries to acquire inp->inp_mtx. Meanwhile, sctp_close acquires inp->inp_mtx, then, in sctp_inpcb_free, tries to acquire stcb->tcb_mtx.

(Note: the line numbers shown in the crash are from #710, but nothing in that PR should have affected this deadlock.)

Excerpted gdb info:

(gdb) info threads
  Id   Target Id                                           Frame 
* 1    Thread 0xffffa8923020 (LWP 2157812) "crash_repro"   futex_wait (private=0, expected=2, futex_word=0xffff8c015c70)
    at ../sysdeps/nptl/futex-internal.h:146
...
  13   Thread 0xffffa253f120 (LWP 2165461) "crash_repro"   futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8)
    at ../sysdeps/nptl/futex-internal.h:146
(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c015c70) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0xffff8c015c70, private=private@entry=0) at ./nptl/lowlevellock.c:49
#2  0x0000ffffa868070c in lll_mutex_lock_optimized (mutex=0xffff8c015c70) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0xffff8c015c70) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000ffffa88b0aa4 in sctp_inpcb_free (inp=inp@entry=0xffff8c02e140, immediate=immediate@entry=1, from=from@entry=1)
    at ../../usrsctplib/netinet/sctp_pcb.c:4083
#5  0x0000ffffa88b854c in sctp_close (so=so@entry=0xffff8c02b1e0) at ../../usrsctplib/netinet/sctp_usrreq.c:891
#6  0x0000ffffa8863a7c in sofree (so=0xffff8c02b1e0) at ../../usrsctplib/user_socket.c:287
#7  0x0000ffffa8867aa8 in usrsctp_close (so=<optimized out>) at ../../usrsctplib/user_socket.c:2005
#8  0x0000aaaab6c020c8 in close_socket (o=0xaaaada80e3e0) at crash_repro.c:164
#9  run_test (close_ns=close_ns@entry=198272357) at crash_repro.c:245
#10 0x0000aaaab6c014a8 in main () at crash_repro.c:284
#4  0x0000ffffa88b0aa4 in sctp_inpcb_free (inp=inp@entry=0xffff8c02e140, immediate=immediate@entry=1, from=from@entry=1)
    at ../../usrsctplib/netinet/sctp_pcb.c:4083
4083			SCTP_TCB_LOCK(stcb);
(gdb) p stcb->tcb_mtx
$1 = {__data = {__lock = 2, __count = 0, __owner = 2165461, __nusers = 1, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\325\n!\000\001\000\000\000\002", '\000' <repeats 30 times>, __align = 2}
(gdb) thread 13
[Switching to thread 13 (Thread 0xffffa253f120 (LWP 2165461))]
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8) at ../sysdeps/nptl/futex-internal.h:146
146	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0xffff8c02e4e8, private=private@entry=0) at ./nptl/lowlevellock.c:49
#2  0x0000ffffa868070c in lll_mutex_lock_optimized (mutex=0xffff8c02e4e8) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0xffff8c02e4e8) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000ffffa88df50c in sctp_add_to_readq (inp=0xffff8c02e140, stcb=stcb@entry=0xffff8c015450, control=0xffff9801cd70, sb=0xffff8c02b298, 
    end=end@entry=1, inp_read_lock_held=inp_read_lock_held@entry=0, so_locked=so_locked@entry=0) at ../../usrsctplib/netinet/sctputil.c:5383
#5  0x0000ffffa887b63c in sctp_process_a_data_chunk (chk_type=<optimized out>, last_chunk=1, break_flag=<synthetic pointer>, abort_flag=0xffffa253e488, 
    high_tsn=0xffffa253e670, net=0xffff8c02dec0, chk_length=<optimized out>, offset=<optimized out>, m=0xffffa253e7f0, asoc=0xffff8c0154a8, 
    stcb=0xffff8c015450) at ../../usrsctplib/netinet/sctp_indata.c:2154
#6  sctp_process_data (mm=mm@entry=0xffffa253e7f0, iphlen=iphlen@entry=0, offset=offset@entry=0xffffa253e66c, length=length@entry=48, inp=0xffff8c02e140, 
    stcb=stcb@entry=0xffff8c015450, net=0xffff8c02dec0, high_tsn=high_tsn@entry=0xffffa253e670) at ../../usrsctplib/netinet/sctp_indata.c:2806
#7  0x0000ffffa888b1c8 in sctp_common_input_processing (mm=mm@entry=0xffffa253e7f0, iphlen=iphlen@entry=0, offset=<optimized out>, offset@entry=12, 
    length=length@entry=48, src=src@entry=0xffffa253e7f8, dst=dst@entry=0xffffa253e808, sh=0xffff9801a4e0, ch=0xffff9801a4ec, compute_crc=1 '\001', 
    ecn_bits=ecn_bits@entry=0 '\000', vrf_id=vrf_id@entry=0, port=port@entry=0) at ../../usrsctplib/netinet/sctp_input.c:6095
#8  0x0000ffffa8869860 in usrsctp_conninput (addr=<optimized out>, buffer=0xaaaada80e950, length=48, ecn_bits=ecn_bits@entry=0 '\000')
    at ../../usrsctplib/user_socket.c:3321
#9  0x0000aaaab6c01ad4 in input_packet_data (arg=0xffff8c02a2b0) at crash_repro.c:373
#10 0x0000ffffa867d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#11 0x0000ffffa86e5edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
#4  0x0000ffffa88df50c in sctp_add_to_readq (inp=0xffff8c02e140, stcb=stcb@entry=0xffff8c015450, control=0xffff9801cd70, sb=0xffff8c02b298, 
    end=end@entry=1, inp_read_lock_held=inp_read_lock_held@entry=0, so_locked=so_locked@entry=0) at ../../usrsctplib/netinet/sctputil.c:5383
5383			SCTP_INP_READ_LOCK(inp);
(gdb) p inp->inp_mtx
$2 = {__data = {__lock = 1, __count = 0, __owner = 2157812, __nusers = 1, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\001\000\000\000\000\000\000\000\364\354 \000\001\000\000\000\002", '\000' <repeats 30 times>, __align = 1}
(gdb) bt
@JonathanLennox
Copy link
Contributor Author

Actually unfortunately it looks like this was a consequence of #710 -- unexpectedly, it looks like the library depends on the socket's reference count not going to zero during sctp_common_input_processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant