-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock in asconf timer event #653
Conversation
Everytime a new IP address appears as source or destination address: when receiving INIT, INIT ACK, ASCONF or binding to a new address; such address is appended to the addr_wq list and a SCTP_TIMER_TYPE_ADDR_WQ is started. The SCTP_TIMER_TYPE_ADDR_WQ timer iterates over all IPs from addr_wq, over all endpoints and over all associations in these endpoints and matches changes of IP address list with network structures in associations. List of all endpoints is protected by the global mutex: ipi_ep_mtx (A). List addr_wq is protected by the global mutex: wq_addr_mtx (B). Everytime a timer event is run a proper mutex protecting structure that the event concerns is locked for the time of handling an event. All timer events except for SCTP_TIMER_TYPE_ADDR_WQ are related to an endpoint or an association, so the proper endpoint or association locks are used. However in case of SCTP_TIMER_TYPE_ADDR_WQ a global wq_addr_mtx is used. It protects the addr_wq that event handler iterates over. So the timer thread running the ADDR_WQ event locked wq_addr_mtx (B) and inside the handler locked also ipi_ep_mtx (A) for iterating over endpoints. Meanwhile any other thread processing INIT, INIT ACK, ASCONF or binding to a new address locked ipi_ep_mtx (A) for making changes in global endpoint structures and during that changes locked wq_addr_mtx (B) for adding new IPs to the wq_addr queue. This caused a typical deadlock, where threads used locks in following order: * timer thread: B, A; * other threads: A, B. This patch fixes the issue by reducing the scope of wq_addr_mtx (B) during processing of SCTP_TIMER_TYPE_ADDR_WQ event only to places which uses addr_wq structure. This way lock wq_addr_mtx (B) is not used when ipi_ep_mtx (A) is required. Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
I don't think this is done when receiving an INIT, INIT ACK, or ASCONF chunk. But when a local address is added to / removed from an end point or to the stack.
Can you point me to a code path / call sequence, where the sequence A, B happens?
|
The deadlock doesn't appear in the current version of usrsctp. It happened on former version with: The external thread calling usrsctp_bind was:
At the same time the timer thread was processing SCTP_TIMER_TYPE_ADDR_WQ timer event by sctp_timeout_handler() and was:
but the PR does not longer apply as in the current version of usrsctp the sctp_inpcb_bind() function does not call sctp_add_addr_to_vrf() anymore. |
Thanks for double checking and letting me know. |
Everytime a new IP address appears as source or destination address:
when receiving INIT, INIT ACK, ASCONF or binding to a new address;
such address is appended to the addr_wq list and a SCTP_TIMER_TYPE_ADDR_WQ
is started.
The SCTP_TIMER_TYPE_ADDR_WQ timer iterates over all IPs from addr_wq,
over all endpoints and over all associations in these endpoints
and matches changes of IP address list with network structures in associations.
List of all endpoints is protected by the global mutex: ipi_ep_mtx (A).
List addr_wq is protected by the global mutex: wq_addr_mtx (B).
Everytime a timer event is run a proper mutex protecting structure
that the event concerns is locked for the time of handling an event.
All timer events except for SCTP_TIMER_TYPE_ADDR_WQ are related to an endpoint
or an association, so the proper endpoint or association locks are used.
However in case of SCTP_TIMER_TYPE_ADDR_WQ a global wq_addr_mtx is used.
It protects the addr_wq that event handler iterates over. So the timer thread
running the ADDR_WQ event locked wq_addr_mtx (B) and inside the handler
locked also ipi_ep_mtx (A) for iterating over endpoints.
Meanwhile any other thread processing INIT, INIT ACK, ASCONF
or binding to a new address locked ipi_ep_mtx (A) for making changes
in global endpoint structures and during that changes locked wq_addr_mtx (B)
for adding new IPs to the wq_addr queue.
This caused a typical deadlock, where threads used locks in following order:
This patch fixes the issue by reducing the scope of wq_addr_mtx (B)
during processing of SCTP_TIMER_TYPE_ADDR_WQ event only to places which
uses addr_wq structure. This way lock wq_addr_mtx (B) is not used
when ipi_ep_mtx (A) is required.
Signed-off-by: Lukasz Wojciechowski l.wojciechow@partner.samsung.com