Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when endpoint has multiple worker threads and SIP TCP transport is disconnected during incoming call handling #1902

Closed
pjsipbot opened this issue Dec 28, 2015 · 0 comments

Comments

@pjsipbot
Copy link
Collaborator

2015-12-28 02:51:08: @nanangizz created the issue on trac ticket 1902

Scenario:

  1. Run pjsua with multiple worker threads, e.g: with --thread-cnt=2 param.
  2. Receive a call via TCP transport and disconnect the transport immediately (can be simulated by adding code to shutdown the transport in the incoming call callback pjsua_call_on_incoming()).
  3. The thread handling incoming call may crash at any point when accessing the dialog (usually still inside pjsua_call_on_incoming() function), as the dialog is already destroyed prematurely by other working thread after transport is disconnected.

Sample call stack (crash when trying to use dialog's pool while dialog is already destroyed):

pj_pool_alloc_from_block(pj_pool_block * block, unsigned __int64 size) Line 50	C
pj_pool_alloc(pj_pool_t * pool, unsigned __int64 size) Line 60	C
pj_strdup(pj_pool_t * pool, pj_str_t * dst, const pj_str_t * src) Line 40	C
pjsip_tpmgr_find_local_addr2(pjsip_tpmgr * tpmgr, pj_pool_t * pool, pjsip_tpmgr_fla2_param * prm) Line 1510	C
pjsua_acc_get_uac_addr(int acc_id, pj_pool_t * pool, const pj_str_t * dst_uri, pjsip_host_port * addr, pjsip_transport_type_e * p_tp_type, int * secure, const void * * p_tp) Line 3118	C
pjsua_call_on_incoming(pjsip_rx_data * rdata) Line 1389	C
...

During fixing this issue, we found a couple of quite major issues:

  1. Dialog being created by pjsip_dlg_create_uas() may be destroyed by other thread before the function returns, so application has no chance to lock the dialog. It may happen with this scenario:
    1. Initial INVITE is coming, pjsip_dlg_create_uas() is being executed to create UAS dialog.
    2. pjsip_dlg_create_uas() created transaction for the incoming message and registered the just created dialog.
    3. Transport disconnected event received by other thread, it triggers transaction to move its state to terminated.
    4. The just created dialog receives on_tsx_state() and get destroyed!

This is a quite fatal bug, so we decided to deprecate pjsip_dlg_create_uas() immediately and introduced pjsip_dlg_create_uas_and_inc_lock(). Application not affected by this bug, e.g: app with a single worker thread, can continue using pjsip_dlg_create_uas() by setting DEPRECATED_FOR_TICKET_1902 to zero in config_site.h.

  1. In case of transport error on multi worker threads application, there is a possibility of a transaction user receiving two on_tsx_state() callbacks concurrently, one for terminated state notification and another for destroyed state notification, this is due to delayed terminated state notification when transport error occurs (the delay was introduced to avoid deadlock, see Deadlock and crash problem in transaction related to transport #1646). For dialog, this can be harmful as one of pjsip_dlg_on_tsx_state() may destroy the dialog, while another pjsip_dlg_on_tsx_state() is waiting for dialog lock.

This ticket fixes this issue by serializing the transaction state notifications.

Thanks Itay Bianco for the report.


2016-02-05 04:18:33: @nanangizz edited the issue description


2016-02-05 04:31:27: @nanangizz commented

In r5241, fixed:

  • Crash when endpoint has multiple worker threads and SIP TCP transport is disconnected during incoming call handling.
  • Deprecated pjsip_dlg_create_uas(), replaced by pjsip_dlg_create_uas_and_inc_lock().
  • Serialized transaction state notifications (terminated and destroyed) in case of transport error.

2016-02-19 10:47:54: @nanangizz changed status from new to closed


2016-02-19 10:47:54: @nanangizz set resolution to fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants