Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in t_reply of Kamailio v5.2.2 rev. 9bc44f #2055

Closed
mshary opened this issue Sep 3, 2019 · 8 comments
Closed

Segfault in t_reply of Kamailio v5.2.2 rev. 9bc44f #2055

mshary opened this issue Sep 3, 2019 · 8 comments

Comments

@mshary
Copy link
Contributor

mshary commented Sep 3, 2019

Description

Kamailio v5.2.2 rev. 9bc44f randomly crashes once a week. The problem seems to be related to TM and ASYNC modules as seen in back-trace of dump core below.

Troubleshooting

None. We have dump core file available for analysis.

Reproduction

Randomly, roughly once every 5 to 7 days.

Debugging Data

Program terminated with signal 11, Segmentation fault.
#0  0xb4ecdb33 in faked_env (t=0x7ad779f8, msg=0x0, is_async_env=1) at t_reply.c:813
813     t_reply.c: No such file or directory.

(gdb) info locals
__FUNCTION__ = "faked_env"

(gdb) list
808     in t_reply.c

(gdb) bt full
#0  0xb4ecdb33 in faked_env (t=0x7ad779f8, msg=0x0, is_async_env=1) at t_reply.c:813
        __FUNCTION__ = "faked_env"
#1  0xb4eedba2 in t_continue_helper (hash_index=48512, label=393930873, rtact=0xb5ba6184, cbname=0x0, cbparam=0x0) at t_suspend.c:337
        t = 0x7ad779f8
        backup_T = 0xffffffff
        backup_T_branch = -1
        faked_req = 0x7afdf1bc
        brpl = 0x0
        erpl = 0x8055fd2
        faked_req_len = 3176
        cancel_data = {cancel_bitmap = 3076137104, reason = {cause = 26896, u = {text = {s = 0xb51cc7e0 "\254\026\003", len = 134606304}, e2e_cancel = 0xb51cc7e0,
              packed_hdrs = {s = 0xb51cc7e0 "\254\026\003", len = 134606304}}}}
        branch = 0
        uac = 0x7ad77b64
        ret = 0
        cb_type = 2
        msg_status = -1256442488
        last_uac_status = -1245318100
        reply_status = 137942287
        do_put_on_wait = 134606304
        hdr = 0xb55987d8
        prev = 0x0
        tmp = 0x0
        route_type_bk = 1
        keng = 0x0
        evname = {s = 0xb4f31305 "tm:continue", len = 11}
        __FUNCTION__ = "t_continue_helper"
#2  0xb4ef0a2d in t_continue (hash_index=48512, label=393930873, route=0xb5ba6184) at t_suspend.c:583
No locals.
#3  0xb43ceac3 in async_exec_task (param=0x7a94bf54) at async_sleep.c:233
        atp = 0x7a94bf54
        keng = 0x0
        cbname = {s = 0x0, len = 0}
        evname = {s = 0xb43d1a94 "async:task-exec", len = 15}
        __FUNCTION__ = "async_exec_task"
#4  0x080894c6 in async_task_run (idx=4) at core/async_task.c:282
        ptask = 0x7a94bf4c
        received = 4
        __FUNCTION__ = "async_task_run"
#5  0x08087ca1 in async_task_child_init (rank=0) at core/async_task.c:185
        pid = 0
        i = 3
        __FUNCTION__ = "async_task_child_init"
#6  0x08201d26 in init_child (rank=0) at core/sr_module.c:871
        ret = 62
#7  0x0806b3d9 in main_loop () at main.c:1737
        i = 4
        pid = 5982
        si = 0x0
        si_desc = "udp receiver child=3 sock=85.118.42.43:5080\000\000\002a\267гn\267p\037\000\000\240\263n\267\364\237n\267\240\263n\267`\025\316\b\220W\371\277\315\002a\267\220\000\000\000`\025\316\b\364\237n\267\025\000\000\000\060X\371\277\355\ng\267\060/\316\b0/\316\b@q\245\265\000@\000\000@\220n\267\000\000\000"
        nrprocs = 4
        woneinit = 1
        __FUNCTION__ = "main_loop"
#8  0x08071e84 in main (argc=13, argv=0xbff95be4) at main.c:2696
        cfg_stream = 0x8bd3008
        c = -1
        r = 0
        tmp = 0xbff96822 ""
        tmp_len = -1218858976
        port = 2209
        proto = 1
        options = 0x83779a8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 1217010649
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 0
        n_lst = 0xbff95be4
        p = 0x805e474 "[\201\303\060\037@"
        st = {st_dev = 14, __pad1 = 0, st_ino = 10873, st_mode = 16832, st_nlink = 2, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 120,
          st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1559687937, tv_nsec = 2808493}, st_mtim = {tv_sec = 1566372756, tv_nsec = 587274417}, st_ctim = {
            tv_sec = 1567450779, tv_nsec = 139516276}, __unused4 = 0, __unused5 = 0}
        __FUNCTION__ = "main"

Log Messages

Sep  3 03:02:03 webrtc-as1 kernel: [7785063.653846] kamailio[5987]: segfault at 318 ip b4ecdb33 sp bff94f90 error 6 in tm.so[b4e15000+12a000]

SIP Traffic

Not available

Possible Solutions

N/A

Additional Information

  • Kamailio Version - output of kamailio -v
version: kamailio 5.2.2 (i386/linux) 9bc44f
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 9bc44f -dirty
compiled on 11:33:22 Apr 29 2019 with gcc 4.7.2
  • Operating System:
Linux webrtc-as1 3.16.0-0.bpo.4-686-pae #1 SMP Debian 3.16.36-1+deb8u2~bpo70+1 (2016-10-19) i686 GNU/Linux
@miconda
Copy link
Member

miconda commented Sep 3, 2019

Can you get the output for the next gdb commands:

frame 0
p *t

@mshary
Copy link
Contributor Author

mshary commented Sep 3, 2019

Here it is,

(gdb) frame 0
#0  0xb4ecdb33 in faked_env (t=0x7ad779f8, msg=0x0, is_async_env=1) at t_reply.c:813
813     in t_reply.c
(gdb) p *t
$1 = {next_c = 0x0, prev_c = 0x0, hash_index = 0, label = 0, flags = 0, nr_of_outgoings = 0, fcount = 0, ref_count = {val = 0}, from = {s = 0x0, len = 0}, callid = {
    s = 0x0, len = 0}, cseq_n = {s = 0x0, len = 0}, to = {s = 0x0, len = 0}, method = {s = 0x0, len = 0}, tmcb_hl = {first = 0x0, reg_types = 0}, wait_timer = {
    next = 0x0, prev = 0x0, expire = 0, initial_timeout = 0, data = 0x0, f = 0, flags = 0, slow_idx = 0}, wait_start = 0, uas = {request = 0x0, end_request = 0x0,
    response = {rbtype = 0, flags = 0, t_active = 0, branch = 0, buffer_len = 0, buffer = 0x0, my_T = 0x0, timer = {next = 0x0, prev = 0x0, expire = 0,
        initial_timeout = 0, data = 0x0, f = 0, flags = 0, slow_idx = 0}, dst = {send_sock = 0x0, to = {s = {sa_family = 0, sa_data = '\000' <repeats 13 times>},
          sin = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0,
            sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}},
            sin6_scope_id = 0}}, id = 0, proto = 0 '\000', send_flags = {f = 0, blst_imask = 0}}, retr_expire = 0, fr_expire = 0}, local_totag = {s = 0x0, len = 0},
    cancel_reas = 0x0, status = 0}, uac = 0x0, async_backup = {backup_route = 0, backup_branch = 0, blind_uac = 0, ruri_new = 0}, fwded_totags = 0x0,
  uri_avps_from = 0x0, uri_avps_to = 0x0, user_avps_from = 0x0, user_avps_to = 0x0, domain_avps_from = 0x0, domain_avps_to = 0x0, xavps_list = 0x0, reply_mutex = {
    val = 0}, reply_locker_pid = {val = 0}, reply_rec_lock_level = 0, fr_timeout = 0, fr_inv_timeout = 0, rt_t1_timeout_ms = 0, rt_t2_timeout_ms = 0, end_of_life = 0,
  relayed_reply_branch = 0, on_failure = 0, on_branch_failure = 0, on_reply = 0, on_branch = 0, on_branch_delayed = 0,
  md5 = 0x7ad77b44 "92164e4777075c44c4f5de28a2b6f050\377\377\377\377"}

@miconda
Copy link
Member

miconda commented Sep 4, 2019

Seems to be a removed transaction structure. Can the async operation take very long till it resumes the transaction processing? What kind of operations are done in this async execution?

@mshary
Copy link
Contributor Author

mshary commented Sep 4, 2019

The async operation basically related to chat history setup on our production server. It fetches chat sessions and relevent info from backend (REDIS + MySQL or Elastic Search) using mod_perl.

According to our load test results, the whole operation takes roughly 500 to 800 ms (<1 second) under normal conditions but may prolong upto 5000 ms (5 seconds) in case backend service (MySQL or ES) failure.

The server logs show that transaction in question was started at 03:02:00 CET on Sept. 03, 2019 and kamailio crashed at 03:02:03 CET, there was a backend failure (REDIS was unreachable at the time).

@miconda
Copy link
Member

miconda commented Sep 4, 2019

Do you still have the sip traffic for that transaction (requests/replies received/sent by kamailio)? If yes, you can send me the pcap to try to match with code execution.

@mshary
Copy link
Contributor Author

mshary commented Sep 13, 2019

Hi,

Sorry for late reply, i was travelling and didn't get chance to look at this issue.

Anyways, no, we do not have sip trace. I only have syslogs which indicate that right before crash the REDIS connectivity failed.

Sep  3 03:02:03 webrtc-service kamailio[5987]: ERROR: app_perl [kamailioxs.xs:1041]: XS_Kamailio__Message_log(): perl error: Could not connect to Redis server at 172.16.x.x:6379: Connection timed out at /usr/share/perl5/Redis.pm line 420#012#011Redis::__build_sock('Redis=HASH(0x93a1948)') called at /usr/share/perl5/Redis.pm line 407#012#011eval {...} called at /usr/share/perl5/Redis.pm line 407#012#011Redis::__connect('Redis=HASH(0x93a1948)') called at /usr/share/perl5/Redis.pm line 65#012#011Redis::new('Redis', 'server', '172.16.x.x:6379', 'reconnect', 2, 'every', 100, 'encoding', undef, ...) called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 2272#012#011main::get_chat_sessions('Kamailio::Message=SCALAR(0x8cddb64)', 502100012130) called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 0#012#011eval{...} called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 0#012#011...propagated at /usr/share/perl5/Redis.pm line 410.#012
Sep  3 03:02:03 webrtc-service kernel: [7785063.653846] kamailio[5987]: segfault at 318 ip b4ecdb33 sp bff94f90 error 6 in tm.so[b4e15000+12a000]

@mshary
Copy link
Contributor Author

mshary commented Sep 13, 2019

Here is the code fragment that was executing in async process when crash happened,

            xlog("L_NOTICE", "[$pr:$fU@$si:$sp]: Found '$avp(rows)' db records for SIP SCI command '$avp(sip_sci_cmd)' \n");
	    append_to_reply("X-Message-Count: $avp(rows)\r\n");
            $avp(reason) = "Accepted: sending " + $avp(rows) + " messages";
            send_reply("201", "$avp(reason)");
...
            perl_exec("get_chat_sessions", $fU);
            xlog("L_INFO", "[$pr:$fU@$si:$sp]: Updated Chat History Summary: $avp(new_body) \n");

            #### self message with json body ####
            $uac_req(all)=$null;
            $uac_req(method)="MESSAGE";
            $uac_req(furi)=$fu;
            $uac_req(turi)=$fu;
            #$uac_req(evroute)=1;
            $uac_req(callid)=$uuid(r);
            $uac_req(hdrs)="Content-Type: application/json\r\nDate: " + $TF + "\r\n";
            $uac_req(body)=$avp(new_body);

            $uac_req(ruri)=$avp(sender_ruri);
            $uac_req(ouri)=$avp(sender_duri);
            $uac_req(sock)=$avp(sender_sock);

            uac_req_send();

Note that variable $avp(new_body) was set by PERL script. There is a theory from one of my developers is that since REDIS failed to connect so this variable was never set and when UAC tried to set NULL as JSON body for SIP MESSAGE request, the crashed must have happened. However, the core dump attached in this ticket does not confirm that.

This resulted in this syslog messages,

Sep  3 02:59:56 webrtc-service kamailio[5987]: NOTICE: <script>: [wss:xxxxxxxx@x.x.x.x:52025]: Found '26' db records for SIP SCI command 'chat_summary'
Sep  3 03:02:03 webrtc-service kamailio[5987]: ERROR: app_perl [kamailioxs.xs:1041]: XS_Kamailio__Message_log(): perl error: Could not connect to Redis server at 172.16.x.x:6379: Connection timed out at /usr/share/perl5/Redis.pm line 420#012#011Redis::__build_sock('Redis=HASH(0x93a1948)') called at /usr/share/perl5/Redis.pm line 407#012#011eval {...} called at /usr/share/perl5/Redis.pm line 407#012#011Redis::__connect('Redis=HASH(0x93a1948)') called at /usr/share/perl5/Redis.pm line 65#012#011Redis::new('Redis', 'server', '172.16.x.x:6379', 'reconnect', 2, 'every', 100, 'encoding', undef, ...) called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 2272#012#011main::get_chat_sessions('Kamailio::Message=SCALAR(0x8cddb64)', 502100012130) called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 0#012#011eval{...} called at /usr/local/kamailio/etc/kamailio/chat_history.pl line 0#012#011...propagated at /usr/share/perl5/Redis.pm line 410.#012
Sep  3 03:02:03 webrtc-service kernel: [7785063.653846] kamailio[5987]: segfault at 318 ip b4ec

Let know if you have any further queries.

miconda added a commit that referenced this issue Oct 9, 2019
- transaction is expired in that moment, pending its destroy process
- GH #2055
@miconda
Copy link
Member

miconda commented Oct 9, 2019

My expectation is that the crash happened because the transaction was resumed more or less when it was about to be destroyed. I pushed a patch to prevent that happening -- I will back port it. If you still get the crash with it, reopen.

@miconda miconda closed this as completed Oct 9, 2019
miconda added a commit that referenced this issue Oct 9, 2019
- transaction is expired in that moment, pending its destroy process
- GH #2055

(cherry picked from commit 52afc7b)
miconda added a commit that referenced this issue Oct 9, 2019
- transaction is expired in that moment, pending its destroy process
- GH #2055

(cherry picked from commit 52afc7b)
(cherry picked from commit c44bd17)
miconda added a commit that referenced this issue Jan 28, 2020
- transaction is expired in that moment, pending its destroy process
- GH #2055

(cherry picked from commit 52afc7b)
(cherry picked from commit c44bd17)
(cherry picked from commit f36d656)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants