Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when processing select variable via kemi #1829

Closed
towe75 opened this issue Feb 1, 2019 · 13 comments
Closed

Segfault when processing select variable via kemi #1829

towe75 opened this issue Feb 1, 2019 · 13 comments

Comments

@towe75
Copy link
Contributor

towe75 commented Feb 1, 2019

Hi,

we have crashes on some of our kamailio hosts:

/usr/sbin/kamailio[1833]: CRITICAL: <core> [core/pass_fd.c:277]: receive_fd(): EOF on 9
/usr/sbin/kamailio[1820]: ALERT: <core> [main.c:738]: handle_sigs(): child process 1831 exited by a signal 6
/usr/sbin/kamailio[1820]: ALERT: <core> [main.c:741]: handle_sigs(): core was not generated
/usr/sbin/kamailio[1820]: INFO: <core> [main.c:764]: handle_sigs(): terminating due to SIGCHLD

Problem

Kamailio segfaults when processing tcp replies.

Config file and use case

We're running a somewhat complex Kemi/python configuration. It routes internal udp traffic from Asterisk to PSTN providers with various protocols (udp/tcp/tls).

I was able to isolate the problem on a test rig to a simple test setup. It happens if kamailio uses a outgoing tcp socket, udp is not affected.
The segfault occurs always after approx. 12-13 register/auth/ok cycles (8 packets, each). No other messages or calls are involved at all. I can reproduce the case now with a short register timeout within 4-5 minutes.
Registers are not directly processed in kamailio (no usrloc at al).

A downgrade to 5.1.6 shows a similar problem.

Kamailio version / os

Prebuilt 5.2.1 from deb.kamailio.org in a Ubuntu 16.04 lxc container.

kamailio -V

version: kamailio 5.2.1 (x86_64/linux) 

flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown 
compiled with gcc 5.3.1

gdb debug info

Core dumps where analyzed with the source and dbg packages from deb.kamailio.org.

(gdb) list

251             clear_branches();
252
253             if(unlikely(ksr_route_locks_set!=NULL && msg->callid && msg->callid->body.s
254                             && msg->callid->body.len >0)) {
255                     cidlockidx = get_hash1_raw(msg->callid->body.s, msg->callid->body.len);
256                     cidlockidx = cidlockidx % ksr_route_locks_set->size;
257                     cidlockset = 1;
258             }
259
260             if(msg->first_line.type == SIP_REQUEST) {

(gdb) bt full

#0  0x0000000000596b86 in receive_msg (
    buf=0x2969590 "SIP/2.0 401 Unauthorized\r\nVia: SIP/2.0/TCP 81.27.163.30:5062;branch=z9hG4bK4473.59e6c5c0a45c67b063e075a8f9f8eccb.0\r\nVia: SIP/2.0/UDP 100.72.253.100:5060;rport=5060;received=100.72.253.100;
branch=z9hG4"..., len=573, rcv_info=0x7f256b187e08) at core/receive.c:256
        msg = 0x7f2577648d90
        ctx = {rec_lev = -878175776, run_flags = 32767, last_retcode = 6442811, jmp_env = {{__jmpbuf = {8108147, 134217728, 8388608, 6, 140736610180220, 139798687284720, 1, 70364449210377}, 
              __mask_was_saved = 878175761, __saved_mask = {__val = {139798687284848, 140736610179776, 139798687284720, 42949672963, 139798687286050, 8388608, 139798687286053, 140736610179792, 6445147, 0, 0, 
                  255, 0, 140736610180220, 139798687284720, 2314885530818453536}}}}}
        bctx = 0x0
        ret = 1
        stats_on = 0
        tvb = {tv_sec = 140736610180220, tv_usec = 139798687285480}
        tve = {tv_sec = 139798687284720, tv_usec = 38654722047}
        tz = {tz_minuteswest = 2014554944, tz_dsttime = 1}
        diff = 0
        inb = {
          s = 0x2969590 "SIP/2.0 401 Unauthorized\r\nVia: SIP/2.0/TCP 81.27.163.30:5062;branch=z9hG4bK4473.59e6c5c0a45c67b063e075a8f9f8eccb.0\r\nVia: SIP/2.0/UDP 100.72.253.100:5060;rport=5060;received=100.72.25
3.100;branch=z9hG4"..., len = 573}
        netinfo = {data = {s = 0x7bb873 "INFO", len = 134217728}, rcv = 0x800000, dst = 0x6}
        keng = 0x0
        evp = {data = 0x7fffcba81900, rcv = 0x7f256b187e08, dst = 0x0}
        cidlockidx = 2555451508
        cidlockset = 0
        errsipmsg = 0
        __func__ = "receive_msg"
#1  0x000000000062cb6c in receive_tcp_msg (
    tcpbuf=0x7f256b1880e8 "SIP/2.0 401 Unauthorized\r\nVia: SIP/2.0/TCP 81.27.163.30:5062;branch=z9hG4bK4473.59e6c5c0a45c67b063e075a8f9f8eccb.0\r\nVia: SIP/2.0/UDP 100.72.253.100:5060;rport=5060;received=100.72.
253.100;branch=z9hG4"..., len=573, rcv_info=0x7f256b187e08, con=0x7f256b187df0) at core/tcp_read.c:1448
        buf = 0x2969590 "SIP/2.0 401 Unauthorized\r\nVia: SIP/2.0/TCP 81.27.163.30:5062;branch=z9hG4bK4473.59e6c5c0a45c67b063e075a8f9f8eccb.0\r\nVia: SIP/2.0/UDP 100.72.253.100:5060;rport=5060;received=100.72.25
3.100;branch=z9hG4"...
        bsize = 65535
        blen = 65535
        __func__ = "receive_tcp_msg"
#2  0x000000000062ec28 in tcp_read_req (con=0x7f256b187df0, bytes_read=0x7fffcba81c78, read_flags=0x7fffcba81c7c) at core/tcp_read.c:1631
        bytes = 573
        total_bytes = 573
        resp = 1
        size = 24
        req = 0x7f256b187e70
        dst = {send_sock = 0x0, to = {s = {sa_family = 61216, sa_data = "\255\000\000\000\000\000\270\034\250\313\377\177\000"}, sin = {sin_family = 61216, sin_port = 173, sin_addr = {s_addr = 0}, 
              sin_zero = "\270\034\250\313\377\177\000"}, sin6 = {sin6_family = 61216, sin6_port = 173, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
                  __u6_addr8 = "\270\034\250\313\377\177\000\000\b\000\000\000\000\000\000", __u6_addr16 = {7352, 52136, 32767, 0, 8, 0, 0, 0}, __u6_addr32 = {3416792248, 32767, 8, 0}}}, sin6_scope_id = 20}}, 
          id = 0, proto = 1 '\001', send_flags = {f = 0, blst_imask = 1}}
        c = 57 '9'
        ret = -878175312
        __func__ = "tcp_read_req"
#3  0x00000000006318c1 in handle_io (fm=0x7f2577636468, events=1, idx=-1) at core/tcp_read.c:1804
        ret = 8
        n = 8
        read_flags = 1
        con = 0x7f256b187df0
        s = 9
        resp = 140736610180352
        t = 32549
        __func__ = "handle_io"
#4  0x000000000062017f in io_wait_loop_epoll (h=0xadef20 <io_w>, t=2, repeat=0) at core/io_wait.h:1065
        n = 1
        r = 0
        fm = 0x7f2577636468
        revents = 1
        __func__ = "io_wait_loop_epoll"
#5  0x000000000063375a in tcp_receive_loop (unix_sock=25) at core/tcp_read.c:1974
        __func__ = "tcp_receive_loop"
#6  0x000000000051a6ed in tcp_init_children () at core/tcp_main.c:4853
        r = 0
        i = 1
        reader_fd_1 = 25
        pid = 0
        si_desc = "tcp receiver (generic)\000\000\r\216U\000\000\000\000\000\bqbw%\177\000\000\320x`w\001", '\000' <repeats 11 times>, "M\307]\000\000\000\000\000Q\243|\000\000\000\000\000\250\271aw%\177\000\000`Fx\000\000\000\000\000\000\000\000\000N\031\000\000s\270{\000\000\000\000\000\320_\347w%\177\000\000s\270{\000\000\000\000\000\000\000\000\b\000\000\000"
        si = 0x0
        __func__ = "tcp_init_children"
#7  0x000000000042619b in main_loop () at main.c:1735
        i = 2
        pid = 6477
        si = 0x0
        si_desc = "udp receiver child=1 sock=81.27.163.30:5060 (81.27.163.30:5060)\000\004\000\000\000\000\000\000\000\000\260\354j%\177\000\000\000\000\000\000\000\000\000\000\250\347\027k%\177\000\000p \250\313\377\177\000\000\034,M\000\000\000\000\000", '\377' <repeats 12 times>, "\n\000\000"
        nrprocs = 2
        woneinit = 1
        __func__ = "main_loop"
#8  0x000000000042c9a5 in main (argc=15, argv=0x7fffcba82318) at main.c:2675
        cfg_stream = 0x27eb010
        c = -1
        r = 0
        tmp = 0x7fffcba83eb8 ""
        tmp_len = 1
        port = 0
        proto = 2021155176
        options = 0x76c218 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 856022880
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 0
        n_lst = 0x7f2578786718
        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
        st = {st_dev = 93, st_ino = 98, st_nlink = 2, st_mode = 16832, st_uid = 378, st_gid = 378, __pad0 = 0, st_rdev = 0, st_size = 60, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1549008770, 
            tv_nsec = 369207448}, st_mtim = {tv_sec = 1549009064, tv_nsec = 813217177}, st_ctim = {tv_sec = 1549009064, tv_nsec = 813217177}, __glibc_reserved = {0, 0, 0}}
        __func__ = "main"

Let me know if you need more info, i can fully reproduce the problem.

Cheers,

Thomas

@miconda
Copy link
Member

miconda commented Feb 1, 2019

Did you set route_locks_size global parameter? If yes, what is the value?

Can you provide from gdb the output for:

p ksr_route_locks_set
p *ksr_route_locks_set

@towe75
Copy link
Contributor Author

towe75 commented Feb 1, 2019

No, this parameter is not set, it should have the default value.

@towe75 towe75 changed the title Segault when processing replies from a outgoing tcp connection Segfault when processing replies from a outgoing tcp connection Feb 1, 2019
@towe75
Copy link
Contributor Author

towe75 commented Feb 1, 2019

@miconda not sure if it's useful but i tried to start kamailio with the mentioned route_locks_size but could not get it up (tried with values 1,2,8,16,20).

Starting Kamailio (OpenSER) - the Open Source SIP Server...
Feb 01 09:47:57 ifens224 kamailio[11130]: INFO: pv [pv_shv.c:60]: shvar_init_locks(): locks array size 16
Feb 01 09:47:57 ifens224 kamailio[11130]: ERROR: <core> [core/receive.c:82]: ksr_route_locks_set_init(): failed to allocate route locks set

@miconda
Copy link
Member

miconda commented Feb 4, 2019

You haven't provided the output of the gdb commands I listed in my previous comment, I need to see what's there to be able to troubleshoot further.

@towe75
Copy link
Contributor Author

towe75 commented Feb 4, 2019

Sorry, i overlooked it.

Output is:

(gdb) p ksr_route_locks_set
$1 = (rec_lock_set_t *) 0x7c00000100

(gdb) p *ksr_route_locks_set
Cannot access memory at address 0x7c00000100

@miconda
Copy link
Member

miconda commented Feb 4, 2019

That should be NULL (0x0) by default, so somewhere is an invalid write/buffer overflow.

Can you just start kamailio with your config, but do not send any sip traffic to it -- just attach with gdb to one of the processes running there and do the same commands as above to see if the overwrite happens at startup or at runtime?

@towe75
Copy link
Contributor Author

towe75 commented Feb 4, 2019

Ok, i attached to a random child and i get the expected output:

(gdb) p ksr_route_locks_set
$1 = (rec_lock_set_t *) 0x0

(gdb) p *ksr_route_locks_set
Cannot access memory at address 0x0

Strange. Can i check for some specific included module or is there some other global config that might affect this setting?

@towe75
Copy link
Contributor Author

towe75 commented Feb 11, 2019

@miconda, i was able to isolate the problem to a single line of python script.
I did so by writing two sipp scripts to simulate our asterisk and the isp and subsequently removing logic from the kamailio script.

Segfault happens when i use this selector within ksr_reply_route:

KSR.pv.get("$sel(@hf_value.Via)")

a simpler KSR.pv.get("$sel(via)") works without crashes. I could surely ommit the "hf_value" part to workaround the problem but it looks like some KEMI/python binding problem.
Using e.g. "hf_value.From" does not result in a crash as well. I tried replies with multiple Via lines and also multiple Via's in one header line and could not find a difference in the behavior.

As stated above, the segfault happens after so and so many packets (typically 14-20) but the pace of the messages seems to play a role as well.

Hope it helps!

Cheers, Thomas

@miconda
Copy link
Member

miconda commented Feb 13, 2019

Trying to understand how I can reproduce it in order to troubleshoot ...

So, I have to use KSR.pv.get("$sel(@hf_value.Via)") inside ksr_reply_route() of a Python (app_python) script? The it crashes after a while of handling traffic (14-20 packets -- requests and replies, or only replies)?

At the first thought, should not be a KEMI export issue, but a select variable issue. From what is strictly related to KEMI code, either KSR.pv.get("$sel(via)") or KSR.pv.get("$sel(@hf_value.Via)") have the sam code, but then it ends up in executed code from textopsx module to get the $sel(@hf_value.Via), while I think the $sel(via) is implemented in the core. The select variables were implemented by SER developers when Kamailio was a separate project (during 2005-2008) and we got the code with the merge in 2008-2009. So, overall I am not that familiar with that code, but it has to be fixed, so being able to reproduce will help a lot.

@towe75
Copy link
Contributor Author

towe75 commented Feb 13, 2019

Daniel, i will shrink the script as much as possible and then i can surely provide a full test case incl. sipp scenarios. Give me a few days, though.
By now i can say that this is the only place where i used a "@hf_value" selector. Maybe the segfault occurs also if one uses it in a request instead of a reply, this would simplify the test case greatly (e.g. can then maybe use sipsak).

@towe75
Copy link
Contributor Author

towe75 commented Feb 15, 2019

I prepared some files in a zip: https://drive.google.com/open?id=1qVap6BhbGpJwHL-Gf3aUCBOSSbekscs5 including some instructions on how to reproduce. Kamailio config is down to 15 lines (5 modules only) and the python script is also bare bones with about 30 lines.

It boils down to:

  1. setup kamailio and sipp
  2. run kamailio
  3. start the "isp" sipp in one shell
  4. start the "asterisk" sipp in another shell
    (more details to be found in the zip)

I did a few more checks, also. One initial assumption about the problem was wrong.
It is reproducible with a single udp socket as well, not related to tcp like initially stated.
Also i tried to reproduce it with a native kamailio config file but could not reproduce it, this script is also included in the zip file.

Good luck hunting it down :-)

@miconda miconda changed the title Segfault when processing replies from a outgoing tcp connection Segfault when processing select variable via kemi Feb 26, 2019
miconda added a commit that referenced this issue Feb 26, 2019
@miconda
Copy link
Member

miconda commented Feb 27, 2019

Thanks for the instructions to reproduce the issue, it helped to identify the problem and fix it. Select required some function to be executed in order to initialize some of its internal buffers and it didn't happen when executing kemi scripts. I pushed fixes to master branch, I am going to backport to stable branches these days.

If you want to test with master branch right now, let me know the results in your side.

@miconda miconda closed this as completed Feb 27, 2019
@towe75
Copy link
Contributor Author

towe75 commented Feb 28, 2019

@miconda great! Glad i could help.
I will try it out in a few days. Cu at Kamailio World

Cheers

miconda added a commit that referenced this issue Mar 6, 2019
- reset the select static buffer
- GH #1829

(cherry picked from commit 4868694)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants