Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TM crash when relaying reply, in context of unreachable rtpengine #3297

Open
abalashov opened this issue Dec 8, 2022 · 0 comments
Open

TM crash when relaying reply, in context of unreachable rtpengine #3297

abalashov opened this issue Dec 8, 2022 · 0 comments
Labels

Comments

@abalashov
Copy link
Contributor

When relaying this 200 OK repeatedly with calls to rtpengine_answer() failing because all RTPEngines in a set are unreachable...

Dec  8 06:24:38 gw kamailio[4823]: ERROR: rtpengine [rtpengine.c:2960]: select_rtpp_node(): rtpengine failed to select new for calllen=36 callid=8279a4ea-151e-48de-a23c-5840104126de
SIP/2.0 200 OK
v:SIP/2.0/UDP 10.150.20.20;branch=z9hG4bK03e.e4da444e627bb575ef4f46151729a1fa.0,SIP/2.0/UDP 10.151.20.108;received=10.151.20.108;rport=5060;branch=z9hG4bKKQyrBKpp04X0K
Record-Route:<sip:10.150.20.20;lr;ftag=emHm2apy95jFe;rtp_relay=1;rtp_group=0>
f:"14045551212"<sip:14045551212@10.151.20.108>;tag=emHm2apy95jFe
t:<sip:16782001111@internal.evaristesys.com>;tag=tHQt0rDcF7UHK
i:8279a4ea-151e-48de-a23c-5840104126de
CSeq:60686471 INVITE
m:<sip:16782001111@10.160.10.55:5060;transport=udp>
User-Agent:VoiceAppServer
Allow:INVITE,ACK,BYE,CANCEL,OPTIONS,MESSAGE,INFO,UPDATE,REFER,NOTIFY
k:path,replaces
u:talk,hold,conference,refer
c:application/sdp
Content-Disposition:session
l:257

v=0
o=DM 1670428117 1670428118 IN IP4 10.160.10.55
s=DM
c=IN IP4 10.160.10.55
t=0 0
m=audio 52418 RTP/AVP 0 101
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=rtcp:52419 IN IP4 10.160.10.55

Repeat INVITEs eventually result in a crash. This backtrace is taken from a different server but identical scenario:

#0  t_should_relay_response (Trans=0x7f4aee56ba80, new_code=200, branch=0, should_store=0x7ffd8b8bbba0, should_relay=0x7ffd8b8bbba4, cancel_data=0x7ffd8b8bbe40,
    reply=0x7f4afe40df70) at t_reply.c:1285
1285				&& !(inv_through && Trans->uac[branch].last_received<300)) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64 jansson-2.11-1.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 libev-4.03-3.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 libunistring-0.9.3-5.el6.x86_64 openssl-1.0.1e-48.el6_8.3.x86_64 sqlite-3.6.20-1.el6_7.2.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0  t_should_relay_response (Trans=0x7f4aee56ba80, new_code=200, branch=0, should_store=0x7ffd8b8bbba0, should_relay=0x7ffd8b8bbba4, cancel_data=0x7ffd8b8bbe40,
    reply=0x7f4afe40df70) at t_reply.c:1285
#1  0x00007f4afd73ab7d in relay_reply (t=0x7f4aee56ba80, p_msg=0x7f4afe40df70, branch=0, msg_status=200, cancel_data=0x7ffd8b8bbe40, do_put_on_wait=1)
    at t_reply.c:1821
#2  0x00007f4afd740c38 in reply_received (p_msg=0x7f4afe40df70) at t_reply.c:2558
#3  0x00000000004b5d6a in do_forward_reply (msg=0x7f4afe40df70, mode=0) at core/forward.c:747
#4  0x00000000004b78c9 in forward_reply (msg=0x7f4afe40df70) at core/forward.c:852
#5  0x000000000054982b in receive_msg (
    buf=0xb27700 "SIP/2.0 200 OK\r\nv:SIP/2.0/UDP xxx;branch=z9hG4bK9915.dcfe6d03cb21e6c94a6705adb095c566.0,SIP/2.0/UDP xxx;received=xxx;rport=5060;branch=z9hG4bKZmZp0NZv3FvQB\r\nRecord-Rout"..., len=984, rcv_info=0x7ffd8b8bc6c0) at core/receive.c:434
#6  0x00000000006658ca in udp_rcv_loop () at core/udp_server.c:541
#7  0x000000000042500f in main_loop () at main.c:1655
#8  0x000000000042c5bb in main (argc=11, argv=0x7ffd8b8bcc88) at main.c:2696

Looks like in the crash frame, Trans->uac is NULL:

(gdb) frame 0
#0  t_should_relay_response (Trans=0x7f4aee56ba80, new_code=200, branch=0, should_store=0x7ffd8b8bbba0, should_relay=0x7ffd8b8bbba4, cancel_data=0x7ffd8b8bbe40,
    reply=0x7f4afe40df70) at t_reply.c:1285
1285				&& !(inv_through && Trans->uac[branch].last_received<300)) {
(gdb) print Trans->uac
$1 = (struct ua_client *) 0x0

I have seen this crash from time to time, though it is infrequent. It always seems to occur when there are problems reaching RTPEngine in a timely fashion.

This is on Kamailio 5.2.4:de9a03, which I know has long fallen off the back of the maintenance train. However, line 1386 of t_reply.c in master:HEAD shows that the code is no different now:

https://github.com/kamailio/kamailio/blob/master/src/modules/tm/t_reply.c#L1386

Of course, it's possible that something upstream has changed so as to prevent this state. Otherwise, I suppose a null pointer safety check for Trans->uac is recommended. I'd be happy to submit a patch, just wasn't sure if the feeling was more that this should be resolved higher up the call stack.

@abalashov abalashov added the bug label Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant