stuck channels on FreeSWITCH 1.10.10 (possibly transcoding is the cause) #2264

gabada · 2023-10-05T01:29:13Z

Describe the bug
FreeSWITCH channels are getting stuck. When you restart/fsctl crash freeswitch the CDR that's written says INCOMPATIBLE_DESTINATION.

When I set my ITSP to only offer PCMU everything works as expected.

When you do a uuid_kill on the channel it says OK but doesn't actually kill it (remove it from the database). You need to restart Freeswitch to get it removed.

To Reproduce
Steps to reproduce the behavior:

Make a bunch of inbound calls to Freeswitch 1.10.10 offering codes other than PCMU
It is not every call but eventually you will start seeing calls that got stuck.

Expected behavior
FreeSWITCH should handle the transcoding and the calls should release normally when they are completed.

Package version or git hash

FreeSWITCH Version 1.10.10-release-24-4cb05e7f4a~64bit (-release-24-4cb05e7f4a 64bit)

Trace logs
freeswitch.log is a call that is still stuck.
freeswitch.log

backtrace from core file

backtrace.log

The text was updated successfully, but these errors were encountered:

markjcrane · 2023-10-06T21:33:16Z

I've seen this on multiple servers as well. If using PCMU only it seems to solve the calls that get stuck.

andywolk · 2023-10-06T22:36:38Z

Not all debugging symbols were loaded. Please re-do the backtrace.

kaaelhaa · 2023-10-12T07:06:04Z

We are also seeing this. We cannot replicate in our test environment, but it happens in production.

It does not seem to be related to call volume, just randomly calls get "stuck" and it looks like the thread processing the call in FreeSWITCH is hanging.

kaaelhaa · 2023-10-12T07:06:46Z

Downgrading to FreeSWITCH 1.10.8 fixes the issue for us, although not ideal.

fetristan · 2023-10-12T13:18:56Z

Hi,
We have the same problem there, the freeswitch receive the BYE but don't answer to it and stuck the line randomly.
We are forced to uuid_kill the line.

andywolk · 2023-10-12T14:59:53Z

@kaaelhaa FreeSWITCH 1.10.8 because 1.10.9 is also affected?

kaaelhaa · 2023-10-12T15:03:08Z

@kaaelhaa FreeSWITCH 1.10.8 because 1.10.9 is also affected?

@andywolk I don't know if 1.10.9 is affected. We skipped 1.10.9 in our production environment so haven't tested.

andywolk · 2023-10-12T15:04:32Z

@kaaelhaa was libsofia downgraded as well or just FreeSWITCH?

kaaelhaa · 2023-10-12T15:10:05Z

@kaaelhaa was libsofia downgraded as well or just FreeSWITCH?

@andywolk Downgraded by installing the Debian packages for 1.10.8 so I assume that also means libsofia was downloaded?

andywolk · 2023-10-12T15:11:53Z

@kaaelhaa What is libsofia version on the system with FS 1.10.8 right now?

gabada · 2023-10-12T15:15:35Z

I added the symbols yesterday. waiting for some stuck calls.

andywolk · 2023-10-12T15:22:46Z

@gabada If dbgsym files are installed and if you still have the core dump file you can re generate the backtrace.

gabada · 2023-10-12T16:49:44Z

@andywolk Here is the newly generated backtrace it still has some ?? in it. not sure why.
I installed freeswitch-all-dbg libfreeswitch1-dbg so far no stuck calls but I dont have many calls going through this server right now.
backtrace.log

Also BKW asked me to run deadlock.py on it and there are no deadlocks

andywolk · 2023-10-12T17:03:26Z

@gabada Because there are no libsofia dbg symbols: apt-get install libsofia-sip-ua0-dbgsym

Also when doing a backtrace please do

thread apply all bt
info threads
thread apply all bt full

gabada · 2023-10-12T17:06:12Z

I installed libsofia
Still seeing this after bt

#0  0x00007fdfab91d7bc in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffe52debcb0) at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x00007fdfabdb03e2 in fspr_sleep (t=<optimized out>) at time/unix/time.c:246
#2  0x00007fdfaba9f3f2 in switch_core_runtime_loop (bg=<optimized out>) at src/switch_core.c:1201
#3  0x0000558c99a7bf09 in ?? ()
#4  0x00007fdfab8461ca in __libc_start_call_main (main=main@entry=0x558c99a7b490, argc=argc@entry=7, argv=argv@entry=0x7ffe52df0a18) at ../sysdeps/nptl/libc_start_call_main.h:58
#5  0x00007fdfab846285 in __libc_start_main_impl (main=0x558c99a7b490, argc=7, argv=0x7ffe52df0a18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe52df0a08) at ../csu/libc-start.c:360
#6  0x0000558c99a7d161 in ?? ()

andywolk · 2023-10-12T17:08:44Z

Oh never mind dbg for sofia. it is there. other ?? are at least from lua. We don't currently need those.

Please re-generate using gdb commands I mentioned.

gabada · 2023-10-12T17:12:32Z

Here is a new backtrace.

backtrace.log

andywolk · 2023-10-12T17:18:22Z

Now with your help let's check if there are deadlocks.
You will need to download a py script from github, configure gdb, open the dump and execute the deadlock command.
Here are the steps:
libc6-dbg is a requirement for the script

apt-get install libc6-dbg wget
wget -O /deadlock.py https://raw.githubusercontent.com/facebook/folly/master/folly/experimental/gdb/deadlock.py
echo source -v /deadlock.py > /root/.gdbinit

/root/.gdbinit user root may be different in your case.

Then open gdb and do

deadlock

And see if it finds anything

gabada · 2023-10-12T17:46:15Z

No deadlock detected. Do you have debug symbols installed?

kaaelhaa · 2023-10-12T20:01:05Z

@kaaelhaa What is libsofia version on the system with FS 1.10.8 right now?

@andywolk this is the Sofia version on 1.10.8:

$ dpkg -l | grep sofia
ii  freeswitch-mod-sofia              1.10.8~release~20~3510866140~buster-1~buster+1 amd64        mod_sofia for FreeSWITCH
ii  libsofia-sip-ua0                  1.13.16-125~dfc7095f4c~buster                  amd64        Sofia-SIP library runtime

And 1.10.10 for comparison:

$ dpkg -l | grep sofia
ii  freeswitch-mod-sofia              1.10.10~release~24~4cb05e7f4a~buster-1~buster+1 amd64        mod_sofia for FreeSWITCH
ii  libsofia-sip-ua0                  1.13.16-125~dfc7095f4c~buster                   amd64        Sofia-SIP library runtime

So the same version of libsofia across the two systems.

andywolk · 2023-10-12T20:03:02Z

apt-cache policy libsofia-sip-ua0

gabada · 2023-10-12T20:11:21Z

@andywolk
I currently have a stuck channel. I did gcore $(pidof freeswitch) I just ran the deadlock script on it but there is no deadlock.
I am going to make a backtrace now and upload it shortly.

gabada · 2023-10-12T20:16:40Z

@andywolk
Here is the new backtrace. The instance is still up with the stuck call so if you need anything else or even want to do a screenshare session please let me know.
backtrace.log

kaaelhaa · 2023-10-12T20:17:28Z

apt-cache policy libsofia-sip-ua0

On the 1.10.8 host:

$ apt-cache policy libsofia-sip-ua0
libsofia-sip-ua0:
  Installed: 1.13.16-125~dfc7095f4c~buster
  Candidate: 1.13.16-125~dfc7095f4c~buster
  Version table:
 *** 1.13.16-125~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
        100 /var/lib/dpkg/status
     1.13.16-124~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-123~2366f9cf40~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-122~a3b83039d4~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.12.11+20110422.1-2.1+deb10u4 500
        500 http://security.debian.org/debian-security buster/updates/main amd64 Packages
     1.12.11+20110422.1-2.1+b1 500
        500 http://cdn-aws.deb.debian.org/debian buster/main amd64 Packages

1.10.10 for comparison:

$ apt-cache policy libsofia-sip-ua0
libsofia-sip-ua0:
  Installed: 1.13.16-125~dfc7095f4c~buster
  Candidate: 1.13.16-125~dfc7095f4c~buster
  Version table:
 *** 1.13.16-125~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
        100 /var/lib/dpkg/status
     1.13.16-124~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-123~2366f9cf40~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-122~a3b83039d4~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.12.11+20110422.1-2.1+deb10u4 500
        500 http://security.debian.org/debian-security buster/updates/main amd64 Packages
     1.12.11+20110422.1-2.1+b1 500
        500 http://cdn-aws.deb.debian.org/debian buster/main amd64 Packages

andywolk · 2023-10-12T20:21:14Z

Thank you. We see a pattern in both backtraces. Will analyze further.

prashan-abey · 2023-10-23T22:43:22Z

Hi @andywolk , is there any progress on the investigation.

We are experiencing the same issue after FreeSWITCH 1.10.10 upgrade.

flaviogrossi · 2023-10-30T17:42:04Z

@andywolk if it helps, I think I might be experiencing the same issue, and it seems that for the relevant calls, this log is printed in the logs, but not this one.

Meaning that the session lock can never be acquired or released.

We're currently investigating the issue, will post updates in case any further details come up.

andywolk · 2023-10-30T18:27:24Z

We are working on a solution. There is another issue filed where we can see similar things #2290

xadhoom · 2023-11-03T10:27:03Z

For those that needs the latest freeswitch (like us, because of openssl 3.0.x support), reverting the commit cited in #2290 temporarly fixes the issue (we have it in production with the revert and all is running fine).

robertoscarpone · 2023-12-06T07:32:59Z

We are encountering this problem too, more and more frequently. Besides reverting the commit mentioned in #2290, is there any news on fixing this issue?

gabada · 2024-01-09T14:44:52Z

Was this resolved in FreeSWITCH 1.10.11?
Thanks!

greenbea · 2024-01-28T00:47:28Z

No, 1.10.11 still has this bug.

gabada · 2024-01-28T00:52:21Z

No, 1.10.11 still has this bug.

Thanks for letting me know.

shaunjstokes · 2024-03-18T07:14:04Z

We also have this problem on 1.10.11, not an issue on 1.10.9.

pinc444 · 2024-04-16T00:19:12Z

I also have this issue I created a script that do uuid kill on old channels
here is the code of this script

#!/bin/bash
now1=$(/usr/local/freeswitch/bin/fs_cli  -x "strepoch")
#here you set how old calls you want to kill
hour=3
minute=0
seconds=0

#here we calculate the total number of seconds from hour+minutes+seconds
allSec=0
allSec=$(($hour*60*60))
allSec=$(($allSec+$minute*60))
allSec=$(($allSec+$seconds))

#here we set the epoch we want to kill everything before it
timeLine=$(($now1-$allSec))

#here we are collecting from the datebase the uuid's and the epoch of all old calls
uuids=($(sqlite3 /usr/local/freeswitch/db/core.db  "select uuid from channels where created_epoch < $timeLine;"))
time1=($(sqlite3 /usr/local/freeswitch/db/core.db  "select created_epoch from channels where created_epoch < $timeLine;"))
time2=($(sqlite3 /usr/local/freeswitch/db/core.db  "select created from channels where  created_epoch < $timeLine;"))
#for uuid in "${uuids[@]}"; do

#here we loop over and do uuid_kill to all old calls
tmi=0
for ((i=0;i<${#uuids[@]};i++))
do
	 echo "the uuid #:$i is:${uuids[$i]} with epoch: ${time1[$i]}  ${time2[$tmi]} ${time2[$tmi+1]}"
   rsp=$(/usr/local/freeswitch/bin/fs_cli  -x "uuid_kill ${uuids[$i]}")
   echo "the response is:$rsp"
   tmi=$tmi+2
done

gabada added the bug Something isn't working label Oct 5, 2023

mgruberman mentioned this issue Nov 15, 2023

mod_event_socket can have dead listeners with overflowing event queues that fill the log with CRITs #2308

Open

stuck channels on FreeSWITCH 1.10.10 (possibly transcoding is the cause) #2264

stuck channels on FreeSWITCH 1.10.10 (possibly transcoding is the cause) #2264

Comments

gabada commented Oct 5, 2023

markjcrane commented Oct 6, 2023

andywolk commented Oct 6, 2023

kaaelhaa commented Oct 12, 2023

kaaelhaa commented Oct 12, 2023

fetristan commented Oct 12, 2023 • edited

andywolk commented Oct 12, 2023

kaaelhaa commented Oct 12, 2023

andywolk commented Oct 12, 2023

kaaelhaa commented Oct 12, 2023

andywolk commented Oct 12, 2023

gabada commented Oct 12, 2023

andywolk commented Oct 12, 2023 • edited

gabada commented Oct 12, 2023 • edited

andywolk commented Oct 12, 2023

gabada commented Oct 12, 2023 • edited by andywolk

andywolk commented Oct 12, 2023

gabada commented Oct 12, 2023

andywolk commented Oct 12, 2023 • edited

gabada commented Oct 12, 2023

kaaelhaa commented Oct 12, 2023

andywolk commented Oct 12, 2023

gabada commented Oct 12, 2023

gabada commented Oct 12, 2023

kaaelhaa commented Oct 12, 2023

andywolk commented Oct 12, 2023

prashan-abey commented Oct 23, 2023

flaviogrossi commented Oct 30, 2023

andywolk commented Oct 30, 2023

xadhoom commented Nov 3, 2023

robertoscarpone commented Dec 6, 2023

gabada commented Jan 9, 2024

greenbea commented Jan 28, 2024

gabada commented Jan 28, 2024

shaunjstokes commented Mar 18, 2024

pinc444 commented Apr 16, 2024 • edited

fetristan commented Oct 12, 2023 •

edited

andywolk commented Oct 12, 2023 •

edited

gabada commented Oct 12, 2023 •

edited

gabada commented Oct 12, 2023 •

edited by andywolk

andywolk commented Oct 12, 2023 •

edited

pinc444 commented Apr 16, 2024 •

edited