Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck channels on FreeSWITCH 1.10.10 (possibly transcoding is the cause) #2264

Open
gabada opened this issue Oct 5, 2023 · 35 comments
Open
Labels
bug Something isn't working

Comments

@gabada
Copy link

gabada commented Oct 5, 2023

Describe the bug
FreeSWITCH channels are getting stuck. When you restart/fsctl crash freeswitch the CDR that's written says INCOMPATIBLE_DESTINATION.

When I set my ITSP to only offer PCMU everything works as expected.

When you do a uuid_kill on the channel it says OK but doesn't actually kill it (remove it from the database). You need to restart Freeswitch to get it removed.

To Reproduce
Steps to reproduce the behavior:

  1. Make a bunch of inbound calls to Freeswitch 1.10.10 offering codes other than PCMU
  2. It is not every call but eventually you will start seeing calls that got stuck.

Expected behavior
FreeSWITCH should handle the transcoding and the calls should release normally when they are completed.

Package version or git hash

  • FreeSWITCH Version 1.10.10-release-24-4cb05e7f4a~64bit (-release-24-4cb05e7f4a 64bit)

Trace logs
freeswitch.log is a call that is still stuck.
freeswitch.log

backtrace from core file

backtrace.log

@gabada gabada added the bug Something isn't working label Oct 5, 2023
@markjcrane
Copy link
Contributor

I've seen this on multiple servers as well. If using PCMU only it seems to solve the calls that get stuck.

@andywolk
Copy link
Contributor

andywolk commented Oct 6, 2023

Not all debugging symbols were loaded. Please re-do the backtrace.

@kaaelhaa
Copy link

We are also seeing this. We cannot replicate in our test environment, but it happens in production.

It does not seem to be related to call volume, just randomly calls get "stuck" and it looks like the thread processing the call in FreeSWITCH is hanging.

@kaaelhaa
Copy link

Downgrading to FreeSWITCH 1.10.8 fixes the issue for us, although not ideal.

@fetristan
Copy link

fetristan commented Oct 12, 2023

Hi,
We have the same problem there, the freeswitch receive the BYE but don't answer to it and stuck the line randomly.
We are forced to uuid_kill the line.
image

@andywolk
Copy link
Contributor

@kaaelhaa FreeSWITCH 1.10.8 because 1.10.9 is also affected?

@kaaelhaa
Copy link

@kaaelhaa FreeSWITCH 1.10.8 because 1.10.9 is also affected?

@andywolk I don't know if 1.10.9 is affected. We skipped 1.10.9 in our production environment so haven't tested.

@andywolk
Copy link
Contributor

@kaaelhaa was libsofia downgraded as well or just FreeSWITCH?

@kaaelhaa
Copy link

@kaaelhaa was libsofia downgraded as well or just FreeSWITCH?

@andywolk Downgraded by installing the Debian packages for 1.10.8 so I assume that also means libsofia was downloaded?

@andywolk
Copy link
Contributor

@kaaelhaa What is libsofia version on the system with FS 1.10.8 right now?

@gabada
Copy link
Author

gabada commented Oct 12, 2023

I added the symbols yesterday. waiting for some stuck calls.

@andywolk
Copy link
Contributor

andywolk commented Oct 12, 2023

@gabada If dbgsym files are installed and if you still have the core dump file you can re generate the backtrace.

@gabada
Copy link
Author

gabada commented Oct 12, 2023

@andywolk Here is the newly generated backtrace it still has some ?? in it. not sure why.
I installed freeswitch-all-dbg libfreeswitch1-dbg so far no stuck calls but I dont have many calls going through this server right now.
backtrace.log

Also BKW asked me to run deadlock.py on it and there are no deadlocks

@andywolk
Copy link
Contributor

@gabada Because there are no libsofia dbg symbols: apt-get install libsofia-sip-ua0-dbgsym

Also when doing a backtrace please do

thread apply all bt
info threads
thread apply all bt full

@gabada
Copy link
Author

gabada commented Oct 12, 2023

I installed libsofia
Still seeing this after bt

#0  0x00007fdfab91d7bc in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffe52debcb0) at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x00007fdfabdb03e2 in fspr_sleep (t=<optimized out>) at time/unix/time.c:246
#2  0x00007fdfaba9f3f2 in switch_core_runtime_loop (bg=<optimized out>) at src/switch_core.c:1201
#3  0x0000558c99a7bf09 in ?? ()
#4  0x00007fdfab8461ca in __libc_start_call_main (main=main@entry=0x558c99a7b490, argc=argc@entry=7, argv=argv@entry=0x7ffe52df0a18) at ../sysdeps/nptl/libc_start_call_main.h:58
#5  0x00007fdfab846285 in __libc_start_main_impl (main=0x558c99a7b490, argc=7, argv=0x7ffe52df0a18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe52df0a08) at ../csu/libc-start.c:360
#6  0x0000558c99a7d161 in ?? ()

@andywolk
Copy link
Contributor

Oh never mind dbg for sofia. it is there. other ?? are at least from lua. We don't currently need those.

Please re-generate using gdb commands I mentioned.

@gabada
Copy link
Author

gabada commented Oct 12, 2023

Here is a new backtrace.

backtrace.log

@andywolk
Copy link
Contributor

andywolk commented Oct 12, 2023

Now with your help let's check if there are deadlocks.
You will need to download a py script from github, configure gdb, open the dump and execute the deadlock command.
Here are the steps:
libc6-dbg is a requirement for the script

apt-get install libc6-dbg wget
wget -O /deadlock.py https://raw.githubusercontent.com/facebook/folly/master/folly/experimental/gdb/deadlock.py
echo source -v /deadlock.py > /root/.gdbinit

/root/.gdbinit user root may be different in your case.

Then open gdb and do

deadlock

And see if it finds anything

@gabada
Copy link
Author

gabada commented Oct 12, 2023

No deadlock detected. Do you have debug symbols installed?

@kaaelhaa
Copy link

@kaaelhaa What is libsofia version on the system with FS 1.10.8 right now?

@andywolk this is the Sofia version on 1.10.8:

$ dpkg -l | grep sofia
ii  freeswitch-mod-sofia              1.10.8~release~20~3510866140~buster-1~buster+1 amd64        mod_sofia for FreeSWITCH
ii  libsofia-sip-ua0                  1.13.16-125~dfc7095f4c~buster                  amd64        Sofia-SIP library runtime

And 1.10.10 for comparison:

$ dpkg -l | grep sofia
ii  freeswitch-mod-sofia              1.10.10~release~24~4cb05e7f4a~buster-1~buster+1 amd64        mod_sofia for FreeSWITCH
ii  libsofia-sip-ua0                  1.13.16-125~dfc7095f4c~buster                   amd64        Sofia-SIP library runtime

So the same version of libsofia across the two systems.

@andywolk
Copy link
Contributor

apt-cache policy libsofia-sip-ua0

@gabada
Copy link
Author

gabada commented Oct 12, 2023

@andywolk
I currently have a stuck channel. I did gcore $(pidof freeswitch) I just ran the deadlock script on it but there is no deadlock.
I am going to make a backtrace now and upload it shortly.

@gabada
Copy link
Author

gabada commented Oct 12, 2023

@andywolk
Here is the new backtrace. The instance is still up with the stuck call so if you need anything else or even want to do a screenshare session please let me know.
backtrace.log

@kaaelhaa
Copy link

apt-cache policy libsofia-sip-ua0

On the 1.10.8 host:

$ apt-cache policy libsofia-sip-ua0
libsofia-sip-ua0:
  Installed: 1.13.16-125~dfc7095f4c~buster
  Candidate: 1.13.16-125~dfc7095f4c~buster
  Version table:
 *** 1.13.16-125~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
        100 /var/lib/dpkg/status
     1.13.16-124~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-123~2366f9cf40~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-122~a3b83039d4~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.12.11+20110422.1-2.1+deb10u4 500
        500 http://security.debian.org/debian-security buster/updates/main amd64 Packages
     1.12.11+20110422.1-2.1+b1 500
        500 http://cdn-aws.deb.debian.org/debian buster/main amd64 Packages

1.10.10 for comparison:

$ apt-cache policy libsofia-sip-ua0
libsofia-sip-ua0:
  Installed: 1.13.16-125~dfc7095f4c~buster
  Candidate: 1.13.16-125~dfc7095f4c~buster
  Version table:
 *** 1.13.16-125~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
        100 /var/lib/dpkg/status
     1.13.16-124~dfc7095f4c~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-123~2366f9cf40~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.13.15-122~a3b83039d4~buster 500
        500 https://freeswitch.signalwire.com/repo/deb/debian-release buster/main amd64 Packages
     1.12.11+20110422.1-2.1+deb10u4 500
        500 http://security.debian.org/debian-security buster/updates/main amd64 Packages
     1.12.11+20110422.1-2.1+b1 500
        500 http://cdn-aws.deb.debian.org/debian buster/main amd64 Packages

@andywolk
Copy link
Contributor

Thank you. We see a pattern in both backtraces. Will analyze further.

@prashan-abey
Copy link

Hi @andywolk , is there any progress on the investigation.

We are experiencing the same issue after FreeSWITCH 1.10.10 upgrade.

@flaviogrossi
Copy link

@andywolk if it helps, I think I might be experiencing the same issue, and it seems that for the relevant calls, this log is printed in the logs, but not this one.

Meaning that the session lock can never be acquired or released.

We're currently investigating the issue, will post updates in case any further details come up.

@andywolk
Copy link
Contributor

We are working on a solution. There is another issue filed where we can see similar things #2290

@xadhoom
Copy link
Contributor

xadhoom commented Nov 3, 2023

For those that needs the latest freeswitch (like us, because of openssl 3.0.x support), reverting the commit cited in #2290 temporarly fixes the issue (we have it in production with the revert and all is running fine).

@robertoscarpone
Copy link

We are encountering this problem too, more and more frequently. Besides reverting the commit mentioned in #2290, is there any news on fixing this issue?

@gabada
Copy link
Author

gabada commented Jan 9, 2024

Was this resolved in FreeSWITCH 1.10.11?
Thanks!

@greenbea
Copy link
Contributor

No, 1.10.11 still has this bug.

@gabada
Copy link
Author

gabada commented Jan 28, 2024

No, 1.10.11 still has this bug.

Thanks for letting me know.

@shaunjstokes
Copy link
Contributor

We also have this problem on 1.10.11, not an issue on 1.10.9.

@pinc444
Copy link

pinc444 commented Apr 16, 2024

I also have this issue I created a script that do uuid kill on old channels
here is the code of this script

#!/bin/bash
now1=$(/usr/local/freeswitch/bin/fs_cli  -x "strepoch")
#here you set how old calls you want to kill
hour=3
minute=0
seconds=0

#here we calculate the total number of seconds from hour+minutes+seconds
allSec=0
allSec=$(($hour*60*60))
allSec=$(($allSec+$minute*60))
allSec=$(($allSec+$seconds))

#here we set the epoch we want to kill everything before it
timeLine=$(($now1-$allSec))

#here we are collecting from the datebase the uuid's and the epoch of all old calls
uuids=($(sqlite3 /usr/local/freeswitch/db/core.db  "select uuid from channels where created_epoch < $timeLine;"))
time1=($(sqlite3 /usr/local/freeswitch/db/core.db  "select created_epoch from channels where created_epoch < $timeLine;"))
time2=($(sqlite3 /usr/local/freeswitch/db/core.db  "select created from channels where  created_epoch < $timeLine;"))
#for uuid in "${uuids[@]}"; do

#here we loop over and do uuid_kill to all old calls
tmi=0
for ((i=0;i<${#uuids[@]};i++))
do
	 echo "the uuid #:$i is:${uuids[$i]} with epoch: ${time1[$i]}  ${time2[$tmi]} ${time2[$tmi+1]}"
   rsp=$(/usr/local/freeswitch/bin/fs_cli  -x "uuid_kill ${uuids[$i]}")
   echo "the response is:$rsp"
   tmi=$tmi+2
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests