[core] fix session deadlock (#2290) #2300

cdevelop · 2023-11-05T02:42:48Z

cdevelop · 2023-11-05T02:46:14Z

@seven1240 Teacher Du and I explained this issue in detail.

signalwire-ci · 2023-11-05T02:53:11Z

Unit-tests failed: https://public-artifacts.signalwire.cloud/drone/signalwire/freeswitch/1571/artifacts.html

seven1240 · 2023-11-05T03:13:49Z

@andywolk cdeveop has a system which left 3-10 dead locked channels per day, which could be fixed by this patch.

Review the patch, it looks like the read_frame parts are fine as the message will be processed on the next read. But I'm worrying about the write parts. What if this happens on a sendonly channel that never has read? or on a App which never read but write ( I don't know such an App in my head though). So the message will never be processed.

Best to find out the root cause of the dead lock instead of the work around I think.

bferreirq · 2024-02-06T10:05:23Z

Hello, any news about this commit ?

KerryRJ · 2024-02-28T08:28:34Z

I have an installation where 30-40 inbound and outbound calls are getting deadlocked on a daily basis, and the call center agent connected to the deadlocked call also gets stuck, until a restart or the extension leg of the call is hung up.

#2387 and #2390 did not resolve the the deadlocks nor the stuck call center agent.

This fix has stopped all deadlocks and subsequently no stuck call center agents.

televoicepl · 2024-03-12T21:29:36Z

I have an installation where 30-40 inbound and outbound calls are getting deadlocked on a daily basis, and the call center agent connected to the deadlocked call also gets stuck, until a restart or the extension leg of the call is hung up.

#2387 and #2390 did not resolve the the deadlocks nor the stuck call center agent.

This fix has stopped all deadlocks and subsequently no stuck call center agents.

same problem in my instance.
1.10.11 and 1.10.10 is affected. 1.10.9 is ok
Random calls are dead and only core restart helps.

shaunjstokes · 2024-04-04T06:13:46Z

This fix has fixed all deadlocks and stuck calls in our environment.

#2387 and #2390 did not fix the issue.

bfroemel · 2024-04-16T08:52:26Z

I can also report that this patch fixes a similar issue with stuck channels and stuck call center agents in my installation.

boteman · 2024-04-28T20:11:01Z

Why not make it pass the Unit Tests so that it can be merged? Many people seem to be experiencing this problem.

televoicepl · 2024-04-29T22:04:02Z

@jakubkarolczyk can you check it?

gregoriusus · 2024-05-07T20:51:02Z

Can I bump this thread. This issue is real problem and a lot of people have this issue...

boteman · 2024-05-08T17:04:44Z

Yesterday on Office Hours I asked about this and the related tickets which exhibit a similar problem. BKW said that it is a much deeper problem than is solved by the proposed patch and that they are refarming the code to solve everything at once. So it is a bigger effort than we thought, no ETR.

zooptwopointone · 2024-05-21T05:34:40Z

Before I open another ticket like this. I am curious if the issue I am finding could be related to this one. I have found that I am getting a lot of stuck calls in this scenario.
Inbound call does a bg_api command to originate another call. Both calls with join a conference. Though the bg executed originate call will get stuck if it receives a re-invite. In all the cases of these stuck calls I have. the re-invite comes immediately after the answer. Maybe 1 - 4 RTP packets might have been transmitted. The re-invite in some cases was for suggesting another codec. and others were for changing the media IP. but in all cases. the both call legs get stuck. That is specifically they show up when you execute "show calls" What I have found is the first inbound leg will end and show false for uuid_exists, and the outbound call leg will show true, I can run uuid_kill on it, but it will not go away, though after the kill it will result in "no such channel" . Though "show calls" will still display them.

This does seem like something that has started/gotten worse after upgrading from 1.10.7 which we were getting stuck calls but at a low rate. say 10 / week vs 100/day Getting a carrier to change some behaviour reduced the re-invites and does seemed to have directly affected the number of dead calls.

Let me know if I should submit a different report for this I can provide more details there.

hhadzem · 2024-07-23T09:06:11Z

Is there any progress/update related to this issue?

zrd740 · 2024-09-06T09:18:05Z

I upgraded several servers from FreeSWITCH 1.10.8 to 1.10.12. This error showed up and it's creating dozens of stuck calls per day, as well as numerous client complaints on a daily basis. 1.10.8 does not have the issue. Is there any update on this, it seems like a problem that has been going on for a long time? FusionPBX support said the solution to the problem is to apply this patch and compile from source as a resolution to the problem. Many users have reported this resolves the problem. I have wanted to use the signalwire packages on debian, but they do not work due to this issue. If we can't get a solution to this I'll need to start compiling from source.

zrd740 · 2024-09-07T14:24:31Z

We've resolved the issue by migrating away from prebuilt packages and compiling FS from source with this patch, and it works perfectly. However, I'm disappointed that after almost a year, this hasn't been included in the release despite its impact on many users.

boteman · 2024-09-09T14:50:47Z

My understanding from what the core developers have said is that this ticket and at least 3 others are a combination of problems deep down in the core that require extensive re-farming of source code to correct. They found some long-standing bugs in the process and aim to correct all of them in this effort. This patch works for some, but not others, and that is why. I'm as anxious as you for a permanent fix.

[core] fix session deadlock (signalwire#2290)

d4fd792

cdevelop mentioned this pull request Nov 5, 2023

* [Core] Fix possible deadlock in switch_core_media_set_codec() This commit results in a deadlock #2290

Open

seven1240 requested a review from andywolk November 5, 2023 03:08

shaunjstokes mentioned this pull request Apr 4, 2024

Freeswitch leaves stale calls #2372

Open

televoicepl mentioned this pull request Apr 29, 2024

Stuck calls #2439

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] fix session deadlock (#2290) #2300

[core] fix session deadlock (#2290) #2300

cdevelop commented Nov 5, 2023

cdevelop commented Nov 5, 2023

signalwire-ci bot commented Nov 5, 2023

seven1240 commented Nov 5, 2023

bferreirq commented Feb 6, 2024

KerryRJ commented Feb 28, 2024

televoicepl commented Mar 12, 2024

shaunjstokes commented Apr 4, 2024 •

edited

Loading

bfroemel commented Apr 16, 2024

boteman commented Apr 28, 2024

televoicepl commented Apr 29, 2024

gregoriusus commented May 7, 2024

boteman commented May 8, 2024

zooptwopointone commented May 21, 2024

hhadzem commented Jul 23, 2024

zrd740 commented Sep 6, 2024 •

edited

Loading

zrd740 commented Sep 7, 2024

boteman commented Sep 9, 2024

[core] fix session deadlock (#2290) #2300

Are you sure you want to change the base?

[core] fix session deadlock (#2290) #2300

Conversation

cdevelop commented Nov 5, 2023

cdevelop commented Nov 5, 2023

signalwire-ci bot commented Nov 5, 2023

seven1240 commented Nov 5, 2023

bferreirq commented Feb 6, 2024

KerryRJ commented Feb 28, 2024

televoicepl commented Mar 12, 2024

shaunjstokes commented Apr 4, 2024 • edited Loading

bfroemel commented Apr 16, 2024

boteman commented Apr 28, 2024

televoicepl commented Apr 29, 2024

gregoriusus commented May 7, 2024

boteman commented May 8, 2024

zooptwopointone commented May 21, 2024

hhadzem commented Jul 23, 2024

zrd740 commented Sep 6, 2024 • edited Loading

zrd740 commented Sep 7, 2024

boteman commented Sep 9, 2024

shaunjstokes commented Apr 4, 2024 •

edited

Loading

zrd740 commented Sep 6, 2024 •

edited

Loading