-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.x] Videoroom use-after-free when unsubscribing and closing a room #3242
Comments
Ah, it looks like a cleanup code stepping on its own feet: we're looping on |
If only I could replicate the problem, though... I'm at attempt 900 with your tool and still nothing 🤣 |
Oh no. On my machine I get the error on the 1st attempt. Let me check again. |
I stopped the tool at about 2000 attempts, as I don't think it's going to die on me at this point 🙂 |
I'm not sure what to say, I tried in a different environment and it's still consistently crashing on the first attempt. If you've cloned the repository before, I've pushed some changes which you may not have pulled? It also only reliably crashes with asan, otherwise it just keeps going. Otherwise, it might be the library versions being used? I'll see if I can make docker reproduction case. |
No, there's no further commits if I do a |
I think this depends on the environment. I can reliably reproduce on nix and Archlinux but not on Debian 12. I'll see if using the same lib versions as Debian avoid the issue. |
I've yet to narrow it down to the exact library, but downgrading libraries does make it non-reproducible. Bad (Reproduces Crash)gcc-12.3.0 glib-2.76.3 glibc-2.37-8 jansson-2.14 libconfig-1.7.3 libnice-0.1.21 libsrtp-2.5.0 libwebsockets-4.1.6 openssl-3.0.9 zlib-1.2.13 Good (No Crash)gcc-11.3.0 glib-2.74.3 glibc-2.35-224 jansson-2.14 libconfig-1.7.3 libnice-0.1.18 libsrtp-2.4.2 libwebsockets-4.1.6 openssl-3.0.9 zlib-1.2.13 My guess is it might be glib as most of the lib versions under "Bad" match that of Debian 12. I'll dig a bit more. |
I can confirm that downgrading glib to 2.74 seems to fix the issue for us (at least with the provided test case). So I guess there's something in glib 2.76 that breaks things? |
I'm on 2.74 too (2.74.7 on Fedora). That said, my guess is that part I mentioned is where things break, so I'll come back to you with a quick patch you can try to see if it improves things. |
@thatsmydoing can you check if this tiny patch helps?
All it does is moving the |
That might be it. With the patch, even on glib 2.76, I can't reproduce it with my test case anymore. |
That's good news! You may want to ensure you don't see any other regression though. If not, I can push the fix upstream. |
I'll be trying this out on production today. I'll let you know around next week if we see any more crashes. |
With the patch and glib 2.76, we unfortunately had a crash elsewhere this time janus-gateway/src/plugins/janus_videoroom.c Line 8582 in e1c7704
janus-gateway/src/plugins/janus_videoroom.c Line 8445 in e1c7704
|
Please share dumps as you did last time. |
I couldn't reproduce it locally, so I only have a gdb backtrace from coredumps. The first logs are running on thatsmydoing@155c456, this is branched off 6d2dd1b and includes your patch as well as an unrelated patch that shouldn't affect this. Logs[7825799502076340] Updating existing session [7540571102874557] Updating existing session [3620882643052495] WebRTC resources freed; 0x7f3ab82b2820 0x7f3ac805a410 [3620882643052495] Handle and related resources freed; 0x7f3ab82b2820 0x7f3ac805a410 [7825799502076340] Negotiation update, checking what changed... [7540571102874557] Negotiation update, checking what changed... [4877804757315329] Updating existing session [WARN] [8270445183568114] Didn't receive video (substream #2) for more than 1 second(s)... [4877804757315329] Negotiation update, checking what changed... [WARN] [4287692750950437] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [3252118253804503] Didn't receive video (substream #1) for more than 1 second(s)... [libwebsockets][NOTICE] [2023/07/20 08:41:09:2406] N: __lws_lc_tag: ++ [wsisrv|3613|adopted] (39) [libwebsockets][NOTICE] [2023/07/20 08:41:09:2407] N: __lws_lc_untag: -- [wsisrv|3613|adopted] (38) 105μs [WARN] No packet received on substream 1 for a while, falling back to 0 [WARN] [8270445183568114] Didn't receive video (substream #2) for more than 1 second(s)... [WARN] [8270445183568114] Didn't receive audio for more than 1 second(s)... [WARN] No packet received on substream 1 for a while, falling back to 0 Detaching handle from JANUS VideoRoom plugin; 0x7f3ab8098760 0x7f3ab81293e0 0x7f3ab8098760 0x7f3ab831cdf0 [janus.plugin.videoroom-0x7f3ab81293e0] No WebRTC media anymore; 0x7f3ab8098760 0x7f3ab831cdf0 [6967737102373463] Updating existing session [8270445183568114] WebRTC resources freed; 0x7f3ab8098760 0x7f3ab80a99e0 Detaching handle from JANUS VideoRoom plugin; 0x7f3ab81ca960 0x7f3ab8369a40 0x7f3ab81ca960 0x7f3ab81ef150 [janus.plugin.videoroom-0x7f3ab8003630] No WebRTC media anymore; 0x7f3ab82131e0 0x7f3ab8305800 [janus.plugin.videoroom-0x7f3ab8369a40] No WebRTC media anymore; 0x7f3ab81ca960 0x7f3ab81ef150 Detaching handle from JANUS VideoRoom plugin; 0x7f3ab82131e0 0x7f3ab8003630 0x7f3ab82131e0 0x7f3ab8305800 [8270445183568114] Handle and related resources freed; 0x7f3ab8098760 0x7f3ab80a99e0 GDB Backtrace#0 janus_videoroom_hangup_media_internal (session_data=session_data@entry=0x7f3ab8305800) at plugins/janus_videoroom.c:8620 #1 0x00007f3acd0cf9a0 in janus_videoroom_hangup_media (handle=) at plugins/janus_videoroom.c:8483 #2 0x0000000000448790 in janus_ice_outgoing_traffic_handle () #3 0x000000000044b701 in janus_ice_outgoing_traffic_dispatch () #4 0x00007f3ad03d2d7d in g_main_context_dispatch () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #5 0x00007f3ad03d3028 in g_main_context_iterate.constprop () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #6 0x00007f3ad03d32df in g_main_loop_run () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #7 0x000000000043d81c in janus_ice_handle_thread () #8 0x00007f3ad03ff64d in g_thread_proxy () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #9 0x00007f3acfbb6dd4 in start_thread () from /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/lib/libc.so.6 #10 0x00007f3acfc389b0 in clone3 () from /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/lib/libc.so.6 While debugging, I saw a similar bug to the one fixed by your patch and fixed it. Unfortunately, it was unrelated to the crash and didn't fix it. Here are the logs running against thatsmydoing@2d1b88a. Logs[3530032410032467] Updating existing session [WARN] [121364506572113] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [121364506572113] Didn't receive video (substream #2) for more than 1 second(s)... [3530032410032467] Negotiation update, checking what changed... [2740559208127164] Negotiation update, checking what changed... [WARN] [4890055471783842] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [121364506572113] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [121364506572113] Didn't receive video for more than 1 second(s)... [WARN] [5426595611094146] Didn't receive video (substream #1) for more than 1 second(s)... [libwebsockets][NOTICE] [2023/07/23 11:01:48:8014] N: __lws_lc_tag: ++ [wsisrv|29cc|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:01:48:8015] N: __lws_lc_untag: -- [wsisrv|29cc|adopted] (37) 91μs [WARN] [121364506572113] Didn't receive video (substream #1) for more than 1 second(s)... [libwebsockets][NOTICE] [2023/07/23 11:01:58:8017] N: __lws_lc_tag: ++ [wsisrv|29cd|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:01:58:8018] N: __lws_lc_untag: -- [wsisrv|29cd|adopted] (37) 68μs [WARN] [6318269714873560] Didn't receive video for more than 1 second(s)... [WARN] [6318269714873560] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [6318269714873560] Unknown SSRC, dropping packet (SSRC 966292906)... [WARN] [121364506572113] Didn't receive video (substream #1) for more than 1 second(s)... [libwebsockets][NOTICE] [2023/07/23 11:02:08:8017] N: __lws_lc_tag: ++ [wsisrv|29ce|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:02:08:8017] N: __lws_lc_untag: -- [wsisrv|29ce|adopted] (37) 92μs [WARN] [121364506572113] Unknown SSRC, dropping packet (SSRC 3329985358)... [WARN] [121364506572113] Unknown SSRC, dropping packet (SSRC 3329985358)... [WARN] [121364506572113] Unknown SSRC, dropping packet (SSRC 3329985358)... [libwebsockets][NOTICE] [2023/07/23 11:02:18:8015] N: __lws_lc_tag: ++ [wsisrv|29cf|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:02:18:8016] N: __lws_lc_untag: -- [wsisrv|29cf|adopted] (37) 83μs [WARN] [121364506572113] Didn't receive video (substream #1) for more than 1 second(s)... [WARN] [6318269714873560] Didn't receive video for more than 1 second(s)... [WARN] [6318269714873560] Didn't receive video (substream #1) for more than 1 second(s)... [libwebsockets][NOTICE] [2023/07/23 11:02:28:8014] N: __lws_lc_tag: ++ [wsisrv|29d0|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:02:28:8015] N: __lws_lc_untag: -- [wsisrv|29d0|adopted] (37) 76μs [WARN] [121364506572113] Unknown SSRC, dropping packet (SSRC 3329985358)... [libwebsockets][NOTICE] [2023/07/23 11:02:38:8013] N: __lws_lc_tag: ++ [wsisrv|29d1|adopted] (38) [libwebsockets][NOTICE] [2023/07/23 11:02:38:8014] N: __lws_lc_untag: -- [wsisrv|29d1|adopted] (37) 80μs [WARN] [121364506572113] Unknown SSRC, dropping packet (SSRC 3329985358)... [janus.plugin.videoroom-0x7f7590335020] No WebRTC media anymore; 0x7f7590052320 0x7f75901da380 [3658334612854888] Updating existing session [4278699205407511] WebRTC resources freed; 0x7f7590052320 0x7f759040dd90 Detaching handle from JANUS VideoRoom plugin; 0x7f7590052320 0x7f7590335020 0x7f7590052320 0x7f75901da380 [4278699205407511] Handle and related resources freed; 0x7f7590052320 0x7f759040dd90 Detaching handle from JANUS VideoRoom plugin; 0x7f7590372010 0x7f7590002330 0x7f7590372010 0x7f759006a530 Detaching handle from JANUS VideoRoom plugin; 0x7f759002f890 0x7f759002dcb0 0x7f759002f890 0x7f759003fff0 [janus.plugin.videoroom-0x7f7590002330] No WebRTC media anymore; 0x7f7590372010 0x7f759006a530 [janus.plugin.videoroom-0x7f759002dcb0] No WebRTC media anymore; 0x7f759002f890 0x7f759003fff0 GDB Backtrace#0 0x00007f75a4e9c538 in janus_videoroom_hangup_media_internal (session_data=session_data@entry=0x7f759003fff0) at plugins/janus_videoroom.c:8621 #1 0x00007f75a4e9d9a0 in janus_videoroom_hangup_media (handle=) at plugins/janus_videoroom.c:8483 #2 0x0000000000448790 in janus_ice_outgoing_traffic_handle () #3 0x000000000044b701 in janus_ice_outgoing_traffic_dispatch () #4 0x00007f75a81a0d7d in g_main_context_dispatch () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #5 0x00007f75a81a1028 in g_main_context_iterate.constprop () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #6 0x00007f75a81a12df in g_main_loop_run () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #7 0x000000000043d81c in janus_ice_handle_thread () #8 0x00007f75a81cd64d in g_thread_proxy () from /nix/store/fr53rm643lg0rp4qmgzzpwsz5xxdqwk0-glib-2.76.3/lib/libglib-2.0.so.0 #9 0x00007f75a7984dd4 in start_thread () from /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/lib/libc.so.6 #10 0x00007f75a7a069b0 in clone3 () from /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/lib/libc.so.6 |
A libasan trace would be better. |
Here's one https://gist.github.com/thatsmydoing/05750ff32b8e5b0f3d3fdbd6f813d1ad running against thatsmydoing@2d1b88a I do have another one that has a slightly different error and line https://gist.github.com/thatsmydoing/8a577c6c26005cde8ef70b6e0e181c09 |
Looks like a race condition, that I think we correctly identified now. Unfortunately it's a also a bit tricky to address and solve. I'll let you know when there's a patch you can test. |
@thatsmydoing please test the PR above. It already includes the other small change I pasted a few weeks ago, so you' won't need to reapply that yourself again. |
@thatsmydoing ping 🙂 |
Ah sorry, I have it running now and it seems good so far. I'll update you next week after having let it run for a bit more. I think the PR also needs Patchdiff --git a/src/plugins/janus_videoroom.c b/src/plugins/janus_videoroom.c index cdf42837..fb0ba279 100644 --- a/src/plugins/janus_videoroom.c +++ b/src/plugins/janus_videoroom.c @@ -8633,8 +8633,8 @@ static void janus_videoroom_hangup_media_internal(gpointer session_data) { } list = list->next; } + temp = temp->next; janus_videoroom_subscriber_stream_remove(s, NULL, TRUE); - temp = temp->next; } /* Free streams */ g_list_free(subscriber->streams); |
No it's there already. |
Ah you mean a different patch than mine I guess? |
Yeah, it's a similar fix to the first patch, just in a different area. |
Found another one too, pushed. |
I've not seen any crashes since I used the PR patchset so I think it's good. |
Ack, thanks for the feedback! I'll merge then. |
What version of Janus is this happening on?
6d2dd1b44d16d51c086f58f5ffc59c87a5e9b0c7
Have you tested a more recent version of Janus too?
Yes.
fe993140577ff91ec626eb7b5f03a2898809674f
Was this working before?
I'm not sure, it also crashes on 1.0.4. In practice, it doesn't crash that often. Maybe once a day in our usage.
Is there a gdb or libasan trace of the issue?
Reproduced locally against
fe993140577ff91ec626eb7b5f03a2898809674f
, https://gist.github.com/thatsmydoing/c8e82a531f3431cb8f066e4b54cb6decAdditional context
I can reproduce it pretty reliably with https://github.com/thatsmydoing/janus-gateway-2775/
The text was updated successfully, but these errors were encountered: