Skip to content
This repository has been archived by the owner on Jan 27, 2024. It is now read-only.

SIGSEGV in timer_alarm #582

Open
bodqhrohro opened this issue Feb 14, 2022 · 9 comments
Open

SIGSEGV in timer_alarm #582

bodqhrohro opened this issue Feb 14, 2022 · 9 comments

Comments

@bodqhrohro
Copy link

Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
0x00007fffdb3b4b60 in download_retry_alarm (TLS=0x55555718cc00, arg=0x555558670100) at queries.c:3253
3253	queries.c: Нет такого файла или каталога.
@(gdb) bt
#0  0x00007fffdb3b4b60 in download_retry_alarm (TLS=0x55555718cc00, arg=0x555558670100) at queries.c:3253
#1  0x00007fffdb39ad78 in timer_alarm (arg=<error reading variable: value has been optimized out>) at tgp-timers.c:35
#2  0x00007ffff72f6744 in g_timeout_dispatch (source=0x5555577ab220, callback=<optimized out>, user_data=<optimized out>)
    at ../../../glib/gmain.c:4933
#3  0x00007ffff72f5be4 in g_main_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:3381
#4  g_main_context_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:4099
#5  0x00007ffff72f5f88 in g_main_context_iterate (context=0x5555556c8b10, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../../../glib/gmain.c:4175
#6  0x00007ffff72f6273 in g_main_loop_run (loop=0x555556295170) at ../../../glib/gmain.c:4373
#7  0x00007ffff79d8b2a in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#8  0x0000555555591d70 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
@bodqhrohro
Copy link
Author

==1643036== 1 errors in context 1 of 9:
==1643036== Invalid read of size 8
==1643036==    at 0x11C1AB75: download_retry_alarm (queries.c:3257)
==1643036==    by 0x11C00D77: timer_alarm (tgp-timers.c:35)
==1643036==    by 0x5490743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FBE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FF87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x549003E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==1643036==    by 0xBA3D654: conversation_created_cb (plugin.c:91)
==1643036==    by 0x574D81B: purple_signal_emit_vargs (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574D9AD: purple_signal_emit (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x57166DB: purple_conversation_new (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574C86B: serv_got_joined_chat (in /usr/lib/libpurple.so.0.14.8)
==1643036==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==1643036== 
==1643036== 
==1643036== 1 errors in context 2 of 9:
==1643036== Invalid read of size 8
==1643036==    at 0x11C1AB69: download_retry_alarm (queries.c:3257)
==1643036==    by 0x11C00D77: timer_alarm (tgp-timers.c:35)
==1643036==    by 0x5490743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FBE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FF87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x549003E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==1643036==    by 0xBA3D654: conversation_created_cb (plugin.c:91)
==1643036==    by 0x574D81B: purple_signal_emit_vargs (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574D9AD: purple_signal_emit (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x57166DB: purple_conversation_new (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574C86B: serv_got_joined_chat (in /usr/lib/libpurple.so.0.14.8)
==1643036==  Address 0x19cb7668 is 9,000 bytes inside an unallocated block of size 9,120 in arena "client"
==1643036== 
==1643036== 
==1643036== 1 errors in context 3 of 9:
==1643036== Invalid read of size 4
==1643036==    at 0x11C1AB31: download_retry_alarm (queries.c:3253)
==1643036==    by 0x11C00D77: timer_alarm (tgp-timers.c:35)
==1643036==    by 0x5490743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FBE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x548FF87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x549003E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==1643036==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==1643036==    by 0xBA3D654: conversation_created_cb (plugin.c:91)
==1643036==    by 0x574D81B: purple_signal_emit_vargs (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574D9AD: purple_signal_emit (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x57166DB: purple_conversation_new (in /usr/lib/libpurple.so.0.14.8)
==1643036==    by 0x574C86B: serv_got_joined_chat (in /usr/lib/libpurple.so.0.14.8)
==1643036==  Address 0x19cb6fc0 is 7,296 bytes inside an unallocated block of size 9,120 in arena "client"

@bodqhrohro
Copy link
Author

@himselfv, could you take a look please? Seems like this is related to changes in majn/tgl@64dccea, but I'm not sure yet where ri->ev can be prematurely freed.

@himselfv
Copy link
Collaborator

himselfv commented Apr 14, 2022

@bodqhrohro
It sort of looks like everything else is broken. At 3253, only ri->D-> is dereferenced, and ri is asserted above, so it must be ri->D. At 3257, it's

TLS->timer_methods->remove (ri->ev);

But ev itself is not dereferenced here, and we're not yet in the remove call, so it's either ri-> (can't be, asserted), or TLS->timer_methods->, and timer_methods->remove is exactly $10, so I guess it's that.

Any details on when the crash happened? Logs?

@bodqhrohro
Copy link
Author

There are no debug logs related to telegram-purple right before in this case, only some usual XMPP/GDK chitchatting. Happens when Pidgin is starting and telegram-purple fetches the messages it missed, possibly on reconnects too.

@himselfv
Copy link
Collaborator

So apparently, to understand what's happening I had to look at Issue 580 which details the setup and gives more logs.

A connected instance of Pidgin is frozen, then network is disconnected/reconnected, forcing Pidgin to quickly disconnect/reconnect on unfreeze. The disconnection destroys the TLS* context, but some delayed download requests linger as timer_alarms for up to 10 seconds, and then fire with the broken TLS* pointer.

If this is the reason, it should not happen on initial connect.

==287030== Invalid read of size 1
==287030==    at 0x4841CE2: __strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==287030==    by 0x59E25E7: __vfprintf_internal (vfprintf-internal.c:1647)
==287030==    by 0x59F3145: __vsnprintf_internal (vsnprintf.c:114)
==287030==    by 0x11C50CE6: vsnprintf (stdio2.h:80)
==287030==    by 0x11C50CE6: tgl_snprintf (tools.c:113)
==287030==    by 0x11C2E957: download_next_part (queries.c:3323)
==287030==    by 0x11C2EB9C: download_retry_alarm (queries.c:3261)
==287030==    by 0x11C14D77: timer_alarm (tgp-timers.c:35)
==287030==    by 0x54A4743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3BE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3F87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A403E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==287030==  Address 0x1f813160 is 0 bytes inside a block of size 65 free'd
==287030==    at 0x483F9AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==287030==    by 0x11C46D79: tgl_free_str (tools.h:111)
==287030==    by 0x11C46D79: tgl_free_all (structures.c:2607)
==287030==    by 0x11C1AAAF: connection_data_free (tgp-structs.c:135)
==287030==    by 0x5725D69: _purple_connection_destroy (in /usr/lib/libpurple.so.0.14.8)
==287030==    by 0x5711E6D: purple_account_disconnect (in /usr/lib/libpurple.so.0.14.8)
==287030==    by 0x5724F15: purple_connection_disconnect_cb (in /usr/lib/libpurple.so.0.14.8)
==287030==    by 0x54A4743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3BE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3F87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A403E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==287030==    by 0xBA51654: conversation_created_cb (plugin.c:91)
==287030==  Block was alloc'd at
==287030==    at 0x483E77F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==287030==    by 0x11C5065F: tgl_alloc_release (tools.c:204)
==287030==    by 0x11C50ED5: tgl_strdup (tools.c:217)
==287030==    by 0x11C4C131: tgl_set_download_directory (tgl.c:54)
==287030==    by 0x11C1888F: tgprpl_login (telegram-purple.c:548)
==287030==    by 0x165968: do_signon (in /usr/bin/pidgin)
==287030==    by 0x54A4743: g_timeout_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3BE3: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A3F87: g_main_context_iterate.constprop.0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x54A403E: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7000.2)
==287030==    by 0x4C48D90: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.33)
==287030==    by 0xBA51654: conversation_created_cb (plugin.c:91)

To fix this, I guess download_retry code should keep a list of all outstanding download_retry timers and cancel all of them on TLS* destruction (tgl_free_all).

Or maybe TLS* should just cancel ALL timers created via TLS->timer_methods, on its destruction? I mean, all of them receive the TLS* pointer, so none can be safely called after it's destroyed. Summoning @EionRobb, if they even care about this project anymore.

@bodqhrohro
Copy link
Author

Or maybe TLS* should just cancel ALL timers created via TLS->timer_methods, on its destruction?

Prpls written by @EionRobb themself do, as a rule of thumb. That's what made me sure that corruptions happen in some other place before the timers there are fired, BTW. I had polluted numerous bug trackers for years because of this before finding the reason (time to close all of that, probably).

@EionRobb
Copy link
Contributor

Yay! Glad you found it :)

@bodqhrohro so the fix is to move to tdlib-purple? :)

@bodqhrohro
Copy link
Author

@bodqhrohro so the fix is to move to tdlib-purple? :)

Nope, I told already that I just rolled back to the tgl revision before the patch. It wasn't working ideally anyway, as I still was getting lots of non-downloaded pictures.

@bodqhrohro
Copy link
Author

Another use-after-free, seems like uncleared callbacks are at guilt there too:
pidgin-vg14.txt.gz

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants