Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in normal usage #819

Closed
coypoop opened this Issue Jan 18, 2018 · 15 comments

Comments

Projects
None yet
3 participants
@coypoop
Copy link
Contributor

commented Jan 18, 2018

This is with irssi 1.1.0.
I'm not sure what is the cause. I wasn't doing anything unusual at the time.

(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x000000000043549f in sig_print_starting ()
#2 0x0000000000488daf in signal_emit_real ()
#3 0x0000000000489319 in signal_emit_id ()
#4 0x000000000044e675 in printformat_module_dest_charargs ()
#5 0x000000000044e6ec in printformat_module_dest_args ()
#6 0x000000000044e85c in printformat_module ()
#7 0x0000000000488daf in signal_emit_real ()
#8 0x000000000048924f in signal_emit ()
#9 0x0000000000483a91 in server_start_connect ()
#10 0x000000000045fb37 in irc_server_connect ()
#11 0x0000000000482e32 in server_connect ()
#12 0x0000000000484de2 in server_reconnect_timeout ()
#13 0x0000779a07e4571f in g_timeout_dispatch () from /usr/pkg/lib/libglib-2.0.so.0
#14 0x0000779a07e44e1f in g_main_context_dispatch () from /usr/pkg/lib/libglib-2.0.so.0
#15 0x0000779a07e45137 in g_main_context_iterate.isra () from /usr/pkg/lib/libglib-2.0.so.0
#16 0x0000779a07e451f3 in g_main_context_iteration () from /usr/pkg/lib/libglib-2.0.so.0
#17 0x0000000000497a5c in main ()

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2018

some general questions

  • how did you install it?
  • does it give some details if you ask for bt full?
  • what scripts are you using?
  • the crash seems to be related to connecting to a server, do you remember what you typed before the crash?
  • did it crash suddenly or after you entered some command?
  • what os?
@coypoop

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2018

  • Via a package manager (pkgsrc), with only a single irrelevant patch.
  • Nothing in bt full, sorry
  • No scripts
  • Not intentionally, but I see that there was a status message
  • Suddenly
  • netbsd

The surrounding code in frame #1 is:

│0x435458 <sig_print_starting> mov 0x2aef7a(%rip),%eax # 0x6e43d8 <printing_splits>
│0x43545e <sig_print_starting+6> test %eax,%eax
│0x435460 <sig_print_starting+8> je 0x435463 <sig_print_starting+11>
│0x435462 <sig_print_starting+10> retq
│0x435463 <sig_print_starting+11> push %rbx
│0x435464 <sig_print_starting+12> mov %rdi,%rbx
│0x435467 <sig_print_starting+15> mov 0x8(%rdi),%rdi
│0x43546b <sig_print_starting+19> mov $0x49b364,%edx
│0x435470 <sig_print_starting+24> xor %esi,%esi
│0x435472 <sig_print_starting+26> callq 0x47e498 <module_check_cast>
│0x435477 <sig_print_starting+31> mov $0x49b7b9,%edx
│0x43547c <sig_print_starting+36> mov $0x4,%esi
│0x435481 <sig_print_starting+41> mov %rax,%rdi
│0x435484 <sig_print_starting+44> callq 0x477883 <chat_protocol_check_cast>
│0x435489 <sig_print_starting+49> test %rax,%rax
│0x43548c <sig_print_starting+52> je 0x4354a3 <sig_print_starting+75>
│0x43548e <sig_print_starting+54> mov 0x8(%rbx),%rax
│0x435492 <sig_print_starting+58> mov 0x18(%rbx),%rsi
│0x435496 <sig_print_starting+62> mov %rax,%rdi
│0x435499 <sig_print_starting+65> callq *0xd8(%rax)
---│0x43549f <sig_print_starting+71> test %eax,%eax
│0x4354a1 <sig_print_starting+73> jne 0x4354a5 <sig_print_starting+77>
│0x4354a3 <sig_print_starting+75> pop %rbx
│0x4354a4 <sig_print_starting+76> retq
│0x4354a5 <sig_print_starting+77> mov 0x8(%rbx),%rdi

@coypoop

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2018

There's a non-zero chance that it's a problem with how I am running a bleeding edge kernel having its own bugs (something else crashed inexplicably :(). If it sounds like a completely impossible crash, feel free to close the bug.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2018

maybe a bug related with netsplit printing..

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2018

would you consider running irssi with debug info compiled so that it might be easier to track down next time it crashes on you?

@coypoop

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2018

yes, I will.

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 20, 2018

sigh #812 (comment)

Okay so everything needed to understand this issue was already here

#12 0x0000000000484de2 in server_reconnect_timeout ()

Reconnect happens, there's no server.

#6 0x000000000044e85c in printformat_module ()

Someone prints

#1 0x000000000043549f in sig_print_starting ()

That's the one in fe-netsplit.c because the asm says printing_splits

│0x435472 <sig_print_starting+26> callq 0x47e498 <module_check_cast>
...
│0x435484 <sig_print_starting+44> callq 0x477883 <chat_protocol_check_cast>

These are PROTO_CHECK_CAST from IS_IRC_SERVER. They return true on null.

│0x43548e <sig_print_starting+54> mov 0x8(%rbx),%rax
│0x435492 <sig_print_starting+58> mov 0x18(%rbx),%rsi
│0x435496 <sig_print_starting+62> mov %rax,%rdi
│0x435499 <sig_print_starting+65> callq *0xd8(%rax)

This is server_ischannel(dest->server, dest->target) which is actually a macro for (server)->ischannel(server, channel).

So... yeah.

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 22, 2018

From #824:

@ailin-nemui:

IS_IRC_SERVER() should return false on null. not convinced yet (but maybe better than nothing)

me:

#define IS_IRC_SERVER(server) \
(IRC_SERVER(server) ? TRUE : FALSE)

#define IRC_SERVER(server) \
PROTO_CHECK_CAST(SERVER(server), IRC_SERVER_REC, chat_type, "IRC")

#define PROTO_CHECK_CAST(object, cast, type_field, id) \
((cast *) chat_protocol_check_cast(object, \
offsetof(cast, type_field), id))

void *chat_protocol_check_cast(void *object, int type_pos, const char *id)
{
return object == NULL ||
chat_protocol_lookup(id) !=
G_STRUCT_MEMBER(int, object, type_pos) ? NULL : object;
}

That object == NULL. Kind of a core thing to the whole object model so I really wouldn't risk changing it.

@ailin-nemui:

it says ( object == NULL || type is wrong ) ? NULL : object, thus it should be false on null. or am I making a fool of meself?

As I said in #824 (comment), this last comment is correct, but mostly relevant to this ticket, so I copied this part of the conversation here.

So IS_IRC_SERVER isn't supposed to return true on null, and part of my analysis here is wrong. Will post another comment with corrections.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2018

the original crash is plausible under this scenario: the server got disconnected and "server disconnected" emitted, thus the clean-up code already ran and free'd the isupport hash table needed for "is_channel". is the server object itself already freed? I don't know, and I would consider it a bug if there is a freed server recorded in dest (however it might be inevitable on some code paths). one thing that might change/improve the situation would be #803 but that's not going to happen in 1.0 or 1.1. As far as I can tell, the original code may be sound for two reasons, either because is_channel wasn't used or because there are no freed servers in the global servers list

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

@coypoop is it possible for you to print the contents of the server object in gdb?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

@dequis in theory there is a "server disconnected" handler in src/irc/core/netsplit.c that is supposed to destroy the netsplits. does anyone have a reliable way to test this crash?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

I prepared #832 based on this analysis

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

we may or may not have a problem if "server disconnected" handlers cause a print somehow before the netjoins are cleaned up but after the object has already been torn down. I'm not sure if that can happen (on first glance it doesn't seem like it should happen...)

@coypoop

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

sorry, I accidentally deleted the core dump and wasn't able to reproduce the issue

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

Oh oops sorry I thought I posted this already, I was looking into this stuff the last weekend but hard disk shenanigans happened, that was a "small" distraction.

To repro:

  1. Connect to a bitlbee
  2. Identify and connect one account
  3. ac off
  4. /reconnect

This is slightly different from the original crash which had a reconnect timeout, but, close enough

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(rr) bt
#0  0x0000000000000000 in ?? ()
#1  0x00005597bbd6b420 in sig_print_starting (dest=0x7fff0f49cd40) at fe-netsplit.c:258
#2  0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 303>, params=params@entry=1, va=va@entry=0x7fff0f49cac0, first_hook=<optimized out>) at signals.c:242
#3  0x00005597bbdcdc75 in signal_emit_id (signal_id=<optimized out>, params=params@entry=1) at signals.c:304
#4  0x00005597bbd885f0 in printformat_module_dest_charargs (module=0x5597bbde29d5 "fe-common/core", dest=0x7fff0f49cd40, formatnum=41, arglist=0x7fff0f49cbf0) at printtext.c:71
#5  0x00005597bbd88694 in printformat_module_dest_args (module=module@entry=0x5597bbde29d5 "fe-common/core", dest=dest@entry=0x7fff0f49cd40, formatnum=formatnum@entry=41, va=va@entry=0x7fff0f49cd90) at printtext.c:58
#6  0x00005597bbd88895 in printformat_module_args (va=0x7fff0f49cd90, formatnum=41, level=<optimized out>, target=<optimized out>, server=<optimized out>, module=0x5597bbde29d5 "fe-common/core") at printtext.c:101
#7  printformat_module (module=0x5597bbde29d5 "fe-common/core", server=<optimized out>, target=<optimized out>, level=<optimized out>, formatnum=41) at printtext.c:110
#8  0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 339>, params=params@entry=1, va=va@entry=0x7fff0f49cf00, first_hook=<optimized out>) at signals.c:242
#9  0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bbde68be "server looking", params=params@entry=1) at signals.c:286
#10 0x00005597bbdc7564 in server_start_connect (server=server@entry=0x5597bce18f20) at servers.c:431
#11 0x00005597bbd9ceee in irc_server_connect (server=0x5597bce18f20) at irc-servers.c:346
#12 0x00005597bbdc6768 in server_connect (conn=conn@entry=0x5597bce18970) at servers.c:347
#13 0x00005597bbdc8796 in cmd_reconnect (data=<optimized out>, server=<optimized out>) at servers-reconnect.c:439
#14 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 46>, params=params@entry=3, va=va@entry=0x7fff0f49d110, first_hook=<optimized out>) at signals.c:242
#15 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bce16170 "command reconnect", params=params@entry=3) at signals.c:286
#16 0x00005597bbdb77ca in parse_command (item=0x5597bce10ac0, server=0x5597bce0caf0, expand_aliases=<optimized out>, command=0x5597bce174e1 "reconnect ") at commands.c:907
#17 event_command (line=0x5597bce174e1 "reconnect ", server=0x5597bce0caf0, item=0x5597bce10ac0) at commands.c:953
#18 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 6>, params=params@entry=3, va=va@entry=0x7fff0f49d2d0, first_hook=<optimized out>) at signals.c:242
#19 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bbdde0af "send command", params=params@entry=3) at signals.c:286
#20 0x00005597bbd4ba20 in key_send_line () at gui-readline.c:493
#21 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 603>, params=params@entry=3, va=va@entry=0x7fff0f49d460, first_hook=<optimized out>) at signals.c:242
#22 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bce17cf0 "key send_line", params=params@entry=3) at signals.c:286
#23 0x00005597bbd86f99 in sig_multi (data=<optimized out>, gui_data=0x0) at keyboard.c:717
#24 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 298>, params=params@entry=3, va=va@entry=0x7fff0f49d620, first_hook=<optimized out>) at signals.c:242
#25 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bce15e40 "key multi", params=params@entry=3) at signals.c:286
#26 0x00005597bbd85f27 in key_emit_signal (key=<optimized out>, key=<optimized out>, keyboard=<optimized out>) at keyboard.c:585
#27 0x00005597bbd87992 in key_pressed (keyboard=0x5597bcd15d80, key=key@entry=0x7fff0f49d790 "^M") at keyboard.c:672
#28 0x00005597bbd4b12c in sig_gui_key_pressed (keyp=0xd) at gui-readline.c:448
#29 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 629>, params=params@entry=1, va=va@entry=0x7fff0f49d850, first_hook=<optimized out>) at signals.c:242
#30 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bbdde1a5 "gui key pressed", params=params@entry=1) at signals.c:286
#31 0x00005597bbd4c9e9 in sig_input () at gui-readline.c:852
#32 0x00005597bbdbed7d in irssi_io_invoke (source=<optimized out>, condition=<optimized out>, data=<optimized out>) at misc.c:51
#33 0x00007f78c6c19c15 in g_main_dispatch (context=0x5597bcc7e040) at gmain.c:3142
#34 g_main_context_dispatch (context=context@entry=0x5597bcc7e040) at gmain.c:3795
#35 0x00007f78c6c19fe0 in g_main_context_iterate (context=context@entry=0x5597bcc7e040, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3868
#36 0x00007f78c6c1a06c in g_main_context_iteration (context=0x5597bcc7e040, may_block=1) at gmain.c:3929
#37 0x00005597bbd47a4f in main (argc=<optimized out>, argv=<optimized out>) at irssi.c:331

The situation at the moment of the crash:

(rr) up
#1  0x00005597bbd6b420 in sig_print_starting (dest=0x7fff0f49cd40) at fe-netsplit.c:258
258             if (!server_ischannel(dest->server, dest->target))
(rr) p dest->server
$1 = (SERVER_REC *) 0x5597bce18f20
(rr) p *dest->server
$2 = {type = 732, chat_type = 66, refcount = 1, connrec = 0x5597bce18970,
  connect_time = 0, real_connect_time = 0, tag = 0x5597bce0f6b0 "localhost",
  nick = 0x5597bce0f670 "dx|fb-dolores", connected = 0, disconnected = 0,
  connection_lost = 0, session_reconnect = 0, no_reconnect = 0, handle = 0x0,
  readtag = 0, connect_pipe = {0x5597bce0fb00, 0x5597bcc8d580}, connect_tag = 32,
  connect_pid = 29323, rawlog = 0x5597bce18b90, module_data = 0x5597bce10d80,
  version = 0x0, away_reason = 0x0, last_invite = 0x0, server_operator = 0,
  usermode_away = 0, banned = 0, dns_error = 0, lag_sent = {tv_sec = 0, tv_usec = 0},
  lag_last_check = 0, lag = 0, channels = 0x0, queries = 0x0, channels_join = 0x0,
  isnickflag = 0x0, ischannel = 0x0, get_nick_flags = 0x0, send_message = 0x0,
  split_message = 0x0, channel_find_func = 0x0, query_find_func = 0x0, mask_match_func =
    0x0, nick_match_msg = 0x0}
(rr) p dest->server->ischannel
$3 = (int (*)(SERVER_REC *, const char *)) 0x0

So yeah, dest->server isn't null, that was irrelevant.

Going back to see how the null in ischannel got there:

(rr) p &dest->server->ischannel
$4 = (int (**)(SERVER_REC *, const char *)) 0x5597bce18ff8
(rr) watch -l *0x5597bce18ff8
Hardware watchpoint 1: -location *0x5597bce18ff8
(rr) run
[...]
(rr) c
[...]
Thread 1 hit Hardware watchpoint 1: -location *0x5597bce18ff8

Old value = 45056
New value = 0
0x00007f78c605bded in __memset_avx2_erms () from /usr/lib/libc.so.6
(rr) bt 10
#0  0x00007f78c605bded in __memset_avx2_erms () from /usr/lib/libc.so.6
#1  0x00007f78c5f8786b in calloc () from /usr/lib/libc.so.6
#2  0x00007f78c6c1f3c1 in g_malloc0 (n_bytes=n_bytes@entry=4896) at gmem.c:124
#3  0x00005597bbd9cd2a in irc_server_init_connect (conn=0x5597bce18970) at irc-servers.c:307
#4  0x00005597bbdc675f in server_connect (conn=conn@entry=0x5597bce18970) at servers.c:346
#5  0x00005597bbdc8796 in cmd_reconnect (data=<optimized out>, server=<optimized out>)
    at servers-reconnect.c:439
#6  0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 46>,
    params=params@entry=3, va=va@entry=0x7fff0f49d110, first_hook=<optimized out>)
    at signals.c:242
#7  0x00005597bbdcdb9d in signal_emit (
    signal=signal@entry=0x5597bce16170 "command reconnect", params=params@entry=3)
    at signals.c:286
#8  0x00005597bbdb77ca in parse_command (item=0x5597bce10ac0, server=0x5597bce0caf0,
    expand_aliases=<optimized out>, command=0x5597bce174e1 "reconnect ") at commands.c:907
#9  event_command (line=0x5597bce174e1 "reconnect ", server=0x5597bce0caf0,
    item=0x5597bce10ac0) at commands.c:953
(More stack frames follow...)

(rr) c
Continuing.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()

So this null got written on irc_server_init_connect, its initial allocation, and continuing goes straight to the crash. This isn't #803

Also worth noting, two g_log happen before the crash, same backtrace:

Thread 1 hit Breakpoint 1, g_log (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=format@entry=0x7f78c6c6b9aa "%s: assertion '%s' failed") at gmessages.c:1399
1399    {
(rr) bt 20
#0  g_log (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=format@entry=0x7f78c6c6b9aa "%s: assertion '%s' failed") at gmessages.c:1399
#1  0x00007f78c6c210d9 in g_return_if_fail_warning (log_domain=log_domain@entry=0x0, pretty_function=pretty_function@entry=0x5597bbded8a0 <__func__.13110> "ischannel_func", expression=expression@entry=0x5597bbdeceab "data != NULL")
    at gmessages.c:2702
#2  0x00005597bbd9c965 in ischannel_func (server=<optimized out>, data=<optimized out>) at irc-servers.c:81
#3  0x00005597bbd6b420 in sig_print_starting (dest=0x7fff0f49ca80) at fe-netsplit.c:258
#4  0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 303>, params=params@entry=1, va=va@entry=0x7fff0f49c800, first_hook=<optimized out>) at signals.c:242
#5  0x00005597bbdcdc75 in signal_emit_id (signal_id=<optimized out>, params=params@entry=1) at signals.c:304
#6  0x00005597bbd885f0 in printformat_module_dest_charargs (module=0x5597bbde29d5 "fe-common/core", dest=0x7fff0f49ca80, formatnum=49, arglist=0x7fff0f49c930) at printtext.c:71
#7  0x00005597bbd88694 in printformat_module_dest_args (module=module@entry=0x5597bbde29d5 "fe-common/core", dest=dest@entry=0x7fff0f49ca80, formatnum=formatnum@entry=49, va=va@entry=0x7fff0f49cad0) at printtext.c:58
#8  0x00005597bbd88895 in printformat_module_args (va=0x7fff0f49cad0, formatnum=49, level=262144, target=0x0, server=0x5597bbde29d5, module=0x5597bbde29d5 "fe-common/core") at printtext.c:101
#9  printformat_module (module=module@entry=0x5597bbde29d5 "fe-common/core", server=<optimized out>, target=target@entry=0x0, level=level@entry=262144, formatnum=formatnum@entry=49) at printtext.c:110
#10 0x00005597bbd7efc3 in sig_server_quit (server=<optimized out>, msg=<optimized out>) at fe-server.c:408
#11 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 73>, params=params@entry=2, va=va@entry=0x7fff0f49cc60, first_hook=<optimized out>) at signals.c:242
#12 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bbde68f5 "server quit", params=params@entry=2) at signals.c:286
#13 0x00005597bbdd0966 in cmd_disconnect (data=<optimized out>, server=0x5597bce0caf0) at chat-commands.c:304
#14 0x00005597bbdcd646 in signal_emit_real (rec=<Signal 47>, params=<optimized out>, va=va@entry=0x7fff0f49ce00, first_hook=<optimized out>) at signals.c:242
#15 0x00005597bbdcdddf in signal_continue (params=<optimized out>) at signals.c:328
#16 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 47>, params=params@entry=2, va=va@entry=0x7fff0f49cf60, first_hook=<optimized out>) at signals.c:242
#17 0x00005597bbdcdb9d in signal_emit (signal=signal@entry=0x5597bbde506a "command disconnect", params=params@entry=2) at signals.c:286
#18 0x00005597bbdc877d in cmd_reconnect (data=<optimized out>, server=0x5597bce0caf0) at servers-reconnect.c:403
#19 0x00005597bbdcd646 in signal_emit_real (rec=rec@entry=<Signal 46>, params=params@entry=3, va=va@entry=0x7fff0f49d110, first_hook=<optimized out>) at signals.c:242
(More stack frames follow...)

Seems to be non-null server but null target passed to ischannel_func:

(rr) reverse-finish
Run back to call of #0  0x00005597bbd9c960 in ischannel_func (server=<optimized out>, data=<optimized out>) at irc-servers.c:81
0x00005597bbd6b41a in sig_print_starting (dest=0x7fff0f49ca80) at fe-netsplit.c:258
258             if (!server_ischannel(dest->server, dest->target))
(rr) s
ischannel_func (server=0x5597bce0caf0, data=0x0) at irc-servers.c:81
81              g_return_val_if_fail(data != NULL, FALSE);

These warnings are still present with #832.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.