Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows that are renamed get left to die in the connection.windows table #13

Closed
jwise opened this issue Oct 5, 2011 · 6 comments
Closed
Assignees
Milestone

Comments

@jwise
Copy link
Owner

jwise commented Oct 5, 2011

The following sequence will cause a crash:

  • User signs on as SomeUser.
  • User renames to someuser.
  • User disconnects.
  • User connects as someuser.
  • User renames to SomeUser.
  • User goes away.

After the rename, the user will still be in naim.connections[].windows as "SomeUser", and on the disconnect, they will not be removed from n.c.w. When they connect, the new buddywin_t will be added as "someuser" to n.c.w, but when they rename back to SomeUser, their buddywin's light userdata will be pointing to a dead buddywin_t.

Simply changing delwin() to do a case insensitive remove is not a solution, because on IRC, a user can change their nick to be something else entirely. Proposed solutions are to add a "changewin" function to the C->Lua API, or to change the .windows structure to key on the light userdata itself, and have the lookups be against get_winname; I'm leaning towards the latter.

The original segv for this is as follows:

Running naim 0.12.0-2011-09-10-0413 (development snapshot) for 3d 3:31m.
Segmentation Fault; partial symbolic backtrace:
0: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'naim_faulthandler+0x26 [0x8082fbe]
1: /lib/libc.so.1'__sighndlr+0x15 [0xfee7f495]
2: /lib/libc.so.1'call_user_handler+0x2a2 [0xfee7235e]
3: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'nw_getcol+0xf [0x808746b]
4: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'vhwprintf+0x21 [0x8082061]
5: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'hwprintf+0x18 [0x80820f8]
6: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'window_echof+0x8b [0x807386f]
7: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_buddywin_t_echo+0x46 [0x809e1ba]
8: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_precall+0x216 [0x80aa47a]
9: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaV_execute+0xe3a [0x80a95e2]
10: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_call+0x7b [0x80aa7bf]
11: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'f_call+0x16 [0x80a6b16]
12: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_rawrunprotected+0x58 [0x80a9d30]
13: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_pcall+0x3d [0x80aaa6d]
14: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'lua_pcall+0x56 [0x80a6b72]
15: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_call_hook+0x9c [0x80b0054]
16: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_hook+0x72 [0x80b046e]
17: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'main+0x75c [0x8083820]
18: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_start+0x83 [0x8064077]

@ghost ghost assigned jwise Oct 5, 2011
@jwise
Copy link
Owner Author

jwise commented Oct 13, 2011

And another one:

Running naim 0.12.0-2011-09-10-0413 (development snapshot) for 7d 20:45m.
Segmentation Fault; partial symbolic backtrace:
0: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'naim_faulthandler+0x26 [0x8082fbe]
1: /lib/libc.so.1'__sighndlr+0x15 [0xfee7f495]
2: /lib/libc.so.1'call_user_handler+0x2a2 [0xfee7235e]
3: /lib/libc.so.1'_fprintf+0x18 [0xfee440b0]
4: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'vhhprintf+0x27c [0x8081ffc]
5: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'vhwprintf+0x46 [0x8082086]
6: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'hwprintf+0x18 [0x80820f8]
7: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'window_echof+0x8b [0x807386f]
8: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_buddywin_t_echo+0x46 [0x809e1ba]
9: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_precall+0x216 [0x80aa47a]
10: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaV_execute+0xe3a [0x80a95e2]
11: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_call+0x7b [0x80aa7bf]
12: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'f_call+0x16 [0x80a6b16]
13: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_rawrunprotected+0x58 [0x80a9d30]
14: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_pcall+0x3d [0x80aaa6d]
15: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'lua_pcall+0x56 [0x80a6b72]
16: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_call_hook+0x9c [0x80b0054]
17: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_hook+0x72 [0x80b046e]
18: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'main+0x75c [0x8083820]
19: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_start+0x83 [0x8064077]

@jwise
Copy link
Owner Author

jwise commented Oct 14, 2011

And another one:

Assertion failed: strlen(h->addch.buf) == h->addch.len, file hwprintf.c, line 49

Running naim 0.12.0-2011-09-10-0413 (development snapshot) for 21:33m.
Abort; partial symbolic backtrace:
0: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'naim_faulthandler+0x26 [0x8082fbe]
1: /lib/libc.so.1'__sighndlr+0x15 [0xfee7f495]
2: /lib/libc.so.1'call_user_handler+0x2a2 [0xfee7235e]
3: /lib/libc.so.1'_lwp_kill+0x15 [0xfee843d5]
4: /lib/libc.so.1'raise+0x22 [0xfee17faa]
5: /lib/libc.so.1'abort+0x74 [0xfedef91c]
6: /lib/libc.so.1'__assert+0x82 [0xfedefd2a]
7: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'h_init+0x65 [0x80807c9]
8: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'vhwprintf+0x21 [0x8082061]
9: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'hwprintf+0x18 [0x80820f8]
10: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'window_echof+0x8b [0x807386f]
11: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_buddywin_t_echo+0x46 [0x809e1ba]
12: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_precall+0x216 [0x80aa47a]
13: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaV_execute+0xe3a [0x80a95e2]
14: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_call+0x7b [0x80aa7bf]
15: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'f_call+0x16 [0x80a6b16]
16: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_rawrunprotected+0x58 [0x80a9d30]
17: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'luaD_pcall+0x3d [0x80aaa6d]
18: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'lua_pcall+0x56 [0x80a6b72]
19: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_call_hook+0x9c [0x80b0054]
20: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_nlua_hook+0x72 [0x80b046e]
21: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'main+0x75c [0x8083820]
22: /storage/home/joshua/naim-0.12.0-2011-09-10-0413/src/naim'_start+0x83 [0x8064077]

@jwise
Copy link
Owner Author

jwise commented Oct 15, 2011

I spy, with my little eye, something beginning with "U". I think it's a use-after-free, with an object left around with Lua references to it after the actual bwin was dead. Here's some telling evidence:

Program received signal SIGSEGV, Segmentation fault.
0x0808746b in nw_getcol (win=0x4040d00) at win.c:536
536 return(getcurx(&(win->_win->win)));
(gdb) bt
#0 0x0808746b in nw_getcol (win=0x4040d00) at win.c:536
#1 0x08080777 in h_init (h=0x8044ad0, win=0xffffffff) at hwprintf.c:45
#2 0x08082061 in vhwprintf (win=0xffffffff, _pair=3, format=0x80c490a "%s", msg=0x8044fec "\020P\004\b\020P\004\b@") at hwprintf.c:686
#3 0x080820f8 in hwprintf (win=0x812cc70, _pair=3, format=0x80c490a "%s") at hwprintf.c:715
#4 0x0807386f in window_echof (bwin=0x812cc70, format=0x80c49fb "%s\n") at echof.c:46
#5 0x0809e1ba in _nlua_buddywin_t_echo (L=0x80fef70) at conn.c:229
#6 0x080aa47a in luaD_precall (L=0x80fef70, func=0x85edff0, nresults=0) at lua511/ldo.c:319
#7 0x080a95e2 in luaV_execute (L=0x80fef70, nexeccalls=4) at lua511/lvm.c:587
#8 0x080aa7bf in luaD_call (L=0x80fef70, func=0x85ede28, nResults=-1) at lua511/ldo.c:377
#9 0x080a6b16 in f_call (L=0x80fef70, ud=0x4040d00) at lua511/lapi.c:796
#10 0x080a9d30 in luaD_rawrunprotected (L=0x80fef70, f=0x80a6b00 <f_call>, ud=0x8047310) at lua511/ldo.c:116
#11 0x080aaa6d in luaD_pcall (L=0x80fef70, func=0x80a6b00 <f_call>, u=0x8047310, old_top=135262064, ef=67374336) at lua511/ldo.c:461
#12 0x080a6b72 in lua_pcall (L=0x80fef70, nargs=3, nresults=-1, errfunc=-1) at lua511/lapi.c:817
#13 0x080b0054 in _call_hook (L=0x80fef70, npreargs=0, nrets=1, signature=0x80c67c4 "fff", msg=0x80fd134 "pï\017\b\001") at hooks.c:74
#14 0x080b046e in _nlua_hook (userdata=0x1e, signature=0x80c67c4 "fff") at hooks.c:176
#15 0x08083820 in main (argc=1, args=0x8047740) at main.c:452
(gdb) print win
$1 = (win_t *) 0x4040d00
(gdb) print *win
Cannot access memory at address 0x4040d00
(gdb) up 5
#5 0x0809e1ba in _nlua_buddywin_t_echo (L=0x80fef70) at conn.c:229
229 window_echof(bwin, "%s\n", str);
(gdb) print bwin
$2 = (buddywin_t *) 0x812cc60
(gdb) print *bwin
$3 = {winname = 0xd <Address 0xd out of bounds>, blurb = 0x2f001d <Address 0x2f001d out of bounds>, status = 0x2050001 <Address 0x2050001 out of bounds>, waiting = 1 '\001', keepafterso = 0 '\0', nwin = {
_win = 0x4040d00, logfile = 0xe977934e, logfilelines = 436470272, height = 659101184, dirty = 1 '\001', small = 0 '\0'}, pouncear = 0x65682074, pouncec = 1914725746, informed = 1952999273,
closetime = 2003791392, viewtime = 1.2102836149954664e-312, e = {buddy = 0x89c9f08, chat = 0x89c9f08, transfer = 0x89c9f08}, et = 135403488, next = 0x0, conn = 0x5}

@jwise
Copy link
Owner Author

jwise commented Oct 15, 2011

I have two running hypotheses. I suspect that if a user AWAYs and then DELWINs (i.e., is autoremoved from the winlist) in the same postselect cycle, then there may be a use-after-free condition, but I can't see how that could happen right now. I'm also seeing some pointer strangeness -- in particular:
(gdb) print (const char *)((TString *)(((Table *)0x89cca70).node->i_key.nk.value.gc)+1)
$57 = 0x81104b8 "handle"
(gdb) print ((Table *)0x89cca70).node->i_val.value.p
$58 = (void *) 0x812cc60

but I need to dive into the Lua code tomorrow to see why I'm getting 0x...70 in the backtrace and 0x...60 in the Lua internal memory dump.

@jwise
Copy link
Owner Author

jwise commented Oct 16, 2011

I have ruled out both hypotheses as root causes.

The first hypothesis has been ruled out because the user in question is still online, and the window is still open.

The second hypothesis has been ruled out as optimization -- namely, bwin->nwin is the only value used from then on, and:

(gdb) up
#5 0x0809e1ba in _nlua_buddywin_t_echo (L=0x80fef70) at conn.c:229
229 window_echof(bwin, "%s\n", str);
(gdb) print &(bwin->nwin)
$102 = (win_t *) 0x812cc70

My new hypothesis has to do with the fact that screen names are not canonicalized on insert, but always compared on lookup. When a user changes the case of their name, signs off, and signs back on, then it might be possible that they have two entries in the Lua list of windows, which can result in a use-after-free case as the wrong entry is looked up in the user list.

@jwise jwise closed this as completed in f53cac2 Oct 16, 2011
@jwise
Copy link
Owner Author

jwise commented Oct 16, 2011

I went with the "store by handle" solution. Should also evaluate doing that more generally. About five hours of debugging and tracking came down to 19 insertions(+), 12 deletions(-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant