-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in mosh-server on debian sid #727
Comments
I get similar results with on my Debian sid host (package version 1.2.5-1.1). Here is the entire subprocess' output from "strace -f mosh-server": [pid 3026] close(6 <unfinished ...> |
Can you try to get a backtrace with gdb? See the instructions on the Debian wiki. |
Not sure where to go from here if anyone has an idea. gdb mosh-serverGNU gdb (Debian 7.10-1+b1) 7.10 MOSH CONNECT 60001 DahOJJfGNv9ZR2Rz5pQrMw mosh-server (mosh 1.2.5) [build mosh 1.2.5] [mosh-server detached, pid = 19701] |
The crash seems to be happening in the background process forked by |
Wait,
(Unless the |
Unfortunately mosh-server is not staying around long enough it would seem. [mosh-server detached, pid = 23302] |
I used the instructions on the Debian wiki to recompile mosh with debuggign symbols. It appears that the process is crashing too quickly for a separate gdb: $ gdb -p $(mosh-server |& sed -ne 's/.pid = ([0-9]).*/\1/p') I will keep trying to get a proper backtrace. |
Based on debugging forks, I was able to get a backtrace. Below is the outline, for reproduction. Look at mosh-segv.txt for the full output. $ gdb \
-ex "set follow-fork-mode child" \
-ex "set detach-on-fork off" \
-ex "set pagination 0" \
-ex "run" \
/usr/bin/mosh-server
[ snip ]
(gdb) info inferiors
[ snip ]
(gdb) inferior 2
(gdb) continue
Program received signal SIGSEGV, Segmentation fault.
[ snip ]
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff697db1b in ?? () from /usr/lib/x86_64-linux-gnu/libutempter.so.0
#2 0x00007ffff697dc82 in utempter_add_record () from /usr/lib/x86_64-linux-gnu/libutempter.so.0
#3 0x000055555555dfb8 in run_server (with_motd=true, verbose=false, colors=0, command_argv=0x7fffffffd940, command_path="/bin/bash", desired_port=<optimized out>, desired_ip=<optimized out>) at mosh-server.cc:498
#4 main (argc=<optimized out>, argv=<optimized out>) at mosh-server.cc:322 |
Boy, this makes it seem like utempter might be broken on Debian sid. We should try to nail down a non-mosh-specific test case and see if we can reproduce it. |
I rebuilt libutempter with debug symbols and get one more line in the backtrace: (gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff697dbf1 in execute_helper (master_fd=5, argv=0x7fffffffd730) at iface.c:110
#2 0x00007ffff697dd1c in utempter_add_record (master_fd=5, hostname=0x7fffffffe0d0 "mosh [30089]") at iface.c:146
#3 0x000055555555dfb8 in run_server (with_motd=true, verbose=false, colors=0, command_argv=0x7fffffffd890, command_path="/bin/bash", desired_port=<optimized out>, desired_ip=<optimized out>) at mosh-server.cc:498
#4 main (argc=<optimized out>, argv=<optimized out>) at mosh-server.cc:322 |
If I edit libutempter and place a "return" at line 93 of iface.c (to return early from execute_helper), then mosh works for me. The source I edited matched the upstream git repo (this code hasn't changed in many years). |
I've managed to reproduce this in an EC2 instance running sid. The point of failure is an uninitialized trampoline for fork() in libutempter0-- the GOT entry is 0, and the crash is from our process trying to jump there. Tracing it with GDB's So this doesn't appear to be a problem with mosh or utempter, but I now can't debug this further either, at least not tonight. I'm also trying to reproduce it in an instance running Arch, but setting that up is going...exceedingly slowly for some reason. How to trace this (this is from memory and may not be quite right):
|
This should help on what's going on: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=817929#35 I'm experiencing this too. Tried with older versions of mosh and got the same result, so it's definitely not because something changed on mosh but in some of it's dependencies. |
Thanks. Replicating the STC in that PR gets the same behavior I'm seeing in gdb, so it's a very useful clue. Working on it. |
Please, see my analysis on the glibc mailing list [1]. On my debian, to prevent mosh server to crash, you my ensure that it must not linked with "-lpthread". Instead "-pthread" is OK. [1] https://sourceware.org/ml/libc-help/2016-03/msg00004.html |
protobuf uses an obsolete automake pthreads detection macro, which results in "pkgconfig --libs protobuf" returning "-lprotobuf -pthread -lpthread" on Linux. Remove the unnecessary and dangerous -lpthread in that case. Fixes mobile-shell#727, mosh-server crash in libutempter on Debian Sid.
There's plenty of little bugs evenly spattered here, and many ways to fix this:
Fixing any one of these little issues would solve Mosh's problem. I think the problem is only maybe 10% our fault, but we're left holding a bunch of pieces. The fix in my PR is perhaps slightly less correct than handling CFLAGS better for link commands, but it's also focused on the specific problem and less likely to cause problems on random other platforms. Bikeshed color opinions (and more substantive comments) will be gratefully listened to. I'll also open an issue on protobuf, which I think is primarily to blame here. |
This is fixed for mosh master, but not yet for Debian. Working on that. |
This is believed fixed (worked-around) for Debian now; mosh-1.2.5-2 is available for unstable and experimental. Thanks, @keithw! |
Discussion on this issue at mobile-shell/mosh#727 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=817929#35
Discussion on this issue at mobile-shell/mosh#727 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=817929#35
Hi,
I guess I need a little help to provide useful information on this, but I have a problem with the mosh-server process not starting up properly on debian sid. I noticed the problem first yesterday (9.3.) morning when trying to login into my server.
Starting the process manually on the affected system yields:
$ mosh-server
MOSH CONNECT 60001
mosh-server (mosh 1.2.5) [build mosh 1.2.5]
Copyright 2012 Keith Winstein mosh-devel@mit.edu
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[mosh-server detached, pid = 3321]
$
which looks ok. But there is no mosh-server process running and a message in the kernel log:
[ 4097.845386] mosh-server[3321]: segfault at 0 ip (null) sp 00007fff59cd0fa8 error 14 in mosh-server[55715b883000+59000]
last lines of strace -f mosh-server:
....
ioctl(5, TIOCSPTLCK, [0]) = 0
ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(5, TIOCGPTN, [1]) = 0
stat("/dev/pts/1", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
open("/dev/pts/1", O_RDWR|O_NOCTTY) = 6
ioctl(6, TIOCSWINSZ, {ws_row=58, ws_col=272, ws_xpixel=0, ws_ypixel=0}) = 0
clone(strace: Process 3333 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc0096e2a10) = 3333
[pid 3333] set_robust_list(0x7fc0096e2a20, 24) = 0
[pid 3332] close(6) = 0
[pid 3332] rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fc008d8bd30}, {SIG_DFL, [], 0}, 8) = 0
[pid 3332] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
[pid 3333] close(5) = 0
[pid 3333] setsid() = 3333
[pid 3333] ioctl(6, TIOCSCTTY, 0) = 0
[pid 3333] dup2(6, 0) = 0
[pid 3333] dup2(6, 1) = 1
[pid 3333] dup2(6, 2) = 2
[pid 3333] close(6) = 0
[pid 3333] rt_sigaction(SIGHUP, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7fc008d8bd30}, NULL, 8) = 0
[pid 3333] rt_sigaction(SIGPIPE, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7fc008d8bd30}, NULL, 8) = 0
[pid 3333] close(3) = 0
[pid 3333] --- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
[pid 3332] +++ killed by SIGSEGV +++
+++ killed by SIGHUP +++
Debian Package Version: 1.2.5-1.1
I tried building Version 1.2.5 and the current git master (e7303e0) both yielding at least the same SIGSEGV.
Further library versions can be provided on request since I am not sure which libraries are involved.
I am not very good with gdb, but if someone tells me what to do, I can try to provide more information.
The text was updated successfully, but these errors were encountered: