SIGSEGV in mosh-server on debian sid #727

Closed
gibbon-joel opened this Issue Mar 10, 2016 · 18 comments

Projects

None yet

8 participants

@gibbon-joel

Hi,

I guess I need a little help to provide useful information on this, but I have a problem with the mosh-server process not starting up properly on debian sid. I noticed the problem first yesterday (9.3.) morning when trying to login into my server.

Starting the process manually on the affected system yields:

$ mosh-server

MOSH CONNECT 60001

mosh-server (mosh 1.2.5) [build mosh 1.2.5]
Copyright 2012 Keith Winstein mosh-devel@mit.edu
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[mosh-server detached, pid = 3321]
$

which looks ok. But there is no mosh-server process running and a message in the kernel log:
[ 4097.845386] mosh-server[3321]: segfault at 0 ip (null) sp 00007fff59cd0fa8 error 14 in mosh-server[55715b883000+59000]

last lines of strace -f mosh-server:
....
ioctl(5, TIOCSPTLCK, [0]) = 0
ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(5, TIOCGPTN, [1]) = 0
stat("/dev/pts/1", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
open("/dev/pts/1", O_RDWR|O_NOCTTY) = 6
ioctl(6, TIOCSWINSZ, {ws_row=58, ws_col=272, ws_xpixel=0, ws_ypixel=0}) = 0
clone(strace: Process 3333 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc0096e2a10) = 3333
[pid 3333] set_robust_list(0x7fc0096e2a20, 24) = 0
[pid 3332] close(6) = 0
[pid 3332] rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fc008d8bd30}, {SIG_DFL, [], 0}, 8) = 0
[pid 3332] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
[pid 3333] close(5) = 0
[pid 3333] setsid() = 3333
[pid 3333] ioctl(6, TIOCSCTTY, 0) = 0
[pid 3333] dup2(6, 0) = 0
[pid 3333] dup2(6, 1) = 1
[pid 3333] dup2(6, 2) = 2
[pid 3333] close(6) = 0
[pid 3333] rt_sigaction(SIGHUP, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7fc008d8bd30}, NULL, 8) = 0
[pid 3333] rt_sigaction(SIGPIPE, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7fc008d8bd30}, NULL, 8) = 0
[pid 3333] close(3) = 0
[pid 3333] --- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
[pid 3332] +++ killed by SIGSEGV +++
+++ killed by SIGHUP +++

Debian Package Version: 1.2.5-1.1
I tried building Version 1.2.5 and the current git master (e7303e0) both yielding at least the same SIGSEGV.

  • ii libncurses5:am 6.0+20160213 amd64
  • ii libprotobuf-c1 1.2.1-1 amd64
  • ii libprotobuf9v5 2.6.1-1.3 amd64
  • ii libssl1.0.0:am 1.0.2d-1 amd64
  • ii libssl1.0.2:am 1.0.2g-1 amd64
  • ii libutempter0:a 1.1.6-3 amd64
  • ii protobuf-compi 2.6.1-1.3 amd64
  • ii zlib1g:amd64 1:1.2.8.dfsg amd64

Further library versions can be provided on request since I am not sure which libraries are involved.

I am not very good with gdb, but if someone tells me what to do, I can try to provide more information.

@marksuter

I get similar results with on my Debian sid host (package version 1.2.5-1.1). Here is the entire subprocess' output from "strace -f mosh-server":

[pid 3026] close(6 <unfinished ...>
[pid 3027] set_robust_list(0x7faf831e4a20, 24 <unfinished ...>
[pid 3026] <... close resumed> ) = 0
[pid 3027] <... set_robust_list resumed> ) = 0
[pid 3026] rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7faf8287ad30}, {SIG_DFL, [], 0}, 8) = 0
[pid 3026] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
[pid 3027] close(5) = 0
[pid 3027] setsid() = 3027
[pid 3027] ioctl(6, TIOCSCTTY, 0) = 0
[pid 3027] dup2(6, 0) = 0
[pid 3027] dup2(6, 1) = 1
[pid 3027] dup2(6, 2) = 2
[pid 3027] close(6) = 0
[pid 3026] +++ killed by SIGSEGV +++

@andersk
Member
andersk commented Mar 11, 2016

Can you try to get a backtrace with gdb? See the instructions on the Debian wiki.

@nostamp
nostamp commented Mar 11, 2016

Not sure where to go from here if anyone has an idea.

gdb mosh-server

GNU gdb (Debian 7.10-1+b1) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mosh-server...done.
(gdb) set pagination 0
(gdb) run
Starting program: /usr/bin/mosh-server
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

MOSH CONNECT 60001 DahOJJfGNv9ZR2Rz5pQrMw

mosh-server (mosh 1.2.5) [build mosh 1.2.5]
Copyright 2012 Keith Winstein mosh-devel@mit.edu
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[mosh-server detached, pid = 19701]
Inferior 1 (process 19697) exited normally bt
No stack.
(gdb)

@andersk
Member
andersk commented Mar 11, 2016

The crash seems to be happening in the background process forked by mosh-server, so do
set follow-fork-mode child
before run.

@andersk
Member
andersk commented Mar 11, 2016

Wait, follow-fork-mode won’t work because the background process forks other irrelevant background processes. Do this instead:

$ mosh-server
…
[mosh-server detached, pid = 408]
$ gdb -p 408  # copy the pid from above
…
(gdb) continue

(Unless the mosh-server process crashes immediately and doesn’t stay around long enough for this to work?)

@nostamp
nostamp commented Mar 12, 2016

Unfortunately mosh-server is not staying around long enough it would seem.

[mosh-server detached, pid = 23302]
$ gdb -p 23302
GNU gdb (Debian 7.10-1+b1) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 23302
ptrace: No such process.
(gdb) continue
The program is not being run.
(gdb)

@marksuter

I used the instructions on the Debian wiki to recompile mosh with debuggign symbols. It appears that the process is crashing too quickly for a separate gdb:

$ gdb -p $(mosh-server |& sed -ne 's/.pid = ([0-9]).*/\1/p')
GNU gdb (Debian 7.10-1+b1) 7.10
[ snip ]
Attaching to process 19538
ptrace: No such process.
(gdb) quit

I will keep trying to get a proper backtrace.

@marksuter

Based on debugging forks, I was able to get a backtrace. Below is the outline, for reproduction. Look at mosh-segv.txt for the full output.

$ gdb \
  -ex "set follow-fork-mode child" \
  -ex "set detach-on-fork off" \
  -ex "set pagination 0" \
  -ex "run" \
  /usr/bin/mosh-server
[ snip ]
(gdb) info inferiors
[ snip ]
(gdb) inferior 2
(gdb) continue
Program received signal SIGSEGV, Segmentation fault.
[ snip ]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff697db1b in ?? () from /usr/lib/x86_64-linux-gnu/libutempter.so.0
#2  0x00007ffff697dc82 in utempter_add_record () from /usr/lib/x86_64-linux-gnu/libutempter.so.0
#3  0x000055555555dfb8 in run_server (with_motd=true, verbose=false, colors=0, command_argv=0x7fffffffd940, command_path="/bin/bash", desired_port=<optimized out>, desired_ip=<optimized out>) at mosh-server.cc:498
#4  main (argc=<optimized out>, argv=<optimized out>) at mosh-server.cc:322
@keithw
Member
keithw commented Mar 12, 2016

Boy, this makes it seem like utempter might be broken on Debian sid. We should try to nail down a non-mosh-specific test case and see if we can reproduce it.

@marksuter

I rebuilt libutempter with debug symbols and get one more line in the backtrace:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff697dbf1 in execute_helper (master_fd=5, argv=0x7fffffffd730) at iface.c:110
#2  0x00007ffff697dd1c in utempter_add_record (master_fd=5, hostname=0x7fffffffe0d0 "mosh [30089]") at iface.c:146
#3  0x000055555555dfb8 in run_server (with_motd=true, verbose=false, colors=0, command_argv=0x7fffffffd890, command_path="/bin/bash", desired_port=<optimized out>, desired_ip=<optimized out>) at mosh-server.cc:498
#4  main (argc=<optimized out>, argv=<optimized out>) at mosh-server.cc:322
@marksuter

If I edit libutempter and place a "return" at line 93 of iface.c (to return early from execute_helper), then mosh works for me. The source I edited matched the upstream git repo (this code hasn't changed in many years).

@cgull
Member
cgull commented Mar 16, 2016

I've managed to reproduce this in an EC2 instance running sid. The point of failure is an uninitialized trampoline for fork() in libutempter0-- the GOT entry is 0, and the crash is from our process trying to jump there. Tracing it with GDB's hbreak doesn't show anything touching the entry in the GOT entry for fork(). Compiling mosh with -static results in a working executable for mosh-server. Alas, I've also managed to destroy that instance trying to apt-get dist-upgrade it.

So this doesn't appear to be a problem with mosh or utempter, but I now can't debug this further either, at least not tonight.

I'm also trying to reproduce it in an instance running Arch, but setting that up is going...exceedingly slowly for some reason.

How to trace this (this is from memory and may not be quite right):

gdb src/frontend mosh-server
b fork
set follow-fork-mode child
r new
c
b utempter_add_entry
set follow-fork-mode parent
c
record full
stepi
<repeat until crash>
reverse-stepi
@ghostbar

This should help on what's going on:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=817929#35

I'm experiencing this too. Tried with older versions of mosh and got the same result, so it's definitely not because something changed on mosh but in some of it's dependencies.

@cgull
Member
cgull commented Mar 17, 2016

Thanks. Replicating the STC in that PR gets the same behavior I'm seeing in gdb, so it's a very useful clue. Working on it.

@kreijack

Please, see my analysis on the glibc mailing list [1]. On my debian, to prevent mosh server to crash, you my ensure that it must not linked with "-lpthread". Instead "-pthread" is OK.
I don't know how obtain this, because it seems that the "-lpthread" is put into the play by the protobuffer library (I assume via the pkg-config/autoconf / automake tools).
On my post on the glibc mailing list I posted a test to highlight the problem, and this test is independent by mosh and /or libutempter.
BR
G.Baroncelli

[1] https://sourceware.org/ml/libc-help/2016-03/msg00004.html

@cgull cgull added a commit to cgull/mosh that referenced this issue Mar 19, 2016
@cgull @cgull cgull + cgull configure.ac: Do s/-pthreads -lpthreads/-pthreads/ for protobuf
protobuf uses an obsolete automake pthreads detection macro,
which results in "pkgconfig --libs protobuf" returning
"-lprotobuf -pthread -lpthread" on Linux.  Remove
the unnecessary and dangerous -lpthread in that case.

Fixes #727, mosh-server crash in libutempter on Debian Sid.
a47917b
@cgull
Member
cgull commented Mar 19, 2016

There's plenty of little bugs evenly spattered here, and many ways to fix this:

  • Root cause, I think, is that pkg-config --libs protobuf produces -lprotobuf -pthread -lpthread; -lpthread is unnecessary and deprecated with gcc.
  • src/frontend/makefile.am only applies protobuf_LIBS to the link command for mosh-server; I believe it maybe should also be applying protobuf_CFLAGS, but evidence for this is a bit thin.
  • gcc or binutils or rtld or...something has a recent change in behavior that causes it to be more sensitive to finding -lpthread injected in an odd place in the library search list. If you hold your mouth just so and look in a certain way, you can almost see this as some kind of issue causing binaries linked with full RELRO to search for symbols differently. But I think ultimately we're feeding the wrong command to gcc. I haven't investigated why this breaks; I think making our build more correct is the better solution.

Fixing any one of these little issues would solve Mosh's problem. I think the problem is only maybe 10% our fault, but we're left holding a bunch of pieces. The fix in my PR is perhaps slightly less correct than handling CFLAGS better for link commands, but it's also focused on the specific problem and less likely to cause problems on random other platforms. Bikeshed color opinions (and more substantive comments) will be gratefully listened to.

I'll also open an issue on protobuf, which I think is primarily to blame here.

@cgull cgull closed this in #733 Mar 24, 2016
@cgull
Member
cgull commented Mar 24, 2016

This is fixed for mosh master, but not yet for Debian. Working on that.

@cgull cgull reopened this Mar 24, 2016
@cgull
Member
cgull commented Mar 31, 2016

This is believed fixed (worked-around) for Debian now; mosh-1.2.5-2 is available for unstable and experimental. Thanks, @keithw!

@cgull cgull closed this Mar 31, 2016
@fadenb fadenb added a commit to mayflower/nixpkgs that referenced this issue Apr 4, 2016
@fadenb fadenb mosh: fix segfaulting issue 9205ca4
@fadenb fadenb added a commit to mayflower/nixpkgs that referenced this issue Apr 4, 2016
@fadenb fadenb mosh: fix segfaulting issue 399d2bc
@mdorman mdorman added a commit to mdorman/nixpkgs that referenced this issue Apr 5, 2016
@fadenb @mdorman fadenb + mdorman mosh: fix segfaulting issue a8b3b01
@peterhoeg peterhoeg added a commit to peterhoeg/nixpkgs that referenced this issue Apr 6, 2016
@fadenb @peterhoeg fadenb + peterhoeg mosh: fix segfaulting issue 52d0ce3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment