Give better error message when GHC/Setup script segfaults #767

Closed
bos opened this Issue May 24, 2012 · 18 comments

Comments

Projects
None yet
3 participants
Contributor

bos commented May 24, 2012

(Imported from Trac #777, reported by philips on 2010-12-12)

I am getting ExitFailure? 11 when trying to install hledger on Debian and openSUSE. I reported this to the hledger author, Simon, and he said he is experiencing this on his machines too and for other packages, not just hledger.

Log file attached. Summary below:

ledger:~/ $ cabal install -v3 hledger
Lots of debug output (see attached)...
hledger-0.13 failed during the configure step. The exception was:
ExitFailure 11
ledger:~/ $ cabal --version
cabal-install version 0.8.2
using version 1.8.0.2 of the Cabal library
ledger:~/ $ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.1
Contributor

bos commented May 24, 2012

(Imported comment by philips on 2010-12-12)

Log output from cabal install hledger session

Contributor

bos commented May 24, 2012

(Imported comment by simonmic on 2010-12-12)

I first saw this a few weeks back, since then it has increased and I see it on several machines (mac and linux) and with various packages (hledger and pandoc among them). Once a package starts giving this error, it also means that "cabal configure" is terminating with non-zero exit status and no apparent error message in -v3 output. Eg:

simon@joyful:/repos$ cabal unpack pandoc
Unpacking to pandoc-1.6/
simon@joyful:/repos$ cd pandoc-1.6/
simon@joyful:/repos/pandoc-1.6$ cabal configure && echo OK
Resolving dependencies...
[1 of 1] Compiling Main ( Setup.hs, dist/setup/Main.o )
Linking ./dist/setup/setup ...
Configuring pandoc-1.6...
simon@joyful:/repos/pandoc-1.6$

cabal configure -v3 and ghc-pkg list output are attached. I'm currently using ghc 6.12.3, have seen it also with 6.12.1.

Contributor

bos commented May 24, 2012

(Imported comment by simonmic on 2010-12-12)

Simon's configure -v3 and ghc-pkg list output

Contributor

bos commented May 24, 2012

(Imported comment by @dcoutts on 2010-12-12)

Note that exit code 11 often means it terminated with signal 11, which is a segmentation fault.

Contributor

bos commented May 24, 2012

(Imported comment by @batterseapower on 2010-12-12)

I'm getting this on OS X as well.

Contributor

bos commented May 24, 2012

(Imported comment by @batterseapower on 2010-12-20)

OK, more information.

If I run the HEAD hledger "./Setup configure" (with a compiled Setup.lhs, GHC 6.12.3):

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xfffffffc
0x00274f52 in scavenge_mutable_list ()
(gdb) bt
#0  0x00274f52 in scavenge_mutable_list ()
#1  0x00275223 in scavenge_capability_mut_lists ()
I get more information if I link with -debug and then run with -Ds:
$ gdb ./Setup
GNU gdb 6.3.50-20050815 (Apple version gdb-1472) (Wed Jul 21 10:53:12 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ... done
(gdb) run configure +RTS -Ds
Starting program: /Users/mbolingbroke/Programming/Checkouts/hledger/hledger/Setup configure +RTS -Ds
Reading symbols for shared libraries ++. done
new task (taskCount: 1)
task exiting
new task (taskCount: 1)
cap 0: created thread 1
cap 0: thread 1 appended to run queue
new bound thread (1)
cap 0: schedule()
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (stack overflow)
increasing stack size from 240 words to 1008.
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (heap overflow)
all threads:
threads on capability 0:
    thread    1 @ 0x117d000 is not blocked (TSO_DIRTY)
other threads:
cap 0: finished GC
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (heap overflow)
all threads:
threads on capability 0:
    thread    1 @ 0x117d000 is not blocked (TSO_DIRTY)
other threads:
cap 0: finished GC
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (suspended while making a foreign call)
cap 0: running thread 1 (ThreadRunGHC)
Configuring hledger-0.13...
cap 0: created thread 2
cap 0: thread 2 appended to run queue
cap 0: created thread 3
cap 0: thread 3 appended to run queue
cap 0: thread 1 stopped (blocked)
    thread    1 @ 0x117d000 is blocked on an MVar @ 0x11eb7a8 (TSO_DIRTY)
cap 0: running thread 2 (ThreadRunGHC)
cap 0: thread 2 stopped (blocked)
    thread    2 @ 0x11ec000 is blocked on read from fd 5 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O
cap 0: running thread 3 (ThreadRunGHC)
cap 0: thread 3 stopped (blocked)
    thread    3 @ 0x11ec400 is blocked on read from fd 7 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O (waiting)
Waking up blocked thread 2
cap 0: running thread 2 (ThreadRunGHC)
cap 0: thread 2 stopped (yielding)
cap 0: thread 2 appended to run queue
scheduler: checking for threads blocked on I/O
cap 0: running thread 2 (ThreadRunGHC)
cap 0: thread 2 stopped (blocked)
    thread    2 @ 0x11ec000 is blocked on read from fd 5 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O (waiting)
Waking up blocked thread 3
cap 0: running thread 3 (ThreadRunGHC)
cap 0: thread 1 appended to run queue
cap 0: waking up thread 1 on cap 0
cap 0: thread 3 stopped (finished)
scheduler: checking for threads blocked on I/O
Waking up blocked thread 2
cap 0: running thread 2 (ThreadRunGHC)
cap 0: thread 2 stopped (finished)
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (suspended while making a foreign call)
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (yielding)
cap 0: thread 1 appended to run queue
cap 0: running thread 1 (ThreadRunGHC)
cap 0: created thread 4
cap 0: thread 4 appended to run queue
cap 0: created thread 5
cap 0: thread 5 appended to run queue
cap 0: thread 1 stopped (blocked)
    thread    1 @ 0x117d000 is blocked on an MVar @ 0x11f1fa0 (TSO_DIRTY)
cap 0: running thread 4 (ThreadRunGHC)
cap 0: thread 4 stopped (yielding)
cap 0: thread 4 appended to run queue
cap 0: running thread 5 (ThreadRunGHC)
cap 0: thread 5 stopped (blocked)
    thread    5 @ 0x11ecc00 is blocked on read from fd 7 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O
cap 0: running thread 4 (ThreadRunGHC)
cap 0: thread 4 stopped (blocked)
    thread    4 @ 0x11ec800 is blocked on read from fd 5 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O (waiting)
Waking up blocked thread 4
cap 0: running thread 4 (ThreadRunGHC)
cap 0: thread 4 stopped (blocked)
    thread    4 @ 0x11ec800 is blocked on read from fd 5 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O (waiting)
Waking up blocked thread 5
Waking up blocked thread 4
cap 0: running thread 4 (ThreadRunGHC)
cap 0: thread 1 appended to run queue
cap 0: waking up thread 1 on cap 0
cap 0: thread 4 stopped (finished)
cap 0: running thread 5 (ThreadRunGHC)
cap 0: thread 5 stopped (finished)
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (suspended while making a foreign call)
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (yielding)
cap 0: thread 1 appended to run queue
cap 0: running thread 1 (ThreadRunGHC)
cap 0: created thread 6
cap 0: thread 6 appended to run queue
cap 0: created thread 7
cap 0: thread 7 appended to run queue
cap 0: thread 1 stopped (blocked)
    thread    1 @ 0x117d000 is blocked on an MVar @ 0x11f77a8 (TSO_DIRTY)
cap 0: running thread 6 (ThreadRunGHC)
cap 0: thread 6 stopped (blocked)
    thread    6 @ 0x11f8000 is blocked on read from fd 5 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O
cap 0: running thread 7 (ThreadRunGHC)
cap 0: thread 7 stopped (blocked)
    thread    7 @ 0x11f8400 is blocked on read from fd 7 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O (waiting)
Waking up blocked thread 6
cap 0: running thread 6 (ThreadRunGHC)
cap 0: thread 6 stopped (yielding)
cap 0: thread 6 appended to run queue
scheduler: checking for threads blocked on I/O
cap 0: running thread 6 (ThreadRunGHC)
cap 0: thread 6 stopped (heap overflow)
all threads:
threads on capability 0:
    thread    6 @ 0x11f8000 is not blocked (TSO_DIRTY)
other threads:
    thread    7 @ 0x11f8400 is blocked on read from fd 7 (TSO_DIRTY)
    thread    1 @ 0x117d000 is blocked on an MVar @ 0x11f77a8 (TSO_DIRTY)
Setup: internal error: ASSERTION FAILED: file rts/sm/Evac.c, line 373
    (GHC version 6.12.3 for i386_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Program received signal SIGABRT, Aborted.
0x90ce1176 in __kill ()
With an assert failure like that, it could be anything...
Contributor

bos commented May 24, 2012

(Imported comment by @dcoutts on 2010-12-20)

We need a Process API that gives us access to the info about whether a process terminated normally or via a signal. If we knew that we could present a better error message about child processes failing.

As for why it failed exactly, that is indeed anyone's guess.

Contributor

bos commented May 24, 2012

(Imported comment by sepposade on 2011-03-15)

Also seeing this with hledger and texmath (dependency of pandoc) on Mac OS 10.6.7:

$ cabal install texmath
Resolving dependencies...
[1 of 1] Compiling Main             (elided...)
Linking /var/folders/-f/-fdnjLBuGf0bDt4aVkHCvE+++TI/-Tmp-/texmath-0.5.0.14544/texmath-0.5.0.1/dist/setup/setup ...
ld: warning: could not create compact unwind for .LFB3: non-standard register 5 being saved in prolog
Configuring texmath-0.5.0.1...
cabal: Error: some packages failed to install:
texmath-0.5.0.1 failed during the configure step. The exception was:
ExitFailure 11
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.3
Contributor

bos commented May 24, 2012

(Imported comment by @kosmikus on 2011-03-23)

Does this one still cause any problems with current versions of GHC and Cabal?

Contributor

bos commented May 24, 2012

(Imported comment by mightybyte on 2012-03-03)

I just started seeing this ExitFailure? 11 issue when building a proprietary project with a large number of dependencies on MacOS. I'll post more details if I can distill smaller self-contained ways of reproducing the problem.

Collaborator

Blaisorblade commented Aug 17, 2016

FWIW, ExitFailure 11 still causes spurious failed builds on Travis, often solved by restarting.

ezyang changed the title from cabal install {hledger, others} ExitFailure 11 to Give better error message when GHC segfaults Aug 17, 2016

ezyang changed the title from Give better error message when GHC segfaults to Give better error message when GHC/Setup script segfaults Aug 17, 2016

Contributor

ezyang commented Aug 17, 2016

Hey @Blaisorblade, the most reasonable resolution I can think of for this ticket would be to improve the message when the Setup script or GHC segfaults. O/w, there's not much Cabal can do? Would that be sufficient to close this?

Contributor

ezyang commented Aug 17, 2016

I made a simple test case with a segfaulting Setup script:

ezyang@sabre:~/Dev/cabal-tmp/cabal-install/tests/IntegrationTests/custom/segfault$ cabal configure
Resolving dependencies...
[1 of 1] Compiling Main             ( dist/setup/setup.hs, dist/setup/Main.o )
Linking ./dist/setup/setup ...
Segmentation fault (core dumped)
ezyang@sabre:~/Dev/cabal-tmp/cabal-install/tests/IntegrationTests/custom/segfault$ cabal install
Resolving dependencies...
cabal: Entering directory '.'
cabal: Leaving directory '.'
Failed to install plain-0.1.0.0
cabal: Error: some packages failed to install:
plain-0.1.0.0-KouPJgq5LZvF2QoUa1N272 failed during the configure step. The
exception was:
ExitFailure (-11)
Collaborator

Blaisorblade commented Aug 18, 2016

That thing would be good.

The other question is "why compiling correct code sometimes segfaults/produces segfaulting binaries"*. Not

*Beyond Travis spurious failures, I can witness that incremental compilation produces crashing binaries and that's sometimes fixed by clean recompilation. Can't make a useful bug report though.

Contributor

ezyang commented Aug 18, 2016

Related: https://ghc.haskell.org/trac/ghc/ticket/7229

Also #971. Should be possible to get out the signal info from the process exit info.

Contributor

ezyang commented Aug 22, 2016

@Blaisorblade You don't have any Travis logs of things failing in this way, do you? Could it be an instance of https://ghc.haskell.org/trac/ghc/ticket/10161 (seems unlikely; this only happens if it's ABI compatible)? Do you ever get a core dump in these cases?

@ezyang ezyang added a commit to ezyang/cabal that referenced this issue Aug 22, 2016

@ezyang ezyang Give an explicit message when SIGSEGV happens.
Fixes #767.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
44b57e5

@ezyang ezyang added a commit to ezyang/cabal that referenced this issue Aug 22, 2016

@ezyang ezyang Give an explicit message when SIGSEGV happens.
Fixes #767.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
64b6ec0
Collaborator

Blaisorblade commented Aug 23, 2016

@ezyang I'll try sending those logs your way when I get them again, instead of triggering a rebuild and destroying them (I knew that was bad).

Could it be an instance of https://ghc.haskell.org/trac/ghc/ticket/10161 (seems unlikely; this only happens if it's ABI compatible)?

Sounds unlikely... my assumption is about this happening with ABI incompatibilities. Though maybe you can exploit that bug to get a SIGSEGV with a (correct) unsafeCoerce?

The only recompilation problem I could debug is https://ghc.haskell.org/trac/ghc/ticket/12180. (For extra fun, I'm both Blaisorblade and pggiarrusso on Trac).

Do you ever get a core dump in these cases?

I never had those enabled. I'll try to do that if this happens again.

Collaborator

Blaisorblade commented Aug 23, 2016 edited

Actually, https://ghc.haskell.org/trac/ghc/ticket/10296 affects 7.8 and 7.10. Something like that could explain better the segfaults on Travis (since the builds are mostly clean modulo cache).
The recompilation issues are probably a different problem.

EDIT: here are at least symptoms, though not a log, of intermittent Travis failure while building Agda: agda/agda#2013.

@ezyang ezyang added a commit to ezyang/cabal that referenced this issue Aug 24, 2016

@ezyang ezyang Give an explicit message when SIGSEGV happens.
Fixes #767.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
b600a2b

@ezyang ezyang added a commit to ezyang/cabal that referenced this issue Aug 24, 2016

@ezyang ezyang Give an explicit message when SIGSEGV happens.
Fixes #767.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
4e3fc54

@ezyang ezyang added a commit to ezyang/cabal that referenced this issue Aug 24, 2016

@ezyang ezyang Give an explicit message when SIGSEGV happens.
Fixes #767.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
b45df92

ezyang closed this in #3712 Aug 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment