`cabal test` freezes but `cabal test --show-details=streaming` works #1810

jstolarek · 2014-04-23T08:24:57Z

I'm running the testsuite for singletons package. When I run cabal test the testsuite freezes, but if I add --show-details=streaming it works perfectly and finishes after ~30seconds. This happens with cabal 1.20 and GHC 7.8.2.

The text was updated successfully, but these errors were encountered:

tibbe · 2014-04-23T13:22:48Z

@ttuegel do you mind taking a look?

ttuegel · 2014-04-23T13:35:52Z

@jstolarek Which operating system are you testing on?

jstolarek · 2014-04-23T13:40:15Z

This is 64-bit Linux

tibbe · 2014-04-24T09:34:20Z

@jstolarek Could you strace it to get an idea of where it's frozen?

jstolarek · 2014-04-24T09:41:46Z

Sure. Log is about 1.5MB - where should I upload it?

tibbe · 2014-04-24T09:44:03Z

Could you just paste the last 10 lines?

On Thu, Apr 24, 2014 at 11:41 AM, Jan Stolarek notifications@github.comwrote:

Sure. Log is about 1.5MB - where should I upload it?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1810#issuecomment-41261417
.

jstolarek · 2014-04-24T09:47:18Z

Sure. Last 2K lines or so look identical:

wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn(0x2)                       = 61
wait4(26241,  <unfinished ...>

This is interrupted several times with

write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn(0x2)                       = 61

And before all that start there's:

rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fc530a1c2d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [QUIT], [], 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fc530a1c2d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
pipe([14, 15])                          = 0
rt_sigprocmask(SIG_BLOCK, [], [], 8)    = 0
timer_settime(0x1, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
vfork()                                 = 26241
close(15)                               = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
read(14, "", 4)                         = 0
close(14)                               = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
timer_settime(0x1, 0, {it_interval={0, 10000000}, it_value={0, 10000000}}, NULL) = 0
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61

Hope that helps.

nikita-volkov · 2014-04-25T21:59:36Z

I'm experiencing the same issue here with Cabal 1.20. Here is a failed Travis CI build. Here is a source tree, which reproduces the issue.

tibbe · 2014-04-27T20:34:50Z

@ttuegel have you been able to reproduce the issue or do you need help with that?

ttuegel · 2014-04-27T22:22:40Z

@tibbe I have reproduced @jstolarek's problem, but I haven't checked if it's the same as @nikita-volkov's. The freeze happens because rawSystemIOWithEnv (which we use to invoke the test executable) never finishes. Based on CPU usage, I think the tests are running, but deadlocking at the end. For example, the singletons tests run for 13s on my machine after last output; then CPU usage drops off. I cannot reproduce the problem with cabal-install's test suite, either. I would say it's a bug in how some test frameworks are doing output, but the test framework can't even tell what the --show-details= setting is. Thoughts?

tibbe · 2014-04-28T16:14:06Z

So if this didn't happen in 1.18, where we didn't have streaming, then we must either be invoking the process in the same way, not waiting for it to finish in the same way, or not passing it handles that in the same state. Some printf style debugging might tell us.

coreyoconnor · 2014-05-02T19:11:35Z

This issue is present running the vty test suite under cabal 1.20. Using cabal 1.18 the issue is not present. Using "--show-details=streaming" does not appear to make a difference.

I attached gdb to the "stuck" process and acquired this backtrace:

(gdb) thread apply all bt

Thread 1 (Thread 0x7fe0b5f8f700 (LWP 9862)):
#0  0x00007fe0b488c263 in __select_nocancel () from /nix/store/pdskwizjw8ar31hql2wjnnx6g0s6xc50-glibc-2.19/lib/libc.so.6
#1  0x0000000000a502ed in awaitEvent ()
#2  0x0000000000a433b5 in scheduleWaitThread ()
#3  0x0000000000a488e7 in hs_main ()
#4  0x000000000040739f in main ()

Which is interesting: There is only a single thread and the thread appears to be just waiting on an event that never occurs.

Not all vty tests freeze. Out of ~14 test suites 4 freeze. These are a subset of those that use quickcheck. If the use of quickcheck is removed then the tests do not freeze.

I have started putting together a minimal repro case from the vty test suite. For the impatient:

branch: https://github.com/coreyoconnor/vty/tree/cabal-issue-1810
test code: https://github.com/coreyoconnor/vty/blob/cabal-issue-1810/test/VerifyOutput.hs
- "verify" just wraps a quickcheck test see https://github.com/coreyoconnor/vty/blob/cabal-issue-1810/test/Verify.hs for implementation.

I'll be pulling this code into a separate project when I get a chance.

tibbe · 2014-05-02T20:59:10Z

I wonder if this is I/O manager related.

coreyoconnor · 2014-05-02T22:02:54Z

Here is a (almost) minimal project which reproduces the issue: https://github.com/coreyoconnor/cabal-issue-1810

$ cabal test with-quickcheck
Preprocessing library cabalIssue1810-0.1.0.0...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering cabalIssue1810-0.1.0.0...
Preprocessing test suite 'with-quickcheck' for cabalIssue1810-0.1.0.0...
[1 of 2] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering with-quickcheck-0.1.0.0...
[1 of 1] Compiling Main             ( dist/build/with-quickcheckStub/with-quickcheckStub-tmp/with-quickcheckStub.hs, dist/build/with-quickcheckStub/with-quickcheckStub-tmp/Main.o )
Linking dist/build/with-quickcheckStub/with-quickcheckStub ...
Running 1 test suites...
Test suite with-quickcheck: RUNNING...
^C^C

$ cabal test without-quickcheck
Preprocessing library cabalIssue1810-0.1.0.0...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering cabalIssue1810-0.1.0.0...
Preprocessing test suite 'without-quickcheck' for cabalIssue1810-0.1.0.0...
[1 of 2] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering without-quickcheck-0.1.0.0...
[1 of 1] Compiling Main             ( dist/build/without-quickcheckStub/without-quickcheckStub-tmp/without-quickcheckStub.hs, dist/build/without-quickcheckStub/without-quickcheckStub-tmp/Main.o )
Linking dist/build/without-quickcheckStub/without-quickcheckStub ...
Running 1 test suites...
Test suite without-quickcheck: RUNNING...
Test suite without-quickcheck: PASS
Test suite logged to: dist/test/cabalIssue1810-0.1.0.0-without-quickcheck.log
1 of 1 test suites (20 of 20 test cases) passed.

I am using 64bit linux and ghc 7.8.2.

travitch · 2014-05-08T17:11:45Z

Just ran into this. I think the problem is with how cabal install is saving output (stdout and stderr) from tests. When streaming is disabled, 'tee' dumps the process output into a pipe, but that pipe isn't read until after the process finishes. This means the pipe fills up and then the process just goes to sleep, waiting for the pipe to drain.

If you enable streaming, one of the branches in tee can keep reading from the pipe because it is always mirroring output to stdout.

ttuegel · 2014-05-08T17:15:34Z

@travitch I suspect you're right. We can always stream the log handle--the one that is saved now--to a temporary file and read it back when needed (that's what the log code did before, anyway).

ttuegel · 2014-05-09T02:25:58Z

Actually, tee isn't closing handles when it's done, even when you tell it to. That probably has something to do with it.

ttuegel · 2014-05-09T22:50:09Z

I take back my last comment. Of course tee doesn't close handles if the child process never finishes writing to them. I've tried streaming the log directly to file to test the "pipe doesn't drain" hypothesis, and it doesn't make any difference.

ttuegel · 2014-05-10T11:31:13Z

@coreyoconnor I just noticed that your example uses detailed-0.9 type test suites. That's very important because the test suite types are handled differently. It's also a surprise, because I didn't think QuickCheck supported being invoked in that way.

@jstolarek Are you also using detailed-0.9 test suites, rather than exitcode-stdio-1.0?

jstolarek · 2014-05-10T14:00:27Z

@ttuegel No, I'm only using exitcode-stdio-1.0.

What's interesting is that this bug keeps disappearing and reappearing. For example with the latest commit of singletons everything is fine. Note also that this is 100% reproducible, meaning that if it fails then it fails always and if it works then it also works always. There doesn't seem to be any random runtime factor involved.

coreyoconnor · 2014-05-10T15:47:25Z

@ttuegel Interesting! I was unaware that QuickCheck was not supported under detailed-0.9. vty has been using detailed + QuickCheck for quite some time without issue. I won't mind switching to exitcode-stdio via test-framework if detailed+QuickCheck is really never supposed to work.

ttuegel · 2014-05-10T18:58:24Z

@coreyoconnor To support detailed-0.9 tests, each test library (QuickCheck, HUnit, etc.) is supposed to provide values of type Distribution.TestSuite.Test. I see that you've written your own adapter. That's fine; it should work, but our goal is that you wouldn't need to. Short version: what you are experiencing is definitely a bug, but it's a different one. Would you mind filing a separate bug with your example?

ttuegel · 2014-05-10T18:59:25Z

@jstolarek Does it ever fail if all the tests pass? I have only been able to reproduce with test suites with at least one failing test.

jstolarek · 2014-05-11T04:49:43Z

@ttuegel I don't know, I haven't paid attention to this. At the moment I can only verify that it can work correctly with some tests failing.

Issue haskell#1810. Some test suites would freeze if invoked with `--show-details=always` instead of `--show-details=streaming` because output would build up in the pipe without being cleared. This corrects the issue by forcing the length of the output string in another thread.

ttuegel · 2014-05-29T15:39:03Z

@snoyberg Could you check cabal --version? The fix for this bug made it into 1.20.0.2, but you need cabal-install-1.20.0.2 to also be built with Cabal-1.20.0.2.

snoyberg · 2014-05-29T15:46:28Z

+ cabal --version
cabal-install version 1.20.0.2
using version 1.20.0.0 of the Cabal library

If that's the case, wouldn't it be better to have a lower bound on the Cabal dependency in cabal-install?

snoyberg · 2014-05-29T15:47:40Z

Actually, doing a sanity check right now... Cabal-1.20.0.2 isn't even available on Hackage.

ttuegel · 2014-05-29T15:50:40Z

@tibbe Can there be a Cabal-1.20.0.2 release to correspond to the recent cabal-install release? The fix for this bug was included in the 1.20 branch, but all the test runner code is in Cabal.

snoyberg · 2014-05-29T15:52:03Z

I would also recommend including a cabal-install-1.20.0.3 release that includes a lower bound on Cabal to ensure the bug fix gets included.

snoyberg · 2014-05-29T16:00:06Z

In case it's useful for you to know, I'm building a new cabal-install binary on the Stackage build server based on the code at the HEAD of the 1.20 branch. I should know in the next 24 hours or so if that solves the problem for all of Stackage.

tibbe · 2014-05-29T18:00:06Z

I've put out a Cabal-1.20.0.1 with the missing patch. Sorry for the mess-up. I'll send out request for binaries so we can upload cabal-install binaries built against Cabal-1.20.0.1 and cabal-install-1.20.0.2.

snoyberg · 2014-05-30T01:58:31Z

Thanks @tibbe. Does this in any way affect the backported patches for cabal-install 1.18? If so, that should probably be brought up for the HP release.

tibbe · 2014-05-30T07:42:13Z

1.18 is unaffected as this is a bug fix for a problem introduced in 1.20 when streaming test output was added.

Please run using "cabal test --show-details=streaming", there's a known issue about this that was fixed in the latest version of cabal: haskell/cabal#1810

snoyberg · 2014-06-01T03:29:24Z

I have a feeling something is still broken here. I kicked off a Stackage build on Friday, and on Sunday morning, the test suite is stalled running the following four tests:

FenwickTree-0.1.1
HCodecs-0.5
HTF-0.11.3.4
HTTP-4000.2.17

I'll try to provide more information, but I can tell you that removing --show-details=streaming caused Stackage to stall once again.

This reverts commit 189bb1d.

ttuegel · 2014-06-03T14:56:18Z

I cannot reproduce the test suite stalling in those four packages. The test suites finish for me with or without --show-details=streaming. I have tested the latest Git version and cabal-install-1.20.0.2/Cabal-1.20.0.1 on x86_64 Linux. Can you provide any more information about your test setup?

snoyberg · 2014-06-05T05:28:41Z

I can only reproduce this behavior when run from within Stackage, so it may be some kind of locking going on instead, I can't be certain. In any event, --show-details=streaming does work for me, which is now turned on in the Stackage code base.

The system I'm running this on in an Amazon EC2 small, running Ubuntu 14.04 64-bit.

jstolarek · 2014-09-24T12:29:14Z

Sorry guys, but this ticket needs to be re-opened. Testsuite for singletons package stalls again. Again, adding --show-details=streaming fixes the problem (and so does running the compiler binary).

I'm using GHC 7.8.3, Cabal 1.20.0.0, cabal-install 1.20.0.3 and latest singletons from the repo.

tibbe · 2014-09-24T12:32:47Z

Is cabal unpack singletons && cd singletons-X.Y.Z.P && cabal test enough to reproduce? @ttuegel could you please take a look?

jstolarek · 2014-09-24T12:55:02Z

Steps to reproduce:

git clone https://github.com/goldfirere/singletons
cd singletons
git submodule init
git submodule update
cd th-desugar
cabal install
cd ..
cabal test

Note that we are currently suffering from a serious memory leak so compiling Data.Singletons.Prelude.List module will take Very Long Time.

ttuegel · 2014-09-24T15:27:41Z

You said you are using Cabal-1.20.0.0; the fix for this problem was in one of the point releases. I cannot reproduce with cabal-install-1.20.0.3 and Cabal-1.20.0.2.

jstolarek · 2014-09-24T18:22:56Z

@ttuegel you're right, upgrading to Cabal-1.20.0.2 solves the problem. I assumed that if I bootstrap cabal-install it installs the latest Cabal version (now I know it doesn't).

ttuegel · 2014-09-24T20:16:34Z

I think it should, but it looks like the bootstrap script didn't get updated with the point releases.

riaqn · 2016-03-20T01:15:43Z

I can confirm this problem.
cabal version:

cabal-install version 1.22.9.0
using version 1.22.8.0 of the Cabal library

ghc version: The Glorious Glasgow Haskell Compilation System, version 7.10.3

strace cabal test, with or without --show-details=streaming

wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn({mask=[]})                 = 61
wait4(20805,

I 'm using detailed-0.9.

mkelc · 2016-03-23T07:37:46Z

Had (Have) the same problem. Switching to stack using stack (>>stack test<<) works. Though this does not solve the issue it's still a workaround..

tibbe added this to the cabal-install-1.20.0.1 milestone Apr 23, 2014

tibbe assigned ttuegel Apr 23, 2014

jstolarek mentioned this issue Apr 24, 2014

Testsuite freezes + failures goldfirere/singletons#27

Closed

1 task

nikita-volkov mentioned this issue Apr 25, 2014

The forced buffering of output is a pain skogsbaer/HTF#36

Closed

ttuegel mentioned this issue May 20, 2014

detailed-0.9 test suites (always?) freeze #1881

Closed

ttuegel reopened this May 29, 2014

snoyberg added a commit to commercialhaskell/stackage that referenced this issue May 29, 2014

Get rid of temporary testing workarounds haskell/cabal#1810

189bb1d

tibbe closed this as completed May 30, 2014

rodrigosetti added a commit to rodrigosetti/shellcheck that referenced this issue May 31, 2014

Test Suite in Cabal (cabal test)

0a9ed91

Please run using "cabal test --show-details=streaming", there's a known issue about this that was fixed in the latest version of cabal: haskell/cabal#1810

rodrigosetti mentioned this issue May 31, 2014

Phase out Makefile in favor of Cabal koalaman/shellcheck#165

Merged

snoyberg added a commit to commercialhaskell/stackage that referenced this issue Jun 2, 2014

Revert "Get rid of temporary testing workarounds haskell/cabal#1810"

200d0db

This reverts commit 189bb1d.

tibbe reopened this Sep 24, 2014

23Skidoo closed this as completed Sep 24, 2014

snoyberg mentioned this issue Jun 2, 2015

Stop using Cabal's test (yay!) commercialhaskell/stack#167

Closed

fotanus mentioned this issue Oct 18, 2016

Suite passes when its are runned separated, but hang when run together? hspec/hspec#287

Closed

cabal test freezes but cabal test --show-details=streaming works #1810

cabal test freezes but cabal test --show-details=streaming works #1810

Comments

jstolarek commented Apr 23, 2014

tibbe commented Apr 23, 2014

ttuegel commented Apr 23, 2014

jstolarek commented Apr 23, 2014

tibbe commented Apr 24, 2014

jstolarek commented Apr 24, 2014

tibbe commented Apr 24, 2014

jstolarek commented Apr 24, 2014

nikita-volkov commented Apr 25, 2014

tibbe commented Apr 27, 2014

ttuegel commented Apr 27, 2014

tibbe commented Apr 28, 2014

coreyoconnor commented May 2, 2014

tibbe commented May 2, 2014

coreyoconnor commented May 2, 2014

travitch commented May 8, 2014

ttuegel commented May 8, 2014

ttuegel commented May 9, 2014

ttuegel commented May 9, 2014

ttuegel commented May 10, 2014

jstolarek commented May 10, 2014

coreyoconnor commented May 10, 2014

ttuegel commented May 10, 2014

ttuegel commented May 10, 2014

jstolarek commented May 11, 2014

ttuegel commented May 29, 2014

snoyberg commented May 29, 2014

snoyberg commented May 29, 2014

ttuegel commented May 29, 2014

snoyberg commented May 29, 2014

snoyberg commented May 29, 2014

tibbe commented May 29, 2014

snoyberg commented May 30, 2014

tibbe commented May 30, 2014

snoyberg commented Jun 1, 2014

ttuegel commented Jun 3, 2014

snoyberg commented Jun 5, 2014

jstolarek commented Sep 24, 2014

tibbe commented Sep 24, 2014

jstolarek commented Sep 24, 2014

ttuegel commented Sep 24, 2014

jstolarek commented Sep 24, 2014

ttuegel commented Sep 24, 2014

riaqn commented Mar 20, 2016

mkelc commented Mar 23, 2016

`cabal test` freezes but `cabal test --show-details=streaming` works #1810

`cabal test` freezes but `cabal test --show-details=streaming` works #1810