Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cabal test freezes but cabal test --show-details=streaming works #1810

Closed
jstolarek opened this issue Apr 23, 2014 · 46 comments
Closed

cabal test freezes but cabal test --show-details=streaming works #1810

jstolarek opened this issue Apr 23, 2014 · 46 comments
Assignees

Comments

@jstolarek
Copy link

I'm running the testsuite for singletons package. When I run cabal test the testsuite freezes, but if I add --show-details=streaming it works perfectly and finishes after ~30seconds. This happens with cabal 1.20 and GHC 7.8.2.

@tibbe tibbe added this to the cabal-install-1.20.0.1 milestone Apr 23, 2014
@tibbe
Copy link
Member

tibbe commented Apr 23, 2014

@ttuegel do you mind taking a look?

@ttuegel
Copy link
Member

ttuegel commented Apr 23, 2014

@jstolarek Which operating system are you testing on?

@jstolarek
Copy link
Author

This is 64-bit Linux

@tibbe
Copy link
Member

tibbe commented Apr 24, 2014

@jstolarek Could you strace it to get an idea of where it's frozen?

@jstolarek
Copy link
Author

Sure. Log is about 1.5MB - where should I upload it?

@tibbe
Copy link
Member

tibbe commented Apr 24, 2014

Could you just paste the last 10 lines?

On Thu, Apr 24, 2014 at 11:41 AM, Jan Stolarek notifications@github.comwrote:

Sure. Log is about 1.5MB - where should I upload it?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1810#issuecomment-41261417
.

@jstolarek
Copy link
Author

Sure. Last 2K lines or so look identical:

wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn(0x2)                       = 61
wait4(26241,  <unfinished ...>

This is interrupted several times with

write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn(0x2)                       = 61

And before all that start there's:

rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fc530a1c2d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [QUIT], [], 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fc530a1c2d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
pipe([14, 15])                          = 0
rt_sigprocmask(SIG_BLOCK, [], [], 8)    = 0
timer_settime(0x1, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
vfork()                                 = 26241
close(15)                               = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
read(14, "", 4)                         = 0
close(14)                               = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
timer_settime(0x1, 0, {it_interval={0, 10000000}, it_value={0, 10000000}}, NULL) = 0
wait4(26241, 0x7fffe3a2c4cc, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a)                      = 61

Hope that helps.

@nikita-volkov
Copy link

I'm experiencing the same issue here with Cabal 1.20. Here is a failed Travis CI build. Here is a source tree, which reproduces the issue.

@tibbe
Copy link
Member

tibbe commented Apr 27, 2014

@ttuegel have you been able to reproduce the issue or do you need help with that?

@ttuegel
Copy link
Member

ttuegel commented Apr 27, 2014

@tibbe I have reproduced @jstolarek's problem, but I haven't checked if it's the same as @nikita-volkov's. The freeze happens because rawSystemIOWithEnv (which we use to invoke the test executable) never finishes. Based on CPU usage, I think the tests are running, but deadlocking at the end. For example, the singletons tests run for 13s on my machine after last output; then CPU usage drops off. I cannot reproduce the problem with cabal-install's test suite, either. I would say it's a bug in how some test frameworks are doing output, but the test framework can't even tell what the --show-details= setting is. Thoughts?

@tibbe
Copy link
Member

tibbe commented Apr 28, 2014

So if this didn't happen in 1.18, where we didn't have streaming, then we must either be invoking the process in the same way, not waiting for it to finish in the same way, or not passing it handles that in the same state. Some printf style debugging might tell us.

@coreyoconnor
Copy link

This issue is present running the vty test suite under cabal 1.20. Using cabal 1.18 the issue is not present. Using "--show-details=streaming" does not appear to make a difference.

I attached gdb to the "stuck" process and acquired this backtrace:

(gdb) thread apply all bt

Thread 1 (Thread 0x7fe0b5f8f700 (LWP 9862)):
#0  0x00007fe0b488c263 in __select_nocancel () from /nix/store/pdskwizjw8ar31hql2wjnnx6g0s6xc50-glibc-2.19/lib/libc.so.6
#1  0x0000000000a502ed in awaitEvent ()
#2  0x0000000000a433b5 in scheduleWaitThread ()
#3  0x0000000000a488e7 in hs_main ()
#4  0x000000000040739f in main ()

Which is interesting: There is only a single thread and the thread appears to be just waiting on an event that never occurs.

Not all vty tests freeze. Out of ~14 test suites 4 freeze. These are a subset of those that use quickcheck. If the use of quickcheck is removed then the tests do not freeze.

I have started putting together a minimal repro case from the vty test suite. For the impatient:

I'll be pulling this code into a separate project when I get a chance.

@tibbe
Copy link
Member

tibbe commented May 2, 2014

I wonder if this is I/O manager related.

@coreyoconnor
Copy link

Here is a (almost) minimal project which reproduces the issue: https://github.com/coreyoconnor/cabal-issue-1810

$ cabal test with-quickcheck
Preprocessing library cabalIssue1810-0.1.0.0...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering cabalIssue1810-0.1.0.0...
Preprocessing test suite 'with-quickcheck' for cabalIssue1810-0.1.0.0...
[1 of 2] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering with-quickcheck-0.1.0.0...
[1 of 1] Compiling Main             ( dist/build/with-quickcheckStub/with-quickcheckStub-tmp/with-quickcheckStub.hs, dist/build/with-quickcheckStub/with-quickcheckStub-tmp/Main.o )
Linking dist/build/with-quickcheckStub/with-quickcheckStub ...
Running 1 test suites...
Test suite with-quickcheck: RUNNING...
^C^C

$ cabal test without-quickcheck
Preprocessing library cabalIssue1810-0.1.0.0...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering cabalIssue1810-0.1.0.0...
Preprocessing test suite 'without-quickcheck' for cabalIssue1810-0.1.0.0...
[1 of 2] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering without-quickcheck-0.1.0.0...
[1 of 1] Compiling Main             ( dist/build/without-quickcheckStub/without-quickcheckStub-tmp/without-quickcheckStub.hs, dist/build/without-quickcheckStub/without-quickcheckStub-tmp/Main.o )
Linking dist/build/without-quickcheckStub/without-quickcheckStub ...
Running 1 test suites...
Test suite without-quickcheck: RUNNING...
Test suite without-quickcheck: PASS
Test suite logged to: dist/test/cabalIssue1810-0.1.0.0-without-quickcheck.log
1 of 1 test suites (20 of 20 test cases) passed.

I am using 64bit linux and ghc 7.8.2.

@travitch
Copy link

travitch commented May 8, 2014

Just ran into this. I think the problem is with how cabal install is saving output (stdout and stderr) from tests. When streaming is disabled, 'tee' dumps the process output into a pipe, but that pipe isn't read until after the process finishes. This means the pipe fills up and then the process just goes to sleep, waiting for the pipe to drain.

If you enable streaming, one of the branches in tee can keep reading from the pipe because it is always mirroring output to stdout.

@ttuegel
Copy link
Member

ttuegel commented May 8, 2014

@travitch I suspect you're right. We can always stream the log handle--the one that is saved now--to a temporary file and read it back when needed (that's what the log code did before, anyway).

@ttuegel
Copy link
Member

ttuegel commented May 9, 2014

Actually, tee isn't closing handles when it's done, even when you tell it to. That probably has something to do with it.

@ttuegel
Copy link
Member

ttuegel commented May 9, 2014

I take back my last comment. Of course tee doesn't close handles if the child process never finishes writing to them. I've tried streaming the log directly to file to test the "pipe doesn't drain" hypothesis, and it doesn't make any difference.

@ttuegel
Copy link
Member

ttuegel commented May 10, 2014

@coreyoconnor I just noticed that your example uses detailed-0.9 type test suites. That's very important because the test suite types are handled differently. It's also a surprise, because I didn't think QuickCheck supported being invoked in that way.

@jstolarek Are you also using detailed-0.9 test suites, rather than exitcode-stdio-1.0?

@jstolarek
Copy link
Author

@ttuegel No, I'm only using exitcode-stdio-1.0.

What's interesting is that this bug keeps disappearing and reappearing. For example with the latest commit of singletons everything is fine. Note also that this is 100% reproducible, meaning that if it fails then it fails always and if it works then it also works always. There doesn't seem to be any random runtime factor involved.

@coreyoconnor
Copy link

@ttuegel Interesting! I was unaware that QuickCheck was not supported under detailed-0.9. vty has been using detailed + QuickCheck for quite some time without issue. I won't mind switching to exitcode-stdio via test-framework if detailed+QuickCheck is really never supposed to work.

@ttuegel
Copy link
Member

ttuegel commented May 10, 2014

@coreyoconnor To support detailed-0.9 tests, each test library (QuickCheck, HUnit, etc.) is supposed to provide values of type Distribution.TestSuite.Test. I see that you've written your own adapter. That's fine; it should work, but our goal is that you wouldn't need to. Short version: what you are experiencing is definitely a bug, but it's a different one. Would you mind filing a separate bug with your example?

@ttuegel
Copy link
Member

ttuegel commented May 10, 2014

@jstolarek Does it ever fail if all the tests pass? I have only been able to reproduce with test suites with at least one failing test.

@jstolarek
Copy link
Author

@ttuegel I don't know, I haven't paid attention to this. At the moment I can only verify that it can work correctly with some tests failing.

ttuegel added a commit to ttuegel/cabal that referenced this issue May 20, 2014
Issue haskell#1810. Some test suites would freeze if invoked with
`--show-details=always` instead of `--show-details=streaming` because
output would build up in the pipe without being cleared. This corrects
the issue by forcing the length of the output string in another thread.
@ttuegel ttuegel reopened this May 29, 2014
@ttuegel
Copy link
Member

ttuegel commented May 29, 2014

@snoyberg Could you check cabal --version? The fix for this bug made it into 1.20.0.2, but you need cabal-install-1.20.0.2 to also be built with Cabal-1.20.0.2.

@snoyberg
Copy link
Collaborator

+ cabal --version
cabal-install version 1.20.0.2
using version 1.20.0.0 of the Cabal library 

If that's the case, wouldn't it be better to have a lower bound on the Cabal dependency in cabal-install?

@snoyberg
Copy link
Collaborator

Actually, doing a sanity check right now... Cabal-1.20.0.2 isn't even available on Hackage.

@ttuegel
Copy link
Member

ttuegel commented May 29, 2014

@tibbe Can there be a Cabal-1.20.0.2 release to correspond to the recent cabal-install release? The fix for this bug was included in the 1.20 branch, but all the test runner code is in Cabal.

@snoyberg
Copy link
Collaborator

I would also recommend including a cabal-install-1.20.0.3 release that includes a lower bound on Cabal to ensure the bug fix gets included.

snoyberg added a commit to commercialhaskell/stackage that referenced this issue May 29, 2014
@snoyberg
Copy link
Collaborator

In case it's useful for you to know, I'm building a new cabal-install binary on the Stackage build server based on the code at the HEAD of the 1.20 branch. I should know in the next 24 hours or so if that solves the problem for all of Stackage.

@tibbe
Copy link
Member

tibbe commented May 29, 2014

I've put out a Cabal-1.20.0.1 with the missing patch. Sorry for the mess-up. I'll send out request for binaries so we can upload cabal-install binaries built against Cabal-1.20.0.1 and cabal-install-1.20.0.2.

@snoyberg
Copy link
Collaborator

Thanks @tibbe. Does this in any way affect the backported patches for cabal-install 1.18? If so, that should probably be brought up for the HP release.

@tibbe
Copy link
Member

tibbe commented May 30, 2014

1.18 is unaffected as this is a bug fix for a problem introduced in 1.20 when streaming test output was added.

@tibbe tibbe closed this as completed May 30, 2014
rodrigosetti added a commit to rodrigosetti/shellcheck that referenced this issue May 31, 2014
Please run using "cabal test --show-details=streaming", there's a known
issue about this that was fixed in the latest version of cabal:
haskell/cabal#1810
@snoyberg
Copy link
Collaborator

snoyberg commented Jun 1, 2014

I have a feeling something is still broken here. I kicked off a Stackage build on Friday, and on Sunday morning, the test suite is stalled running the following four tests:

  • FenwickTree-0.1.1
  • HCodecs-0.5
  • HTF-0.11.3.4
  • HTTP-4000.2.17

I'll try to provide more information, but I can tell you that removing --show-details=streaming caused Stackage to stall once again.

snoyberg added a commit to commercialhaskell/stackage that referenced this issue Jun 2, 2014
@ttuegel
Copy link
Member

ttuegel commented Jun 3, 2014

I cannot reproduce the test suite stalling in those four packages. The test suites finish for me with or without --show-details=streaming. I have tested the latest Git version and cabal-install-1.20.0.2/Cabal-1.20.0.1 on x86_64 Linux. Can you provide any more information about your test setup?

@snoyberg
Copy link
Collaborator

snoyberg commented Jun 5, 2014

I can only reproduce this behavior when run from within Stackage, so it may be some kind of locking going on instead, I can't be certain. In any event, --show-details=streaming does work for me, which is now turned on in the Stackage code base.

The system I'm running this on in an Amazon EC2 small, running Ubuntu 14.04 64-bit.

@jstolarek
Copy link
Author

Sorry guys, but this ticket needs to be re-opened. Testsuite for singletons package stalls again. Again, adding --show-details=streaming fixes the problem (and so does running the compiler binary).

I'm using GHC 7.8.3, Cabal 1.20.0.0, cabal-install 1.20.0.3 and latest singletons from the repo.

@tibbe
Copy link
Member

tibbe commented Sep 24, 2014

Is cabal unpack singletons && cd singletons-X.Y.Z.P && cabal test enough to reproduce? @ttuegel could you please take a look?

@tibbe tibbe reopened this Sep 24, 2014
@jstolarek
Copy link
Author

Steps to reproduce:

git clone https://github.com/goldfirere/singletons
cd singletons
git submodule init
git submodule update
cd th-desugar
cabal install
cd ..
cabal test

Note that we are currently suffering from a serious memory leak so compiling Data.Singletons.Prelude.List module will take Very Long Time.

@ttuegel
Copy link
Member

ttuegel commented Sep 24, 2014

You said you are using Cabal-1.20.0.0; the fix for this problem was in one of the point releases. I cannot reproduce with cabal-install-1.20.0.3 and Cabal-1.20.0.2.

@jstolarek
Copy link
Author

@ttuegel you're right, upgrading to Cabal-1.20.0.2 solves the problem. I assumed that if I bootstrap cabal-install it installs the latest Cabal version (now I know it doesn't).

@ttuegel
Copy link
Member

ttuegel commented Sep 24, 2014

I think it should, but it looks like the bootstrap script didn't get updated with the point releases.

@riaqn
Copy link

riaqn commented Mar 20, 2016

I can confirm this problem.
cabal version:

cabal-install version 1.22.9.0
using version 1.22.8.0 of the Cabal library 

ghc version: The Glorious Glasgow Haskell Compilation System, version 7.10.3

strace cabal test, with or without --show-details=streaming

wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 0x7ffc180191bc, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
write(9, "\377\0\0\0\0\0\0\0", 8)       = 8
rt_sigreturn({mask=[]})                 = 61
wait4(20805, 

I 'm using detailed-0.9.

@mkelc
Copy link

mkelc commented Mar 23, 2016

Had (Have) the same problem. Switching to stack using stack (>>stack test<<) works. Though this does not solve the issue it's still a workaround..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants