New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup.hs must be compiled with -threaded #2398

Open
orlitzky opened this Issue Jan 30, 2015 · 11 comments

Comments

Projects
None yet
7 participants
@orlitzky
Copy link

orlitzky commented Jan 30, 2015

I ran into Gentoo bug #537500 with the test suite for ShellCheck. I asked Haskell-Cafe about it, and Thomas Tuegel told me to report it here. So here you go!

I was able to reproduce the issue with the latest Cabal master branch.

@trofi

This comment has been minimized.

Copy link
Contributor

trofi commented Jan 30, 2015

slightly more info with -debug +RTS -Ds runtime:

Test suite test-shellcheck: RUNNING...
cap 0: created thread 4
cap 0: thread 1 stopped (yielding)
cap 0: running thread 4 (ThreadRunGHC)
cap 0: thread 4 stopped (blocked on a read operation)
        thread    4 @ 0x7ffff5c27390 is blocked on read from fd 3 (TSO_DIRTY)
scheduler: checking for threads blocked on I/O
cap 0: running thread 1 (ThreadRunGHC)
cap 0: thread 1 stopped (suspended while making a foreign call)
cap 0: running thread 1 (ThreadRunGHC)
'dist/build/test-shellcheck/test-shellcheck'
cap 0: thread 1 stopped (suspended while making a foreign call)
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bcbe9c in __libc_waitpid (pid=30078, stat_loc=0x7fffffff9cbc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
31      ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) i th
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fda740 (LWP 30072) "setup" 0x00007ffff7bcbe9c in __libc_waitpid (pid=30078, stat_loc=0x7fffffff9cbc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
(gdb) bt
#0  0x00007ffff7bcbe9c in __libc_waitpid (pid=30078, stat_loc=0x7fffffff9cbc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00000000009d4c74 in waitForProcess ()
#2  0x00000000009d3718 in s54b_info ()
#3  0x0000000000000000 in ?? ()

waitpid() stuck and does not get unblocked by VTALARM (signals stop hppening as ./setup does nothing).

@ttuegel

This comment has been minimized.

Copy link
Member

ttuegel commented Feb 4, 2015

I strongly suspect this is yet another lazy IO bug. At this point, that deserves its own label.

@bennofs

This comment has been minimized.

Copy link
Collaborator

bennofs commented Feb 13, 2015

I think the cause of this problem is that rawSystemIOWithEnv uses ẁaitForProces which being an FFI call blocks all other haskell threads if using the non-threaded runtime. The runTest function in Distribution/Simple/Test/ExeV10.hs however depends on threads to read the pipe which is passed to the process for outputting. If the pipe fills up (because the read thread never runs), the test suite will block.

@ttuegel

This comment has been minimized.

Copy link
Member

ttuegel commented Apr 25, 2015

This is a different flavor of #2489. I am testing the recommended fix for that now. I'll apply it here once everything checks out.

trofi added a commit to gentoo-haskell/gentoo-haskell that referenced this issue Jul 20, 2015

haskell-cabal.eclass: workaround cabal lockup bug in non-threaded run…
…time

Gentoo-bug: https://bugs.gentoo.org/show_bug.cgi?id=537500
Cabal-bug: haskell/cabal#2398
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
@nomeata

This comment has been minimized.

Copy link
Contributor

nomeata commented Aug 15, 2015

Is there a better work-around than building with -threaded, which is not available on all architectures?

@ttuegel

This comment has been minimized.

Copy link
Member

ttuegel commented Aug 15, 2015

@nomeata No, the Setup.hs executable is designed to only be compiled with -threaded. If that option is not available on all architectures, I will need to remove the --show-details=streaming option from the test command. I guess we can keep that option in cabal-install, which already requires -threaded.

@ttuegel ttuegel changed the title Test suite can hang when Setup.hs compiled without -threaded Setup.hs must be compiled with -threaded Aug 15, 2015

@nomeata

This comment has been minimized.

Copy link
Contributor

nomeata commented Aug 15, 2015

It might make sense to keep Setup.hs simple and reliable, e.g. for automatic builders, and do fancy stuff in cabal-install.

If it is related to --show-details=streaming, maybe --show-details=always could be implemented so that it works without threading?

@ttuegel

This comment has been minimized.

Copy link
Member

ttuegel commented Aug 15, 2015

Yes, that's how it was originally implemented. In fact, if Cabal's log files aren't needed, we can even just pipe the test output directly to stdout. I suspect this is the case for most automatic builders, which will have their own logging facilities. Threading is only required to both keep logs and print real-time test output.

@nomeata

This comment has been minimized.

Copy link
Contributor

nomeata commented Sep 12, 2015

I see a few test suite failures on various packages, and due to this bug I’m not sure whether to blame cabal or the actual package at hand. I’d be grateful for a patch, especially if it is simple and can be backported to GHC-7.10.

@nomeata

This comment has been minimized.

Copy link
Contributor

nomeata commented Nov 5, 2015

I’m confused now. I tried to add a test case to Cabal to reproduce the problem, but I’m failing to to do so. Moreover, it seems that there, Setup is compiled without -threaded (see compileSetup in PackageTester.hs).

Maybe the runtime works fine without -threaded, but on some architectures, it does not manage to switch threads?

@nomeata

This comment has been minimized.

Copy link
Contributor

nomeata commented Nov 5, 2015

Ok, I could reproduce it now, with large enough output. Preparing a pull request with the updated output.

I suggest to go this, less ambitious but more reliable route:

  • --show-details=never: Write output directly to the log, i.e. pass the handle to the log file to the test program.
  • --show-details=always: Write output directly to the log, i.e. pass the handle to the log file to the test program. Afterwards read that file and print it.
  • --show-details=failure: Write output directly to the log, i.e. pass the handle to the log file to the test program. Afterwards read that file and print it, if there was a failure
  • --show-details=streaming: Write output directly to stdout. Do not write a log file.

The last bit is a slight loss of functionality, but avoids any threading or pipe foo in Cabal, and furthermore fixes #2911 en passant.

nomeata added a commit to nomeata/cabal that referenced this issue Nov 5, 2015

PackageTests/TestSuiteTests/ExeV10: Have large output
The test case is changed to print a rather large amount of text, large
enough to fill a buffer. This way, the buffer draining functionality in
the test runner is tested, and it reveals haskell#2398.

This makes the test suite currently fail, of course, but the bug was
there before.

nomeata added a commit to nomeata/cabal that referenced this issue Nov 5, 2015

test: New mode --show-details=direct
This mode implements haskell#2911, and allows to connect the test runner
directly to stdout/stdin. This is more reliable in the presence of no
threading, i.e. a work-arond for haskell#2398.

I make the test suite use this, so that it passes again, despite
printing lots of stuff. Once haskell#2398 is fixed properly, the test suite
should probably be extended to test all the various --show-details
modes.

nomeata added a commit to nomeata/cabal that referenced this issue Nov 5, 2015

test: New mode --show-details=direct
This mode implements haskell#2911, and allows to connect the test runner
directly to stdout/stdin. This is more reliable in the presence of no
threading, i.e. a work-arond for haskell#2398.

I make the test suite use this, so that it passes again, despite
printing lots of stuff. Once haskell#2398 is fixed properly, the test suite
should probably be extended to test all the various --show-details
modes.

nomeata added a commit to nomeata/cabal that referenced this issue Nov 7, 2015

test: New mode --show-details=direct
This mode implements haskell#2911, and allows to connect the test runner
directly to stdout/stdin. This is more reliable in the presence of no
threading, i.e. a work-arond for haskell#2398.

I make the test suite use this, so that it passes again, despite
printing lots of stuff. Once haskell#2398 is fixed properly, the test suite
should probably be extended to test all the various --show-details
modes.

@23Skidoo 23Skidoo modified the milestones: Cabal 1.24, Cabal 1.26 Feb 21, 2016

@ezyang ezyang modified the milestone: Cabal 2.0 Sep 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment