Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Errors from setup-exe-cache during parallel build #1076

Closed
rrnewton opened this Issue · 32 comments

3 participants

@rrnewton

Since I started using cabal-0.16 and GHC-7.6.1 (Mac OS 10.8) I've started getting a lot of errors like this:

/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.4.2:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.4.2: cannot execute binary file

I tried blowing away the cache but that doesn't solve it, but if I remove the "-j" to turn off parallel builds, I don't run into the problem and my build succeeds.

@23Skidoo
Collaborator

Are these files created and are they being marked executable?

@23Skidoo
Collaborator

I would love to help, but I can't reproduce this on my machine.

@tibbe
Owner

I'm going to make a bug-fix release of Cabal and cabal-install in a few days, so it would be nice to get a fix for this in there.

@rrnewton

They are marked executable. This is weird but it looks like the file is just for the wrong OS! It is creating files that "file" claims are GNU/Linux executables:

$ file /Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1 
  /Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
$ ls -l /Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1 
-rwxr-xr-x  1 rrnewton  staff  8144967 Oct 22 09:46 /Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1

I didn't even know GHC cross compiling was working ;-).

@rrnewton

By the way this is cropping up for me while working on Accelerate:

https://github.com/AccelerateHS/accelerate.

After installing the main accelerate module, if I install the following package from a submodule I get the error:

cabal install accelerate-backend-kit/ -j 

... even though that one-package build has no actual parallelism.

@tibbe
Owner

... even though that one-package build has no actual parallelism.

-j always behaves the same, regardless if there's actual parallelism. It's implemented as a different driver for cabal install.

@rrnewton

Sure. Though I just wanted to point out that this was not something resulting from genuine parallelism (like a race condition).

Also, it could have just turned itself off if the installation plan includes a single package, I suppose. I assume there would be no cost or benefit either way...

@23Skidoo
Collaborator

This is weird but it looks like the file is just for the wrong OS!

Weird indeed. The setup executable (for build-type Simple) is just the following program:

import Distribution.Simple

main = defaultMain

Which is compiled by calling ghc --make (compileSetupExecutable in Distribution.Client.SetupWrapper). You can see the exact command that is used by purging the setup executable cache directory and running cabal install -v2 -j PACKAGE. For example, here's how it looks on my machine:

$ cabal install -j2 -v2 --reinstall ansi-terminal
[...]
ghc --make /tmp/ansi-terminal-0.5.5-4239/ansi-terminal-0.5.5/dist/setup/setup.hs -o /tmp/ansi-terminal-0.5.5-4239/ansi-terminal-0.5.5/dist/setup/setup -odir /tmp/ansi-terminal-0.5.5-4239/ansi-terminal-0.5.5/dist/setup -hidir /tmp/ansi-terminal-0.5.5-4239/ansi-terminal-0.5.5/dist/setup -i -i/tmp/ansi-terminal-0.5.5-4239/ansi-terminal-0.5.5 -package Cabal-1.14.0
creating /home/cabal-test/.cabal/setup-exe-cache
Installing executable
[...]

What command is used to compile the setup executable on your machine and what happens if you try to compile it yourself?

@rrnewton

It works fine to do this:

ghc --make Setup.hs

But cabal install -j in the same dir fails, even if I delete the setup-exe-cache directory immediately before running it.

The problem seems to be getting bad or stale information in ./dist/. For example, I noticed that with a stale setup-conf cabal would try to call ghc with "-package Cabal-1.14.0", which I had since replaced with Cabal-1.16.0. Further, in the output appended at the end of this file, AFTER the ghc invocation fails, cabal tries to call a cached version in setup-exe-cache (which doesn't exist anymore -- seems like it needs an existence check before executing).

The answer is to wipe out ./dist/ directories and I'm left wishing that cabal clean could accept multiple directories like cabal install:

cabal clean A/ B/ C/

I suppose it wouldn't make sense to have cabal catch certain errors by doing one full retry (wipe out the ./dist/ dir). The other ultimate source of the problems I've been seeing, in addition to out of date setup-config files (hanging around in little noticed corners of submodules of submodules), is synchronization.

This is my bad. I use unison to synchronize working copies between a mac laptop and linux servers. I ignore most dist/ dirs but missed this one. Anyway that's how cabal got linux executables, which it then happily copied from ./dist/ into the setup-exe-cache.

However, I can't believe I'm the only person that uses unison, or Dropbox, or NFS for working directories... I think more sanity checks by cabal would not hurt. How about calling a given setup executable with "--help" or something to make sure it is executable before admitting it into the cache?

Appendix:
What follows is the "-v3" output for a "cabal install -j" that fails:

searching for ghc in path.
found ghc at /usr/local/bin/ghc
("/usr/local/bin/ghc",["--numeric-version"])
/usr/local/bin/ghc is version 7.6.1
looking for tool "ghc-pkg" near compiler in /usr/local/bin
found ghc-pkg in /usr/local/bin/ghc-pkg
("/usr/local/bin/ghc-pkg",["--version"])
/usr/local/bin/ghc-pkg is version 7.6.1
("/usr/local/bin/ghc",["--supported-languages"])
("/usr/local/bin/ghc",["--info"])
Reading installed packages...
("/usr/local/bin/ghc-pkg",["dump","--global","-v0"])
("/usr/local/bin/ghc-pkg",["dump","--user","-v0"])
("/usr/local/bin/ghc",["--print-libdir"])
Reading available packages...
Choosing modular solver.
Resolving dependencies...
[__0] trying: accelerate-backend-kit-0.13.0.0
[__1] rejecting: base-3.0.3.2, 3.0.3.1 (global constraint requires installed instance)
[__1] trying: base-4.6.0.0/installed-689...
[__2] trying: rts-1.0/installedbuil...
[__3] trying: integer-gmp-0.5.0.0/installed-b00...
[__4] trying: ghc-prim-0.3.0.0/installed-4fa...
[__5] trying: accelerate-backend-kit-0.13.0.0:!test
[__6] trying: split-0.2.1.1/installed-99f...
[__7] trying: HUnit-1.2.5.1/installed-d2f...
[__8] trying: test-framework-hunit-0.2.7/installed-c64...
[__9] trying: extensible-exceptions-0.1.1.4/installed-9e3...
[_10] trying: test-framework-0.6.1/installed-8b1...
[_11] trying: xml-1.3.12/installed-488...
[_12] trying: text-0.11.2.3/installed-194...
[_13] trying: bytestring-0.10.0.0/installed-7c0...
[_14] trying: time-1.4.0.1/installed-338...
[_15] trying: regex-posix-0.95.2/installed-04d...
[_16] trying: regex-base-0.93.2/installed-c2f...
[_17] trying: random-1.0.1.1/installed-679...
[_18] trying: old-locale-1.0.0.5/installed-b00...
[_19] trying: hostname-1.0/installed-2c6...
[_20] trying: ansi-wl-pprint-0.6.4/installed-765...
[_21] trying: ansi-terminal-0.5.5/installed-9d8...
[_22] trying: unix-2.6.0.0/installed-b3b...
[_23] trying: deepseq-1.3.0.1/installed-1cc...
[_24] trying: vector-0.10.0.1/installed-078...
[_25] trying: primitive-0.5.0.1/installed-9c4...
[_26] trying: GenericPretty-1.2.0/installed-f12...
[_27] trying: ghc-7.6.1/installed-4e9...
[_28] trying: template-haskell-2.8.0.0/installed-9d6...
[_29] trying: process-1.1.0.2/installed-b88...
[_30] trying: hpc-0.6.0.0/installed-448...
[_31] trying: hoopl-3.9.0.0/installed-d69...
[_32] trying: filepath-1.3.0.1/installed-2c4...
[_33] trying: directory-1.2.0.0/installed-22b...
[_34] trying: bin-package-db-0.0.0.0/installed-364...
[_35] trying: binary-0.5.1.1/installed-e6d...
[_36] trying: Cabal-1.16.0/installed-4a0...
[_37] trying: old-time-1.1.0.1/installed-769...
[_38] trying: accelerate-0.13.0.0/installed-273...
[_39] trying: hashtables-1.0.1.8/installed-03e...
[_40] trying: hashable-1.1.2.5/installed-bbe...
[_41] trying: containers-0.5.0.0/installed-e49...
[_42] trying: pretty-1.1.1.0/installed-60e...
[_43] trying: mtl-2.1.2/installed-25d...
[_44] trying: transformers-0.3.0.0/installed-1bb...
[_45] trying: array-0.4.0.1/installed-cbe...
[_46] done
Warning: The following packages are likely to be broken by the reinstalls:
classsupport-b629-0.1
accelerate-harlan-0.1
Continuing even though the plan contains dangerous reinstalls.
Ready to install accelerate-backend-kit-0.13.0.0
Configuring accelerate-backend-kit-0.13.0.0...
Using external setup method with build-type Simple
creating dist/setup
Using Cabal library version 1.14.0
Using ./dist/setup/setup.hs as setup script.
Setup executable not found in the cache.
Setup script is out of date, compiling...
("/usr/local/bin/ghc",["-v","--make","./dist/setup/setup.hs","-o","./dist/setup/setup","-odir","./dist/setup","-hidir","./dist/setup","-i","-i.","-package","Cabal-1.14.0"])
Waiting for install task to finish...
/usr/local/bin/ghc returned ExitFailure 1 with error message:
Glasgow Haskell Compiler, Version 7.6.1, stage 2 booted by GHC version 7.4.2
Using binary package database:
/usr/local/lib/ghc-7.6.1/package.conf.d/package.cache
Using binary package database:
/Users/rrnewton/.ghc/x86_64-darwin-7.6.1/package.conf.d/package.cache
*** Deleting temp files:
Deleting:
*** Deleting temp dirs:
Deleting:
<command line>: cannot satisfy -package Cabal-1.14.0
(use -v for more information)
Failed to install accelerate-backend-kit-0.13.0.0
Last 10 lines of the build log (
/Users/rrnewton/.cabal/logs/accelerate-backend-kit-0.13.0.0.log ):
Installing library in
/Users/rrnewton/.cabal/lib/accelerate-backend-kit-0.13.0.0/ghc-7.4.2
Registering accelerate-backend-kit-0.13.0.0...
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1:
/Users/rrnewton/.cabal/setup-exe-cache/setup-Cabal-1.14.0-ghc-7.6.1: cannot
execute binary file
cabal: Error: some packages failed to install:
accelerate-backend-kit-0.13.0.0 failed during the configure step. The
exception was:
user error (Glasgow Haskell Compiler, Version 7.6.1, stage 2 booted by GHC
version 7.4.2
Using binary package database:
/usr/local/lib/ghc-7.6.1/package.conf.d/package.cache
Using binary package database:
/Users/rrnewton/.ghc/x86_64-darwin-7.6.1/package.conf.d/package.cache
*** Deleting temp files:
Deleting:
*** Deleting temp dirs:
Deleting:
<command line>: cannot satisfy -package Cabal-1.14.0
(use -v for more information)
)
@23Skidoo
Collaborator

Further, in the output appended at the end of this file, AFTER the ghc invocation fails, cabal tries to call a cached version in setup-exe-cache (which doesn't exist anymore -- seems like it needs an existence check before executing).

I don't see that in the log. It fails immediately after trying to compile Setup.hs with ghc --make because ghc cannot satisfy -package Cabal-1.14.0. Since -j uses the external setup method, the Cabal version is cached in dist/setup/setup.version - I think that the right way to fix this is to compare the saved Cabal version with the one that cabal-install was compiled with and update "setup.version" if they don't match.

This is my bad. I use unison to synchronize working copies between a mac laptop and linux servers. I ignore most dist/ dirs but missed this one. Anyway that's how cabal got linux executables, which it then happily copied from ./dist/ into the setup-exe-cache.

I suspected something like that.

However, I can't believe I'm the only person that uses unison, or Dropbox, or NFS for working directories... I think more sanity checks by cabal would not hurt. How about calling a given setup executable with "--help" or something to make sure it is executable before admitting it into the cache?

We can force recompilation of the setup exe before moving it to setup cache, but I'm not sure that if I fix this there will not be something else that breaks if you replace the dist directory with the one from another computer.

@tibbe
Owner

I don't see that in the log. It fails immediately after trying to compile Setup.hs with ghc --make because ghc cannot satisfy -package Cabal-1.14.0. Since -j uses the external setup method, the Cabal version is cached in dist/setup/setup.version - I think that the right way to fix this is to compare the saved Cabal version with the one that cabal-install was compiled with and update "setup.version" if they don't match.

Could you elaborate a bit on this please. This is an area of cabal I'm not very familiar with. Why are we caching the cabal version?

We can force recompilation of the setup exe before moving it to setup cache, but I'm not sure that if I fix this there will not be something else that breaks if you replace the dist directory with the one from another computer.

I don't we want to do that. We might want to have some light heuristic for detecting if something is wrong, but supporting compiles where the dist directory contains files from another computer is not worth the effort methinks.

@rrnewton
@23Skidoo
Collaborator

Could you elaborate a bit on this please. This is an area of cabal I'm not very familiar with. Why are we caching the cabal version?

See cabalLibVersionToUse in Distribution.Client.SetupWrapper. I think that this is done to avoid calling installedCabalVersion each time.

I'm not sure I follow -- what's the meaning of the "cannot execute binary file" lines? Shouldn't it at least say "file does not exist" instead?

If something fails, install -j prints last 10 of the build log:

Last 10 lines of the build log (
/Users/rrnewton/.cabal/logs/accelerate-backend-kit-0.13.0.0.log ):

These lines are left in the build log from the previous compilation attempts.

I guess the broader question here is what can cabal do to be defensive against abuse. For example, I don't even know what happens if someone attempts multiple cabals concurrently -- does it grab a lock and protect itself?

AFAIK, it does not.

@tibbe
Owner

These lines are left in the build log from the previous compilation attempts.

This sounds wrong. The build log should only contain output from the latest attempt (i.e. we should overwrite it with each build).

@23Skidoo
Collaborator

The build log should only contain output from the latest attempt (i.e. we should overwrite it with each build).

Yes, it's a bit confusing. This was the old behaviour of the --build-log option that we didn't change.

@tibbe
Owner

Yes, it's a bit confusing. This was the old behaviour of the --build-log option that we didn't change.

Filed issue #1081 so we don't forget to fix that.

@rrnewton
@23Skidoo
Collaborator

On the specific proposal -- what's the downside of doing a fresh Setup binary before caching?

I don't think this will add much overhead.

@tibbe
Owner

I don't think this will add much overhead.

Are we talking about one per cabal install -j invocation? What if I have -j as the default (e.g. in ~/.cabal/config) and thus the parallel build system will be used even if I install a single package without dependencies? Having a cached setup would save me some time (in particular linking time).

@23Skidoo
Collaborator

Are we talking about one per cabal install -j invocation?

If there's a setup executable in the cache, it is used right away. So in the common case there will be no difference.
In the uncommon case when you have no setup exe in the cache, but have one in dist/setup (e.g. if you do cabal build && cabal install -j) there will be one extra compilation.

@tibbe
Owner

I see. In that case, do what will be least surprising to the user.

@23Skidoo
Collaborator

cabal build && cabal install -j

Actually, cabal build will usually use the internal setup method, so there will be no difference even in this case.

@23Skidoo 23Skidoo referenced this issue from a commit in 23Skidoo/cabal
@23Skidoo 23Skidoo Force a recompile when updating the setup exe cache.
See the discussion in #1076.
dd4c118
@23Skidoo
Collaborator

OK, do we also want to force a call to installedCabalVersion when the cached version does not match the version cabal-install was compiled with? Usually there is only one Cabal version installed.

@23Skidoo 23Skidoo referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@23Skidoo 23Skidoo referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@23Skidoo 23Skidoo referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@tibbe
Owner

OK, do we also want to force a call to installedCabalVersion when the cached version does not match the version cabal-install was compiled with? Usually there is only one Cabal version installed.

I don't know. Someone that understands the issue better than me needs to think it through. :)

@23Skidoo
Collaborator

I don't know. Someone that understands the issue better than me needs to think it through. :)

I implemented that in #1083. Basically, if the Cabal lib version that cabal-install was compiled with can be used for compiling the setup script and the cached version is different, it will double-check that the cached version is correct (by calling installedCabalVersion).

Without this change, upgrading ghc & cabal-install and then running cabal install -j . gives an error.

@23Skidoo 23Skidoo referenced this issue from a commit in 23Skidoo/cabal
@23Skidoo 23Skidoo Try harder to pick the correct version in cabalLibVersionToUse.
See the discussion in #1076.
a17491d
@23Skidoo
Collaborator

Someone that understands the issue better than me needs to think it through. :)

I think that the only case when this change can cause unnecessary calls to installedCabalVersion is when you're using, say, cabal-instal-1.17, but have only Cabal-1.16 installed. Even in this case, if dist/setup/setup.version doesn't exist, the call to installedCabalVersion will be necessary anyway.

@tibbe
Owner

I've merged a17491d. Anything else we want to do here?

@23Skidoo
Collaborator

I've merged a17491d.

I'm sorry, but I think that I've found another unintended consequence of this change. cabal-install has an option --cabal-lib-version, which is used mainly for testing. Before this patch, it was possible to just say cabal configure --cabal-lib-version=1.14, and that version would be used for all subsequent cabal actions. My change makes --cabal-lib-version obligatory for all cabal actions, not just the first. So --cabal-lib-version is probably the main reason we're caching the Cabal version.

Unfortunately, this caching leads to confusing behaviour in Ryan's use case (update GHC & Cabal, cabal install -j my-package-1/ my-package-2/ ... - NB: without -j the internal setup method is used and all is well). I'm not quite sure what is the correct solution here.

Anything else we want to do here?

No, I don't think so.

@23Skidoo
Collaborator

Before this patch, it was possible to just say cabal configure --cabal-lib-version=1.14, and that version would be used for all subsequent cabal actions.

... although that seems to be the case only for build-type: Custom, otherwise the internal setup method will be used, which uses the Cabal lib version cabal-install was compiled with. So setup.version is used just so that we know which Cabal version the setup executable was compiled with and don't rebuild it unnecessarily. OK, I'm now convinced that my change was correct.

@23Skidoo
Collaborator

@tibbe Can you close this ticket?

@tibbe
Owner

I'll close this for now. If we have some more concrete ideas of how to add better error checking, please files individual tickets for those.

@tibbe tibbe closed this
@23Skidoo
Collaborator

I've reverted a17491d in b45d0c2. See the commit message for explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.