Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

build multiple packages in parallel #440

Closed
bos opened this Issue · 14 comments

4 participants

@bos
Owner

(Imported from Trac #447, reported by @dcoutts on 2009-01-10)

The latest version of the gentoo portage tool is rather slick. It can do parallel builds and it displays a nice summary on the command line, eg:

# emerge -uD system -j --load-average=4.5
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Starting parallel fetch
>>> Emerging (1 of 14) dev-libs/expat-2.0.1-r1
>>> Emerging (2 of 14) sys-devel/autoconf-wrapper-6
>>> Emerging (3 of 14) sys-kernel/linux-headers-2.6.27-r2
>>> Installing sys-devel/autoconf-wrapper-6
>>> Jobs: 0 of 14 complete, 1 running Load avg: 2.99, 1.59, 0.67
Note how they solve the problem of how to display what is going on when there are multiple builds happening. The answer is not to display it at all! This would have to go hand-in-hand with logging all builds so that we can still diagnose failures.

Note the final line, that gets updated to display the current number of jobs running, the number completed etc. It also shows the load average. The job scheduler has two parameters, one is a maximum number of jobs (or unlimited) and the other is a load average. It will only launch new jobs if the load average is less than the given maximum. That allows it to interact reasonably well with builds that use make -j internally. In the example above I set the load average to be just slightly more than the number of CPUs I've got.

It looks to me like it serialises some bits, like installing, since saturating the disk with multiple parallel installs is generally of no benefit, indeed it can be slower. Also downloads seem to be serialised, again because there is probably little benefit to making multiple connections to the same server.

Anyway, the point is, cabal-install ought to be able to do all this. Some bits we can do now. We already have a graph representation of the install plan and we recalculate when a package fails to install.

We will need an improved download api, probably involving sending requests off to a dedicated download thread (which would serialise them).

@bos
Owner

(Imported comment by SamAnklesaria on 2009-01-10)

partial, hypothetical implimentation lacking suppressed output and command line flags

@bos
Owner

(Imported comment by refold on 2011-03-29)

Relevant mailing list thread: http://thread.gmane.org/gmane.comp.lang.haskell.cabal.devel/7473

@bos
Owner

(Imported comment by refold on 2011-06-10)

Current status (for those interested): Building multiple packages in parallel was implemented, but the patches are not merged into the mainline as of yet; I'm now working on parallelising 'cabal build'.

@bos
Owner

(Imported comment by refold on 2011-10-16)

Implementation

@bos
Owner

(Imported comment by refold on 2011-11-05)

Attached are my patches that parallelise cabal-install's 'install' command.

Sorry for sending them as a single large bundle - ideally I would like
to split the patch series, but darcs send makes it hard by ignoring
depended-upon patches. Additionally, it's hard to destructively edit
history in Darcs, so instead of obliterating two unnecessary patches
(changes to README and cabal-install.cabal) I undid those changes with
a "merge" patch.

The patch series logically consists of three parts (in chronological order):

1) From the first patch up to the "Parallelise the install command" patch

Implements the basic parallel framework as described here. Changes
are a bit more pervasive than expected because of Cabal's internal
assumption that the current working directory is the same as the directory of the
package currently being built.

2) From the end of the previous part up to the "Implement output
serialisation (client bits)." patch

Implements output serialisation - since we don't want the console
output to be garbled, all printing should be done from a single
thread. This is done by changing all code called from
D.C.I.executeInstallPlan to use callbacks instead of standard output
functions (debug/info/...).

3) Bugfixes and polishing (remaining patches)

During this stage I was concentrated on testing and fixing bugs and
didn't add any new functionality.

My patches are also available in a separate Darcs repository.

@bos
Owner

(Imported comment by refold on 2011-11-05)

I've updated my parallel patches (see attachment). Patches apply cleanly to the current mainline. The parallel code path now always uses the external setup method (via Setup.hs), so the required changes to the Cabal lib are minimised. There are still some traces of output serialisation, though.

Some numbers:

$ time cabal install -j 1 alex happy
real 1m19.236s
user 1m1.330s
sys 0m10.510s
$ time cabal install -j 4 alex happy
real 0m52.106s
user 1m10.680s
sys 0m15.030s
$ time cabal install -j 1 yesod
real 19m14.913s
user 15m59.420s
sys 1m25.650s
$ time cabal install -j 4 yesod
real 14m8.599s
user 21m36.530s
sys 4m5.650s
I also tested the Nov 2011 version of the code (tries to use the internal setup method, requires pervasive changes to Cabal lib):

$ time cabal install -j 4 alex happy
real 0m45.503s
user 1m4.040s
sys 0m10.100s
$ time cabal install -j 4 yesod
real 10m41.840s
user 17m6.560s
sys 1m33.040s
Compiling and linking all these Setup.hs files does add some noticeable overhead.

If these patches get accepted, I'll start working on improving the UI.

@bos
Owner

(Imported comment by refold on 2012-04-02)

Parallel patches were moved to GitHub:

git clone git://github.com/23Skidoo/cabal.git cabal-parallel-install
cd cabal-parallel-install
git checkout parallel-install
@thielema

Is it also planned to build profiling, shared and static libs in parallel?

@23Skidoo
Collaborator

@thielema These are currently not built in parallel; I'll look at it after the patches are merged.

@tibbe
Owner

This is now mostly done. Great work @23Skidoo ! Remaining is to reduce the output to a much condensed form (as shown in the ticket description) and logging each package's build log to a file that can be output on build failure.

@23Skidoo
Collaborator

Now that the patches implementing build logging and better output are in, I think we should close this issue. Improvements to the parallel code (dynamic status indicator, parallel building of shared/profiling/... versions, module-level parallelism) should be dealt with as separate tickets.

@tibbe
Owner

@23Skidoo Fine by me. Could you please open a new ticket for the final UI improvements?

@23Skidoo
Collaborator

@tibbe Done (#975, #976).

@23Skidoo
Collaborator

@tibbe Can you close this ticket?

@tibbe tibbe closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.