Parallelise cabal build over modules #976

Open
23Skidoo opened this Issue Jul 13, 2012 · 71 comments

Comments

Projects
None yet
9 participants
@23Skidoo
Member

23Skidoo commented Jul 13, 2012

Updated summary by @ezyang. Previously, this ticket talked about all sorts of parallelism at many levels. Component-level parallelism was already addressed in #2623 (fixed by per-component builds), so all that remains is per-module parallelism. This is substantially more difficult, because right now we build by invoking ghc --make; achieving module parallelism would require teaching Cabal how to build using ghc -c. But this too has a hazard: if you don't have enough cores/have a serial dependency graph, ghc -c will be slower, because GHC spends more time reloading interface files. In #976 (comment) @dcoutts describes how to overcome this problem.

There are several phases to the problem:

  1. First building the GHC build server and parallelism infrastructure. This can be done completely independently of Cabal: imagine a program which has a command line identical to GHC, but is internally implemented by spinning up multiple GHC processes and farming out the compilation process. You can tell if this was worthwhile when you get scaling better than GHC's built-in -j and a traditional -c setup.

  2. Next, we need to teach Cabal/cabal-install how to take advantage of this functionality. If you implemented your driver program with exactly the same command line flags as GHC, then this is as simple as just passing -w $your_parallel_ghc_impl. However, this is a problem doing it this way: cabal-install will attempt to spin up N parallel package/component builds, which each in turn will try to spin up M GHC build servers; this is bad; you want the total number of GHC build servers to equal the number of cores. So then you will need to setup some sort signalling mechanism to avoid too many build servers from running at once, OR have cabal new-build orchestrate the entire build down to the module level so it can plan parallelism (but you would probably have to rearchitect according to #4174 before you can do this.)


Now that the package-level parallel install has been implemented (see #440), the next logical step is to extend cabal build with support for building multiple modules, components and/or build variants (static/shared/profiling) in parallel. This functionality should be also integrated with cabal install in such a way that we don't over- or underutilise the available cores.

A prototype implementation of a parallel cabal build is already available as a standalone tool. It works by first extracting a module dependency graph with 'ghc -M' and then running multiple 'ghc -c' processes in parallel.

Since the parallel install code uses the external setup method exclusively, integrating parallel cabal build with parallel install will require using IPC. A single coordinating cabal install -j N process will spawn a number of setup.exe build --semaphore=/path/to/semaphore children, and each child will be building at most N modules simultaneously. An added benefit of this approach is that nothing special will have to be done to support custom setup scripts.

An important issue is that compiling with ghc -c is slow compared to ghc --make because the interface files are not cached. One way to fix this is to implement a "build server" mode for GHC. Instead of repeatedly running ghc -c, each build process will spawn at most N persistent ghcs and distribute the modules between them. Evan Laforge has done some work in this direction.

Other issues:

  • Building internal components in parallel requires knowing their dependency graph (this is being implemented as part of integrating cabal repl patches).
  • Generating documentation in parallel may be only safe for build-type: Simple.
@bos

This comment has been minimized.

Show comment Hide comment
@bos

bos Jul 16, 2012

Contributor

This will be a huge win if it can make effective use of all cores. I've had quite a few multi-minute builds of individual packages, where the newly added per-package parallelism only helps with dependencies during the very first build, but not at all during ongoing development.

Contributor

bos commented Jul 16, 2012

This will be a huge win if it can make effective use of all cores. I've had quite a few multi-minute builds of individual packages, where the newly added per-package parallelism only helps with dependencies during the very first build, but not at all during ongoing development.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Jul 16, 2012

Member

@bos The main obstacle here is reloading of interface files, which slows down the parallel compilation considerably compared to ghc --make. See e.g. Neil Mitchell's Shake paper, where he found that "building the same project with ghc --make takes 7.69 seconds, compared to Shake with 11.83 seconds on one processor and 7.41 seconds on four processors." So far, the most promising approach seems to be implementing a "compile server" mode for GHC.

Member

23Skidoo commented Jul 16, 2012

@bos The main obstacle here is reloading of interface files, which slows down the parallel compilation considerably compared to ghc --make. See e.g. Neil Mitchell's Shake paper, where he found that "building the same project with ghc --make takes 7.69 seconds, compared to Shake with 11.83 seconds on one processor and 7.41 seconds on four processors." So far, the most promising approach seems to be implementing a "compile server" mode for GHC.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Jul 16, 2012

Member

An e-mail from @dcoutts that describes the "compile server" idea in more detail:

So here's an idea I've been mulling over recently...

For IDEs and build tools, we want a ghc api interface where we have very
explicit control over the environment in which new modules are compiled.
We want to be in full control, not using --make, and not using any
search paths etc. We know exactly where each .hi and .o file for all
dependent modules are. We should be able to build up an environment of
module name to (interface, object code) by starting from empty, adding
packages and individual module (.hi, .o) files.

Now that'd give us an api a lot like the current command line interface
of ghc -c single shot mode, except that we would be able to specify .hi
files on the command line rather than having ghc find them by searching.

But once we have that api, it'll be useful for IDEs, and useful for a
ghc server. This should give us the performance advantages of ghc --make
but still give us the control and flexibility of single shot mode. I'll
come to parallel builds in a moment.

The way it'd work is you start the server with some initial environment
(e.g. the packages) and you tell it to compile a module, then you can
tell it to extend its environment e.g. with the module you just compiled
and use the extended environment to compile more modules. So clearly you
could do the same thing as ghc --make does but with the dependency
manager being external to ghc.

Now for parallelism. Suppose we have two cores. We launch two ghc server
processes with the same initial package environment. We start compiling
two independent modules. Now we load the .hi files into *both* ghc
server processes to compile more modules. (In practice we don't load
them into each server when they become available, rather we do it on
demand when we see the module we need to compile needs the module
imports in question based on our module dep graph).

So, a short analysis of the number of times that .hi files are loaded:

In the current ghc --make mode, each .hi file is loaded once. So let's
say M modules. In the current ghc -c mode, for M modules we're loading
at most m * m/2 modules (right?) because in a chain of M modules we have
to load all previous .hi files for each ghc -c invocation.

In the hypothetical ghc server mode, with N servers, the worst case is
something like M * N module loads. Also, the N is parallelised. So the
single threaded performance is the same as --make. If you use 8 cores,
the overhead is 8 times higher in total, but distributed across 8 cores
so the wall clock time is no worse.

Actually, it's probably more sensible to look not at the cost of loading
the .hi files for M modules, but for P packages which is likely the
dominant cost. Again, it's P cost for the --make mode, and M * P for the
ghc -c mode, but N * P for the server mode. So this means it might not
be necessary to do the whole-package .hi file optimisation since the
cost is dramatically reduced.

So overall then, there's two parts to the work in ghc: extend the ghc
api to give IDEs and build managers this precise control over the
environment, then extend the main ghc command line interface to use the
new ghc api feature by providing a --server mode. It'd accept inputs on
stdin or something. It only needs very minimal commands: extend the
environment with a .hi .o pair and compile a .hs file. You can assume
that packages and other initial environment things are specified on the
--server command line.

Finally if there's time, add support for this mode into cabal, but that
might be too much (since that needs a dependency based build manager).

I'll also admit an ulterior motive for this feature, in addition to use
in cabal, which is that I'm working on Visual Studio integration and so
I've been thinking about what IDEs need in terms of the ghc api and I
think very explicit control of the environment is the way to go.
Member

23Skidoo commented Jul 16, 2012

An e-mail from @dcoutts that describes the "compile server" idea in more detail:

So here's an idea I've been mulling over recently...

For IDEs and build tools, we want a ghc api interface where we have very
explicit control over the environment in which new modules are compiled.
We want to be in full control, not using --make, and not using any
search paths etc. We know exactly where each .hi and .o file for all
dependent modules are. We should be able to build up an environment of
module name to (interface, object code) by starting from empty, adding
packages and individual module (.hi, .o) files.

Now that'd give us an api a lot like the current command line interface
of ghc -c single shot mode, except that we would be able to specify .hi
files on the command line rather than having ghc find them by searching.

But once we have that api, it'll be useful for IDEs, and useful for a
ghc server. This should give us the performance advantages of ghc --make
but still give us the control and flexibility of single shot mode. I'll
come to parallel builds in a moment.

The way it'd work is you start the server with some initial environment
(e.g. the packages) and you tell it to compile a module, then you can
tell it to extend its environment e.g. with the module you just compiled
and use the extended environment to compile more modules. So clearly you
could do the same thing as ghc --make does but with the dependency
manager being external to ghc.

Now for parallelism. Suppose we have two cores. We launch two ghc server
processes with the same initial package environment. We start compiling
two independent modules. Now we load the .hi files into *both* ghc
server processes to compile more modules. (In practice we don't load
them into each server when they become available, rather we do it on
demand when we see the module we need to compile needs the module
imports in question based on our module dep graph).

So, a short analysis of the number of times that .hi files are loaded:

In the current ghc --make mode, each .hi file is loaded once. So let's
say M modules. In the current ghc -c mode, for M modules we're loading
at most m * m/2 modules (right?) because in a chain of M modules we have
to load all previous .hi files for each ghc -c invocation.

In the hypothetical ghc server mode, with N servers, the worst case is
something like M * N module loads. Also, the N is parallelised. So the
single threaded performance is the same as --make. If you use 8 cores,
the overhead is 8 times higher in total, but distributed across 8 cores
so the wall clock time is no worse.

Actually, it's probably more sensible to look not at the cost of loading
the .hi files for M modules, but for P packages which is likely the
dominant cost. Again, it's P cost for the --make mode, and M * P for the
ghc -c mode, but N * P for the server mode. So this means it might not
be necessary to do the whole-package .hi file optimisation since the
cost is dramatically reduced.

So overall then, there's two parts to the work in ghc: extend the ghc
api to give IDEs and build managers this precise control over the
environment, then extend the main ghc command line interface to use the
new ghc api feature by providing a --server mode. It'd accept inputs on
stdin or something. It only needs very minimal commands: extend the
environment with a .hi .o pair and compile a .hs file. You can assume
that packages and other initial environment things are specified on the
--server command line.

Finally if there's time, add support for this mode into cabal, but that
might be too much (since that needs a dependency based build manager).

I'll also admit an ulterior motive for this feature, in addition to use
in cabal, which is that I'm working on Visual Studio integration and so
I've been thinking about what IDEs need in terms of the ghc api and I
think very explicit control of the environment is the way to go.
@tibbe

This comment has been minimized.

Show comment Hide comment
@tibbe

tibbe Jul 17, 2012

Owner

Even though using ghc -c leads to a slowdown on one core, having it as an option (for people with more cores) in the meantime seems worthwhile to me.

Owner

tibbe commented Jul 17, 2012

Even though using ghc -c leads to a slowdown on one core, having it as an option (for people with more cores) in the meantime seems worthwhile to me.

@bos

This comment has been minimized.

Show comment Hide comment
@bos

bos Jul 18, 2012

Contributor

@tibbe, I thought the point was that ghc -c doesn't break even until 4 cores. Mind you, Neil was surely testing on Windows, where the OS and filesystem could be reasonably expected to hurt performance quite severely.

Contributor

bos commented Jul 18, 2012

@tibbe, I thought the point was that ghc -c doesn't break even until 4 cores. Mind you, Neil was surely testing on Windows, where the OS and filesystem could be reasonably expected to hurt performance quite severely.

@tibbe

This comment has been minimized.

Show comment Hide comment
@tibbe

tibbe Jul 18, 2012

Owner

@bos I've heard the number 2 tossed around as well, but we should test and see. Doing parallelism at the module level should also expose many more opportunities for parallelism. The current parallel build system suffers quite a bit from lack of that (since there are lots of linear chains of package dependencies.)

Owner

tibbe commented Jul 18, 2012

@bos I've heard the number 2 tossed around as well, but we should test and see. Doing parallelism at the module level should also expose many more opportunities for parallelism. The current parallel build system suffers quite a bit from lack of that (since there are lots of linear chains of package dependencies.)

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 Jul 31, 2012

Member

What about profiling builds? Due to the structure of the compilations (exactly the same things as in a normal compilaiton are built), I'd guess might easily be run in parallel, and we might get almost ~x2 time saved.

Member

nh2 commented Jul 31, 2012

What about profiling builds? Due to the structure of the compilations (exactly the same things as in a normal compilaiton are built), I'd guess might easily be run in parallel, and we might get almost ~x2 time saved.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Jul 31, 2012

Member

@nh2 Parallel cabal build will make this possible.

Member

23Skidoo commented Jul 31, 2012

@nh2 Parallel cabal build will make this possible.

@ghost ghost assigned 23Skidoo Nov 24, 2012

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 17, 2013

Member

I am currently working on this. I got good results with ghc-parmake for compiling large libraries and am now making executables build in parallel.

Member

nh2 commented May 17, 2013

I am currently working on this. I got good results with ghc-parmake for compiling large libraries and am now making executables build in parallel.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 17, 2013

Member

@nh2 Cool! BTW, I proposed this as a GSoC project for this summer. Maybe we can work together if my project gets accepted?

Member

23Skidoo commented May 17, 2013

@nh2 Cool! BTW, I proposed this as a GSoC project for this summer. Maybe we can work together if my project gets accepted?

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 17, 2013

Member

@nh2

I got good results with ghc-parmake for compiling large libraries

I'm interested in the details. How large was the speedup? On how many cores? In my testing, the difference was negligible.

Member

23Skidoo commented May 17, 2013

@nh2

I got good results with ghc-parmake for compiling large libraries

I'm interested in the details. How large was the speedup? On how many cores? In my testing, the difference was negligible.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 18, 2013

Member

How large was the speedup? On how many cores?

The project I'm working on has a library with ~400 modules and 40 executables. I'm using an i7-2600K with 4 real (8 virtual) cores. For building the library only, I get:

* cabal build:                                              4:50 mins
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 2":  4:20 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 4":  3:00 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 8":  2:45 mins

I had to make minimal changes to ghc-parmake to get this to work, and thus got a 2x speedup almost for free :)

As you can see, the speed-up is not as big as we can probably expect from ghc --make itself being parallel or your --server - due to the caching, those should be a good bit faster, and I hope your project gets accepted. I'd be glad to help a bit if I can - but while I'm ok with hacking around on cabal, I've never touched GHC.

Building the executables in parallel is independent from all this and will also probably be a small change.

Member

nh2 commented May 18, 2013

How large was the speedup? On how many cores?

The project I'm working on has a library with ~400 modules and 40 executables. I'm using an i7-2600K with 4 real (8 virtual) cores. For building the library only, I get:

* cabal build:                                              4:50 mins
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 2":  4:20 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 4":  3:00 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 8":  2:45 mins

I had to make minimal changes to ghc-parmake to get this to work, and thus got a 2x speedup almost for free :)

As you can see, the speed-up is not as big as we can probably expect from ghc --make itself being parallel or your --server - due to the caching, those should be a good bit faster, and I hope your project gets accepted. I'd be glad to help a bit if I can - but while I'm ok with hacking around on cabal, I've never touched GHC.

Building the executables in parallel is independent from all this and will also probably be a small change.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 18, 2013

Member
* cabal build:                                              4:50 mins
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 2":  4:20 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 4":  3:00 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 8":  2:45 mins

Nice to hear that it can give a noticeable speedup on large projects. I should try testing it some more.

Building the executables in parallel is independent from all this and will also probably be a small change.

Maybe if you don't integrate build -j and install -j. Then you won't need to implement the IPC design sketched above.

Member

23Skidoo commented May 18, 2013

* cabal build:                                              4:50 mins
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 2":  4:20 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 4":  3:00 mins 
* cabal build --with-ghc=ghc-parmake --ghc-options="-j 8":  2:45 mins

Nice to hear that it can give a noticeable speedup on large projects. I should try testing it some more.

Building the executables in parallel is independent from all this and will also probably be a small change.

Maybe if you don't integrate build -j and install -j. Then you won't need to implement the IPC design sketched above.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 20, 2013

Member

@23Skidoo I made a prototype at https://github.com/nh2/cabal/compare/build-executables-in-parallel. It would be nice if you could take a look.

  • I haven't rebased on the latest master yet. Once the other points are sorted out, I'll do that and send a proper pull request (I will probably rewrite my history on that branch as we go towards that).
  • The copying of Semaphore and JobControl from cabal-install is not so nice. Is that the way to go nevertheless or should they be moved to some Internal package in Cabal? Update: We are discussing that here.
  • I still have to sort out that pressing Ctrl-C kills everything nicely and to get failure exit codes right.
  • It looks like I can't use macros (need MIN_VERSION_base) in the Cabal package - is that correct? The way how I work around it is very ugly (just using the deprecated old functions in Exception, creating warnings).
  • We probably want to make parallel jobs a config setting as well, or use the same number as the existing --jobs.

Feedback appreciated.

Member

nh2 commented May 20, 2013

@23Skidoo I made a prototype at https://github.com/nh2/cabal/compare/build-executables-in-parallel. It would be nice if you could take a look.

  • I haven't rebased on the latest master yet. Once the other points are sorted out, I'll do that and send a proper pull request (I will probably rewrite my history on that branch as we go towards that).
  • The copying of Semaphore and JobControl from cabal-install is not so nice. Is that the way to go nevertheless or should they be moved to some Internal package in Cabal? Update: We are discussing that here.
  • I still have to sort out that pressing Ctrl-C kills everything nicely and to get failure exit codes right.
  • It looks like I can't use macros (need MIN_VERSION_base) in the Cabal package - is that correct? The way how I work around it is very ugly (just using the deprecated old functions in Exception, creating warnings).
  • We probably want to make parallel jobs a config setting as well, or use the same number as the existing --jobs.

Feedback appreciated.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 23, 2013

Member

I have updated my branch to fix some minor bugs in my code. I can now build my project with cabal build --with-ghc=ghc-parmake --ghc-options="-j 8" -j8 to get both parallel library compilation and parallel executable building.

The questions above still remain.

Member

nh2 commented May 23, 2013

I have updated my branch to fix some minor bugs in my code. I can now build my project with cabal build --with-ghc=ghc-parmake --ghc-options="-j 8" -j8 to get both parallel library compilation and parallel executable building.

The questions above still remain.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 23, 2013

Member

@nh2 Thanks, I'll take look.

Member

23Skidoo commented May 23, 2013

@nh2 Thanks, I'll take look.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 25, 2013

Member

@nh2

The copying of Semaphore and JobControl from cabal-install is not so nice. Is that the way to go nevertheless or should they be moved to some Internal package in Cabal?

Can't you just export them from Cabal and remove the copies in cabal-install?

It looks like I can't use macros (need MIN_VERSION_base) in the Cabal package - is that correct?

Yes, this doesn't work because of bootstrapping. You can do this, however:

#if !defined(VERSION_base)
-- we're bootstrapping, do something that works everywhere
#else

#if MIN_VERSION_base(...)
...
#else
...
#endif

#endif

Or maybe we should add a configure script.

Member

23Skidoo commented May 25, 2013

@nh2

The copying of Semaphore and JobControl from cabal-install is not so nice. Is that the way to go nevertheless or should they be moved to some Internal package in Cabal?

Can't you just export them from Cabal and remove the copies in cabal-install?

It looks like I can't use macros (need MIN_VERSION_base) in the Cabal package - is that correct?

Yes, this doesn't work because of bootstrapping. You can do this, however:

#if !defined(VERSION_base)
-- we're bootstrapping, do something that works everywhere
#else

#if MIN_VERSION_base(...)
...
#else
...
#endif

#endif

Or maybe we should add a configure script.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 26, 2013

Member

Yes, this doesn't work because of bootstrapping. You can do this, however

Good idea, but when we do the something that works everywhere, we will still get the warnings, this time only in one of the two phases.

Or maybe we should add a configure script.

If that would be enough to find out the version of base, that sounds like the better solution. I don't know what the reliable way to find that out is, though.

Member

nh2 commented May 26, 2013

Yes, this doesn't work because of bootstrapping. You can do this, however

Good idea, but when we do the something that works everywhere, we will still get the warnings, this time only in one of the two phases.

Or maybe we should add a configure script.

If that would be enough to find out the version of base, that sounds like the better solution. I don't know what the reliable way to find that out is, though.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 26, 2013

Member

I have another idea - since Cabal only supports building on GHC nowadays, you can use

#if __GLASGOW_HASKELL__ < 700
-- Code that uses block
#else 
-- Code that uses mask
#endif
Member

23Skidoo commented May 26, 2013

I have another idea - since Cabal only supports building on GHC nowadays, you can use

#if __GLASGOW_HASKELL__ < 700
-- Code that uses block
#else 
-- Code that uses mask
#endif
@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 26, 2013

Member

@nh2

We probably want to make parallel jobs a config setting as well, or use the same number as the existing --jobs.

We can make cabal build read the jobs config file setting, but it shouldn't be used when the package is built during the execution of an install plan (since there's no way to limit the number of parallel build jobs from cabal install ATM).

Member

23Skidoo commented May 26, 2013

@nh2

We probably want to make parallel jobs a config setting as well, or use the same number as the existing --jobs.

We can make cabal build read the jobs config file setting, but it shouldn't be used when the package is built during the execution of an install plan (since there's no way to limit the number of parallel build jobs from cabal install ATM).

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 27, 2013

Member

GLASGOW_HASKELL

Nice, pushed that.

Member

nh2 commented May 27, 2013

GLASGOW_HASKELL

Nice, pushed that.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 27, 2013

Member

I haven't rebased on the latest master yet

Just rebased that.

Member

nh2 commented May 27, 2013

I haven't rebased on the latest master yet

Just rebased that.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 28, 2013

Member

My GSoC 2013 project proposal has been accepted.

Member

23Skidoo commented May 28, 2013

My GSoC 2013 project proposal has been accepted.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 28, 2013

Member

Awesome! Let's give this build system another integer factor speedup! :)

Member

nh2 commented May 28, 2013

Awesome! Let's give this build system another integer factor speedup! :)

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 28, 2013

Member

We can make cabal build read the jobs config file setting, but it shouldn't be used when the package is built during the execution of an install plan (since there's no way to limit the number of parallel build jobs from cabal install ATM).

Do you mean with this: When we use install -j and build -j, we get more than n (e.g. n*n) jobs because the two are not coordinated?

Member

nh2 commented May 28, 2013

We can make cabal build read the jobs config file setting, but it shouldn't be used when the package is built during the execution of an install plan (since there's no way to limit the number of parallel build jobs from cabal install ATM).

Do you mean with this: When we use install -j and build -j, we get more than n (e.g. n*n) jobs because the two are not coordinated?

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo May 28, 2013

Member

Do you mean with this: When we use install -j and build -j, we get more than n (e.g. n*n) jobs because the two are not coordinated?

Yes. The plan is to use an OS-level semaphore for this, as outlined above.

Member

23Skidoo commented May 28, 2013

Do you mean with this: When we use install -j and build -j, we get more than n (e.g. n*n) jobs because the two are not coordinated?

Yes. The plan is to use an OS-level semaphore for this, as outlined above.

@tibbe

This comment has been minimized.

Show comment Hide comment
@tibbe

tibbe Aug 7, 2013

Owner

That's what I meant, sounds good. We should use this semaphore here. This way we get parallel profiling lib building for free with install -j.

Owner

tibbe commented Aug 7, 2013

That's what I meant, sounds good. We should use this semaphore here. This way we get parallel profiling lib building for free with install -j.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Aug 7, 2013

Member

@tibbe

That's what I meant, sounds good. We should use this semaphore here. This way we get parallel profiling lib building for free with install -j.

Yes, that's the plan.

Member

23Skidoo commented Aug 7, 2013

@tibbe

That's what I meant, sounds good. We should use this semaphore here. This way we get parallel profiling lib building for free with install -j.

Yes, that's the plan.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 Oct 12, 2013

Member

@23Skidoo I made a prototype at https://github.com/nh2/cabal/compare/build-executables-in-parallel. It would be nice if you could take a look.

I made a pull request for this #1540, rebased on current master. It's much easier to not lose track of things when they are in pull request form.

@23Skidoo Please tell me if you made recent changes that I should make use of there.

Member

nh2 commented Oct 12, 2013

@23Skidoo I made a prototype at https://github.com/nh2/cabal/compare/build-executables-in-parallel. It would be nice if you could take a look.

I made a pull request for this #1540, rebased on current master. It's much easier to not lose track of things when they are in pull request form.

@23Skidoo Please tell me if you made recent changes that I should make use of there.

@tibbe

This comment has been minimized.

Show comment Hide comment
@tibbe

tibbe Mar 5, 2014

Owner

@23Skidoo I believe this is done now right, or are you still waiting to submit your PR?

Owner

tibbe commented Mar 5, 2014

@23Skidoo I believe this is done now right, or are you still waiting to submit your PR?

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Mar 5, 2014

Member

I need to rework #1572; @dcoutts doesn't want to merge it in the current state. I hope to get it into 1.20.

Member

23Skidoo commented Mar 5, 2014

I need to rework #1572; @dcoutts doesn't want to merge it in the current state. I hope to get it into 1.20.

@ttuegel ttuegel added this to the cabal-install-1.24 milestone Apr 23, 2015

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman Apr 30, 2017

Collaborator

@23Skidoo is that also true for new-build?
I'm curious as cabal new-build cabal-install --allow-newer seems not to build in parallel.

Collaborator

angerman commented Apr 30, 2017

@23Skidoo is that also true for new-build?
I'm curious as cabal new-build cabal-install --allow-newer seems not to build in parallel.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Apr 30, 2017

Member

I think it should. If you run it with -v3, it'll show you the ghc invocation string. You can also try new-build -j.

Member

23Skidoo commented Apr 30, 2017

I think it should. If you run it with -v3, it'll show you the ghc invocation string. You can also try new-build -j.

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Apr 30, 2017

Member

On second thought, since new-build also installs dependencies, it probably suffers from the same problem as install -j and doesn't use ghc --make -j.

Member

23Skidoo commented Apr 30, 2017

On second thought, since new-build also installs dependencies, it probably suffers from the same problem as install -j and doesn't use ghc --make -j.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman Apr 30, 2017

Collaborator

I didn't manage to go through the massive amount of output -v3 produces. However looking at the processes, I see a single ghc process running. Where I'd like to see 4 or preferably even 8. Therefore I believe cabal new-build -j does not parallelize :-(

Collaborator

angerman commented Apr 30, 2017

I didn't manage to go through the massive amount of output -v3 produces. However looking at the processes, I see a single ghc process running. Where I'd like to see 4 or preferably even 8. Therefore I believe cabal new-build -j does not parallelize :-(

@23Skidoo

This comment has been minimized.

Show comment Hide comment
@23Skidoo

23Skidoo Apr 30, 2017

Member

Yep, see my second comment. We should open a ticket for making install/new-build -j use ghc --make -j. I have some initial code on this branch: https://github.com/23Skidoo/cabal/commits/num-linker-jobs

Member

23Skidoo commented Apr 30, 2017

Yep, see my second comment. We should open a ticket for making install/new-build -j use ghc --make -j. I have some initial code on this branch: https://github.com/23Skidoo/cabal/commits/num-linker-jobs

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman Apr 30, 2017

Collaborator

@23Skidoo I'm perfectly fine with the idea of opening a second ticket, which details this; and closing this ticket. Please do! I just wanted to note down how this behaved on my system.

Collaborator

angerman commented Apr 30, 2017

@23Skidoo I'm perfectly fine with the idea of opening a second ticket, which details this; and closing this ticket. Please do! I just wanted to note down how this behaved on my system.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 Apr 30, 2017

Member

When doing so, just don't forget that ghc --make -j is horribly inefficient and needs RTS flags (like described on http://trofi.github.io/posts/193-scaling-ghc-make.html) to be of any use (otherwise it will be slower than non-j builds).

Member

nh2 commented Apr 30, 2017

When doing so, just don't forget that ghc --make -j is horribly inefficient and needs RTS flags (like described on http://trofi.github.io/posts/193-scaling-ghc-make.html) to be of any use (otherwise it will be slower than non-j builds).

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman Apr 30, 2017

Collaborator

@nh2 so presumably we want

ghc --make -j +RTS -A256M -qb0 -RTS

@trofi do you agree? Will this eventually be in GHC?

Collaborator

angerman commented Apr 30, 2017

@nh2 so presumably we want

ghc --make -j +RTS -A256M -qb0 -RTS

@trofi do you agree? Will this eventually be in GHC?

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang Apr 30, 2017

Contributor

So, based on this discussion, there seem to be two distinct issues:

  1. My update summary, which is all about a fancy build server design to overcome the scaling problems of ghc --make -j
  2. A much simpler fix, which is to make it so that you can pass -j to ghc when building things in parallel.

(2) should not be hard to implement, although it will be harder to demonstrate that it actually speeds things up (due to the fact that ghc -j scales pretty horribly). Perhaps we should make a separate ticket for it, though I don't think we should close this one.

Contributor

ezyang commented Apr 30, 2017

So, based on this discussion, there seem to be two distinct issues:

  1. My update summary, which is all about a fancy build server design to overcome the scaling problems of ghc --make -j
  2. A much simpler fix, which is to make it so that you can pass -j to ghc when building things in parallel.

(2) should not be hard to implement, although it will be harder to demonstrate that it actually speeds things up (due to the fact that ghc -j scales pretty horribly). Perhaps we should make a separate ticket for it, though I don't think we should close this one.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 1, 2017

Collaborator

If someone outlines how to do this proper, I might take a look at doing so. My main quibble is that "GHC is slow" is a known theme. And if we aggravate this with cabal, we are potentially leaving build performance on the table.

Collaborator

angerman commented May 1, 2017

If someone outlines how to do this proper, I might take a look at doing so. My main quibble is that "GHC is slow" is a known theme. And if we aggravate this with cabal, we are potentially leaving build performance on the table.

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang May 1, 2017

Contributor

@angerman Well, which strategy did you want to do?

Contributor

ezyang commented May 1, 2017

@angerman Well, which strategy did you want to do?

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 1, 2017

Collaborator

@ezyang what's the optimal solution we can hope for here? Turn cabal into using shake (#4174) and then see if how much of the additional build server we need?

@23Skidoo, if I understand correctly might have part of (2) already done?

Collaborator

angerman commented May 1, 2017

@ezyang what's the optimal solution we can hope for here? Turn cabal into using shake (#4174) and then see if how much of the additional build server we need?

@23Skidoo, if I understand correctly might have part of (2) already done?

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang May 1, 2017

Contributor

As I understand it, here is the optimal solution:

  1. Implement GHC buildserver, with a GHC Shake driver that can submit requests for build to the buildserver
  2. Rewrite Cabal in Shake, deferring to the GHC Shake driver for actually building Haskell code
  3. Rewrite cabal-install in Shake, deferring to Cabal Shake driver for actually building a package/component

In the end you have a single Shake build system which knows about parallelism down to the individual Haskell source file level, and can do maximal parallelism.

...I don't expect this to be implemented any time soon.

Contributor

ezyang commented May 1, 2017

As I understand it, here is the optimal solution:

  1. Implement GHC buildserver, with a GHC Shake driver that can submit requests for build to the buildserver
  2. Rewrite Cabal in Shake, deferring to the GHC Shake driver for actually building Haskell code
  3. Rewrite cabal-install in Shake, deferring to Cabal Shake driver for actually building a package/component

In the end you have a single Shake build system which knows about parallelism down to the individual Haskell source file level, and can do maximal parallelism.

...I don't expect this to be implemented any time soon.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 1, 2017

Collaborator

I believe those three items can be built in separate steps with increasing productivity with each additional layer?

Could cabal produce a cmake (or make) file instead and feed that into build system?
I believe that's what ghc-cabal does to some extend?

Collaborator

angerman commented May 1, 2017

I believe those three items can be built in separate steps with increasing productivity with each additional layer?

Could cabal produce a cmake (or make) file instead and feed that into build system?
I believe that's what ghc-cabal does to some extend?

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang May 1, 2017

Contributor

I believe those three items can be built in separate steps with increasing productivity with each additional layer?

Yes.

Could cabal produce a cmake (or make) file instead and feed that into build system?
I believe that's what ghc-cabal does to some extend?

Well, depends on what you mean by make. ghc -M knows how to create a Makefile for building Haskell. Cabal itself, in principle, knows about dependencies between components in a package, but this information isn't reified anywhere in the code today. Plus, a large amount of the processing is done by Haskell code in process, so your Makefile wouldn't have anything to call.

Contributor

ezyang commented May 1, 2017

I believe those three items can be built in separate steps with increasing productivity with each additional layer?

Yes.

Could cabal produce a cmake (or make) file instead and feed that into build system?
I believe that's what ghc-cabal does to some extend?

Well, depends on what you mean by make. ghc -M knows how to create a Makefile for building Haskell. Cabal itself, in principle, knows about dependencies between components in a package, but this information isn't reified anywhere in the code today. Plus, a large amount of the processing is done by Haskell code in process, so your Makefile wouldn't have anything to call.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 1, 2017

Collaborator

@ezyang
Great! So one could start with the GHC build server for example and get somewhere.

What I'm wondering is if cabal, which has the notion of packages, targets and corresponding Modules/Files, (and flags), could generate a Makefile (or CMakeLists.txt -- assuming there was some plumbing for haskell), that exposed those targets.

And at least in the case of CMake, could be used to generate ninja files, which then in turn could be compiled using ninja as a build system, or even shake (which as far as I understand, can read ninja files).

This brings another question up. I believe there is some ghc --make powered by shake floating around somewhere, would investing to get that into ghc proper, help us with the -j scaling?

Collaborator

angerman commented May 1, 2017

@ezyang
Great! So one could start with the GHC build server for example and get somewhere.

What I'm wondering is if cabal, which has the notion of packages, targets and corresponding Modules/Files, (and flags), could generate a Makefile (or CMakeLists.txt -- assuming there was some plumbing for haskell), that exposed those targets.

And at least in the case of CMake, could be used to generate ninja files, which then in turn could be compiled using ninja as a build system, or even shake (which as far as I understand, can read ninja files).

This brings another question up. I believe there is some ghc --make powered by shake floating around somewhere, would investing to get that into ghc proper, help us with the -j scaling?

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang May 1, 2017

Contributor

What I'm wondering is if cabal, which has the notion of packages, targets and corresponding Modules/Files, (and flags), could generate a Makefile (or CMakeLists.txt -- assuming there was some plumbing for haskell), that exposed those targets.

The big problem is that many operations which need to be done while building can't be characterized as just "run this and that command." There's a lot of Haskell code that gets run during a build, that needs to get run, and is not exposed as a command in any way. Shake has a similar problem: it's more expressive than ninja, so you can't take a Shake build system and turn it into ninja.

I believe there is some ghc --make powered by shake floating around somewhere, would investing to get that into ghc proper, help us with the -j scaling?

There are two. You have https://github.com/ndmitchell/ghc-make which is implemented by calling ghc -c (you get parallelism but it is slower than --make sequentially), and https://github.com/ezyang/ghc-shake which uses the GHC API and cannot be parallelized out of process.

Neither of these can be put into GHC because they depend on Shake and GHC does not want to take Shake on as a boot library at this time.

Contributor

ezyang commented May 1, 2017

What I'm wondering is if cabal, which has the notion of packages, targets and corresponding Modules/Files, (and flags), could generate a Makefile (or CMakeLists.txt -- assuming there was some plumbing for haskell), that exposed those targets.

The big problem is that many operations which need to be done while building can't be characterized as just "run this and that command." There's a lot of Haskell code that gets run during a build, that needs to get run, and is not exposed as a command in any way. Shake has a similar problem: it's more expressive than ninja, so you can't take a Shake build system and turn it into ninja.

I believe there is some ghc --make powered by shake floating around somewhere, would investing to get that into ghc proper, help us with the -j scaling?

There are two. You have https://github.com/ndmitchell/ghc-make which is implemented by calling ghc -c (you get parallelism but it is slower than --make sequentially), and https://github.com/ezyang/ghc-shake which uses the GHC API and cannot be parallelized out of process.

Neither of these can be put into GHC because they depend on Shake and GHC does not want to take Shake on as a boot library at this time.

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 1, 2017

Member

I am a bit confused by the mention of ghc -M. Is that even an option?

To my knowledge, ghc -M cannot reliably compile all Haskell projects in the same way as ghc --make can do.

None of ghc-make, ghc-parmake can by themselves provide reliable compilation based on ghc -M as a Makefile is not expressive enough to do the up-to-date checks that ghc --make does.


Is this discussion still about how to combine package-level -j and module-level -j?

With the difficulty being that passing -j4 to both would create 16 threads in the worst case and thus creating overhead?

If yes, implementing a ghc build server (while useful) to solve this problem sounds way out of scope to get some basic solution going in the short term.

Here's a commonly used alternative: make supports two flags, -j and -l. The -l flag (--load-average)

Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-point number). With no argument, removes a previous load limit.

Many build systems that call make recursively use this functionality to solve the same problem we have here; they use make -j4 -l4, to ensure that globally no more than roughly 4 CPU cores are fully used.

How about an -l-style flag for GHC? Then cabal can always call ghc -j or ghc -j4 while also compiling multiple packages itself. This would even be maximally efficient, because cabal-level -j is more efficient than ghc-level -j, and a make -l style approach would always result in full cabal-level parallelisation wen parallelisation is possible at both levels.

Member

nh2 commented May 1, 2017

I am a bit confused by the mention of ghc -M. Is that even an option?

To my knowledge, ghc -M cannot reliably compile all Haskell projects in the same way as ghc --make can do.

None of ghc-make, ghc-parmake can by themselves provide reliable compilation based on ghc -M as a Makefile is not expressive enough to do the up-to-date checks that ghc --make does.


Is this discussion still about how to combine package-level -j and module-level -j?

With the difficulty being that passing -j4 to both would create 16 threads in the worst case and thus creating overhead?

If yes, implementing a ghc build server (while useful) to solve this problem sounds way out of scope to get some basic solution going in the short term.

Here's a commonly used alternative: make supports two flags, -j and -l. The -l flag (--load-average)

Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-point number). With no argument, removes a previous load limit.

Many build systems that call make recursively use this functionality to solve the same problem we have here; they use make -j4 -l4, to ensure that globally no more than roughly 4 CPU cores are fully used.

How about an -l-style flag for GHC? Then cabal can always call ghc -j or ghc -j4 while also compiling multiple packages itself. This would even be maximally efficient, because cabal-level -j is more efficient than ghc-level -j, and a make -l style approach would always result in full cabal-level parallelisation wen parallelisation is possible at both levels.

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 1, 2017

Collaborator

@angerman

The big problem is that many operations which need to be done while building can't be characterized as just "run this and that command."

There are also two big reasons why build steps aren't worth exposing.

  • if your (c)make-based build has a GHC process for source file, each GHC process will spend time to reload interfaces written by other files, as mentioned.
  • Worse, if Foo.hs changes, but the new Foo.hi has the same content, you don't want to rebuild Bar.hs. At least with Make, that's extra logic to add to replace the base building block of Makefiles.
Collaborator

Blaisorblade commented May 1, 2017

@angerman

The big problem is that many operations which need to be done while building can't be characterized as just "run this and that command."

There are also two big reasons why build steps aren't worth exposing.

  • if your (c)make-based build has a GHC process for source file, each GHC process will spend time to reload interfaces written by other files, as mentioned.
  • Worse, if Foo.hs changes, but the new Foo.hi has the same content, you don't want to rebuild Bar.hs. At least with Make, that's extra logic to add to replace the base building block of Makefiles.
@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 1, 2017

Member

I forgot to mention in #976 (comment):

The approach I described is exactly how nix avoids the same problem as we have here: nix itself can build multiple packages (say C projects) at once, and each package itself can be compiled with make -j.

Member

nh2 commented May 1, 2017

I forgot to mention in #976 (comment):

The approach I described is exactly how nix avoids the same problem as we have here: nix itself can build multiple packages (say C projects) at once, and each package itself can be compiled with make -j.

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 1, 2017

Collaborator

@nh2 Didn't know about -l sounds very interesting, and could be easier to implement (at least on *x)!
But one still risks starting too many long-lived tasks at once, when the load average is still low. Still, it might be worth studying its performance in practice. Is there any benchmark showing speedups in the Nix scenario? It's not the exact same load, but it's already implemented :-)

Collaborator

Blaisorblade commented May 1, 2017

@nh2 Didn't know about -l sounds very interesting, and could be easier to implement (at least on *x)!
But one still risks starting too many long-lived tasks at once, when the load average is still low. Still, it might be worth studying its performance in practice. Is there any benchmark showing speedups in the Nix scenario? It's not the exact same load, but it's already implemented :-)

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 1, 2017

Member

@Blaisorblade The tasks should not be long-lived (at the module-level); when implementing -l, ghc should check the load every time a module finishes or wants to start. Most operating systems can do very well when the number of tasks is higher than the number of processes; much better than ghc can indeed. The only time it is a problem is when you are very memory constrained, e.g. when even temporarily compiling 16 modules (over 4 processes) immediately makes you go out of memory. If you have such a system, you would most likely not want full -j anyway; if you do, there are other mitigations, such as waiting a short bit before starting the next job in order to get an accurate load number.

In any case, the non-Haskell world seems to have done pretty well with -l (although in general it also doesn't have compilers that take GBs of memory; with the exception of C++ maybe).

Member

nh2 commented May 1, 2017

@Blaisorblade The tasks should not be long-lived (at the module-level); when implementing -l, ghc should check the load every time a module finishes or wants to start. Most operating systems can do very well when the number of tasks is higher than the number of processes; much better than ghc can indeed. The only time it is a problem is when you are very memory constrained, e.g. when even temporarily compiling 16 modules (over 4 processes) immediately makes you go out of memory. If you have such a system, you would most likely not want full -j anyway; if you do, there are other mitigations, such as waiting a short bit before starting the next job in order to get an accurate load number.

In any case, the non-Haskell world seems to have done pretty well with -l (although in general it also doesn't have compilers that take GBs of memory; with the exception of C++ maybe).

@ezyang

This comment has been minimized.

Show comment Hide comment
@ezyang

ezyang May 1, 2017

Contributor

None of ghc-make, ghc-parmake can by themselves provide reliable compilation based on ghc -M as a Makefile is not expressive enough to do the up-to-date checks that ghc --make does.

I'll remark that ghc-shake, by virtue of using the GHC API, has as precise recompilation checking as ghc --make.

-l seems like a reasonable idea, assuming that you don't mind this new feature not working unless you're using GHC 8.4 ;)

Contributor

ezyang commented May 1, 2017

None of ghc-make, ghc-parmake can by themselves provide reliable compilation based on ghc -M as a Makefile is not expressive enough to do the up-to-date checks that ghc --make does.

I'll remark that ghc-shake, by virtue of using the GHC API, has as precise recompilation checking as ghc --make.

-l seems like a reasonable idea, assuming that you don't mind this new feature not working unless you're using GHC 8.4 ;)

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 1, 2017

Collaborator

Operationally, it's enough if -l is convincing enough for somebody to volunteer implementing it (sadly, not me) ;-) I'm still 👍. https://www.preney.ca/paul/archives/341 also recommends it for Gentoo...

Anyway, while we're trying to anticipate how good it's going to be:

The tasks should not be long-lived (at the module-level); when implementing -l, ghc should check the load every time a module finishes or wants to start.

I meant that some modules take longer to build...

The only time it is a problem is when you are very memory constrained

You also had IO, but with SSDs we can probably ignore that. Based on e.g. http://stackoverflow.com/a/17749621/53974 you're essentially right—build times increase only a tiny bit for higher -j.

OTOH, I'd guess we are memory constrained when you rebuild ~50 packages (say, a fresh Agda build).

In any case, the non-Haskell world seems to have done pretty well with -l (although in general it also doesn't have compilers that take GBs of memory; with the exception of C++ maybe).

IIRC C++ has also files taking longer to build, so maybe C++ builds could be a better benchmark than C, if somebody looks for results/tries it out...

Collaborator

Blaisorblade commented May 1, 2017

Operationally, it's enough if -l is convincing enough for somebody to volunteer implementing it (sadly, not me) ;-) I'm still 👍. https://www.preney.ca/paul/archives/341 also recommends it for Gentoo...

Anyway, while we're trying to anticipate how good it's going to be:

The tasks should not be long-lived (at the module-level); when implementing -l, ghc should check the load every time a module finishes or wants to start.

I meant that some modules take longer to build...

The only time it is a problem is when you are very memory constrained

You also had IO, but with SSDs we can probably ignore that. Based on e.g. http://stackoverflow.com/a/17749621/53974 you're essentially right—build times increase only a tiny bit for higher -j.

OTOH, I'd guess we are memory constrained when you rebuild ~50 packages (say, a fresh Agda build).

In any case, the non-Haskell world seems to have done pretty well with -l (although in general it also doesn't have compilers that take GBs of memory; with the exception of C++ maybe).

IIRC C++ has also files taking longer to build, so maybe C++ builds could be a better benchmark than C, if somebody looks for results/tries it out...

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 1, 2017

Member

OTOH, I'd guess we are memory constrained when you rebuild ~50 packages (say, a fresh Agda build).

@Blaisorblade I don't get it, does the number of packages matter?

Just to be explicit, the way I imagine it to work is this:

  • Cabal sees you have 4 cores.
  • Cabal starts building 4 packages in parallel.
  • So Cabal invokes ghc -j4 -l4, 4 times.
  • Each of the ghcs checks the current load before/after every build step (module).
  • Thus in the worst case, you are initially compiling 16 modules at the same time.
  • As soon as the some ghc completes building its first module, it'll see that the load is > 4 and spawn no replacement for the completed module.
  • As a result, the number of modules compiled globally quickly converges from 16 to 4; as soon as the 17th overall module starts compilation, we can be quite sure that there are only 4 modules building in total.

This seems to be the case no matter if you have 4 packages or 50.

Member

nh2 commented May 1, 2017

OTOH, I'd guess we are memory constrained when you rebuild ~50 packages (say, a fresh Agda build).

@Blaisorblade I don't get it, does the number of packages matter?

Just to be explicit, the way I imagine it to work is this:

  • Cabal sees you have 4 cores.
  • Cabal starts building 4 packages in parallel.
  • So Cabal invokes ghc -j4 -l4, 4 times.
  • Each of the ghcs checks the current load before/after every build step (module).
  • Thus in the worst case, you are initially compiling 16 modules at the same time.
  • As soon as the some ghc completes building its first module, it'll see that the load is > 4 and spawn no replacement for the completed module.
  • As a result, the number of modules compiled globally quickly converges from 16 to 4; as soon as the 17th overall module starts compilation, we can be quite sure that there are only 4 modules building in total.

This seems to be the case no matter if you have 4 packages or 50.

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 1, 2017

Collaborator

You're right, 50 packages shouldn't affect the number of processes. Need more coffee.
Regarding load average: on Linux uptime gives the one-minute average, so you'd have some delay (1/n minutes, where n is your example). Or do you have some other stat in mind?
http://stackoverflow.com/a/21621256/53974

Collaborator

Blaisorblade commented May 1, 2017

You're right, 50 packages shouldn't affect the number of processes. Need more coffee.
Regarding load average: on Linux uptime gives the one-minute average, so you'd have some delay (1/n minutes, where n is your example). Or do you have some other stat in mind?
http://stackoverflow.com/a/21621256/53974

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 1, 2017

Member

It seems that GNU make indeed uses the 1-minute load average:

http://git.savannah.gnu.org/cgit/make.git/tree/getloadavg.c?h=4.2.1#n491

/* Put the 1 minute, 5 minute and 15 minute load averages
   into the first NELEM elements of LOADAVG.
   Return the number written (never more than 3, but may be less than NELEM),
   or -1 if an error occurred.  */

int
getloadavg (double loadavg[], int nelem)
{

Used here: http://git.savannah.gnu.org/cgit/make.git/tree/job.c?h=4.2.1#n1945

However, since Linux's load average is an exponentially weighted average instead of a real average, this might still be a non-issue. For example, stress-ng --cpu 16 on my 4-core machine brings the 1-minute load time in htop above for after already 3 seconds or so.

Member

nh2 commented May 1, 2017

It seems that GNU make indeed uses the 1-minute load average:

http://git.savannah.gnu.org/cgit/make.git/tree/getloadavg.c?h=4.2.1#n491

/* Put the 1 minute, 5 minute and 15 minute load averages
   into the first NELEM elements of LOADAVG.
   Return the number written (never more than 3, but may be less than NELEM),
   or -1 if an error occurred.  */

int
getloadavg (double loadavg[], int nelem)
{

Used here: http://git.savannah.gnu.org/cgit/make.git/tree/job.c?h=4.2.1#n1945

However, since Linux's load average is an exponentially weighted average instead of a real average, this might still be a non-issue. For example, stress-ng --cpu 16 on my 4-core machine brings the 1-minute load time in htop above for after already 3 seconds or so.

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 2, 2017

Collaborator

However, since Linux's load average is an exponentially weighted average instead of a real average, this might still be a non-issue.

Good point, didn't know that. 👍 Googling leads to http://juliano.info/en/Blog:Memory_Leak/Understanding_the_Linux_load_average confirming and elaborating.

To sum up: what you proposed thinking is close to -j16 -l 4 rather than -j4 -l 4. You gave reasons to believe it should work, including prior art from Nix. We've debated potential costs and this still seems a very promising idea. Who'd volunteer to champion this in GHC?

Collaborator

Blaisorblade commented May 2, 2017

However, since Linux's load average is an exponentially weighted average instead of a real average, this might still be a non-issue.

Good point, didn't know that. 👍 Googling leads to http://juliano.info/en/Blog:Memory_Leak/Understanding_the_Linux_load_average confirming and elaborating.

To sum up: what you proposed thinking is close to -j16 -l 4 rather than -j4 -l 4. You gave reasons to believe it should work, including prior art from Nix. We've debated potential costs and this still seems a very promising idea. Who'd volunteer to champion this in GHC?

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 2, 2017

Collaborator

To sum up: what you proposed thinking is close to -j16 -l 4 rather than -j4 -l 4. You gave reasons to believe it should work, including prior art from Nix. We've debated potential costs and this still seems a very promising idea. Who'd volunteer to champion this in GHC?

Unless someone else feels like it, I can give this a shot. Seems like it shouldn't be too hard. But who knows.

Collaborator

angerman commented May 2, 2017

To sum up: what you proposed thinking is close to -j16 -l 4 rather than -j4 -l 4. You gave reasons to believe it should work, including prior art from Nix. We've debated potential costs and this still seems a very promising idea. Who'd volunteer to champion this in GHC?

Unless someone else feels like it, I can give this a shot. Seems like it shouldn't be too hard. But who knows.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 2, 2017

Collaborator

Looking at that getloadavg function in make, does anyone know of a library that provides those information sufficiently enough for ghc's use case? I'd rather not copy those 500 lines of code, if there is a simpler way.

Collaborator

angerman commented May 2, 2017

Looking at that getloadavg function in make, does anyone know of a library that provides those information sufficiently enough for ghc's use case? I'd rather not copy those 500 lines of code, if there is a simpler way.

@Blaisorblade

This comment has been minimized.

Show comment Hide comment
@Blaisorblade

Blaisorblade May 2, 2017

Collaborator
  1. This code is probably what you want (found through Hayoo). You might need to copy-paste it eventually
    http://hackage.haskell.org/package/loadavg-0.1/docs/System-Posix-LoadAvg.html

    For bonus points, it uses getloadavg (see man 3 getloadavg), which has the same API on Linux and OS X (and comes from BSD 4.3 so should work on other BSDs).

  2. Those 500 lines for Make support more obscure platforms (half of which aren't supported by GHC anyway), and seem to not support either OS X or Windows anyway.

Collaborator

Blaisorblade commented May 2, 2017

  1. This code is probably what you want (found through Hayoo). You might need to copy-paste it eventually
    http://hackage.haskell.org/package/loadavg-0.1/docs/System-Posix-LoadAvg.html

    For bonus points, it uses getloadavg (see man 3 getloadavg), which has the same API on Linux and OS X (and comes from BSD 4.3 so should work on other BSDs).

  2. Those 500 lines for Make support more obscure platforms (half of which aren't supported by GHC anyway), and seem to not support either OS X or Windows anyway.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 3, 2017

Collaborator

Upon a suggestion by @ezyang, I tried making ghc be invoked with -j.

For the record: timings by just forwarding -j to ghc, when building cabal-install

cabal new-build cabal-install --allow-newer  780.90s user 12.66s system 98% cpu 13:27.41 total
cabal new-build cabal-install --allow-newer -j  784.95s user 14.14s system 98% cpu 13:31.16 total
cabal new-build cabal-install --allow-newer -j --ghc-options=-j  1216.44s user 265.02s system 359% cpu 6:52.49 total

We can even do a bit better, by providing the RTS options @trofi suggested:

cabal new-build cabal-install --allow-newer -j --ghc-options="-j +RTS -A256M -qb0 -RTS"  972.75s user 22.43s system 281% cpu 5:53.41 total

For another package (aeson-lens), the numbers improve not so much:

cabal new-build . --allow-newer  671.26s user 23.99s system 160% cpu 7:11.88 total
cabal new-build . --allow-newer -j --ghc-options="-j +RTS -A256M -qb0 -RTS"  735.81s user 185.04s system 273% cpu 5:36.41 total
Collaborator

angerman commented May 3, 2017

Upon a suggestion by @ezyang, I tried making ghc be invoked with -j.

For the record: timings by just forwarding -j to ghc, when building cabal-install

cabal new-build cabal-install --allow-newer  780.90s user 12.66s system 98% cpu 13:27.41 total
cabal new-build cabal-install --allow-newer -j  784.95s user 14.14s system 98% cpu 13:31.16 total
cabal new-build cabal-install --allow-newer -j --ghc-options=-j  1216.44s user 265.02s system 359% cpu 6:52.49 total

We can even do a bit better, by providing the RTS options @trofi suggested:

cabal new-build cabal-install --allow-newer -j --ghc-options="-j +RTS -A256M -qb0 -RTS"  972.75s user 22.43s system 281% cpu 5:53.41 total

For another package (aeson-lens), the numbers improve not so much:

cabal new-build . --allow-newer  671.26s user 23.99s system 160% cpu 7:11.88 total
cabal new-build . --allow-newer -j --ghc-options="-j +RTS -A256M -qb0 -RTS"  735.81s user 185.04s system 273% cpu 5:36.41 total
@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 3, 2017

Collaborator

After giving this some thought and looking into ghc. I wonder if we want cabal -l rather than ghc -l. Compared with make, that would be where -l is passed as well. We don't pass -l to the downstream compilers usually.

@hvr suggested -j <n>:<m> for cabal, were <n> is package level, and <m> is ghc-level and passed to ghc -j.

Thus, I'd suggest the following:

  • Add -l to cabal
  • Allow -j <n>:<m>.

There still the question how to handle -j <n> if we allow it. Would it be -j <n>:<n> or -j <n>:<1> or -j <1>:<n>?

Collaborator

angerman commented May 3, 2017

After giving this some thought and looking into ghc. I wonder if we want cabal -l rather than ghc -l. Compared with make, that would be where -l is passed as well. We don't pass -l to the downstream compilers usually.

@hvr suggested -j <n>:<m> for cabal, were <n> is package level, and <m> is ghc-level and passed to ghc -j.

Thus, I'd suggest the following:

  • Add -l to cabal
  • Allow -j <n>:<m>.

There still the question how to handle -j <n> if we allow it. Would it be -j <n>:<n> or -j <n>:<1> or -j <1>:<n>?

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 3, 2017

Member

After giving this some thought and looking into ghc. I wonder if we want cabal -l rather than ghc -l. Compared with make, that would be where -l is passed as well. We don't pass -l to the downstream compilers usually.

@angerman I don't understand this. Sure you wouldn't pass -l to gcc. But we wouldn't pass -l to ghc-the-compiler, we would pass -l to ghc --make-the-parallelizing-build-system.

The lowest level component that spawn threads must have the -l option; in our case that's ghc --make.

Of course Cabal can have a user-exposed -l option as well in addition, but without ghc having it, it can't work.

Member

nh2 commented May 3, 2017

After giving this some thought and looking into ghc. I wonder if we want cabal -l rather than ghc -l. Compared with make, that would be where -l is passed as well. We don't pass -l to the downstream compilers usually.

@angerman I don't understand this. Sure you wouldn't pass -l to gcc. But we wouldn't pass -l to ghc-the-compiler, we would pass -l to ghc --make-the-parallelizing-build-system.

The lowest level component that spawn threads must have the -l option; in our case that's ghc --make.

Of course Cabal can have a user-exposed -l option as well in addition, but without ghc having it, it can't work.

@angerman

This comment has been minimized.

Show comment Hide comment
@angerman

angerman May 4, 2017

Collaborator

After discussing this with @ezyang in #hackage for a bit, I don't think I'm going to try and vedge -l into ghc. My understanding is that ghc starts with <n> capabilities and then distributes work among them. Where work is basically a thread per module, and the the threads form a dependency graph. Thus if we wanted to check against the load average before starting a new thread (or dynamically shrinking / widening the capabilities), I don't see a trivial solution to this; and I don't intend to layer much more complexity onto ghc, as this would be a stop gap measure until we ghcd, as @ezyang suggested. I'd argue that time is better spent on ghcd than trying to wedge -l into ghc.

If we started to check again -l at the beginning when setting up capabilities, this would be identical to adding -l to cabal as far as I can see. If this renders no benefits as @nh2 points out, we might just have to accept that -l is not an easy option with the way ghc --make is designed.

I'm open to alternative ideas!

Collaborator

angerman commented May 4, 2017

After discussing this with @ezyang in #hackage for a bit, I don't think I'm going to try and vedge -l into ghc. My understanding is that ghc starts with <n> capabilities and then distributes work among them. Where work is basically a thread per module, and the the threads form a dependency graph. Thus if we wanted to check against the load average before starting a new thread (or dynamically shrinking / widening the capabilities), I don't see a trivial solution to this; and I don't intend to layer much more complexity onto ghc, as this would be a stop gap measure until we ghcd, as @ezyang suggested. I'd argue that time is better spent on ghcd than trying to wedge -l into ghc.

If we started to check again -l at the beginning when setting up capabilities, this would be identical to adding -l to cabal as far as I can see. If this renders no benefits as @nh2 points out, we might just have to accept that -l is not an easy option with the way ghc --make is designed.

I'm open to alternative ideas!

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 May 4, 2017

Member

If there are n threads (e.g. forkIO) set up at the beginning, the -l concept should still work.

There will be some mechanism in the build system that determines when the next module can start to build.

I imagine the load check should simply go into that place (as opposed to grow or shrink the number of threads, the threads would still be there, but be blocked).

Member

nh2 commented May 4, 2017

If there are n threads (e.g. forkIO) set up at the beginning, the -l concept should still work.

There will be some mechanism in the build system that determines when the next module can start to build.

I imagine the load check should simply go into that place (as opposed to grow or shrink the number of threads, the threads would still be there, but be blocked).

@treeowl

This comment has been minimized.

Show comment Hide comment
@treeowl

treeowl Dec 19, 2017

Member

Is the speed at which GHC can read interface files a bottleneck? How hard might it be to fix that?

Member

treeowl commented Dec 19, 2017

Is the speed at which GHC can read interface files a bottleneck? How hard might it be to fix that?

@nh2

This comment has been minimized.

Show comment Hide comment
@nh2

nh2 Dec 22, 2017

Member

Is the speed at which GHC can read interface files a bottleneck?

@treeowl I'm not sure if that is known.

I've advertised in some other place that GHC, for the various parts of its build pipeline, should record CPU and wall time and be able to produce a report (e.g. "N seconds CPU/wall were spent on reading and decoding interface files). That way we could more easily pinpoint where bottlenecks are. Right now GHC does things time counting and reporting only for optimiser phases, not for any of the "more basic tech" bits.

Member

nh2 commented Dec 22, 2017

Is the speed at which GHC can read interface files a bottleneck?

@treeowl I'm not sure if that is known.

I've advertised in some other place that GHC, for the various parts of its build pipeline, should record CPU and wall time and be able to produce a report (e.g. "N seconds CPU/wall were spent on reading and decoding interface files). That way we could more easily pinpoint where bottlenecks are. Right now GHC does things time counting and reporting only for optimiser phases, not for any of the "more basic tech" bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment