Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GNU make jobserver client support #1139

Open
stefanb2 opened this issue Apr 27, 2016 · 44 comments · May be fixed by #1140

Comments

@stefanb2
Copy link
Contributor

commented Apr 27, 2016

As long as ninja is the only build execution tool, the current ninja -jN implementation works fine.

But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:

  • top-level GNU Make (with -jX, acts as job server)
  • M instances of GNU make (with -j, act as job server clients)
  • N instances of ninja (don't know anything about job server)

Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.

It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.

stefanb2 added a commit to stefanb2/ninja that referenced this issue Apr 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@stefanb2

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2016

I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:

  • use case A: top-level GNU make, 1 ninja instance, M-1 GNU make instances
  • use case B: top-level GNU make, N ninja instances, M-N GNU make instances

FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file.

@nico

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2016

Thanks for the patch!

We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.)

Is your use case that you have recursive makefiles?

@stefanb2

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2016

I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation.

Here are my reasons for requesting this:

  1. recursion: kati currently can't translate recursive GNU make based build systems, like Linux kernel kbuild. IMHO a major effort and unfortunately I can't wait for kati to provide this, hence the such sub-component builds will have to stay with GNU make for the time being.
  2. missing features: kati currently can't translate fully modularized GNU make based build systems, i.e. where each component is built in isolation and in a separate build directory, so that all ninja.build files could be merged into a single one. While IMHO not such a major issue as (1) it is much simpler to replace the lowest-level $(MAKE) recipe with a kati/ninja recipe. Parsing + merging might also introduce unnecessary build delay (needs to be seen what would happen in real life)
  3. technical barriers: e.g. sub-component builds that run behind a "chroot firewall". Even if everything moves to Ninja, you would still need 1 (main) + N (one for each chroot) ninja instances that need to cooperate. Ninja doesn't offer anything like that.
  4. too simple workarounds: AOSP makeparallel + kati/ninja runs all $(MAKE) instances hard-coded with "make -j4" with no cooperation between any of the GNU make instances. That is only acceptable if you have no or only a few or small $(MAKE) invocations from the ninja.build file.
  5. organizational barriers: even if it might be possible to use kati/ninja to convert an existing GNU make base sub-part of the system, you might not be allowed to do so. Such sub-component builds need to stay with GNU make.
  6. You ask: why not split the build up and run them as separate builds? Goto (5)...

IMHO my patch provides a good solution, considering

  • how small the required changes to ninja are,
  • that the default behaviour is completely unchanged, and
  • that this will make the life easier for many other ninja users which face the same issues
@ghost

This comment has been minimized.

Copy link

commented May 23, 2016

wow +1

stefanb2 added a commit to stefanb2/ninja that referenced this issue May 23, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 25, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 25, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 26, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 28, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 added a commit to stefanb2/ninja that referenced this issue May 30, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@maximuska

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2016

Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16.

@fabio-porcedda

This comment has been minimized.

Copy link

commented Mar 10, 2017

I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that.

@stefanb2 Thanks a lot for your work

@dublet

This comment has been minimized.

Copy link

commented Apr 12, 2017

Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet.

@glandium

This comment has been minimized.

Copy link

commented May 26, 2017

Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup.

@stefanb2

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2017

@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request.

That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably tokenpool-gnu-make.cc could be refactored into system agnostic and UNIX-dependent bits.

stefanb2 added a commit to stefanb2/ninja that referenced this issue Nov 7, 2017
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@nox

This comment has been minimized.

Copy link

commented Nov 11, 2017

This would be really useful too when invoking ninja as part of another build tool, such as cargo.

@comicfans

This comment has been minimized.

Copy link

commented Nov 12, 2017

This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem.

@xqms

This comment has been minimized.

Copy link

commented Dec 6, 2017

+1 - this is highly interesting for parallel builds with catkin_tools (https://catkin-tools.readthedocs.io/en/latest/). A catkin_tools workspace consists of separate CMake projects which are built in isolation. To control the CPU consumption of parallel make runs, catkin_tools contains a GNU Make jobserver implementation.
In this way, the make jobserver is starting to become a standard "protocol" for controlling resource consumption of parallel builds.

Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because

  • Targets/individual rules will clash - would need CMake changes to keep them apart.
  • We would need some way of encoding inter-package dependencies (build this subninja before that).
  • catkin_tools needs to perform additional installation steps after a package has been built.
  • Also, catkin_tools provides many nice features which would be defeated by a merged build (package-level monitoring, build output grouped by packages, ...).
@yann-morin-1998

This comment has been minimized.

Copy link

commented Jan 6, 2018

@nico I would like to add my voice to having support for GNu make job-server support in ninja.

Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others,
are tasked with generating systems by building a lot of various packages from various sources,
all using various buildsystems. I'll mostly use Buildroot as an example, as I'm very familiar with
it, but the following is in principle applicable to all the buildsystems as well.

Such build systems will typically have this sequence per package they build:

  1. download sources of a package
  2. extract the sources
  3. configure the package
  4. build the package
  5. install it in a staging location

And they will repeat that sequence for each and all packages that are needed to build the
target system:

  1. build busybox
  2. build coreutils
  3. build foo
  4. build bar
  5. etc...

Once all packages have been built and installed in the staging location, a system image
(e.g. a bootloader + Linux Kernel + root filesystem for example) is generated from that
staging location. That system image can the be directly flashed onto a device.

Now, that was the quick overview.

Since a system can be made of a lot of packages, we want to build as many packages in
parallel (respecting a depndency chain, of course). But then for each package, we also want
to take advantage of parallel compilation, in case no other package is being built at the same
time.

So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means
we have to distribute those jobs to the various packages that need to be built at some point in
time, so that we maximie the number of jobs, but do not over-shoot the 8-CPU limit.

For example, if 8 ninja-based packages are built in parallel and they do not share a job-server,
they will each be building 8 jobs, which is a total of 64 parallel jobs. On the other hand, limiting
the ninja builds to a single job will be a waste of time when only a single package is built at some
point in time (e.g. becasue the other ones have already finished building, or because the
dependency chain needs that one package before continuing).

And as has been already explained in previous posts in this thread, not every package is based
on ninja, and not every package is even conceivably switchable to ninja. And even if every packages
were using ninja, we can't simply aggregate all the ninja definitions to have a super-build, because
eveything would end up clashing with everything else... So we still need to be able to cooperate with
the rest of the world, especially when that rest of the world has been established for decades now... ;-)

Thanks for reading so far! :-)

nashif added a commit to nashif/zephyr that referenced this issue Mar 6, 2018
This reverts commit 0e6689d.

Parallel builds are broken due to a mix of Make/Ninja and the job server
not being operational.

See ninja-build/ninja#1139

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
@ihnorton

This comment has been minimized.

Copy link

commented Mar 14, 2018

+1. We also face this issue of Y*N ninjas while using CMake ExternalProject functionality.

nashif added a commit to zephyrproject-rtos/zephyr that referenced this issue Mar 20, 2018
This reverts commit 0e6689d.

Parallel builds are broken due to a mix of Make/Ninja and the job server
not being operational.

See ninja-build/ninja#1139

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
GiulianoFranchetto added a commit to GiulianoFranchetto/zephyr that referenced this issue Mar 26, 2018
This reverts commit 0e6689d.

Parallel builds are broken due to a mix of Make/Ninja and the job server
not being operational.

See ninja-build/ninja#1139

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
@nox

This comment has been minimized.

Copy link

commented Oct 15, 2018

@nico For our use case in Rust land, we have Cargo invoke ninja in a build script when building some crate. There is literally no one who want to make ninja be the top-level build tool there, hence why we need job server support.

@wouterklouwen-youview

This comment has been minimized.

Copy link

commented Oct 15, 2018

In our case we are building over 100 open source packages, including autoconf and automake. It seems unlikely they'll be converted to build with ninja.

@noseglasses

This comment has been minimized.

Copy link

commented Dec 10, 2018

I am facing the same problems that all the others already stated. Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad. It would be so cool to benefit from Ninja's lightning speed in re-builds.

In our case, the missing job server client capability is all that needs to be added Ninja. Wrapped by a dummy GNU/make process that simply supplies the job server, Ninja could serve as the actual top level build system, thus allowing for much faster rebuilds. Of course it would be even nicer if Ninja would be able to act as job server.

@myfreeweb

This comment has been minimized.

Copy link

commented Dec 13, 2018

++ In the LDC D compiler, ninja test causes a ninja → ctest → ninja chain which hits this problem

@jcfr

This comment has been minimized.

Copy link

commented Dec 13, 2018

Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad.

Considering using the binary at https://github.com/Kitware/ninja/releases/tag/v1.8.2.g81279.kitware.dyndep-1.jobserver-1

or you could pip install ninja, it also installs the version with jobserver support.

@nox

This comment has been minimized.

Copy link

commented Dec 14, 2018

@nico How can we make you change your mind about including this to Ninja?

stefanb2 added a commit to stefanb2/ninja that referenced this issue Dec 14, 2018
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
bradking pushed a commit to bradking/ninja that referenced this issue Jan 31, 2019
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@simonfxr

This comment has been minimized.

Copy link

commented Feb 1, 2019

Any Progress? Some way of coordinating concurrent ninja builds would be really useful, e.g. using multiple different configurations generated by CMake. I have a project where I have 16 different configurations for testing possible combinations of build flags. Currently I use a xargs -P to build in parallel, it works, but it's ugly and not cross platform.

@avikivity

This comment has been minimized.

Copy link

commented Feb 1, 2019

I have a similar use case with multiple cmake confifurations that could use a jobserver.

@jcfr

This comment has been minimized.

Copy link

commented Feb 1, 2019

@bonzini

This comment has been minimized.

Copy link

commented Feb 8, 2019

My usecase is a bit different, as I have a test driver that runs hundreds of tests in parallel using the jobserver; for that I actually would need a jobserver implemented in Ninja itself, but the client is a prerequisite and the work needed to implement the server is trivial compared to the client.

@avikivity

This comment has been minimized.

Copy link

commented Feb 8, 2019

@bonzini you could run ninja from a one-line makefile (I sort of have the same plans in order to build debug and release in parallel)

@stefanb2

This comment has been minimized.

Copy link
Contributor Author

commented Feb 8, 2019

FYI: I do have a proposal for a jobserver implementation in Ninja but of course it doesn't make sense to submit a PR until the current one has been merged.

@bonzini

This comment has been minimized.

Copy link

commented Feb 8, 2019

@avikivity yeah I have a Makefile that's way more than one line, since I'm only slowly converting from Make to meson/ninja—which is what brought me to this issue. But I'd like to get rid of it sooner or later, of course.

@avikivity

This comment has been minimized.

Copy link

commented Feb 8, 2019

Looks like it will be later rather than sooner :(

@nox

This comment has been minimized.

Copy link

commented Feb 8, 2019

@nico Ping.

@nox

This comment has been minimized.

Copy link

commented Mar 18, 2019

@jhasse Ping, given you have the last commit on master.

@jhasse

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

@nox Is there anything you want me to comment on?

See #1140 for a possible implementation.

@nox

This comment has been minimized.

Copy link

commented Mar 18, 2019

I guess I want an update on that PR, given there has been code changes since your last comment which was in December 2018.

bradking added a commit to bradking/ninja that referenced this issue Apr 22, 2019
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@dothebart

This comment has been minimized.

Copy link

commented Jul 19, 2019

since all real arguments have been named, PRs are there for more than a year, I'd say this doesn't shed a good light on ninja-build as progressive project that follows user demand, has a clear discussion philosopy about whats good or bad for the way ahead.

Please consider getting this fixed.

@avikivity

This comment has been minimized.

Copy link

commented Aug 12, 2019

Note that the compiler can benefit from jobserver support: gcc -flto will run as many jobs in parallel as the jobserver will allow it. Without it, one must either overcommit the build host, or underutilize its resources.

     You can also specify '-flto=jobserver' to use GNU make's job server
     mode to determine the number of parallel jobs.  This is useful when
     the Makefile calling GCC is already executing in parallel.  You
     must prepend a '+' to the command recipe in the parent Makefile for
     this to work.  This option likely only works if 'MAKE' is GNU make.
@jhasse

This comment has been minimized.

Copy link
Collaborator

commented Aug 12, 2019

@dothebart The PR is still being worked on. Not sure what you want us to "fix".

@avikivity This sounds awesome! LTO is one of the worst memory killers and we should keep -fto=jobserver in mind and test if it works with Ninja's potential jobserver support.

edit: Just noticed that this was brought up in #1139 (comment) already :)

@avikivity

This comment has been minimized.

Copy link

commented Aug 14, 2019

And I see that I upvoted that comment long ago :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.