Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GNU make jobserver client support #1139

Open
stefanb2 opened this issue Apr 27, 2016 · 92 comments · May be fixed by #1140, #2260 or #2263
Open

Add GNU make jobserver client support #1139

stefanb2 opened this issue Apr 27, 2016 · 92 comments · May be fixed by #1140, #2260 or #2263
Labels
Milestone

Comments

@stefanb2
Copy link
Contributor

As long as ninja is the only build execution tool, the current ninja -jN implementation works fine.

But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:

  • top-level GNU Make (with -jX, acts as job server)
  • M instances of GNU make (with -j, act as job server clients)
  • N instances of ninja (don't know anything about job server)

Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.

It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.

stefanb2 pushed a commit to stefanb2/ninja that referenced this issue Apr 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@stefanb2 stefanb2 linked a pull request Apr 27, 2016 that will close this issue
@stefanb2
Copy link
Contributor Author

I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:

  • use case A: top-level GNU make, 1 ninja instance, M-1 GNU make instances
  • use case B: top-level GNU make, N ninja instances, M-N GNU make instances

FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file.

@nico
Copy link
Collaborator

nico commented Apr 27, 2016

Thanks for the patch!

We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.)

Is your use case that you have recursive makefiles?

@stefanb2
Copy link
Contributor Author

I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation.

Here are my reasons for requesting this:

  1. recursion: kati currently can't translate recursive GNU make based build systems, like Linux kernel kbuild. IMHO a major effort and unfortunately I can't wait for kati to provide this, hence the such sub-component builds will have to stay with GNU make for the time being.
  2. missing features: kati currently can't translate fully modularized GNU make based build systems, i.e. where each component is built in isolation and in a separate build directory, so that all ninja.build files could be merged into a single one. While IMHO not such a major issue as (1) it is much simpler to replace the lowest-level $(MAKE) recipe with a kati/ninja recipe. Parsing + merging might also introduce unnecessary build delay (needs to be seen what would happen in real life)
  3. technical barriers: e.g. sub-component builds that run behind a "chroot firewall". Even if everything moves to Ninja, you would still need 1 (main) + N (one for each chroot) ninja instances that need to cooperate. Ninja doesn't offer anything like that.
  4. too simple workarounds: AOSP makeparallel + kati/ninja runs all $(MAKE) instances hard-coded with "make -j4" with no cooperation between any of the GNU make instances. That is only acceptable if you have no or only a few or small $(MAKE) invocations from the ninja.build file.
  5. organizational barriers: even if it might be possible to use kati/ninja to convert an existing GNU make base sub-part of the system, you might not be allowed to do so. Such sub-component builds need to stay with GNU make.
  6. You ask: why not split the build up and run them as separate builds? Goto (5)...

IMHO my patch provides a good solution, considering

  • how small the required changes to ninja are,
  • that the default behaviour is completely unchanged, and
  • that this will make the life easier for many other ninja users which face the same issues

@ghost
Copy link

ghost commented May 23, 2016

wow +1

stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 23, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 25, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 25, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 26, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 27, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 28, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
stefanb2 pushed a commit to stefanb2/ninja that referenced this issue May 30, 2016
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@maximuska
Copy link
Contributor

Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16.

@fabio-porcedda
Copy link

I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that.

@stefanb2 Thanks a lot for your work

@dublet
Copy link

dublet commented Apr 12, 2017

Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet.

@glandium
Copy link

Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup.

@stefanb2
Copy link
Contributor Author

@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request.

That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably tokenpool-gnu-make.cc could be refactored into system agnostic and UNIX-dependent bits.

stefanb2 pushed a commit to stefanb2/ninja that referenced this issue Nov 7, 2017
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
@nox
Copy link

nox commented Nov 11, 2017

This would be really useful too when invoking ninja as part of another build tool, such as cargo.

@comicfans
Copy link

This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem.

@xqms
Copy link

xqms commented Dec 6, 2017

+1 - this is highly interesting for parallel builds with catkin_tools (https://catkin-tools.readthedocs.io/en/latest/). A catkin_tools workspace consists of separate CMake projects which are built in isolation. To control the CPU consumption of parallel make runs, catkin_tools contains a GNU Make jobserver implementation.
In this way, the make jobserver is starting to become a standard "protocol" for controlling resource consumption of parallel builds.

Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because

  • Targets/individual rules will clash - would need CMake changes to keep them apart.
  • We would need some way of encoding inter-package dependencies (build this subninja before that).
  • catkin_tools needs to perform additional installation steps after a package has been built.
  • Also, catkin_tools provides many nice features which would be defeated by a merged build (package-level monitoring, build output grouped by packages, ...).

@yann-morin-1998
Copy link

@nico I would like to add my voice to having support for GNu make job-server support in ninja.

Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others,
are tasked with generating systems by building a lot of various packages from various sources,
all using various buildsystems. I'll mostly use Buildroot as an example, as I'm very familiar with
it, but the following is in principle applicable to all the buildsystems as well.

Such build systems will typically have this sequence per package they build:

  1. download sources of a package
  2. extract the sources
  3. configure the package
  4. build the package
  5. install it in a staging location

And they will repeat that sequence for each and all packages that are needed to build the
target system:

  1. build busybox
  2. build coreutils
  3. build foo
  4. build bar
  5. etc...

Once all packages have been built and installed in the staging location, a system image
(e.g. a bootloader + Linux Kernel + root filesystem for example) is generated from that
staging location. That system image can the be directly flashed onto a device.

Now, that was the quick overview.

Since a system can be made of a lot of packages, we want to build as many packages in
parallel (respecting a depndency chain, of course). But then for each package, we also want
to take advantage of parallel compilation, in case no other package is being built at the same
time.

So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means
we have to distribute those jobs to the various packages that need to be built at some point in
time, so that we maximie the number of jobs, but do not over-shoot the 8-CPU limit.

For example, if 8 ninja-based packages are built in parallel and they do not share a job-server,
they will each be building 8 jobs, which is a total of 64 parallel jobs. On the other hand, limiting
the ninja builds to a single job will be a waste of time when only a single package is built at some
point in time (e.g. becasue the other ones have already finished building, or because the
dependency chain needs that one package before continuing).

And as has been already explained in previous posts in this thread, not every package is based
on ninja, and not every package is even conceivably switchable to ninja. And even if every packages
were using ninja, we can't simply aggregate all the ninja definitions to have a super-build, because
eveything would end up clashing with everything else... So we still need to be able to cooperate with
the rest of the world, especially when that rest of the world has been established for decades now... ;-)

Thanks for reading so far! :-)

nashif added a commit to nashif/zephyr that referenced this issue Mar 6, 2018
This reverts commit 0e6689d.

Parallel builds are broken due to a mix of Make/Ninja and the job server
not being operational.

See ninja-build/ninja#1139

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
@ihnorton
Copy link

+1. We also face this issue of Y*N ninjas while using CMake ExternalProject functionality.

nashif added a commit to zephyrproject-rtos/zephyr that referenced this issue Mar 20, 2018
This reverts commit 0e6689d.

Parallel builds are broken due to a mix of Make/Ninja and the job server
not being operational.

See ninja-build/ninja#1139

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
@avikivity
Copy link

Please consider merging this, it's helpful for build systems that have to recurse into other build systems, and for LTO links.

@mattgodbolt
Copy link

Seconded; our LTO builds suffer from either overcommitting CPU resources, or under utilizing as they don't play nicely with the overarching ninja setup.

@mathstuf
Copy link
Contributor

mathstuf commented Feb 5, 2024

Note that this is only about making ninja take into account running under make. ninja is not setting up a jobserver to communicate with a make or any other tool running under it. It also doesn't (AFAIK) communicate with any commands ninja runs that may want to also participate (e.g., a build rule in build.ninja won't be able to tell a sub-make command about the job server either)

@mattgodbolt
Copy link

mattgodbolt commented Feb 6, 2024

Got it! Thanks...I got myself confused: I'm after job server support which I think is #1139 😊
edit: or maybe not! Maybe that's not actually filed anywhere: my use case is ninja being able to run the linker with lto options that limit the number of CPUs it uses in the same way as ninja itself limits things.

@avikivity
Copy link

I think it would work by running ninja under make, so make would be the jobserver for ninja and anything it spawns.

@mathstuf
Copy link
Contributor

mathstuf commented Feb 6, 2024

I think it would work by running ninja under make, so make would be the jobserver for ninja and anything it spawns.

Doesn't ninja need to coordinate to keep the right files open and environment intact for its rules to communicate with the job server?

@eli-schwartz
Copy link

Doesn't ninja need to coordinate to keep the right files open and environment intact for its rules to communicate with the job server?

No, the new "fifo" jobserver explicitly allows GNU Make to act as the coordinator for your entire process tree, regardless of whether or not any individual process in the process tree supports it, as long as a recursive descendant knows how to communicate via the fifo.

This is a benefit over the classic anonymous pipe for two reasons:

  • if ninja does NOT support the jobserver, gcc -flto=jobserver can still coordinate with the jobserver when run by ninja
  • if ninja DOES support the jobserver, it only needs to act as a client and ask for jobs, it doesn't need to act as a server for the jobs it acquired and pass them on to gcc -flto=jobserver.

@mathstuf
Copy link
Contributor

mathstuf commented Feb 6, 2024

Ah, neat, thanks. The fifo mechanism sounds much better then.

@avikivity
Copy link

Let's meet again, same place, next year.

@xim
Copy link

xim commented Feb 7, 2024

Let's meet again, same place, next year.

Count me in

@degasus
Copy link

degasus commented Feb 28, 2024

The PR is still being worked on. Not sure what you want us to "fix".

@jhasse After reading all of the comments here, I'm under the impression that the pull request #1140 is finished for more than half a decade, and just rebased every year.

@stefanb2 Please correct me if I'm wrong, but it seems like you are waiting for a decision for either

  • ninja does not want to have any jobserver client support and this issue and this three PR shall be closed
  • ninja does want to have a jobserver support, but not the GNU Make jobserver protocol. So we shall look for alternatives
  • ninja does want to have GNU Make jobserver client support, but you don't like the implementation in Add GNU make jobserver client support #1140. So what shall be modified?
  • ninja does want to have GNU Make jobserver client support, and there is nothing wrong in Add GNU make jobserver client support #1140. So how many more years shall this wait?

@digit-google
Copy link
Contributor

I do not know @jhasse exact point of view on the topic, but I can see several issues with the PR:

  • There is no regression test suite for what is a major change to Ninja's behavior. While there are unit-tests that verify some parts of the implementation, a real regression test suite that can be run on CI would verify that the Ninja binary works as expected, either as a client, a server, or both at the same time. This requires writing new Python tests under misc/ that simulate a jobserver-enabled build with multiple scenarios.

  • The code is hard to understand and maintain. In particular the way Posix signals are used is scary and brittle. It will very likely lead to flaky and unexpected failures under heavy loads and non-conventional runtime environments (think containers or qemu user emulation). The Win32 part writes directly to the completion port of SubprocessSet. At a minimum, all signal-twiddling and completion-related code should be part of subprocess-posix.cc or subprocess-win32.cc, which would provide a sane API for the token pool implementation.

Ideally, Ninja would implement an asynchronous loop API that would allow to wait for several i/o conditions and timers concurrently in the main thread, and act upon it, and SubprocessSet and TokenPool classes would be all users of it, but that's probably for another PR.

  • Minor: I recommend reworking the commits in the PR to ensure that each one of them is final, correct, individually testable, and updates both configure.py and CMakeLists.txt at the same time.

@eli-schwartz
Copy link

@digit-google it would be productive to make review comments directly on the PR.

Preferably any time in the past 8 years, but no time like the present! :)

Note that regression testing can be somewhat accounted for by the fact that an extremely large number of people have been running the patchset in production for years now.

@digit-google
Copy link
Contributor

I agree, but I was responding to @degasus who was asking in this thread what could be changed in the PR. However, I'll add similar comments there too.

It is great that the current patchset has been working well, and I encourage putting actual metrics, like number of users, build performance improvement times, in the actual PR description and final patchset.

However, unit and regression testing is about ensuring that future Ninja changes do not break its behavior unexpectedly. Given the complexity of the feature and the fact that is changes how Ninja interacts with its runtime environment, unit-testing is not enough. But that's just my humble opinion.

@eli-schwartz
Copy link

You can say the same thing about all the existing functionality ninja has.

My opinion is that it isn't fair or reasonable to ask this PR to be a special exception, but it would be fair iff someone wrote an end-to-end testing suite, then asked the jobserver PR to include jobserver coverage in it.

@digit-google
Copy link
Contributor

Frankly, that PR would be fine to me, even without a full regression test suite, if it didn't spread tricky signal-handling code in what looks like unrelated parts of the source tree. This is a hackish design that is bound to be a maintenance nightmare for anyone that accepts that in their git repository. I assume that's why @jhasse, who has very very limited bandwidth to maintain Ninja, has not felt confident in accepting it.

And for full disclosure, I am not an official Ninja maintainer in any way, but I maintain my own Ninja fork for the Fuchsia project in order to support a number of important additional features.

While I do plan to implement jobserver support there to, this will not be based on this PR for exactly this reason.

@stefanb2
Copy link
Contributor Author

stefanb2 commented Mar 2, 2024

@stefanb2 Please correct me if I'm wrong, but it seems like you are waiting for a decision for either

The short answer: I'm not waiting for anything.

The long answer:

This contribution is a side product of the migration of the internal code base at my former workplace to Android N. Android N build system introduced the kati-ninja-combo, which had severe negative impacts on build performance. These were not acceptable for the company, so I looked into adding jobserver client support to ninja. This turned out to be rather simple and the build performance problem was solved. As the resulting changes were already paid for, I requested for permission to contribute them upstream.

IMHO there is nothing for me to do. Either

  • the project makes a decision about the contribution, or
  • my former employer requests me to withdraw it.

@segevfiner
Copy link

Kitware (CMake's authors) also maintain https://github.com/Kitware/ninja which is a fork/build with this PR, and the ninja you can install from PyPI https://pypi.org/project/ninja/, is actually this fork.

@jcfr
Copy link

jcfr commented Mar 21, 2024

Kitware (CMake's authors) also maintain https://github.com/Kitware/ninja

Ditto. We have been using our fork as both (1) a staging area for features in review and (2) the version built and distributed1 on PyPI.

For context, the distribution of both cmake and ninja on PyPI was motivated to support the scikit-build2 initiative.

Footnotes

  1. https://pypi.org/project/ninja/

  2. https://github.com/scikit-build/scikit-build-core

edtanous added a commit to edtanous/openbmc that referenced this issue May 8, 2024
Ninja has a PR for adding make jobserver support [1] that has been a widely
debated PR for many... many years.  Given that many people have forked to
incorporate this PR, and it claims to solve a problem we have (OOM on gcc
processes) it seems like it would be worthwhile using a well maintained fork
instead of the main project.

This is not a one way door.  If we find that the project goes
unmaintained, doesn't build, or otherwise has problems, we can always go
back to using mainline.

Of the forks that have pulled this in, there are:
The Fuscia project [2]
Their targets seem more specific and less generic, although their
improvements seem more extensive.

Kitware [3]
Maintains a fork of ninja

Docker [4]

[1] ninja-build/ninja#1139
[2] https://fuchsia.googlesource.com/third_party/github.com/ninja-build/ninja/+/refs/heads/main/README.fuchsia
[3] https://github.com/Kitware/ninja
[4] https://github.com/dockbuild/ninja-jobserver

'''
EXTRA_OEMESON_COMPILE:append = " \
--ninja-args='--tokenpool-master=fifo' \
"

PARALLEL_MAKE = "-j 20"
BB_NUMBER_THREADS = "20"
'''

Signed-off-by: Ed Tanous <ed@tanous.net>
@mortie
Copy link

mortie commented Jun 3, 2024

What's the current status on this? I'm interested in it from the meta build system perspective, where many different projects written in different languages and using different build systems are all compiled in a coordinated manner. Without make jobserver client support in ninja, meta build systems are forced to make one of the following terrible trade-offs:

  • Build the projects sequentially, relying on the individual build systems' concurrency support. This is bad, since significant parts of from-scratch build times are single-threaded (especially the configure at the beginning and the linking at the end). AFAIK, this is what Buildroot does by default.
  • Build the projects concurrently, but limit each individual project to use one core. This works well if there are many small projects, but terrible if there are one or two projects which are significantly bigger than the others. You do not want to build Chromium with only one core.
  • Build the projects concurrently and let each project use many cores. This is probably the fastest if you have enough RAM, but if you have one of those 32-thread systems, it means you're running 32*32=1024 compiler processes at the same time in the worst case. This requires an immense amount of RAM. This is what Bitbake does by default.

If Ninja and other build systems supported the jobserver protocol, there would be another option:

  • Build the projects concurrently, and let each project use multiple cores, but run one central job server which limits the total concurrency across all projects.

To my knowledge, Ninja is the only real hold-out to make this a practical possibility. GNU Make and Rust's Cargo already support being jobserver clients.

@eli-schwartz
Copy link

The current status is that after @stefanb2's PR died the death of eternally pending review, @hundeboll reimplemented it two weeks ago in #2450 and it has been approved and scheduled for inclusion in ninja 1.13.0 (but the merge button hasn't been hit).

No jobserver master support, only client support, but this is probably not a worry for you.

It would be nice if the new PR had linked to the issue as well but it is what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment