-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GNU make jobserver client support #1139
Comments
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:
FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file. |
Thanks for the patch! We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.) Is your use case that you have recursive makefiles? |
I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation. Here are my reasons for requesting this:
IMHO my patch provides a good solution, considering
|
wow +1 |
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16. |
I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that. @stefanb2 Thanks a lot for your work |
Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet. |
Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup. |
@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request. That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably |
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
This would be really useful too when invoking ninja as part of another build tool, such as cargo. |
This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem. |
+1 - this is highly interesting for parallel builds with Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because
|
@nico I would like to add my voice to having support for GNu make job-server support in ninja. Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others, Such build systems will typically have this sequence per package they build:
And they will repeat that sequence for each and all packages that are needed to build the
Once all packages have been built and installed in the staging location, a system image Now, that was the quick overview. Since a system can be made of a lot of packages, we want to build as many packages in So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means For example, if 8 ninja-based packages are built in parallel and they do not share a job-server, And as has been already explained in previous posts in this thread, not every package is based Thanks for reading so far! :-) |
This reverts commit 0e6689d. Parallel builds are broken due to a mix of Make/Ninja and the job server not being operational. See ninja-build/ninja#1139 Signed-off-by: Anas Nashif <anas.nashif@intel.com>
+1. We also face this issue of Y*N ninjas while using CMake |
This reverts commit 0e6689d. Parallel builds are broken due to a mix of Make/Ninja and the job server not being operational. See ninja-build/ninja#1139 Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Please consider merging this, it's helpful for build systems that have to recurse into other build systems, and for LTO links. |
Seconded; our LTO builds suffer from either overcommitting CPU resources, or under utilizing as they don't play nicely with the overarching |
Note that this is only about making |
Got it! Thanks...I got myself confused: I'm after job server support which I think is #1139 😊 |
I think it would work by running ninja under make, so make would be the jobserver for ninja and anything it spawns. |
Doesn't |
No, the new "fifo" jobserver explicitly allows GNU Make to act as the coordinator for your entire process tree, regardless of whether or not any individual process in the process tree supports it, as long as a recursive descendant knows how to communicate via the fifo. This is a benefit over the classic anonymous pipe for two reasons:
|
Ah, neat, thanks. The |
Let's meet again, same place, next year. |
|
@jhasse After reading all of the comments here, I'm under the impression that the pull request #1140 is finished for more than half a decade, and just rebased every year. @stefanb2 Please correct me if I'm wrong, but it seems like you are waiting for a decision for either
|
I do not know @jhasse exact point of view on the topic, but I can see several issues with the PR:
Ideally, Ninja would implement an asynchronous loop API that would allow to wait for several i/o conditions and timers concurrently in the main thread, and act upon it, and SubprocessSet and TokenPool classes would be all users of it, but that's probably for another PR.
|
@digit-google it would be productive to make review comments directly on the PR. Preferably any time in the past 8 years, but no time like the present! :) Note that regression testing can be somewhat accounted for by the fact that an extremely large number of people have been running the patchset in production for years now. |
I agree, but I was responding to @degasus who was asking in this thread what could be changed in the PR. However, I'll add similar comments there too. It is great that the current patchset has been working well, and I encourage putting actual metrics, like number of users, build performance improvement times, in the actual PR description and final patchset. However, unit and regression testing is about ensuring that future Ninja changes do not break its behavior unexpectedly. Given the complexity of the feature and the fact that is changes how Ninja interacts with its runtime environment, unit-testing is not enough. But that's just my humble opinion. |
You can say the same thing about all the existing functionality ninja has. My opinion is that it isn't fair or reasonable to ask this PR to be a special exception, but it would be fair iff someone wrote an end-to-end testing suite, then asked the jobserver PR to include jobserver coverage in it. |
Frankly, that PR would be fine to me, even without a full regression test suite, if it didn't spread tricky signal-handling code in what looks like unrelated parts of the source tree. This is a hackish design that is bound to be a maintenance nightmare for anyone that accepts that in their git repository. I assume that's why @jhasse, who has very very limited bandwidth to maintain Ninja, has not felt confident in accepting it. And for full disclosure, I am not an official Ninja maintainer in any way, but I maintain my own Ninja fork for the Fuchsia project in order to support a number of important additional features. While I do plan to implement jobserver support there to, this will not be based on this PR for exactly this reason. |
The short answer: I'm not waiting for anything. The long answer: This contribution is a side product of the migration of the internal code base at my former workplace to Android N. Android N build system introduced the kati-ninja-combo, which had severe negative impacts on build performance. These were not acceptable for the company, so I looked into adding jobserver client support to ninja. This turned out to be rather simple and the build performance problem was solved. As the resulting changes were already paid for, I requested for permission to contribute them upstream. IMHO there is nothing for me to do. Either
|
Kitware (CMake's authors) also maintain https://github.com/Kitware/ninja which is a fork/build with this PR, and the ninja you can install from PyPI https://pypi.org/project/ninja/, is actually this fork. |
Ditto. We have been using our fork as both (1) a staging area for features in review and (2) the version built and distributed1 on PyPI. For context, the distribution of both Footnotes |
Ninja has a PR for adding make jobserver support [1] that has been a widely debated PR for many... many years. Given that many people have forked to incorporate this PR, and it claims to solve a problem we have (OOM on gcc processes) it seems like it would be worthwhile using a well maintained fork instead of the main project. This is not a one way door. If we find that the project goes unmaintained, doesn't build, or otherwise has problems, we can always go back to using mainline. Of the forks that have pulled this in, there are: The Fuscia project [2] Their targets seem more specific and less generic, although their improvements seem more extensive. Kitware [3] Maintains a fork of ninja Docker [4] [1] ninja-build/ninja#1139 [2] https://fuchsia.googlesource.com/third_party/github.com/ninja-build/ninja/+/refs/heads/main/README.fuchsia [3] https://github.com/Kitware/ninja [4] https://github.com/dockbuild/ninja-jobserver ''' EXTRA_OEMESON_COMPILE:append = " \ --ninja-args='--tokenpool-master=fifo' \ " PARALLEL_MAKE = "-j 20" BB_NUMBER_THREADS = "20" ''' Signed-off-by: Ed Tanous <ed@tanous.net>
What's the current status on this? I'm interested in it from the meta build system perspective, where many different projects written in different languages and using different build systems are all compiled in a coordinated manner. Without make jobserver client support in ninja, meta build systems are forced to make one of the following terrible trade-offs:
If Ninja and other build systems supported the jobserver protocol, there would be another option:
To my knowledge, Ninja is the only real hold-out to make this a practical possibility. GNU Make and Rust's Cargo already support being jobserver clients. |
The current status is that after @stefanb2's PR died the death of eternally pending review, @hundeboll reimplemented it two weeks ago in #2450 and it has been approved and scheduled for inclusion in ninja 1.13.0 (but the merge button hasn't been hit). No jobserver master support, only client support, but this is probably not a worry for you. It would be nice if the new PR had linked to the issue as well but it is what it is. |
As long as ninja is the only build execution tool, the current
ninja -jN
implementation works fine.But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:
Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.
It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.
The text was updated successfully, but these errors were encountered: