Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn and/or deprecate depfile rules that generate their own edges #1303

Open
taktoa opened this issue Jul 17, 2017 · 53 comments
Open

Warn and/or deprecate depfile rules that generate their own edges #1303

taktoa opened this issue Jul 17, 2017 · 53 comments

Comments

@taktoa
Copy link

taktoa commented Jul 17, 2017

Note: the text below is slightly modified from a series of messages I wrote on the ninja-build IRC channel. Since no one replied to them, I figure that it'd be better to put them in a more permanent place.

I am currently developing a tool that generates a Nix package (derivation) from a Ninja file and a source path, thus allowing reproducible/incremental/distributed builds of anything that can emit a build.ninja. Unfortunately, I've run into what seems to be a dire problem, that is a direct result of what I see as somewhat of a misfeature in Ninja.

Essentially, the fact that it is possible to use a depfile rule to generate dependencies of that self-same rule is a big problem. This wouldn't be so bad if it weren't for the fact that the depfile documentation basically recommends this style (assuming I am not misunderstanding the semantics of GCC):

rule cc
  depfile = $out.d
  command = gcc -MMD -MF $out.d [other gcc flags here]

CMake generates rules like this, and I suspect many of the other Ninja generators I'd like to accept input from do as well, since it's the "obvious" way to get perfect dependencies in a C/C++ project. Admittedly there is a benefit in performance by having one rule instead of two.

The reason this basically makes my project impossible is because you can't make a clean phase separation between rules-that-generate-build-edges and rules-that-are-expensive. You end up having to build all the depfiles twice: once with all the files available, and once sandboxed with the correct dependencies, which destroys incrementalism. Note that, AFAICT, adopting the style I am suggesting does have correctness benefits: as it stands, it is possible for builds to be nondeterministic if you forget to add enough order-only dependencies.

tl;dr: I'm suggesting that instead of doing gcc -MMD -MF $out.d [other flags], you split it into two rules: one that only outputs the dependency data, and one that actually compiles the file. ninja could also disallow (or discourage) people from writing these self-dependent nondeterministic rules.

If I'm wrong about this being an issue, please let me know; I'd be ecstatic.

@zlolik
Copy link

zlolik commented Jul 18, 2017

@taktoa , I do not understand clearly why you need separate depfile phase. But I want to add that some compilers can not generate depfile while compiling. So I had to do rule with separate calling $CXX -ppd=$out.d (like gcc -MMD -MF $out.d):

rule compile_c6x
  command = cmd.exe /c @"$rspfile"
  depfile = $out.d
  rspfile = $out.bat
  rspfile_content = @echo off && cd $pos && "$CXX" $CXX_FLAGS ${opts} -ppd=$out.d -c $in && "$CXX" $CXX_FLAGS ${opts} -c $in

@taktoa
Copy link
Author

taktoa commented Jul 18, 2017

Basically, the only way to get something similar to the depfile feature in Nix is through a feature called "import from derivation" (IFD). This essentially allows you to include values in a Nix expression that were computed by building something (normally, all the information in a Nix expression is static, but this allows phased computation of a Nix expression). So my plan for converting depfile to Nix was that rules including it would turn into Nix derivations that compute depfile.nix files, which would then be IFDed into the build graph. Note that in the Nix derivation I'm computing from a Ninja build graph, each build rule is sandboxed in an environment where it only has its inputs available. This is mandatory to make the build incremental with Nix. This means that the build will fail unless we have perfect dependencies.

So if the depfile rules have correct dependencies, i.e.: depfile rule depends on foo.c and only computes a Makefile, and then a separate rule depends on foo.c and that Makefile and computes foo.o, then there is no problem. But if a depfile rule will fail unless it has all the files in scope (because it also happens to be building foo.o), then I have to assume that all depfile rules need access to every file in the source, which means that every time any file is changed those rules will need to be rebuilt, which likely means a rebuild of everything.

EDIT: if you are familiar with Bazel, Nix has a similar basic design except that you cannot run nix-build inside a Nix derivation, hence why IFD exists.

@evmar
Copy link
Collaborator

evmar commented Jul 19, 2017

Ninja is an intentional compromise between correctness and performance, and I'm afraid this behavior might be by design. One of our earliest clients was the WebKit build, where a given build step could process a set of IDL files and generate an unpredictable set of header files (the file names were derived from macros within the IDL files which contained conditional logic too) that would then affect downstream compilations. In Ninja this is modeled as the header-generator writing out a stamp file, and downstream dependencies only depend on the stamp file because they don't know which headers they'll get.

Note that this isn't to say you aren't allowed to fully specify dependencies! If you want, you can make every build rule list out all its inputs:
build foo.o: cc foo.cc | stdio.h /usr/include/bits/whatever.h /usr/include/a/thousand/more/files.h
but rather that Ninja must support situations where listing all the dependencies isn't possible.

Note that by the rules of C, adding a header file anywhere can have far-reaching consequences. If you have an include "foo.h" somewhere and you stick a different foo.h somewhere else in the include path. Because of this I am not sure that sandboxing does what you want for developing C code. Say I have some source file that has some includes and I build it and its depfile, then use sandboxing to ensure that it only accesses those files. Then suppose I change that file -- say add a #define that affects which headers it involves. The sandbox is now pinning the file set to the wrong files, and to updating the sandbox file list to the correct requires running the compiler again unsandboxed and letting it explore your file system.

The whole purpose of a depfile is to write down the inputs that were used to generate an output that couldn't be known ahead of running a command on that input. For your purposes it sounds like what you really want is to disallow depfiles entirely, because you always want file lists fully specified. That seems fine enough -- do your pass over all the source to determine these things (effectively compile the entire application once, writing down which files it used to do so) and then generate an appropriate build.ninja file (which I guess you'd use to build it again later?). But if you're asking for CMake-generated ninja files (or those generated by other tools) to not use depfiles, I think it's not possible.

(To be clear, I wish all the above weren't so -- it would be nice if C compilation was more regular and predictable. But Ninja is constrained by what real code does. The whole depfile feature is a bit of a concession to C wackyness.)

@taktoa
Copy link
Author

taktoa commented Jul 19, 2017

Well you can still use depfile in such a way that allows incrementalism. Of course, the part where you run gcc -M will not be incremental; it will be run every time you build, but the important thing is that the expensive steps, where you actually compile, are incremental. The current state of affairs is that the cheap dependency computation is mixed with the expensive build. Note that if the generated depfile ends up being the same, the corresponding Nix derivation (after importing) will have the same hash, so it will be a cache hit. Of course, when you modify foo.h, that modifies one of the leaves of the dependency graph (these are "fixed-output derivations" in Nix parlance), so Nix will end up rebuilding the transitive closure of foo.h. In short:

  1. A Nix derivation is a complete description of how to build something in a sandbox, including environment variables, dependencies on other derivations, etc. Of course, we still need to sometimes do impure things in a Nix build, like downloading a file, and in those cases we are required to specify the hash of the output (any observable nondeterminism invalidates the assumption made by normal derivations that the hash of how to build something fully determines the hash of the output). These impure derivations are called fixed-output derivations.
  2. We rerun all depfile rules every time we build to compute correct dependencies. This is done at Nix evaluation time, via the import-from-derivation feature.
  3. A cache hit for a derivation build depends on its hash derivation matching something already existing in the Nix store, so the sandbox issues you are mentioning don't apply.
  4. If running all the depfile rules takes exactly as long as actually building the program, then there will be no incremental speedup.

@Ericson2314
Copy link

Ericson2314 commented Jul 20, 2017

@evmar As a straw man, one could do all preprocessing every time anything changes, and then be perfectly incremental from there on out. Preprocessing isn't free, but it would probably still be a good deal. (This requires content-addressing, not timestamp-based, caching).

@taktoa
Copy link
Author

taktoa commented Jul 20, 2017

Yes, that is precisely what I was getting at in my last comment. We should encourage people to use depfile in a way that makes it possible to deal with as a cheap preprocessing step, but instead it is typically used as part of the same rules that actually do the build, so preprocessing becomes the same as building.

@Ericson2314
Copy link

Ericson2314 commented Jul 21, 2017

@evmar

Then suppose I change that file -- say add a #define that affects which headers it involves. The sandbox is now pinning the file set to the wrong files, and to updating the sandbox file list to the correct requires running the compiler again unsandboxed and letting it explore your file system.

The key is not "unsandboxed" all-together, but just with a wider sandbox. Let not the perfect (non-shitty preprocessing) be the enemy of the good (re-runing the cheap command, pseudo-preprocesing to make the deps file, to allow incrementally ratcheting tighter the sandbox).

@ilyapopov
Copy link

ilyapopov commented Jul 21, 2017

Personally, I consider this feature a big competitive advantage of Ninja. I will be saddened if it changes. I believe this change would make live harder for a lot of people in the benefit of one (supposedly small) use case.

As a solution, you may split these rules yourself. Also, I recommend you have a look at the build2 build system. The author is trying to solve a little bit different problem (distributed builds), but it seems similar enough (you need to transfer a complete set of inputs from a control node to worker node and then transfer all the results back). IIRC, he went the route of separating preprocessing step and compilation step (and he discovered there a lot of caveats there). Their task, however, is somewhat easier since they have complete control over the build system, not converting from another one. You may also take some inspiration from how ccache works.

@ilyapopov
Copy link

ilyapopov commented Jul 21, 2017

And, by the way, you seem to be Haskell guys. Did you have a look at shake build system? It already can read and execute ninja files, so you don't need to redo that work yourself.

@evmar
Copy link
Collaborator

evmar commented Jul 22, 2017

I think we mostly understand each other at this point, so now I'm not sure what you want out of this bug. It sounds like it's not sufficient for your goals to change how Ninja itself behaves, because you need builds to be broken into additional steps and have them pass varying flags to the compiler (which also means identifying which commands are compilations) and it can't work on Windows etc.

Bringing up shake is a great idea! Shake is super cool and could be nice way to approach this problem -- you could use it to deserialize the build rules, add in your adjusted steps, and then execute the modified build. I think your other alternative is to try to figure out if you can generate alternative build files based on your needs but it's not clear to me where your input files are coming from.

@taktoa
Copy link
Author

taktoa commented Jul 22, 2017

@evmar

Yes, I'm aware of Shake, and would loved to have used it if not for the fact that Shake is far too powerful for this purpose; I would have to be able to turn arbitrary Haskell lambdas into Nix expressions for this to work! The issue is essentially that Shake is a monad, whereas I need an applicative build system.

I have a few proposals:

  1. Create a warning during Ninja execution that occurs whenever a build edge brought in by depfile touches the node that caused its generation.
  2. Change the example in the Ninja manual to use the split style, so people in the future don't use this feature unless they really need to.
  3. If we establish that the performance impact of splitting these rules is too much, we should investigate ways to mitigate that impact. For example, if Ninja optionally depended on libclang, we could do the preprocessing without even a single fork. I find it likely that we could be competitive with the current solution through such methods.

@ilyapopov can you elaborate more (or link to a source) on the issues the build2 people ran into with separating preprocessing and the actual build?

@ilyapopov
Copy link

ilyapopov commented Jul 22, 2017

@taktoa

Create a warning during Ninja execution that occurs whenever a build edge brought in by depfile touches the node that caused its generation.

This is how CMake (which is arguably the most popular C/C++ build system nowadays) generates ninja files. Do you propose to generate a warning for every file in a half of all the C/C++ projects?

can you elaborate more (or link to a source) on the issues the build2 people ran into with separating preprocessing and the actual build?

From what I remember:

  • C grammar is such that the result of separate preprocessing/compilation may be different from doing it in one go.
  • Some error/warning messages may become imprecise

See on reddit here and on clang mailing list starting from here

@taktoa
Copy link
Author

taktoa commented Jul 22, 2017

Of course, we can also have a flag that disables the warning. Think of it as a default "lint" option. As it stands, people who generate Ninja probably don't even think about the issue. I am probably also going to talk to the CMake folks about an option that allows the 2-rule generation mode as well as the 1-rule mode.

You realize that with regard to preprocessing, I am not suggesting literally running the C preprocessor separately from the C compiler. I am instead merely suggesting that we run the C dependency computation step (e.g.: gcc -M) separately from the C compiler. This will of course be exactly as expensive as preprocessing, but it will not affect the compile in any observable way. Nothing in the C language, as far as I know, can create #include pragmas after preprocessing, so the file dependency graph we get from this will be exactly as complete as the one we get normally.

@ilyapopov
Copy link

ilyapopov commented Jul 22, 2017

You realize that with regard to preprocessing, I am not suggesting literally running the C preprocessor separately from the C compiler.

Yes I do. You asked what build2 devs encountered, I replied.

@taktoa
Copy link
Author

taktoa commented Jul 28, 2017

I raised another issue on CMake regarding adding the aforementioned option: https://gitlab.kitware.com/cmake/cmake/issues/17114

@jimon
Copy link

jimon commented Jul 28, 2017

Maybe instead of shoehorning Nix use case to Ninja we can try to parse a given build.ninja file and create two files from it (aka split style): one to figure out all dependencies, and another depfile-free just to build stuff. It's also possible to reject "invalid" dependency generation rules (that trigger recursive graph reevaluation) or at least warn about them at this step. I had some experience writing build.ninja parser python, and it's should be fairly simple in any given language.

Main feature of Ninja for me is "I changed a file in 10k source files project, BUILD IT FAST", and I don't want to give up this feature just for cases when I need reproducible builds, because this are two different use cases.

@taktoa
Copy link
Author

taktoa commented Jul 28, 2017

Yes, I've considered that approach, but the entire point of using Ninja for this purpose is that I could be language-agnostic while avoiding the pitfalls of, for example, Makefile. I don't want to have to implement a flag parser for every conceivable compiler out there, instead of simply getting the flags right to begin with.

In most cases, we're generating Ninja anyway, so it's really not that hard to add options to each Ninja generator such that they can generate a reproducible file.

Regarding parsing Ninja, I've already handled this in my fairly comprehensive language-ninja library.

Also, I'm not convinced that the slowdown would be as drastic as you claim. In fact, you could specify the dependency file build as an order-only dependency, and then also generate the dependency file during the rule that builds the object file, so the only real slowdown would be the fact that the number of build edges could double (in the worst case); there would be no slowdown due to building things that you wouldn't normally rebuild. Order-only dependencies just get converted to normal dependencies in ninja2nix, so there's no correctness loss here.

@taktoa
Copy link
Author

taktoa commented Jul 28, 2017

Fundamentally this issue is about documentation. I want people to be aware of the tradeoff they are making when they write a Ninja generator this way. Also, if someone adds an option for generating split rules to their Ninja generator, they actually get a tangible benefit, even if they give zero shits about Nix: they can now run their build through ninja2nix, and if it fails to build, they can be pretty sure that they failed to specify their Ninja dependencies correctly!

@taktoa taktoa changed the title Warn/deprecate depfile rules that generate their own edges Warn and/or deprecate depfile rules that generate their own edges Jul 28, 2017
@taktoa
Copy link
Author

taktoa commented Jul 28, 2017

So, to clarify all of the above, all I'm interested in vis a vis the Ninja project itself is that, if I make a pull request adding the following changes, it would be accepted:

  1. Add a warning to Ninja that detects cases in which the right-hand side of depfile = ... is not also an dependency of the corresponding build rule. This warning can be disabled by adding a special top-level variable to your Ninja file, if you're certain that this is the behavior you want. If this is too controversial, we could make it "opt-in", though I think that defeats most of the purpose.
  2. Add some documentation about this issue.

I really don't think these goals are as drastic or unconventional as some people in this thread are making them out to be. I have no intention of "shoehorning the Nix use case to Ninja", nor am I removing a "competitive advantage of Ninja". No features would be impacted.

Another way of looking at this issue is that I want to encourage people to write Ninja files that have a correct dependency graph even if you only run Ninja once.

@mathstuf
Copy link
Contributor

mathstuf commented Jul 31, 2017

How would this behave with Fortran support (#1265) where outputs need computed as well, not just inputs?

@mathstuf
Copy link
Contributor

mathstuf commented Jul 31, 2017

Also, what about tools that do not support separate depfile/compilation steps? Are they just doomed to generated warnings forever? Not all depfile rules are guaranteed to be C/C++ compilers…

@taktoa
Copy link
Author

taktoa commented Jul 31, 2017

@mathstuf

I addressed that in my last comment.

This warning can be disabled by adding a special top-level variable to your Ninja file, if you're certain that this is the behavior you want.

@mathstuf
Copy link
Contributor

mathstuf commented Jul 31, 2017

Why are we warning by default for behavior that cannot be avoided? To me, it's similar a C++ compiler complaining that it doesn't like your function names but you have to adhere to some external spec about how they work…

@taktoa
Copy link
Author

taktoa commented Jul 31, 2017

It's avoidable in most cases, so we should encourage people to avoid it. If you can't avoid it, it's a one-line fix. I don't see what's unreasonable about this.

@mathstuf
Copy link
Contributor

mathstuf commented Jul 31, 2017

Maybe if the flag were able to be on a rule saying "I know my depfiles are "wrong", but we can't do it the Right Way, so ignore me" rather than global?

@taktoa
Copy link
Author

taktoa commented Jul 31, 2017

Sure! Greater granularity would be useful as well.

@mathstuf
Copy link
Contributor

mathstuf commented Aug 1, 2017

How do you envision this working with generated source files where a tool is built, run, and then the outputs of it are finally written?

@taktoa
Copy link
Author

taktoa commented Aug 1, 2017

@mathstuf Can you give a specific example of a dependency graph that you are concerned about? I suspect there is no issue.

@bendlas
Copy link

bendlas commented Oct 3, 2019

I had independently been thinking about this approach, while working with the (humongous) Chromium build in Nix. It is already crushing our build servers, and with Chromium devs getting rid of their jumbo flag, it's bound to get even worse + we would like to have variants at less than the full cost. Glad to find somebody already talking about it.

How do you envision this working with generated source files where a tool is built, run, and then the outputs of it are finally written?

I think, we could utilize overlayfs to cheaply capture and recombine all the "incidental" outputs of a rule. Not sure if this answers the question exactly.

How would this behave with Fortran support (#1265) where outputs need computed as well, not just inputs?

It makes sense to classify ninja rules into two categories:

1. rules, where we have a cheap way to find exact dependency info ahead of time, e.g. gcc on C/C++
2. rules, we don't know about, that might also use the full power of ninja

Of course, (exact) 1 rules could only depend on other 1 rules. 2 rules would need a full reconstruction of all their inputs (including the full source tree), to work reliably. Fortunately, we might be able to break out large swaths of 1 leaves from any build tree and reasonably expect those to make up a large chunk of build time.

I think the focus of this ticket should be anything preventing us from realizing this separation. Does this make sense?

@evmar
Copy link
Collaborator

evmar commented Oct 3, 2019

I think there might be some misunderstanding of my comments above about how Ninja is designed around depfiles.

The fundamental design constraint of how Ninja builds means that if A must be built after B -- for example, if B generates a header file that A uses -- then the input .ninja file must have that dependency already written out, before the source of A and B are examined or any depfiles exist. This means that depfiles are by definition not necessary to build.

They are only necessary to correctly rebuild after changing a file in an already-completed build. For this reason, I don't think there's much value in trying to compute depfiles in some sort of separate pass. For your purposes (end-to-end builds?) you might be able to get away with just removing use of depfiles entirely.

Another way of saying this is that the only reason the depfile feature exists is exactly to implement the above semantics. If you want some other semantics, and in particular if you have a pass that "computes all the deps of everything exactly", then I think you can remove depfiles entirely from your build.ninja file and instead just write down all the build edges you have using the ordinary ninja syntax for dependencies.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

@evmar

I'm not sure how to not come off a bit judgemental with this; apologies in advance.

So from the Nix people in this thread's perspective. the idea that "builds" vs "rebuilds" work differently is...nuts. We have build caches shared between hundreds of computers that last indefinitely and so we must be extremely cautious about sandboxing and cache invalidation; namely we only cache "pure functions" which dependent precisely on their inputs. Any rule that works differently for builds and rebuilds is emphatically not a pure function, and we don't know how to cache it safely.

It may be a good different that what Ninja does today, but I am sure it is possible to take advantage depfile stuff in a pure / stateless manner. The trick is to think of "dynamic dependencies": rules that produce rules. There is precedent for this: https://shakebuild.com/ does it (though not in a way I consider adequate for reasons that probably are not relevant to discuss), https://github.com/apple/swift-llbuild (https://llbuild.readthedocs.io/en/latest/) takes some inspiration from shake and I believe has it on the roadmap. https://build2.org/faq.xhtml has some hardcoded logic for preprocessing and dependency generation that is similarly to what we'd like to end up with (but not hardcoded into ninja in a layer-violationy way!), and finally as an alternative to the "import from derivation" that works in Nix today, I wrote NixOS/rfcs#40 .

What I'd advocate for Ninja is (surprise!) similar to that proposal I linked.

The fundamental design constraint of how Ninja builds means that if A must be built after B -- for example, if B generates a header file that A uses -- then the input .ninja file must have that dependency already written out, before the source of A and B are examined or any depfiles exist. This means that depfiles are by definition not necessary to build.

Right so this makes total sense and is a good design since Ninja currently is really about static dependencies. To support dynamic dependencies you need "rules that produce rules". There should be stub rules which depend on the rule-producing-rules so Ninja knows what dynamic rules are needed, and which should be in scope (modifying a single global rules db is also ruins determinism, and sp one should instead keep a stack of dyamically created rules).

Does this make sense? I am happy to sketch out a fuller design or whatever if you are interested. Dynamic dependencies are extremely useful beyond all the use-cases mentioned here. There's a bunch of consternation at the big corps for how to include Rust crates.io and other outside ecosystem dependencies, for example, which is fundamentally a dynamic dependency problem. I could go on and on; I think the benefits of showing the world how to get this right are huge.

@evmar
Copy link
Collaborator

evmar commented Oct 3, 2019

I totally get that Nix has goals around correctness, statelessness etc. But I think those goals are in conflict with Ninja's main goal, which is to make whatever necessary correctness compromises to be fast (in the environment where its users are working, which is usually not Nix). In particular for C++ we didn't want to pay to parse all the code twice (once to extract dependencies, and then again as part of compilation). The build2 FAQ you linked (where it talks about why Ninja is faster) is exactly right.

There are lots of build systems that make very different points in this design space! Even the tools that generate Ninja files make different decisions here: when extracting header dependencies, in principle your app that imports stdio.h should have a dependency edge on /usr/include/stdio.h and its dependencies, and whether those "system-level" dependencies get included in the depfile or not is a gcc flag outside of Ninja's control. And beyond that you also should to depend on the .so files that are fed into gcc, and so on, and I think no gcc flag will give you that.

I think in particular Bazel does a lot of work to manage the above and may even get it right (hermeticity is so hard!). I also think Shake is super interesting and I love it whenever I read Neil's comments. But I also think that the place I chose to lie in the design space is exactly why Ninja is chosen over Bazel and Shake in many projects.

I think calling values other than the ones you have "nuts" is something to apologize for, but mostly because it is bad engineering. Surely you can recognize that all of engineering is tradeoffs -- for any design (hermiticity! content hashing!) a good engineer evaluates what it gains against what it costs. The tradeoffs we made were very intentional: sacrifice X so that we get Y. In a different project I might care about X more and in such a project I wouldn't use Ninja.

This is repeating my comment above, but I suspect tackling this at the level of trying to parse Ninja files or changing Ninja semantics is doing it at the wrong level. The whole point of Ninja is that your higher-level build system gets to choose its semantics, and so if you desire different semantics you might as well build that thing and plug it in to those systems. Someone rewrote all of Ninja from scratch in C in a few thousand lines, there's really not a lot there.

In any case, I am sorry to ramble on this thread. I have given Ninja on to new maintainers and they can take it in whichever direction they choose. My hope is that the above at least helps you better understand where I was coming from.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

I totally get there are compromises, but I don't think speed is what's being given up. (It would be simplicity, dynamic dependencies are more work to implement.)

Remember nothing compels the user to use the new rules. Nor is it Ninja's fault that GCC cannot generate dependencies and write down intermediate files so it doesn't need to repeat work. (The rule-producing rules could have multiple outputs.). Sandboxing does indeed slow things down, and require non-portable OS-specific code, and I don't advocate Ninja do that. I'm happen that Nix and Bazel and whatever else bare that burden. [I carefully put forth my designs such that Nix might in fact implement all of Ninja including this too.]

This is repeating my comment above, but I suspect tackling this at the level of trying to parse Ninja files or changing Ninja semantics is doing it at the wrong level. The whole point of Ninja is that your higher-level build system gets to choose its semantics, and so if you desire different semantics you might as well build that thing and plug it in to those systems. Someone rewrote all of Ninja from scratch in C in a few thousand lines, there's really not a lot there.

Right, so as I see it the core of Ninja (not the depfile stuff) is good in that is supports a wide variety of implementation strategies. And indeed, the Ninja file format is now a lingua franc of sorts between various build systems. Only the depfile stuff runs afoul of these stateless purity goals. Frankly, it's hard to define what those semantics are, to the point that different implementations might well give different results!

No one is saying remove them outright all of the sudden, taking away that choice. But by providing an alternative (rules producing rules) gives the hgher-level build system more choice. There's tons of disagreement and confusion on how dynamic dependencies actual work and Ninja, given its proominance as defining the format that other tools use, has the power to mediate the conversation.

@jonesmz
Copy link
Contributor

jonesmz commented Oct 3, 2019

In particular for C++ we didn't want to pay to parse all the code twice (once to extract dependencies, and then again as part of compilation).

Unless I'm misinformed, support for C++20 modules requires that all C++ code in a project be parsed to map modules to source files, before compilation.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

Right the dynamic dependencies situation is especially delicate when build systems assume dumb compilers and compilers assume dumb build systems. C++20 modules require some agreement on dynamic dependencies (or hand duplication), IIUC, so it's a very good time to try to get the ecosystem unstuck by creating some new feature other build systems will emulate, and compilers will take advantage of.

@jonesmz
Copy link
Contributor

jonesmz commented Oct 3, 2019

I was just involved in a discussion on reddit.com/r/cpp about Ninja, and modules, and dynamic dependencies yesterday.

As a frequent user of Ninja, I prefer completeness and correctness by default.

If I want speed over completeness / correctness, then I, as a user, should be required to provide an explicit flag indicating that.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

reddit thread link?

@jonesmz
Copy link
Contributor

jonesmz commented Oct 3, 2019

reddit thread link?

I suppose that I should point out that the majority of text in the discussion came from my keyboard. shrug, just one of those things.

https://www.reddit.com/r/cpp/comments/da5ttn/a_look_into_building_c_modules_with_a_scanner/

Unless I'm misinformed, support for C++20 modules requires that all C++ code in a project be parsed to map modules to source files, before compilation.

I haven't thought about this very much, so I might be mistaken, or other people might have already pointed this out:

One could theoretically do a "trial compilation" of all of the source files prior to a module mapping database being created, and have the compiler spit out any module information regardless of whether the compilation for that cpp file fails during the "trial" compilation, and then later run a "real" compilation (if the trial compilation did not succeed for that translation unit).

This would allow projects that don't use modules at all to achieve the same speed that Ninja has now.

But projects with modules need to know the module mapping strictly prior to compilation finishing.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

Oh! crawling that thread I found #1521 . Looks like what I want (at first glance) has been implemented!

@mathstuf
Copy link
Contributor

mathstuf commented Oct 3, 2019

Oh! crawling that thread I found #1521 . Looks like what I want (at first glance) has been implemented!

Eh. I'm not so sure about that in practice. The scan rules for C++ modules have depfile = associated with them so that the scanner knows when it needs to rescan the sources because a header added a #define SOME_IMPORTANT_DEFINE that makes a different module be imported. It just kicks the can up the road a bit. But, scanning is much faster than compilation, so keeping those as non-hermetic rules is not so bad.

@Ericson2314
Copy link

Ericson2314 commented Oct 3, 2019

@mathstuf What exactly are you disagreeing with? That I found what i wanted? That I would always use dyndeps instead of depfile?

Ninja doesn't sandbox at all, so I wouldn't consider one of those more more "hermetic" than another. Rather, dyndep makes initial builds and rebuilds work the same way. I really like that. Precisely because scanning is so much faster, I am willing to wait for it before that initial build.

@bendlas
Copy link

bendlas commented Oct 3, 2019

I suspect tackling this at the level of trying to parse Ninja files or changing Ninja semantics is doing it at the wrong level.

I tend to agree, if only for the fact that ninja downstream are unlikely to accept a fundamental restriction to their dynamic dependencies. However, anything we can do, to make more of the dynamic dependency graph discoverable, could benefit every ninja user, especially if we have a way to distinguish "complete" rules.

Maybe it's as simple as having a mechanism for ninja users to say how to generate their depfiles and brushing up ninja query.

And beyond that you also should to depend on the .so files that are fed into gcc, and so on, and I think no gcc flag will give you that.

This is, what Nix already gives you. Our problem, really, is when trying to split up a build into multiple derivations, how to get the most minimal set (or even just a reliable superset short of the full source package) of inputs to our recipes, hence minimal rebuilds. Even with the work on Fortran and C++ modules, it may be unreasonable to expect ninja to completely solve this goal, but maybe it could grow to better support it.

Even if we end up sidestepping ninja for this, I think its place as build assembly is richly deserved. ninja query is an absolute gem and I'd love to see it become even more useful.

@mathstuf
Copy link
Contributor

mathstuf commented Oct 4, 2019

That I found what i wanted?

Yes. Because the rules which scan need depfile = themselves because they don't know all their input file paths. Basically, #include still exists even with modules.

@Ericson2314
Copy link

Ericson2314 commented Oct 4, 2019

I would use dyndep for that too. There are ways, though they would benefit from a different implementation of the C preprocessor.

@Ericson2314
Copy link

Ericson2314 commented Oct 4, 2019

@bendlas Did you #1303 (comment) Ninja now has something I like better? We could turn that into IFD, but it would probably be even better to turn it into NixOS/rfcs#40, which is a bit more principled than the "nix-build within derivation" recursive Nix.

@bendlas
Copy link

bendlas commented Oct 4, 2019

I did see that comment and understand, that in some parallel universe, where chromium already migrated their depfiles to dyndeps, this might solve my problem at hand, as well.

However, it's unreasonable to expect everybody to drop what they are doing in order to migrate their depfiles to dyndeps, so that we could wait for them to solve our problem. In fact, I find even the hint of that idea preposterous.

Still, thanks for the pointer, I wasn't aware of rfc#40.

So, seeing as ninja already provides a migration path to a brighter future, I think we should take this conversation away from the ninja tracker, so as to not impose unnecessary notifications on the - certainly busy - maintainers here. Maybe we can get back, after we convinced all the major users of ninja, to migrate.

@Ericson2314
Copy link

Ericson2314 commented Oct 4, 2019

@bendlas If you have enough cycles to spare, it might make sense to modify Clang to do incremental pre-preprocessing. Basically, whenever an #include is encountered, be able to serialize the state and spit out the wanted file, and resume from the serialized state and provided file. Once that is done I think CMake, Meson, Chromium's Gz, etc could be enticed to try using that feature. I'll open an LLVM issue and refer to this thread.

@mathstuf
Copy link
Contributor

mathstuf commented Oct 4, 2019

Basically, whenever an #include is encountered, be able to serialize the state and spit out the wanted file, and resume from the serialized state and provided file.

I don't know if that works. Could you first write up a mock example using awk and other such simple tools to show the flow and example build.ninja structure? I suspect this won't work because given a source file, how is a generator supposed to write out the rules to do this incremental preprocesssing? Each source file has…100 associated rules with it that chain into each other? What if a file has 101 includes then? dyndep only fixes dynamic dependencies. Dynamic rules (i.e., adding a new rule at build-execution time) are not supported in ninja at all.

@jonesmz
Copy link
Contributor

jonesmz commented Oct 4, 2019

Dynamic rules (i.e., adding a new rule at build-execution time) are not supported in ninja at all.

That doesn't seem like a bad thing to add to Ninja

@mathstuf
Copy link
Contributor

mathstuf commented Oct 4, 2019

AFAIK, that is a fundamental redesign of the code that exists in this repository today. Ninja currently has a "plan" and "execute" phase that are completely separate. Once "plan" is done, the graph can only be pruned of nodes (dyndep allowed it to add edges). Even something relatively simple like #760 isn't possible without a new design, so adding nodes later seems even more prone to implementation problems.

@jonesmz
Copy link
Contributor

jonesmz commented Oct 4, 2019

That's all fair.

I think there are lots of things that could be done to improve Ninja to allow complex things to be managed more elegantly.

@Ericson2314
Copy link

Ericson2314 commented Oct 7, 2019

I'm not sure "rules" are actually more expressive than "dependencies". (c.f. imagine compiling an idea to many rules vs interpretting it with a single universal rule).

@Ericson2314
Copy link

Ericson2314 commented Jan 29, 2020

https://www.usenix.org/system/files/atc19-fouladi.pdf was recently published, and does preprocessing in the way I described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants