Evaluate build paths in the context of the build's bindings #94

Colecf · 2023-12-29T01:58:36Z

This fixes an incompatibility with ninja.

I've also refactored Env to make it clearer and remove the TODO(#83), but I actually think we could do signifigant further cleanup. Only rules should require EvalStrings, global variables and build bindings can be evaluated as soon as they're read, although maybe that would change with subninjas. Also, rules currently parse their variables as EvalString, but I think that could be changed to EvalString<&'text str> if we hold onto the byte buffers of all the included files until the parsing is done.

Fixes #91 and #39.

evmar · 2023-12-30T11:38:47Z

Wow, I'm impressed you got all this figured out! I had some ideas here and was about to start tinkering and then I realized I probably should put some more testing in place before I broke something I had done in the past...

I checked in some testing support for benchmarking changes to parsing, see
https://github.com/evmar/n2/blob/main/doc/development.md#benchmarking

It looks like for the LLVM CMake build this patch regresses parse perf by ~60%:

parse benches/build.ninja
                        time:   [219.09 ms 219.36 ms 219.64 ms]
                        change: [+61.143% +61.381% +61.614%] (p = 0.00 < 0.05)

Eyeballing the patch, I'm not sure if it's fundamental to just getting this correct, or if it's something simpler like the stop_at param or needing to avoid allocating some Vecs while parsing.

(Definitely have the feeling that I wish I had thought variables through better when making Ninja, blergh. It was a very long time ago though...)

evmar · 2023-12-30T11:54:42Z

src/eval.rs

+    /// evalulate turns the EvalString into a regular String, looking up the
+    /// values of variable references in the provided Envs. It will look up
+    /// its variables in the earliest Env that has them, and then those lookups
+    /// will be recursively expanded starting from the env after the one that


I believe this isn't right.

The Ninja docs aren't too great but the intent is expansion only ever happens once:
https://ninja-build.org/manual.html#_variable_expansion

I think that is mostly an implementation detail. I mostly made this change so that it was easy to turn an EvalString into the proper string under a variety of different circumstances. (rule bindings vs build bindings) If we want, we can expand out variables at any point in the process.

However I don't think doing early expansion will really help that much. Consider this build.ninja:

root_out = out build ${local_out}/foo: ${local_out}/bar local_out = ${root_out}/local

Here, we have to expand 2 paths. Even if we had preemptively flattened root_out into local_out, we still need to copy the out/local path fragment into two separate strings. So flattening it doesn't save us any string copying time, only some scope traversal time, but since there are a maximum of 4 scopes currently in n2 (implicit, rule, build, global, though this will increase when subninja support is added), I don't think that matters.

That is an interesting argument regarding scope traversal time!

I think the most nested case of lookup (excluding subninja) is a file

var = a rule print command = echo "var:'$var' out:'$out'" build c$var: print var = b$var

which prints
var:'ba' out:'cba'

evmar · 2023-12-30T12:02:13Z

src/eval.rs

-        // TODO(#83): this is wrong in that it doesn't include envs.
-        // This can occur when you have e.g.
-        //   rule foo
-        //     bar = $baz


In Ninja it turns out you can't declare arbitrary vars in a rule block so this example isn't legal.

I tried a construction like

rule print command = echo "varref is $varref" depfile = abc build out: print varref = $depfile

And Ninja expanded varref to the empty string.

...which I think means this worry was misplaced (?)

I don't really understand this comment, (where is x used?) but just from the signature you can tell it was wrong, (You can't convert an EvalString to a String without any additional input, or else you shouldn't have used an EvalString in the first place) so I reworked it so Envs return EvalStrings.

Colecf · 2023-12-30T22:21:40Z

Thanks for the comments, and adding the benchmarking tests. I'll look into the performance regressions. I think the easiest way to make it faster would be to expand the build bindings as soon as they're parsed, but one reason I didn't do that was because the android multithreaded parser will parse chunks of the manifest in parallel. So we could parse a build definition before all the global variables above it are parsed. So if we want to support that eventually, we'd need to leave the ability to have the build's paths and bindings unevaluated.

Colecf · 2023-12-31T03:06:45Z

For my benchmarking I've just been using the "synthetic" build file, I haven't set up an LLVM build.

First I started with some straight optimizations:

Gets rid of the slice.contains(), you were right, it is slow. (or at least, letting the match know about the characters we care about is faster)
Use with_capacity for the evaluated vecs
Reuse a common buffer in read_eval, and then clone it afterwards

With these changes it goes from about 90% slower than HEAD~ to about 20% slower, on the "parse synthetic build.ninja" test. I think this can probably mostly be explained due to the fact that I've moved the evaluation of the build's variable bindings from the loader code into read_build, and that benchmark doesn't include the loader.

I then realized, if we do multithreading, it's essentially always wrong to do any evaluating inside the parser, as the parser is what will be run in the separate threads. So I changed the parser to always return EvalString<&'text str>, and the evaluation happens in the loader. This kindof cheats the benchmark, and now the benchmark says parsing builds is about 15% faster than HEAD~.

I added a benchmark that runs the loader instead of just the parser. With this benchmark, this cl is only 3% slower than before. (And even that might be noise, I've seen 3% differences when running the same benchmark twice on the same code)

So this change ended up turning into a lot of preparation for multithreading, and not just the path evaluation change. Sorry about that.

Colecf · 2024-01-03T19:15:40Z

Actually, this still has some issues. Rule bindings do need to be recursive at least.

Regular ninja also has this validation.

Colecf · 2024-01-03T22:01:12Z

Well, currently n2 doesn't have recursive rule bindings either, so I guess this is ok for now, and we can address that issue in a followup.

I rebased this pr on top of #96, and removed the rule_bindings_arent_recursive test.

I'm making some changes that affect the performance of the parser and the loader, and would like to benchmark them together. Unfortunately this required making a bunch of things public (#[cfg(test)] doesn't seem to apply for benchmarks), which isn't great.

evmar · 2024-01-04T21:12:14Z

I checked in some LLVM build files for your convenience, see tests/snapshot/README.md.

evmar · 2024-01-04T21:20:46Z

I wrote down what I believe the rules are for variable scope here, based on a bit of fiddling with running files through Ninja

https://github.com/evmar/n2/blob/main/doc/design_notes.md#variable-scope

I'm not sure if that's helpful, but I think it at least helped me to try to think it through.

This fixes an incompatibility with ninja. I've also moved a bunch of variable evaluations out of the parser and into the loader, in preparation for parsing the build file in multiple threads, and then only doing the evaluations after all the chunks of the file have been parsed. Fixes evmar#91 and evmar#39.

Colecf · 2024-01-05T00:34:04Z

I checked in some LLVM build files for your convenience, see tests/snapshot/README.md.

Thanks! I hacked together a benchmark test to use them (locally), and discovered this PR was ~15% slower when loading the llvm-cmake files. I found a fix for this by duplicating the inner loop of read_eval into two different versions, one for paths and one not. After that change, loading the llvm-cmake files is ~2% faster than before this PR on my machine. Unfortunately, this change makes a large block of duplicated code, but I couldn't figure out a better way to do it. :/

I wrote down what I believe the rules are for variable scope

I see you wrote "Lookup for rule variables can refer to $in/$out, build scope, then toplevel". This isn't entirely correct, it should be "$in/$out, other rule variables recursively (with cycle detection), build scope, then toplevel". However currently n2 both before and after this PR doesn't have that behavior. I plan to add it in a follow-up after this PR is merged.

This recursive rule binding is useful for depfiles:

rule my_rule:
  command = my_tool --depfile ${depfile} ${in} > ${out}
  depfile = ${out}.d

build out: my_rule main.c

I found this pattern in the android codebase.

Also it might be worth specifying that while build bindings can refer to toplevel bindings, they can only refer to toplevel bindings that were defined before the build.

evmar · 2024-01-07T06:11:04Z

other rule variables recursively (with cycle detection)

Oh yikes, this seems to have happened after my time working on Ninja. It's probably worth figuring out if there is a simpler specific thing we could do here that is less work.

Colecf · 2024-01-07T06:14:11Z

I'm not opposed to not supporting recursive rule bindings in n2, and instead fixing android to not rely on it, if you'd prefer that. This is probably easier to fix with fewer occurrences than build paths taking into account build bindings.

evmar · 2024-01-07T22:47:05Z

My hope is there's some balance between bug-for-bug Ninja compatibility and breaking totally from it.

For example, maybe this could be the rule: in any scope, bindings written earlier can be used in lexically later statements. I think that's easy to understand, it's already how toplevel scope works, and it defines away circularity problems in the other scopes. (Selfishly, that also makes evaluation easier.)

evmar · 2024-01-08T17:33:21Z

I took another pass over my attempt at writing down what the current rules are, which mostly meant removing some of the details (which as you correctly pointed out, I had wrong)
https://github.com/evmar/n2/blob/main/doc/design_notes.md#variable-scope

Colecf · 2024-01-16T18:56:03Z

@evmar Have you had a chance to look at this change?

evmar · 2024-01-18T19:03:47Z

(Sorry for the delay, ice storm here means I've been on childcare all week.)

My overall take is I appreciate the intent and I think the tests are great. I don't love the approach and have an ill-formed idea about a different approach but I can revisit that at some point in the future if I care to.

evmar · 2024-01-18T19:14:44Z

My vague ideas (sorry on five hours of sleep) are something like:

give Env something like get(parents: &[dyn Env]) -> Cow<str>, this lets us evaluate toplevel bindings immediately to strings
when parsing a build statement, accumulate ins/outs in local vars of Vec<EvalString>, but then resolve them to paths via some Loader-like thing and return resolved paths in the Build struct
when parsing a build statement, when we loop as in read_scoped_vars, evaluate each binding to strings while they are parsed
we could do similar inline evaluation when parsing a rule, something like read_scoped_vars(|var, evalstring| match var { "depfile" => rule.depfile = evalstring } (this would also let us error on unexpected vars here)

Sorry that is so vague, I do not expect you to do it, mostly writing it for my own memory!

Colecf · 2024-01-18T19:30:53Z

Thanks!

I think most of your points about doing evaluation earlier are incompatible with multithreaded parsing. You can't evaluate the toplevel bindings as you read them, because there could be variable references to variables from an earlier chunk of the file that's being parsed in a different thread. Multithreading will likely blow any savings we would get from earlier evaluation out of the water as well.

I'll work on a multithreading PR, so you can have a frame of reference for what can be evaluated earlier and what can't.

evmar · 2024-01-19T06:58:22Z

Naively it feels like the generator knows a lot more about the file structure than n2 does, and it could do something around splitting files and using subninja to control scopes such that you don't need to do so much speculation about the file's contents... (?) E.g. even splitting a text file with multiple threads means trying to seek around for newlines, I would think.

Colecf · 2024-01-19T09:27:08Z

splitting a text file with multiple threads means trying to seek around for newlines, I would think.

Yeah, the android fork of ninja essentially looks for a non-escaped newline immediately followed by an identifier.

Having the generator split the file is an interesting idea. But wouldn't it still require deferred evaluation of global variables? If you used includes, you'd have to wait defer expansion of everything after the include, and the contents of the included file. With subninja it's better, but still requires deferring evaluation of the subninja's variables until the parent file has evaluated up until the subninja command.

And it's harder to split a ninja file using subninja because you can't share global variables across them. You wouldn't want a "linked list" of subninjas, where there was 1 subninja statement per file, at the end of every file, because then you wouldn't be able to start parsing that until you fully parsed the file before it, essentially making it a serial again. A better way would be to have 1 top level file that just contained include/subninja statements for all the broken-up files. But that only really works with include, because the subninja files would be unable to share variables. You'd have to have a pretty sophisticated generator to ensure all your variables were available in the correct scopes if you were to use subninja.

It also puts more work on generator authors. Doubly so for android as we'd have to support the splitting in both soong and kati.

evmar · 2024-01-19T16:25:31Z

I think I still don't have a good mental picture of how these work, they are like hundreds of mbs of interdependent globals? 😬

evmar · 2024-01-19T16:30:01Z

The two big samples I have either don't use them at all:

llvm-cmake$ egrep '^\w+ =' build.ninja CMakeFiles/rules.ninja
build.ninja:ninja_required_version = 1.5
build.ninja:CONFIGURATION = Release
build.ninja:cmake_ninja_workdir = /Users/evmar/projects/llvm-project-16.0.0.src/build/

Or in GN's case, they have a subninja per module that each set some at the top:

llvm-gn$ find . -name '*.ninja' | wc -l
     824

where the largest files are still pretty small:

llvm-gn$ find . -name '*.ninja' | xargs wc -l | sort -n | tail -5
     484 ./stage2_unix/obj/compiler-rt/lib/builtins/builtins.ninja
     673 ./obj/llvm/lib/CodeGen/CodeGen.ninja
    2872 ./build.ninja
    4226 ./toolchain.ninja
   35930 total

Colecf · 2024-01-20T04:55:32Z

In android's case, we have a lot of variables that are spread throughout one ninja file, and they commonly contain references to earlier variables:

$ egrep '^[a-zA-Z0-9._-]+ =' out/soong/build.sdk_phone64_x86_64.ninja | wc -l
249685
$ egrep '^[a-zA-Z0-9._-]+ =' out/soong/build.sdk_phone64_x86_64.ninja | wc -c
96253004 (96 megabytes, I don't think we have any variable assignments that span multiple lines)
$ # For a sense of how inter-dependent they are:
$ egrep '^[a-zA-Z0-9._-]+ =.*\$\{' out/soong/build.sdk_phone64_x86_64.ninja | wc -l
47953

(the \w regex doesn't work because the grand majority of our variables include a period)

However, I haven't done a deep dive into seeing how difficult it would be to make soong/kati output separate ninja files. I'll take a look into that, maybe there's a pattern to the inter-dependedness that can be exploited.

I'll open a separate issue for multithreading as well.

evmar · 2024-01-21T00:38:01Z

If you had one of these .ninja files handy that doesn't contain anything confidential I'd love to add it to https://github.com/evmar/n2/tree/main/tests/snapshot

evmar reviewed Dec 30, 2023

View reviewed changes

Colecf force-pushed the deferred_variable_expansion branch 2 times, most recently from 9dffdcb to c13f859 Compare December 31, 2023 02:58

Colecf force-pushed the deferred_variable_expansion branch from c13f859 to 8e61026 Compare January 2, 2024 02:13

Enforce only valid rule variables are used

abcd3bb

Regular ninja also has this validation.

Colecf force-pushed the deferred_variable_expansion branch from 8e61026 to fd10e19 Compare January 3, 2024 21:59

Colecf force-pushed the deferred_variable_expansion branch from fd10e19 to 9831750 Compare January 4, 2024 17:59

Colecf force-pushed the deferred_variable_expansion branch from 9831750 to d510d1d Compare January 5, 2024 00:21

evmar force-pushed the main branch 2 times, most recently from 8cafbda to d43e8ca Compare January 8, 2024 17:38

evmar merged commit 909ac60 into evmar:main Jan 18, 2024
2 checks passed

Colecf deleted the deferred_variable_expansion branch January 18, 2024 22:51

Colecf mentioned this pull request Jan 19, 2024

Path expansion in builds doesn't take take build variables in context #39

Open

Colecf mentioned this pull request Jan 20, 2024

Add multithreaded parsing #108

Open

Colecf mentioned this pull request Feb 29, 2024

Subninja support and multithreaded loading #112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate build paths in the context of the build's bindings #94

Evaluate build paths in the context of the build's bindings #94

Colecf commented Dec 29, 2023

evmar commented Dec 30, 2023 •

edited

evmar Dec 30, 2023

Colecf Dec 31, 2023

evmar Jan 4, 2024

evmar Dec 30, 2023

evmar Dec 30, 2023

Colecf Dec 31, 2023

Colecf commented Dec 30, 2023

Colecf commented Dec 31, 2023 •

edited

Colecf commented Jan 3, 2024

Colecf commented Jan 3, 2024

evmar commented Jan 4, 2024

evmar commented Jan 4, 2024

Colecf commented Jan 5, 2024 •

edited

evmar commented Jan 7, 2024

Colecf commented Jan 7, 2024

evmar commented Jan 7, 2024

evmar commented Jan 8, 2024

Colecf commented Jan 16, 2024

evmar commented Jan 18, 2024

evmar commented Jan 18, 2024

Colecf commented Jan 18, 2024 •

edited

evmar commented Jan 19, 2024

Colecf commented Jan 19, 2024

evmar commented Jan 19, 2024

evmar commented Jan 19, 2024

Colecf commented Jan 20, 2024

evmar commented Jan 21, 2024

Evaluate build paths in the context of the build's bindings #94

Evaluate build paths in the context of the build's bindings #94

Conversation

Colecf commented Dec 29, 2023

evmar commented Dec 30, 2023 • edited

evmar Dec 30, 2023

Choose a reason for hiding this comment

Colecf Dec 31, 2023

Choose a reason for hiding this comment

evmar Jan 4, 2024

Choose a reason for hiding this comment

evmar Dec 30, 2023

Choose a reason for hiding this comment

evmar Dec 30, 2023

Choose a reason for hiding this comment

Colecf Dec 31, 2023

Choose a reason for hiding this comment

Colecf commented Dec 30, 2023

Colecf commented Dec 31, 2023 • edited

Colecf commented Jan 3, 2024

Colecf commented Jan 3, 2024

evmar commented Jan 4, 2024

evmar commented Jan 4, 2024

Colecf commented Jan 5, 2024 • edited

evmar commented Jan 7, 2024

Colecf commented Jan 7, 2024

evmar commented Jan 7, 2024

evmar commented Jan 8, 2024

Colecf commented Jan 16, 2024

evmar commented Jan 18, 2024

evmar commented Jan 18, 2024

Colecf commented Jan 18, 2024 • edited

evmar commented Jan 19, 2024

Colecf commented Jan 19, 2024

evmar commented Jan 19, 2024

evmar commented Jan 19, 2024

Colecf commented Jan 20, 2024

evmar commented Jan 21, 2024

evmar commented Dec 30, 2023 •

edited

Colecf commented Dec 31, 2023 •

edited

Colecf commented Jan 5, 2024 •

edited

Colecf commented Jan 18, 2024 •

edited