Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move metadata generation before analysis #64112

Closed
wants to merge 6 commits into from

Conversation

@nnethercote
Copy link
Contributor

commented Sep 3, 2019

The goal of this commit is to improve the effectiveness of pipelining, because metadata generation must complete before dependent crates can begin compiling.

r? @ghost

nnethercote added 6 commits Aug 29, 2019
Merge `Compiler::{ongoing_codegen,link}`.
They each have two call sites, and the sequencing is almost identical in
both cases.
Remove `lower_to_hir()` call from `prepare_output()`.
It's a false dependency. The result isn't used and there are no
relevant side-effects.
Move call site of `dep_graph_future()`.
`Compiler::register_plugins()` calls `passes::register_plugins()`, which
calls `Compiler::dep_graph_future()`. This is the only way in which a
`passes` function calls a `Compiler` function.

This commit moves the `dep_graph_future()` call out of
`passes::register_plugins()` and into `Compiler::register_plugins()`,
which is a more sensible spot for it. This will delay the loading of the
dep graph slightly -- from the middle of plugin registration to the end
of plugin registration -- but plugin registration is fast enough
(especially compared to expansion) that the impact should be neglible.
Remove `Compiler::compile()`.
`Compiler::compile()` is different to all the other `Compiler` methods
because it lacks a `Queries` entry. It only has one call site, which is
in a test that doesn't need its specific characteristics.

This patch removes `Compile::compile()` and replaces the call to it in
the test with a call to `Compile::codegen_and_link()`, which is similar
enough for the test's purposes.
[DRAFT] Move metadata generation before analysis.
The goal of this commit is to improve the effectiveness of pipelining,
because metadata generation must complete before dependent crates can
begin compiling.

Unfortunately it's not much of a win. The biggest speed-up I have seen
is 1.07x, vs 1.79x for the original pipelining implementation. And it
slightly slows down the common case when an error occurs during
analysis, because metadata generation now precedes analysis.

Currently three tests fail with the change, due to bounds violations.
These tests are currently `git rm`'d, so that subsequent tests can run.
@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

Unfortunately it's not much of a win. The biggest speed-up I have seen is 1.07x, vs 1.79x for the original pipelining implementation. And it slightly slows down the common case when an error occurs during analysis, because metadata generation now precedes analysis.

Currently three tests fail with the change, due to bounds violations. These tests are currently git rm'd, so that subsequent tests can run.

The first give commits above are from #64016, which is awaiting review.

Here are timings for the multi-crate rustc-perf benchmarks that currently compile on my Linux box. For each configuration there are three measurements:

  • no pipelining;
  • existing pipelining;
  • new, more aggressive pipelining.
syn
- dbg: 3.01 s -> 2.51 s (1.19x faster) -> 2.51 s (1.00x faster)
- opt: 5.86 s -> 3.26 s (1.79x faster) -> 3.03 s (1.07x faster)

regex
- dbg: 3.59 s -> 3.24 s (1.10x faster) -> 3.24 s (1.00x faster)
- opt: 6.37 s -> 4.58 s (1.39x faster) -> 4.42 s (1.03x faster)

piston-image
- dbg: 7.75 s -> 6.99 s (1.10x faster) -> 6.92 s (1.00x faster)
- opt: 14.45 s -> 10.50 s (1.37x faster) -> 10.27 s (1.02x faster)

webrender/webrender
- dbg: 36.94 s -> 32.83 s (1.12x faster) -> 31.35 s (1.04x faster)
- opt: 57.79 s -> 43.89 s (1.31x faster) -> 41.95 s (1.04x faster)

ripgrep
- dbg: 11.18 s -> 10.47 s (1.06x faster) -> 10.58 s (.98x faster)
- opt: 23.02 s -> 19.55 s (1.17x faster) -> 19.56 s (.99x faster)

cranelift-codegen/cranelift-codegen
- dbg: 17.66 s -> 17.31 s (1.02x faster) -> 16.61 s (1.04x faster)
- opt: 23.46 s -> 19.97 s (1.17x faster) -> 19.45 s (1.02x faster)

futures
- dbg: 1.18 s -> 1.15 s (1.02x faster) -> 1.24 s (.93x faster)
- opt: 1.25 s -> 1.06 s (1.17x faster) -> 1.06 s (1.00x faster)

wg-grammar
- dbg: 9.28 s -> 8.87 s (1.04x faster) -> 8.85 s (1.00x faster)
- opt: 12.03 s -> 10.98 s (1.09x faster) -> 10.70 s (1.02x faster)

webr_api/webrender_api
- dbg: 29.45 s -> 28.94 s (1.01x faster) -> 28.26 s (1.02x faster)
- opt: 44.75 s -> 43.01 s (1.04x faster) -> 41.63 s (1.03x faster)

clap-rs
- dbg: 11.64 s -> 11.52 s (1.01x faster) -> 11.54 s (.99x faster)
- opt: 25.05 s -> 24.75 s (1.01x faster) -> 24.84 s (.99x faster)

html5ever
- dbg: 6.13 s -> 6.08 s (1.00x faster) -> 6.11 s (.99x faster)
- opt: 8.05 s -> 7.95 s (1.01x faster) -> 8.02 s (.99x faster)

encoding
- dbg: 1.60 s -> 1.54 s (1.03x faster) -> 1.58 s (.97x faster)
- opt: 1.80 s -> 1.78 s (1.00x faster) -> 1.76 s (1.01x faster)

ctfe-stress-2
- dbg: 11.89 s -> 12.10 s (.98x faster) -> 12.06 s (1.00x faster)
- opt: 9.08 s -> 9.06 s (1.00x faster) -> 8.97 s (1.01x faster)

coercions
- dbg: 1.46 s -> 1.42 s (1.02x faster) -> 1.42 s (.99x faster)
- opt: 1.08 s -> 1.09 s (.99x faster) -> 1.08 s (1.00x faster)

cargo
- dbg: 33.75 s -> 33.57 s (1.00x faster) -> 33.61 s (.99x faster)
- opt: 53.38 s -> 53.48 s (.99x faster) -> 53.36 s (1.00x faster)

You can see that the additional improvements from the new, aggressive pipelining are paltry compared to the improvements from the existing pipelining.

Because of all this, my current inclination is to abandon the effort, but I'm posting it here in case somebody has an idea to improve it.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

The small improvements make sense; analysis takes a lot less time than code generation, so the amount of additional overlap is relatively small.

@Centril

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

My reading of the diff is that metadata generation still happens after lowering and more importantly after cfg stripping/macro expansion. Can you confirm?

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

My reading of the diff is that metadata generation still happens after lowering and more importantly after cfg stripping/macro expansion. Can you confirm?

That is correct. I couldn't push it any earlier because do_metadata() requires a TyCtxt, which is produced by Compiler::global_ctxt(), which relies on the results of Compiler::lower_to_hir(), which relies on the results of Compiler::expansion().

@Centril

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Thanks.

The numbers for syn still look enticing. cc @dtolnay

@est31

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

it slightly slows down the common case when an error occurs during analysis, because metadata generation now precedes analysis.

What about doing them in parallel? Of course, it might still be slightly slower on hardware with not enough cores available, but at least there won't be any regression on sufficiently "good" hardware.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

commented Sep 4, 2019

Doesn't metadata generation needed data that is produced only very late in the process (like optimized MIR and, potentially, the list of exported monomorphizations)? The query system should take care of producing this data on demand but with what's expected to be contained in RLIBs at the moment, it seems hard to effectively move metadata generation more towards the front.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Sep 4, 2019

@alexcrichton said on Zulip:

rustc has tons of caches internally
so the time previously spent in analysis may just be shifted to metadata generation now
(maybe)
and the analysis phase, which happens later, just hits all the caches and doesn't actually do anything

This may be true. Understanding the ordering of things in rustc is surprisingly hard. I did spend enough time looking at (and refactoring!) librustc_interface/queries.rs and librustc_interface/passes.rs that I understand Compiler quite well, and I am confident that I got the high-level ordering right. (And diagnostic println! statements confirmed this.) It is still possible that there are things going on at a different level that I'm not aware of.

I am still inclined to abandon this effort. I tried what I set out to try, got a result that makes sense (very slight speedups, but much smaller than the original pipelining because analysis is much quicker than codegen), and suggestions for further work are more speculative. If somebody else wants to experiment with it further, please do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.