Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with LTO should skip "compilation" #43212

Closed
glandium opened this issue Jul 13, 2017 · 8 comments
Closed

Building with LTO should skip "compilation" #43212

glandium opened this issue Jul 13, 2017 · 8 comments
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@glandium
Copy link
Contributor

glandium commented Jul 13, 2017

See #43211 for some horrorifying timings from doing LTO builds in Firefox.

Please correct me where I'm wrong, but here is my understanding of the situation wrt building with LTO (and more or less confirmed by @alexcrichton and @mbrubeck on irc):

  • Cargo starts building all dependencies
  • For each dependency, the rust compiler creates an rlib
  • The rlib contains compiled code for the crate, as well as metadata about the crate.
  • When linking the main crate with LTO, the rust compiler uses the metadata from the dependee's rlibs, and compiles based on that and the code in the current crate. As I understand it, at this point, all the code that was compiled and put in those rlibs is not used.

In simplified and C/C++ terms, this is my understanding of what's happening:

  • Let's say we have a test.c that is built in a libtest library, and linked with foo.c into a foo binary.
  • The libtest library is generated with:
    • gcc -o test.o -c test.c -O3 that's the compiled code part
    • gcc -o test.lto.o -c test.c -O3 -flto that's the metadata used for LTO
    • gcc-ar cr libtest.a test.lto.o test.o
  • The code for the main binary is generated with:
    • gcc -o foo.lto.o -c foo.c -O3 -flto
    • (maybe rust even compiles the code here too? like gcc -o foo.o -c foo.c -O3)
    • gcc -flto -o foo foo.lto.o libtest.a

In the above, the fact is, if libtest.a only contained test.lto.o, the foo binary would still compile fine, because the compiled code is not used. Which means we've spent time generating that test.o for nothing.

Now, consider a crate like geckoservo, which, while it contains 3Kloc, you wouldn't expect to require the time it takes to build (it's well above a minute). @mbrubeck suggested that compiling the crate inlines a bunch of stuff. Which is probably what is happening. Except that seems completely irrelevant and wasted time, considering it will have to do it all again when linking the entire project.

FWIW, the -Ztime-passes output with last 1.20 nightly, for geckoservo looks like:

time: 0.011; rss: 32MB  parsing
time: 0.000; rss: 32MB  recursion limit
time: 0.000; rss: 32MB  crate injection
time: 0.000; rss: 32MB  plugin loading
time: 0.000; rss: 32MB  plugin registration
time: 0.243; rss: 134MB expansion
time: 0.000; rss: 134MB maybe building test harness
time: 0.000; rss: 134MB maybe creating a macro crate
time: 0.000; rss: 134MB checking for inline asm in case the target doesn't support it
time: 0.001; rss: 134MB early lint checks
time: 0.000; rss: 134MB AST validation
time: 0.015; rss: 137MB name resolution
time: 0.001; rss: 137MB complete gated feature checking
time: 0.005; rss: 140MB lowering ast -> hir
time: 0.001; rss: 138MB indexing hir
time: 0.000; rss: 138MB attribute checking
time: 0.000; rss: 135MB language item collection
time: 0.001; rss: 135MB lifetime resolution
time: 0.000; rss: 135MB looking for entry point
time: 0.000; rss: 135MB looking for plugin registrar
time: 0.000; rss: 135MB loop checking
time: 0.000; rss: 135MB static item recursion checking
time: 0.016; rss: 136MB compute_incremental_hashes_map
time: 0.000; rss: 136MB load_dep_graph
time: 0.000; rss: 136MB stability index
time: 0.002; rss: 136MB stability checking
time: 0.004; rss: 137MB type collecting
time: 0.000; rss: 137MB impl wf inference
time: 0.000; rss: 137MB coherence checking
time: 0.000; rss: 137MB variance testing
time: 0.009; rss: 138MB wf checking
time: 0.009; rss: 140MB item-types checking
time: 0.366; rss: 185MB item-bodies checking
time: 0.024; rss: 185MB const checking
time: 0.002; rss: 186MB privacy checking
time: 0.001; rss: 186MB intrinsic checking
time: 0.000; rss: 186MB effect checking
time: 0.005; rss: 186MB match checking
time: 0.001; rss: 186MB liveness checking
time: 0.076; rss: 193MB borrow checking
time: 0.000; rss: 193MB reachability checking
time: 0.001; rss: 193MB death checking
time: 0.000; rss: 193MB unused lib feature checking
time: 0.011; rss: 193MB lint checking
time: 0.000; rss: 193MB resolving dependency formats
  time: 0.009; rss: 194MB       write metadata
  time: 0.569; rss: 279MB       translation item collection
  time: 0.041; rss: 298MB       codegen unit partitioning
  time: 0.022; rss: 748MB       internalize symbols
time: 6.012; rss: 748MB translation
time: 0.000; rss: 748MB assert dep graph
time: 0.000; rss: 748MB serialize dep graph
  time: 4.810; rss: 712MB       llvm function passes [0]
  time: 79.068; rss: 958MB      llvm module passes [0]
  time: 21.767; rss: 929MB      codegen passes [0]
  time: 0.001; rss: 929MB       codegen passes [0]
time: 107.035; rss: 929MB       LLVM passes
time: 0.000; rss: 929MB serialize work products

e.g. most of the time is in llvm module and codegen passes.

Cc: @froydnj @rillian

@alexcrichton alexcrichton added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 13, 2017
@alexcrichton
Copy link
Member

cc @michaelwoerister

@michaelwoerister
Copy link
Member

Yes, this is one of the cases where MIR-only RLIBs would help.

@michaelwoerister
Copy link
Member

I'll think some more about this next week. Maybe there's something we can do already with the compiler's current capabilities.

@michaelwoerister
Copy link
Member

  • When linking the main crate with LTO, the rust compiler uses the metadata from the dependee's rlibs, and compiles based on that and the code in the current crate. As I understand it, at this point, all the code that was compiled and put in those rlibs is not used.

So this is not entirely true. The rlib's metadata also contains optimized LLVM bitcode and when building the main crate, this bitcode is linked into the main LLVM module for that crate. After all bitcode from all rlib dependencies is linked together into one humongous module, we let LLVM run another set of optimizations on it. This is what the LTO linker plugin would normally do.

Consequently, the optimization passes for intermediate rlibs are not really "lost", only the codegen passes, which are not cheap but also generally not as expensive as the LLVM passes before.

There is a different way of going about code generation and optimization for Rust code though. We call it MIR-only RLIBs and in this model we would generate neither LLVM IR nor machine code for RLIBs. Only when building an actual binary would the compiler instantiate the things from RLIB dependencies.

Compiling RLIBs would be massively sped up in this model, however, building the binary would also be that much slower. So it's not entirely clear that this is a win in overall build times, especially if one is mostly working on leaf crates and doesn't have to rebuild intermediate RLIBs often.

However, it's still possible that this is a win because right now we are not sharing instantiations of generic functions between crates, potentially asking the compiler to optimize the exact same code over and over again. In the MIR-only RLIB model, there would always only be one instance per leaf crate. Also, there's more room for dead code elimination because at the moment, when building an RLIB, the compiler has to assume that it will later be linked into a Rust Dylib which would export/instantiate a lot more things than an executable, staticlib, or cdylib.

@glandium When building with LTO enabled, are you mostly interested in reducing the build times for Rust developers or rather scenarios where Rust code is largely unmodified between builds?

@michaelwoerister
Copy link
Member

I'm debating whether it makes sense to have kind of a "soft launch" for MIR-only RLIBs:

  • We modify Cargo to pass -C lto also when building RLIBs (which I think it doesn't at the moment).
  • When an RLIB is built with -C lto it becomes a MIR-only RLIB: No trans, no LLVM, no linking.
  • When a binary is built, the compiler takes care of instantiating all exported functions and statics from any MIR-only RLIBs in the dependency graph. It would be able to work with any mix of regular and MIR-only RLIBs, whether LTO is enabled or not.

One would be able to opt into the new model in a backwards compatible way.

What do you think, @alexcrichton, @brson?

@alexcrichton
Copy link
Member

Sounds reasonable to me!

@glandium
Copy link
Contributor Author

When building with LTO enabled, are you mostly interested in reducing the build times for Rust developers or rather scenarios where Rust code is largely unmodified between builds?

I guess both, but it's hard to have it both ways.

@Mark-Simulacrum Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. labels Jul 28, 2017
@nnethercote
Copy link
Contributor

With recent changes, #70458 and #71528 in particular, rlibs created during LTO builds now only contain metadata and LLVM bitcode, but no object code. Previously they did contain object code. IIUC, that addresses the original complaint of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants