rustbuild: Compile rustc with ThinLTO #45400

alexcrichton · 2017-10-20T03:05:20Z

This commit enables ThinLTO for the compiler as well as multiple codegen units.
This is intended to get the benefits of parallel codegen while also avoiding
any major loss of perf. Finally this commit is also intended as further testing
for #45320 and shaking out bugs.

alexcrichton · 2017-10-20T03:05:53Z

src/bootstrap/bin/rustc.rs

@@ -175,6 +175,9 @@ fn main() {
        if let Ok(s) = env::var("RUSTC_CODEGEN_UNITS") {
            cmd.arg("-C").arg(format!("codegen-units={}", s));
        }
+        if stage != "0" && env::var("RUSTC_THINLTO").is_ok() {


Note that this skips stage0 as the current stage0 doesn't have ThinLTO (but that should be coming soon) but despite that the ThinLTO implementation on beta is probably too buggy to use.

alexcrichton · 2017-10-20T03:06:33Z

r? @Mark-Simulacrum

cc @michaelwoerister
cc #44675

Mark-Simulacrum · 2017-10-20T03:46:41Z

src/bootstrap/builder.rs

@@ -618,6 +618,10 @@ impl<'a> Builder<'a> {
        // FIXME: cargo bench does not accept `--release`
        if self.config.rust_optimize && cmd != "bench" {
            cargo.arg("--release");
+
+            if mode != Mode::Libstd {
+                cargo.env("RUSTC_THINLTO", "1");


Why do we not want this for std?

This will (afaict) override the codegen-units setting in config.toml unconditionally, and isn't gated either, both of which seem somewhat odd. I think I'm okay with the latter (we expect ThinLTO to work reliably and not lose performance) but I'd prefer that we then remove codegen-units from the configuration.

OTOH, if we are disabling this for std due to performance concerns, then I feel like it should be behind a toggle.

At the moment I'm just trying to be conservative, my plan is to, after this lands, enable this for libstd. I just suspect that'll have more bugs/regressions associated with it (for whatever reason) and wanted to get the rustc wins ASAP

Mark-Simulacrum · 2017-10-20T03:47:53Z

Once Travis is fixed, I'd also like to run a perf try on this to make sure it's not a regression in performance (don't want to eat compiler dev cycles for no reason).

kennytm · 2017-10-20T06:18:27Z

src/bootstrap/bin/rustc.rs

@@ -175,6 +175,9 @@ fn main() {
        if let Ok(s) = env::var("RUSTC_CODEGEN_UNITS") {
            cmd.arg("-C").arg(format!("codegen-units={}", s));
        }
+        if stage != "0" && env::var("RUSTC_THINLTO").is_ok() {
+            cmd.arg("-Ccodegen-units=16").arg("-Zthinlto");


Could we respect the rust.codegen-units settings in config.toml here instead of hard-coding "16"?

This is all intended to be pretty temporary, and for now I'd prefer to just keep this easy to remove later as I don't think anyone's really configuring codegen units anyway.

@alexcrichton Ok, please add a FIXME.

(I did set the CGU to 3 😆)

michaelwoerister · 2017-10-20T09:50:08Z

Awesome :)

alexcrichton · 2017-10-20T14:26:31Z

@bors: try

@Mark-Simulacrum should be fixed now I think

bors · 2017-10-20T14:26:41Z

⌛ Trying commit 115d187eaed09cff8a60dacf5694d237f9b8b15f with merge 4306d764ad37fe6df6c145c2790541f26e6134aa...

bors · 2017-10-20T15:55:22Z

☀️ Test successful - status-travis
State: approved= try=True

Mark-Simulacrum · 2017-10-20T16:00:17Z

Looks good to me. I'll go ahead and run perf once I can and if that doesn't show any major regressions r+. Excited to improve rustc compile times!

Mark-Simulacrum · 2017-10-20T17:46:34Z

Hm, so this looks like it regresses compiler perf by quite a bit, so I don't think we'd be ready to do this quite yet: http://perf.rust-lang.org/compare.html?start=95272a07f1fe3a82497225a4cd0d67080b8ffebf&end=4306d764ad37fe6df6c145c2790541f26e6134aa&stat=instructions%3Au. Do you think we can do something within this PR to improve that? Otherwise this seems like it'd be a net loss to compiler devs.

alexcrichton · 2017-10-20T18:42:29Z

Thanks for collecting the data @Mark-Simulacrum!

I do sort of disagree with the conclusion I think though. The best thing to look at is the timing comparison rather than the instruction comparison (aka instructions mostly just show where else to look). Here we still see a "wall of red" but everything is quite modest. It looks like most incremental benchmarks fared particulary bad, but again losing at most hundreds of milliseconds.

I fear that if we accept zero runtime regressions that we will never make this change. I consider turning ThinLTO and multiple CGUs on for rustc to be an important step #45320, and if we never do this we'll never actually close #45320 as finished. The data on #45320 shows massive wins across the ecosystem. I think it's inevitable that enabling ThinLTO and multiple CGUs is going to come at a slight performance loss. It's simply architecturally different and less optimized than today's setup.

On the other hand, however, I'd like to consider the other extreme for a moment. Let's say that we had LTO enabled unconditionally for all Rust crates at 1.0.0. Clearly the compile times are sub-par and have easy wins (turn LTO off). If you disable LTO, however, you're very likely to regress on almost all benchmarks slightly. Despite this I don't think we'd ever consider enabling LTO by default!

What we are doing, however, is effectively enabling "whole crate LTO" by default today. This compilation model is mostly unprecedented in C/C++ and is often why Rust code performs so well despite not paying much attention to it. That we blocked on enabling mulitple CGUs in release mode on ThinLTO was surprising enough to me originally, but if we don't enable multiple CGUs at this juncture I fear that we never will. We should of course be honest in that ThinLTO and multiple CGUs will come at a performance regression, but the "regression" here, in my opinion, is very much in the "lost in the noise" territory. For example all of these 1-5% regressions could indicating timing differences or immediately get swallowed up by other improvements. Furthermore we continue to have an escape hatch to get localized equivalences to today's codegen, #[inline].

I don't think a policy of "zero regressions tolerated" is too helpful in this situation, and in this situation I think not merging this could be actively harmful to compile times in rustc in the long run actually. Here we see some modest regressions which may or may not improve as ThinLTO is refined upstream over time. If this actually affects code in practice there's an escape hatch through #[inline].

With my own personal opinions out of the way, I did indeed dig more into the differences between these two compilers. I chose the syntex_syntax benchmark as it showed the largest cpu-clock regression (~10%), and this benchmark is specifically for a building the syntex_syntax crate in incremental mode where the incremental cache is 100% full and has 100% cache hits (no source changes).

I was indeed able to reproduce the regression, where compile times increased from ~13s to ~14s. Taking a look at the profile data from before this change and then after this change I noticed a few things:

This benchmark is very heavy on HashMap as it's a huge crate with 100% cache hits in incremental, which means it's basically entirely bottlenecked on the performance there
Notably the before profile highest function is try_mark_green, a large incremental-related function. Afterewards the highest function (a number of them actually), are HashMap::get and related functionality.
This seems to imply that with one CGU the HashMap::get function was inlined, but ThinLTO decided to not inline it, however.
The assembly generated for HashMap::get is not the biggest thing in the world, but it's also not exactly small.

So my conclusion here is that the performance difference mostly likely does indeed just come down to inlining decisions. I'm not sure why ThinLTO didn't inline this function whereas one CGU did indeed inline it.

I personally think we should move forward regardless with this patch no matter what. I think multiple CGUs has the potential for a huge speedup of rustc compile times (trans/LLVM is an absolutely huge portion of bootstrap, parallelizing anything would be a huge boon).

Despite this, though, I'm curious to evaluate the impact of #[inline] to attempt to recover any changes from multiple CGUs. I've pushed a commit up with a few more #[inline] annotations throughout HashMap (on the hottest functions here) and we can rerun performance data to see what's up.

alexcrichton · 2017-10-20T18:48:05Z

@bors: try

bors · 2017-10-20T18:48:14Z

⌛ Trying commit 2997c03c831acbbb94c2943fd70e1e33c89e0aaa with merge 57a4ae1e8edd91e10f5d7865423b81f79e22adb6...

This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for rust-lang#45320 and shaking out bugs.

alexcrichton · 2017-10-20T18:49:55Z

@bors: try

bors · 2017-10-20T18:50:04Z

⌛ Trying commit d01427f with merge 11de9ca...

rustbuild: Compile rustc with ThinLTO This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for #45320 and shaking out bugs.

bors · 2017-10-20T20:13:39Z

☀️ Test successful - status-travis
State: approved= try=True

alexcrichton · 2017-10-20T20:52:54Z

Ok the new comparison I believe is less worse than the old comparison, and sure enough the before/after of the #[inline] annotations shows improvements across the board.

Mark-Simulacrum · 2017-10-20T22:50:21Z

I'm happy with your explanation, thanks! Seems like this overall should be an improvement in compile times for the rustc itself, and only a marginal loss for the ecosystem, so I'm happy with that.
@bors r+

bors · 2017-10-20T22:50:22Z

📌 Commit d01427f has been approved by Mark-Simulacrum

petrochenkov · 2017-10-20T23:08:40Z

improvement in compile times for the rustc itself, and only a marginal loss for the ecosystem

I guess it's okay for development builds, but it looks like this affects compilers released on stable channel as well.
There are no reasons to pessimize stable compilers in favor of faster bootstrap if this bootstrap happens only once in 6 weeks.

Mark-Simulacrum · 2017-10-20T23:22:37Z

We can disable the ThinLTO on dist builds -- but then again, I think the argument @alexcrichton makes is that we don't enable LTO either (which would net us perf wins, I'd imagine).

I'm okay with disabling ThinLTO on dist builds, but I do think it should be by default enabled for developers working on rustc, since we expect it to improve compile times of rustc. In the long run, I expect (perhaps wrongly) that ThinLTO + codgen-units will eventually be equivalent to a non-parallel optimizer, and perhaps even slightly better.

Since next beta will dist without this though I think we're okay landing this for now and we can iterate: I don't think it's worth not landing this, but if you or others disagree, feel free to r-.

petrochenkov · 2017-10-20T23:45:02Z

we don't enable LTO either

For stable releases it would be nice to enable it as well.
I don't remember why exactly this wasn't done earlier (LTO bugs? too much memory? or maybe it was in old infra?), @alexcrichton probably knows better.

bors · 2017-10-22T00:35:11Z

⌛ Testing commit d01427f with merge 1042190...

rustbuild: Compile rustc with ThinLTO This commit enables ThinLTO for the compiler as well as multiple codegen units. This is intended to get the benefits of parallel codegen while also avoiding any major loss of perf. Finally this commit is also intended as further testing for #45320 and shaking out bugs.

bors · 2017-10-22T03:13:12Z

☀️ Test successful - status-appveyor, status-travis
Approved by: Mark-Simulacrum
Pushing 1042190 to master...

michaelwoerister · 2017-10-23T08:29:47Z

So here are some thoughts from me:

It is really surprising how small the performance regression is. The trade-off here is between bors merge time and very rigorously testing ThinLTO on the one hand and having nightly builds with slightly worse runtime performance on the other hand. To me the choice is pretty clear.
I agree that we should build stable and beta releases with just one CGU and whatever other performance optimization we can do. I think the only reason we don't use regular LTO for stable releases is that rustc is built out of Rust dylibs which are incompatible with LTO.
@alexcrichton, @Mark-Simulacrum, do we have numbers on how this influences build times in CI?

Overall I'm in favor of keeping this enabled for nightly builds.

alexcrichton · 2017-10-23T15:52:04Z

@petrochenkov sorry didn't get a chance to comment on this before it merged, will continue on #45444

alexcrichton commented Oct 20, 2017

View reviewed changes

rust-highfive assigned Mark-Simulacrum Oct 20, 2017

Mark-Simulacrum reviewed Oct 20, 2017

View reviewed changes

kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Oct 20, 2017

kennytm reviewed Oct 20, 2017

View reviewed changes

alexcrichton force-pushed the bootstrap-thinlto branch from b00770b to 115d187 Compare October 20, 2017 14:25

alexcrichton mentioned this pull request Oct 20, 2017

Tracking issue for enabling multiple CGUs in release mode by default #45320

Closed

11 tasks

alexcrichton force-pushed the bootstrap-thinlto branch from 115d187 to 2997c03 Compare October 20, 2017 18:47

alexcrichton force-pushed the bootstrap-thinlto branch from 2997c03 to 8327f1c Compare October 20, 2017 18:48

alexcrichton added 2 commits October 20, 2017 11:49

[test] Add some #[inline] to HashMap

d01427f

alexcrichton force-pushed the bootstrap-thinlto branch from 8327f1c to d01427f Compare October 20, 2017 18:49

kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 21, 2017

bors merged commit d01427f into rust-lang:master Oct 22, 2017

petrochenkov mentioned this pull request Oct 22, 2017

rustbuild: Disable multiple CGUs in stable release builds #45444

Closed

alexcrichton deleted the bootstrap-thinlto branch October 23, 2017 15:51

alexcrichton mentioned this pull request Oct 25, 2017

rustbuild: Build libstd with ThinLTO #45529

Closed

alexcrichton mentioned this pull request Nov 4, 2017

Disable backtraces on OSX #45760

Closed

alexcrichton mentioned this pull request Jul 27, 2018

Make Vec derefing inlinable #52704

Closed

rustbuild: Compile rustc with ThinLTO #45400

rustbuild: Compile rustc with ThinLTO #45400

Uh oh!

Conversation

alexcrichton commented Oct 20, 2017

Uh oh!

alexcrichton Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

Mark-Simulacrum Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

alexcrichton Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum commented Oct 20, 2017

Uh oh!

kennytm Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

alexcrichton Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

kennytm Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

michaelwoerister commented Oct 20, 2017

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

Mark-Simulacrum commented Oct 20, 2017

Uh oh!

Mark-Simulacrum commented Oct 20, 2017

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

alexcrichton commented Oct 20, 2017

Uh oh!

Mark-Simulacrum commented Oct 20, 2017

Uh oh!

bors commented Oct 20, 2017

Uh oh!

petrochenkov commented Oct 20, 2017

Uh oh!

Mark-Simulacrum commented Oct 20, 2017

Uh oh!

petrochenkov commented Oct 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bors commented Oct 22, 2017

Uh oh!

bors commented Oct 22, 2017

Uh oh!

michaelwoerister commented Oct 23, 2017

Uh oh!

alexcrichton commented Oct 23, 2017

Uh oh!

Uh oh!

petrochenkov commented Oct 20, 2017 •

edited

Loading