Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uprustc: Implement ThinLTO #44841
Conversation
rust-highfive
assigned
arielb1
Sep 25, 2017
This comment has been minimized.
This comment has been minimized.
|
r? @arielb1 (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
|
Note that the first commit here is from #44783, so shouldn't need extra review, the thin support should all be in the second. Example output of the modified |
rust-highfive
assigned
michaelwoerister
and unassigned
arielb1
Sep 25, 2017
alexcrichton
reviewed
Sep 25, 2017
| if codegen_unit_name.contains(NUMBERED_CODEGEN_UNIT_MARKER) { | ||
| // If we use the numbered naming scheme for modules, we don't want | ||
| // the files to look like <crate-name><extra>.<crate-name>.<index>.<ext> | ||
| // but simply <crate-name><extra>.<index>.<ext> |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Sep 25, 2017
Author
Member
@michaelwoerister mind helping me understand what was going on here? For ThinLTO we need to make sure that all the objects and such have unique names, which this comment seems to indicate we will have achieved. (although in practice I didn't see crate name hashes and such in those names)
With this name munging left in though I found that lots of objects were overwriting one another by accident, because I think the backend of shuffling files around was "getting weird". I couldn't find a downside to removing this logic, though, so I was curious if you knew what this was originally added for?
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Sep 25, 2017
Contributor
This logic looks like a measure for keeping filenames/paths short. We could probably just get rid of the "numbered" codegen unit naming scheme (incremental does not use that). That was only introduced to stay close to the then existing behavior. That would have the downside of codegen units having somewhat misleading names but they should be unique.
This comment has been minimized.
This comment has been minimized.
|
That's a nice surprise
|
This comment has been minimized.
This comment has been minimized.
Oops an excellent question! I had 8 cores.
AFAIK no, I realized when doing all this that we should just turn this on by default. I was gonna do that in a separate PR orthogonal to this though.
Hm sort of! I think it's a lot better than it looks though. For example, here's a comparison between 1 and 16 codegen units, without ThinLTO enabled (of the regex benchmark suite, threshold 5% regression at least). And then here's the same comparison when those 16 CGUs are compiled with ThinLTO. Notably nearly every benchmark regesses (sometimes by 10x) with just vanilla codegen units, whereas with ThinLTO the worst regression is 18ns/iter -> 27ns/iter, and ThinLTO even improved the performance in one case! In other words, my conclusion is that runtime performance is "basically the same" modulo compiler wizardry details. My guess is that any possible performance loss could be easily fixed by Overally though I definitely wouldn't classify it as overall a significant performance loss, only a very minor performance loss in some esoteric situations, at best. From what I've seen it does everything you'd expect it to do across the board. Then again, that's why this is unstable to start out with :) |
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Sep 25, 2017
alexcrichton
referenced this pull request
Sep 25, 2017
Merged
rustc: Default 32 codegen units at O0 #44853
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Sep 25, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Sep 25, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Sep 25, 2017
This comment has been minimized.
This comment has been minimized.
|
Amazing!! On the subject of performance, note that ThinLTO has historically been missing some optimizations of classical LTO (just due to them not being implemented). One class of such optimizations specifically called out on LLVM's "open projects" page is "propagating more global informations across the program". I assume upstream has made progress on this since LLVM 4.0, so maybe we'll see improved performance just by updating to LLVM 5.0? |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Sep 26, 2017
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I've just ran some build time benchmarks on my project that uses a lot of popular rust libraries and codegen (diesel, hyper, serde, tokio, futures, reqwest) on my Intel Core i5 laptop (skylake 2c/4t) and got these results:
Cargo profile:
rustc 1.22.0-nightly (17f56c5 2017-09-21) As expected, the best results is for codegen units of number of system threads, and 32 is way to much for an average machine. Did you concider an option to select number of codegen units depending on number of cpus, with Thank you for working on compile times! |
This comment has been minimized.
This comment has been minimized.
|
@mersinvald very interesting! Did you mean to comment on #44853 though? If so, maybe we can continue over there? |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton ok, sorry :) |
arielb1
added
the
S-waiting-on-author
label
Sep 26, 2017
alexcrichton
force-pushed the
alexcrichton:thinlto
branch
3 times, most recently
from
d8ae4f8
to
d3b0c49
Sep 26, 2017
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton FYI if you were trying to convince us that you weren't a robot, this and #44853 back-to-back aren't helping. ;) |
bors
added a commit
that referenced
this pull request
Sep 29, 2017
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton
force-pushed the
alexcrichton:thinlto
branch
from
d3b0c49
to
51e37aa
Sep 30, 2017
alexcrichton
added
S-waiting-on-review
and removed
S-waiting-on-author
labels
Sep 30, 2017
This comment has been minimized.
This comment has been minimized.
|
Rebased and should be ready for review! |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Oct 6, 2017
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
Multiple
|
alexcrichton
force-pushed the
alexcrichton:thinlto
branch
from
5f66c99
to
1cb0c99
Oct 6, 2017
This comment has been minimized.
This comment has been minimized.
|
@bors: r=michaelwoerister |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Oct 7, 2017
This comment has been minimized.
This comment has been minimized.
|
|
kennytm
reviewed
Oct 7, 2017
| let foo = foo as usize as *const u8; | ||
| let bar = bar::bar as usize as *const u8; | ||
|
|
||
| assert_eq!(*foo, *bar); |
This comment has been minimized.
This comment has been minimized.
kennytm
Oct 7, 2017
Member
asm.js failed on this line...
[01:25:14] failures:
[01:25:14]
[01:25:14] ---- [run-pass] run-pass/thin-lto-inlines.rs stdout ----
[01:25:14]
[01:25:14] error: test run failed!
[01:25:14] status: exit code: 101
[01:25:14] command: "/emsdk-portable/node/4.1.1_64bit/bin/node" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/thin-lto-inlines.stage2-asmjs-unknown-emscripten.js"
[01:25:14] stdout:
[01:25:14] ------------------------------------------
[01:25:14] 3 3
[01:25:14]
[01:25:14] ------------------------------------------
[01:25:14] stderr:
[01:25:14] ------------------------------------------
[01:25:14] thread 'main' panicked at 'assertion failed: `(left == right)`
[01:25:14] left: `115`,
[01:25:14] right: `99`', /checkout/src/test/run-pass/thin-lto-inlines.rs:36:8
[01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[01:25:14]
[01:25:14] ------------------------------------------
[01:25:14]
[01:25:14] thread '[run-pass] run-pass/thin-lto-inlines.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8
[01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace.
alexcrichton
force-pushed the
alexcrichton:thinlto
branch
from
1cb0c99
to
4ca1b19
Oct 7, 2017
This comment has been minimized.
This comment has been minimized.
|
@bors: r=michaelwoerister |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Oct 7, 2017
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Oct 7, 2017
This comment has been minimized.
This comment has been minimized.
|
|
bors
merged commit 4ca1b19
into
rust-lang:master
Oct 8, 2017
alexcrichton
deleted the
alexcrichton:thinlto
branch
Oct 8, 2017
petrochenkov
referenced this pull request
Oct 8, 2017
Closed
Some run-pass tests fail after recent LTO/CGU changes #45103
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton commentedSep 25, 2017
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows incremental LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a
-Z thinltoflag to gate whether ThinLTO is enabled. It thenalso implements two forms of ThinLTO:
In one mode we'll only perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
avoid overloading machines.
integrates with our own incremental strategy, but this is yet to be
determined.
having it tailored to fit our needs for the time being.
TargetMachinecreation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!