Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upEnable ThinLTO with incremental compilation. #53673
Conversation
This comment has been minimized.
This comment has been minimized.
|
@bors try |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Aug 24, 2018
michaelwoerister
changed the title
Incr thinlto 2000
Enable ThinLTO with incremental compilation.
Aug 24, 2018
This comment was marked as resolved.
This comment was marked as resolved.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
michaelwoerister
force-pushed the
michaelwoerister:incr-thinlto-2000
branch
from
ee14d4a
to
1d0f3fd
Aug 24, 2018
This comment has been minimized.
This comment has been minimized.
|
@bors try |
bors
added
S-waiting-on-bors
S-waiting-on-author
and removed
S-waiting-on-bors
labels
Aug 24, 2018
kennytm
added
S-waiting-on-review
and removed
S-waiting-on-author
labels
Aug 24, 2018
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Aug 24, 2018
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
alexcrichton
reviewed
Aug 24, 2018
| Err(e) => { | ||
| let msg = format!("Error while trying to load ThinLTO import data \ | ||
| for incremental compilation: {}", e); | ||
| sess.fatal(&msg) |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
Should this perhaps fall back on returning a new ThinLTOImports instance? It seems like if a previous compiler is ctrl-c'd at just the right time it could poison future compilers to return this error message
This comment has been minimized.
This comment has been minimized.
| @@ -983,6 +1006,9 @@ pub fn start_async_codegen(tcx: TyCtxt, | |||
| allocator_config.emit_bc_compressed = true; | |||
| } | |||
|
|
|||
| modules_config.emit_pre_thin_lto_bc = | |||
| need_pre_thin_lto_bitcode_for_incr_comp(sess); | |||
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
Should this be swapped with save_temps above so -C save-temps always emits it?
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Aug 29, 2018
Author
Contributor
foo.pre-thin-lto.bc should actually be exactly the same as foo.thin-lto-input.bc, so I didn't really make an effort to add it to the save-temps output. Do you think it's worth the trouble?
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Aug 31, 2018
Author
Contributor
I see what you mean now. Yes, it should be swapped so we don't overwrite the value.
| execute_copy_from_cache_work_item(cgcx, work_item, timeline) | ||
| } | ||
| work_item @ WorkItem::LTO(_) => { | ||
| execute_lto_work_item(cgcx, work_item, timeline) |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
It looks like each of these methods takes the bound work_item but quickly unwraps it, could instead they be bound in this match and the value passed in here?
This comment has been minimized.
This comment has been minimized.
| len: module.data().len(), | ||
| }); | ||
| serialized.push(module); | ||
| module_names.push(name); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
This looks the same as the loop above, so could chain be used to process both in one go? The modules_to_optimize local variable looks like it can be hoisted above the loop too perhaps?
| // the cache instead of having been recompiled... | ||
| let current_imports = ThinLTOImports::from_thin_lto_data(data); | ||
|
|
||
| // ... so we load this additional information from the previous |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
I'm not sure I'm following what's going on here. Aren't all CGUs loaded into the ThinLTOData instance? Is this perhaps an older comment?
AFAIK when we redo ThinLTO we have to unconditionally load the ThinLTO buffer for all CGUs coming inas input, so I can't quite figure out where some would be missing, but you can likely enlighten me!
| pub fn save_to_file(&self, path: &Path) -> io::Result<()> { | ||
| use std::io::Write; | ||
| let file = File::create(path)?; | ||
| let mut writer = io::BufWriter::new(file); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
For this and load_from_file below I'd imagine that these maps are pretty small (on the order of CGUs, not symbols) which probably means that we can reasonable hold the entire contents of this serialized file in memory. In that case it's probably much faster to read/write the file in one go (and do all other operations in memory)
| No, | ||
| PreThinLto, | ||
| PostThinLto, | ||
| PostThinLtoButImportedFrom, |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
I'm slightly confused by this enum variant, but I think this is the same confusion that I had before perhaps?
If any CGU is either "no" reusable or pre-thin-lto, I think that means that all CGUs need to be loaded for the ThinLTO data collection stage.
In thinking about this as well, I think this function below may not work in general? I think we can only determine CGU post-thin-lto CGU reuse after the ThinLTO data is created, right? Put another way, I think the possible reuse states are:
- Everything is green, nothing has changed.
- All modified CGUs need to be re-codegen'd
- Afterwards, ThinLTOData is created, using the cached ThinLTO buffers for unmodified CGUs and freshly created buffers for re-codegen'd CGUs.
- Now there's a graph of CGU to CGUs-imported, as well as whether each CGU is red/green (green for cached, red for just codegen'd)
- Any red CGU is re-thin-LTO'd.
- Any green CGU which imports from a red CGU is re-thin-LTO'd
Here, before we create the ThinLTOData, I don't think we can determine that a green CGU only imports from other green CGUs? LLVM seems like it could do fancy things such as:
- Let's have three CGUs, A, B, and C.
- A/B are green and C is red
- Previously, A imported from B and not C
- Afterwards, though, A ends up importing from both B and C (for whatever reason)
I think that this classification below would mean that A is "as green as can be" but it actually needs to be re-thin-LTO'd?
I may also just be lost with the classification of names here...
| }); | ||
| true | ||
| } | ||
| CguReUsable::PostThinLtoButImportedFrom => { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 24, 2018
Member
I suppose to elaborate on my comment above, the way I expected this to work these two latter states wouldn't be possible. It seems like don't really need to handle the case that literally nothing changed as it's not so important. In that case we can assume something changed which means that everything will either be codegen'd or sent as a pre-thin-lto module to the backend. After the synchronization point we'd then make another decision about CGU reuse and such.
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@rust-timer build 937c465 |
This comment has been minimized.
This comment has been minimized.
rust-timer
commented
Aug 24, 2018
|
Success: Queued 937c465 with parent 57e13ba, comparison URL. |
This comment has been minimized.
This comment has been minimized.
|
Yes, I can see one case where your A/B/C example would be handled sub-optimally: A references functions in both B and C, but in session 1 ThinLTO classifies no exported functions in C (and called from A) as potentially inlineable. Therefore the import data will show no edge from A to C. Then, in session 2, C is changed and now some function there has become small enough to be elegible for inlining. The algorithm in the PR would re-translate C (because it changed) but it would take the cached version of A since it has no edge to C. It would therefore not be able to inline functions from C into A although that might be possible now. There are a couple of factors that somewhat lessen the negative effect:
That being said, deferring the classification to until after the index is built would solve the problem reliably (and is probably more in line with how the linker-plugin works). Unless I'm overlooking something, it shouldn't be too hard to implement it this way fortunately |
TimNN
added
S-waiting-on-author
and removed
S-waiting-on-review
labels
Aug 28, 2018
This comment has been minimized.
This comment has been minimized.
|
OK, so the perf results (which don't contain the proposed changes yet but should be kind of valid anyway) look better than last time:
Some cases profit a lot from incr. comp. (e.g. |
michaelwoerister
force-pushed the
michaelwoerister:incr-thinlto-2000
branch
from
1d0f3fd
to
8fdf3e6
Aug 31, 2018
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton, I just pushed a commit that implements the algorithm as suggested by you. The code actually got simpler @bors try |
This comment has been minimized.
This comment has been minimized.
|
This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message
|
michaelwoerister
added some commits
Aug 17, 2018
This comment has been minimized.
This comment has been minimized.
|
Thanks for the review, @alexcrichton. I'll resolve the nits after the weekend. After this lands, we should start another "Help us test ..." thread on irlo to see what runtime performance really looks like. These days we can also do a try build of Firefox and see how it performs in the benchmarks |
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
michaelwoerister
added some commits
Sep 3, 2018
michaelwoerister
force-pushed the
michaelwoerister:incr-thinlto-2000
branch
from
737f1ef
to
21d05f6
Sep 3, 2018
This comment has been minimized.
This comment has been minimized.
|
I think all nits should be addressed now. I added some @bors r=alexcrichton |
This comment has been minimized.
This comment has been minimized.
|
|
bors
added
S-waiting-on-bors
and removed
S-waiting-on-author
labels
Sep 3, 2018
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Sep 3, 2018
This comment has been minimized.
This comment has been minimized.
|
|
bors
merged commit 21d05f6
into
rust-lang:master
Sep 3, 2018
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
What happened to the perf? Plenty of stuff turned very red, is this expected? |
This comment has been minimized.
This comment has been minimized.
|
Indeed, this had a calamitous effect on compile times for incremental opt builds, and I don't understand how this was deemed acceptable prior to landing. I think it should be backed out ASAP.
|
This comment has been minimized.
This comment has been minimized.
|
Yes, the effects on compile times were expected. Let me explain what's going on here: This PR enables a new combination of compiler settings (ThinLTO + incremental compilation) that we've wanted to have for years and that, as per the existing rules, is now selected as the default when doing optimized, incremental builds. The old behavior (optimized, incremental builds without the additional ThinLTO pass) is still available when compiling with Without ThinLTO, incremental opt builds produce much slower code. In many cases benchmarks performed 2-3 times worse because of reduced IPO opportunities. If that code is fast enough for your needs, great, but there was no way we could make incremental compilation the default for optimized builds in Cargo. With ThinLTO enabled this might change. Once this is part of a nightly compiler, we'll test what runtime performance of code produced this way looks like; if it's close enough to non-incremental builds, we can make incr. comp. the default for opt builds in Cargo, giving compile time reductions of 50-85% for small changes! Note that Cargo still defaults to non-incremental compilation for opt builds, so none of this will be visible to end users yet. |
This comment has been minimized.
This comment has been minimized.
|
Huh, ok.
If you look at the perf results for just this PR, there are no improvements. (The few green entries are almost certainly noise, belonging to benchmarks that have high variance.) |
This comment has been minimized.
This comment has been minimized.
Yeah, I was wondering why I hadn't seen those improvements in the try builds before But it looks like |
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister hm oh I also just realized, this didn't actually add any tests? Would it be possible to add a few incremental + optimized tests to exercise these code paths? (I don't think we can really test it works without a disassembly and brittle tests), but we can at least try to run it through the ringer! |
This comment has been minimized.
This comment has been minimized.
|
The existing incremental tests will actually test some of this when I'll think about how to test this some more. Maybe expand |
This comment has been minimized.
This comment has been minimized.
|
Oh nevermind then, carry on! So long as something broke when implementing this sounds like it's being exercised which is all I would look for :) |
eddyb
reviewed
Sep 8, 2018
| @@ -1622,6 +1626,11 @@ extern "C" { | |||
| Data: &ThinLTOData, | |||
| Module: &Module, | |||
| ) -> bool; | |||
| pub fn LLVMRustGetThinLTOModuleImports( | |||
| Data: *const ThinLTOData, | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I do have a few ideas for 2 or 3 tests. I'll make a PR this week, if I get to it. It requires exposing the different caching levels to the test framework. That's a good idea anyway but it's not totally trivial because of that. |
michaelwoerister commentedAug 24, 2018
•
edited
This is an updated version of #52309. This PR allows
rustcto use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching: