Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ThinLTO with incremental compilation. #53673

Merged
merged 7 commits into from Sep 3, 2018

Conversation

@michaelwoerister
Copy link
Contributor

michaelwoerister commented Aug 24, 2018

This is an updated version of #52309. This PR allows rustc to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.

The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching:

  1. if the module itself has changed, we have to re-codegen and re-optimize.
  2. if the module itself has not changed, but a module it imported from during ThinLTO has, we don't need to re-codegen and don't need to re-run the first optimization phase. Only the second (i.e. ThinLTO-) optimization phase is re-run.
  3. if neither the module itself nor any of its imports have changed then we can re-use the final, post-ThinLTO version of the module. (We might have to load its pre-ThinLTO version though so it's available for other modules to import from)
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 24, 2018

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 24, 2018

⌛️ Trying commit ee14d4a with merge 2d7e52f...

bors added a commit that referenced this pull request Aug 24, 2018
Incr thinlto 2000

This is an updated version of #52309. This PR allows `rustc` to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.
@michaelwoerister michaelwoerister changed the title Incr thinlto 2000 Enable ThinLTO with incremental compilation. Aug 24, 2018
@rust-highfive

This comment was marked as resolved.

Copy link
Collaborator

rust-highfive commented Aug 24, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:40] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:40] tidy error: /checkout/src/librustc_codegen_llvm/back/lto.rs: missing trailing newline
[00:04:40] tidy error: /checkout/src/librustc_codegen_llvm/back/write.rs:2485: line longer than 100 chars
[00:04:42] Dependencies not on the whitelist:
[00:04:42] * memmap 
[00:04:42] some tidy checks failed
[00:04:42] 
[00:04:42] 
[00:04:42] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:42] 
[00:04:42] 
[00:04:42] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:42] Build completed unsuccessfully in 0:00:50
[00:04:42] Build completed unsuccessfully in 0:00:50
[00:04:42] make: *** [tidy] Error 1
[00:04:42] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:2f141b3c
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:05342ea2:start=1535123775571971018,finish=1535123775580007110,duration=8036092
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:1744e32a
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:07b14e58
travis_time:start:07b14e58
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0212adc5
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:incr-thinlto-2000 branch from ee14d4a to 1d0f3fd Aug 24, 2018
@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Aug 24, 2018

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 24, 2018

⌛️ Trying commit 1d0f3fd with merge 937c465...

bors added a commit that referenced this pull request Aug 24, 2018
Enable ThinLTO with incremental compilation.

This is an updated version of #52309. This PR allows `rustc` to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.

The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching:
1. if the module itself has changed, we have to re-codegen and re-optimize.
2. if the module itself has not changed, but a module it imported from during ThinLTO has, we don't need to re-codegen and don't need to re-run the first optimization phase. Only the second (i.e. ThinLTO-) optimization phase is re-run.
3. if neither the module itself nor any of its imports have changed then we can re-use the final, post-ThinLTO version of the module. (We might have to load its pre-ThinLTO version though so it's available for other modules to import from)
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Aug 24, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:05:23] * memmap 
[00:05:23] some tidy checks failed
[00:05:23] 
[00:05:23] 
[00:05:23] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:05:23] 
[00:05:23] 
[00:05:23] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:05:23] Build completed unsuccessfully in 0:00:50
[00:05:23] Build completed unsuccessfully in 0:00:50
[00:05:23] make: *** [tidy] Error 1
[00:05:23] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:00a20dae
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:3453066c:start=1535127707846023606,finish=1535127707855087437,duration=9063831
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:2db993c0
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:09be6192
travis_time:start:09be6192
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:01198ad8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Err(e) => {
let msg = format!("Error while trying to load ThinLTO import data \
for incremental compilation: {}", e);
sess.fatal(&msg)

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

Should this perhaps fall back on returning a new ThinLTOImports instance? It seems like if a previous compiler is ctrl-c'd at just the right time it could poison future compilers to return this error message

This comment has been minimized.

Copy link
@michaelwoerister

michaelwoerister Aug 29, 2018

Author Contributor

Yes, good catch.

@@ -983,6 +1006,9 @@ pub fn start_async_codegen(tcx: TyCtxt,
allocator_config.emit_bc_compressed = true;
}

modules_config.emit_pre_thin_lto_bc =
need_pre_thin_lto_bitcode_for_incr_comp(sess);

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

Should this be swapped with save_temps above so -C save-temps always emits it?

This comment has been minimized.

Copy link
@michaelwoerister

michaelwoerister Aug 29, 2018

Author Contributor

foo.pre-thin-lto.bc should actually be exactly the same as foo.thin-lto-input.bc, so I didn't really make an effort to add it to the save-temps output. Do you think it's worth the trouble?

This comment has been minimized.

Copy link
@michaelwoerister

michaelwoerister Aug 31, 2018

Author Contributor

I see what you mean now. Yes, it should be swapped so we don't overwrite the value.

execute_copy_from_cache_work_item(cgcx, work_item, timeline)
}
work_item @ WorkItem::LTO(_) => {
execute_lto_work_item(cgcx, work_item, timeline)

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

It looks like each of these methods takes the bound work_item but quickly unwraps it, could instead they be bound in this match and the value passed in here?

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 31, 2018

Member

(ping on this comment)

len: module.data().len(),
});
serialized.push(module);
module_names.push(name);

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

This looks the same as the loop above, so could chain be used to process both in one go? The modules_to_optimize local variable looks like it can be hoisted above the loop too perhaps?

// the cache instead of having been recompiled...
let current_imports = ThinLTOImports::from_thin_lto_data(data);

// ... so we load this additional information from the previous

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

I'm not sure I'm following what's going on here. Aren't all CGUs loaded into the ThinLTOData instance? Is this perhaps an older comment?

AFAIK when we redo ThinLTO we have to unconditionally load the ThinLTO buffer for all CGUs coming inas input, so I can't quite figure out where some would be missing, but you can likely enlighten me!

pub fn save_to_file(&self, path: &Path) -> io::Result<()> {
use std::io::Write;
let file = File::create(path)?;
let mut writer = io::BufWriter::new(file);

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

For this and load_from_file below I'd imagine that these maps are pretty small (on the order of CGUs, not symbols) which probably means that we can reasonable hold the entire contents of this serialized file in memory. In that case it's probably much faster to read/write the file in one go (and do all other operations in memory)

No,
PreThinLto,
PostThinLto,
PostThinLtoButImportedFrom,

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

I'm slightly confused by this enum variant, but I think this is the same confusion that I had before perhaps?

If any CGU is either "no" reusable or pre-thin-lto, I think that means that all CGUs need to be loaded for the ThinLTO data collection stage.

In thinking about this as well, I think this function below may not work in general? I think we can only determine CGU post-thin-lto CGU reuse after the ThinLTO data is created, right? Put another way, I think the possible reuse states are:

  • Everything is green, nothing has changed.
  • All modified CGUs need to be re-codegen'd
    • Afterwards, ThinLTOData is created, using the cached ThinLTO buffers for unmodified CGUs and freshly created buffers for re-codegen'd CGUs.
    • Now there's a graph of CGU to CGUs-imported, as well as whether each CGU is red/green (green for cached, red for just codegen'd)
    • Any red CGU is re-thin-LTO'd.
    • Any green CGU which imports from a red CGU is re-thin-LTO'd

Here, before we create the ThinLTOData, I don't think we can determine that a green CGU only imports from other green CGUs? LLVM seems like it could do fancy things such as:

  • Let's have three CGUs, A, B, and C.
  • A/B are green and C is red
  • Previously, A imported from B and not C
  • Afterwards, though, A ends up importing from both B and C (for whatever reason)

I think that this classification below would mean that A is "as green as can be" but it actually needs to be re-thin-LTO'd?

I may also just be lost with the classification of names here...

});
true
}
CguReUsable::PostThinLtoButImportedFrom => {

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 24, 2018

Member

I suppose to elaborate on my comment above, the way I expected this to work these two latter states wouldn't be possible. It seems like don't really need to handle the case that literally nothing changed as it's not so important. In that case we can assume something changed which means that everything will either be codegen'd or sent as a pre-thin-lto module to the backend. After the synchronization point we'd then make another decision about CGU reuse and such.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 24, 2018

☀️ Test successful - status-travis
State: approved= try=True

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 24, 2018

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Aug 24, 2018

Success: Queued 937c465 with parent 57e13ba, comparison URL.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 24, 2018

Yes, I can see one case where your A/B/C example would be handled sub-optimally: A references functions in both B and C, but in session 1 ThinLTO classifies no exported functions in C (and called from A) as potentially inlineable. Therefore the import data will show no edge from A to C. Then, in session 2, C is changed and now some function there has become small enough to be elegible for inlining. The algorithm in the PR would re-translate C (because it changed) but it would take the cached version of A since it has no edge to C. It would therefore not be able to inline functions from C into A although that might be possible now.

There are a couple of factors that somewhat lessen the negative effect:

  1. If there is any potentially inlinable function in C that is called from A then an edge from A to C will exist and all functions in C will be covered because A will be included in the LTO phase.
  2. If there is a function in C that is explicitly marked with #[inline] then changing that will invalidate A completely (as well as C) and both will be part of the LTO phase again.

That being said, deferring the classification to until after the index is built would solve the problem reliably (and is probably more in line with how the linker-plugin works). Unless I'm overlooking something, it shouldn't be too hard to implement it this way fortunately :) Good catch!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 29, 2018

OK, so the perf results (which don't contain the proposed changes yet but should be kind of valid anyway) look better than last time:

Some cases profit a lot from incr. comp. (e.g. webrender, encoding, crates.io). The patched incremental: println case compiles 4-6 times faster than a non-incremental build. Many of the other cases are still twice as fast. If the runtime performance of the generated code is acceptable, that would be pretty good! Some crates though just hate incr. comp. it seems (yes, I'm looking at you style-servo).

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:incr-thinlto-2000 branch from 1d0f3fd to 8fdf3e6 Aug 31, 2018
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 31, 2018

@alexcrichton, I just pushed a commit that implements the algorithm as suggested by you. The code actually got simpler :) Let's see how it performs.

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 31, 2018

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

  1. git checkout incr-thinlto-2000 (switch to your branch)
  2. git fetch upstream master (retrieve the latest master)
  3. git rebase upstream/master -p (rebase on top of it)
  4. Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
  5. git push self incr-thinlto-2000 --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message
warning: Cannot merge binary files: src/Cargo.lock (HEAD vs. heads/homu-tmp)
Auto-merging src/librustc/session/mod.rs
Auto-merging src/librustc/session/config.rs
Auto-merging src/Cargo.lock
CONFLICT (content): Merge conflict in src/Cargo.lock
Automatic merge failed; fix conflicts and then commit the result.

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:incr-thinlto-2000 branch from 737f1ef to 21d05f6 Sep 3, 2018
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Sep 3, 2018

I think all nits should be addressed now. I added some info!() output that shows which CGUs are loaded from cache and which are re-compiled.

@bors r=alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 3, 2018

📌 Commit 21d05f6 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 3, 2018

⌛️ Testing commit 21d05f6 with merge ee73f80...

bors added a commit that referenced this pull request Sep 3, 2018
…hton

Enable ThinLTO with incremental compilation.

This is an updated version of #52309. This PR allows `rustc` to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.

The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching:
1. if the module itself has changed, we have to re-codegen and re-optimize.
2. if the module itself has not changed, but a module it imported from during ThinLTO has, we don't need to re-codegen and don't need to re-run the first optimization phase. Only the second (i.e. ThinLTO-) optimization phase is re-run.
3. if neither the module itself nor any of its imports have changed then we can re-use the final, post-ThinLTO version of the module. (We might have to load its pre-ThinLTO version though so it's available for other modules to import from)
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 3, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing ee73f80 to master...

@bors bors merged commit 21d05f6 into rust-lang:master Sep 3, 2018
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 3, 2018

😲

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Sep 3, 2018

What happened to the perf? Plenty of stuff turned very red, is this expected?

@nnethercote

This comment has been minimized.

Copy link
Contributor

nnethercote commented Sep 4, 2018

Indeed, this had a calamitous effect on compile times for incremental opt builds, and I don't understand how this was deemed acceptable prior to landing. I think it should be backed out ASAP.

sentry-cli-opt
        avg: 254.9%     min: -0.0%      max: 972.3%
cargo-opt
        avg: 180.3%     min: 0.1%       max: 662.0%
syn-opt
        avg: 174.1%?    min: -2.4%?     max: 646.3%?
regex-opt
        avg: 108.3%     min: 0.1%       max: 326.0%
clap-rs-opt
        avg: 87.4%      min: 0.1%       max: 313.8%
regression-31157-opt
        avg: 78.0%      min: -1.2%      max: 262.6%
crates.io-opt
        avg: 71.9%      min: -0.2%      max: 243.2%
tokio-webpush-simple-opt
        avg: 65.9%      min: -0.1%      max: 156.1%
webrender-opt
        avg: 53.7%      min: 0.0%       max: 152.2%
hyper-opt
        avg: 40.4%      min: -0.0%      max: 119.7%
piston-image-opt
        avg: 19.5%      min: 0.0%       max: 50.0%
issue-46449-opt
        avg: 21.5%      min: -0.1%      max: 42.0%
ripgrep-opt
        avg: 14.6%      min: 0.1%       max: 39.5%
encoding-opt
        avg: 8.3%       min: 0.0%       max: 23.6%
inflate-opt
        avg: 9.1%?      min: 0.4%?      max: 18.2%?
html5ever-opt
        avg: 5.6%       min: -0.0%      max: 16.9% 
deeply-nested-opt
        avg: 5.7%       min: -0.1%      max: 16.6%
futures-opt
        avg: 2.4%       min: -0.1%      max: 8.9%
ucd-opt
        avg: 1.4%       min: -0.1%      max: 3.7%
keccak-opt
        avg: 1.1%       min: -0.1%      max: 3.2%
helloworld-opt
        avg: 1.2%       min: 0.2%       max: 2.6%
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Sep 4, 2018

Yes, the effects on compile times were expected. Let me explain what's going on here: This PR enables a new combination of compiler settings (ThinLTO + incremental compilation) that we've wanted to have for years and that, as per the existing rules, is now selected as the default when doing optimized, incremental builds. The old behavior (optimized, incremental builds without the additional ThinLTO pass) is still available when compiling with -Clto=no and it's performance should not be affected by the changes in here. In fact, if you look at the non-incremental opt benchmarks, performance has gone up quite a bit in some cases (regex -1.5%, cargo -3.2%, crates.io -4.1%).

Without ThinLTO, incremental opt builds produce much slower code. In many cases benchmarks performed 2-3 times worse because of reduced IPO opportunities. If that code is fast enough for your needs, great, but there was no way we could make incremental compilation the default for optimized builds in Cargo. With ThinLTO enabled this might change. Once this is part of a nightly compiler, we'll test what runtime performance of code produced this way looks like; if it's close enough to non-incremental builds, we can make incr. comp. the default for opt builds in Cargo, giving compile time reductions of 50-85% for small changes!

Note that Cargo still defaults to non-incremental compilation for opt builds, so none of this will be visible to end users yet.

@nnethercote

This comment has been minimized.

Copy link
Contributor

nnethercote commented Sep 4, 2018

Huh, ok.

In fact, if you look at the non-incremental opt benchmarks, performance has gone up quite a bit in some cases (regex -1.5%, cargo -3.2%, crates.io -4.1%).

If you look at the perf results for just this PR, there are no improvements. (The few green entries are almost certainly noise, belonging to benchmarks that have high variance.)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Sep 4, 2018

If you look at the perf results for just this PR, there are no improvements.

Yeah, I was wondering why I hadn't seen those improvements in the try builds before :) That makes more sense anyway.

But it looks like script-servo-opt and style-servo-opt failed...

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 5, 2018

@michaelwoerister hm oh I also just realized, this didn't actually add any tests? Would it be possible to add a few incremental + optimized tests to exercise these code paths? (I don't think we can really test it works without a disassembly and brittle tests), but we can at least try to run it through the ringer!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Sep 6, 2018

The existing incremental tests will actually test some of this when optimize-tests = true is set in config.toml. All tests are then compiled with ThinLTO, so we at least see if we get strange linker errors. I actually had to track down a few of them.

I'll think about how to test this some more. Maybe expand codegen tests a little so they properly deal with multiple CGUs? Or a run-make test if everything else fails...

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 6, 2018

Oh nevermind then, carry on! So long as something broke when implementing this sounds like it's being exercised which is all I would look for :)

@@ -1622,6 +1626,11 @@ extern "C" {
Data: &ThinLTOData,
Module: &Module,
) -> bool;
pub fn LLVMRustGetThinLTOModuleImports(
Data: *const ThinLTOData,

This comment has been minimized.

Copy link
@eddyb

eddyb Sep 8, 2018

Member

This should be &ThinLTOData.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Sep 10, 2018

I do have a few ideas for 2 or 3 tests. I'll make a PR this week, if I get to it. It requires exposing the different caching levels to the test framework. That's a good idea anyway but it's not totally trivial because of that.

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Sep 25, 2018
…nerics-for-incr-comp, r=alexcrichton

incr.comp.: Don't automatically enable -Zshare-generics for incr. comp. builds.

So far the compiler would automatically enable sharing of monomorphizations for incremental builds. That was OK because without (Thin)LTO this could have very little impact on the runtime performance of the generated code. However, since rust-lang#53673, ThinLTO and incr. comp. can be combined, so the trade-off is not as clear anymore.

This PR removes the automatic tie between the two options. Whether monomorphizations are shared between crates or not now _only_ depends on the optimization level.

r? @alexcrichton
@michaelwoerister michaelwoerister mentioned this pull request Oct 22, 2018
20 of 32 tasks complete
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this pull request Dec 19, 2019
…t-lto-imports, r=michaelwoerister

save LTO import info and check it when trying to reuse build products

Fix rust-lang#59535

Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations.

This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk.

----

What is the scenario where comparing against past LTO import information is necessary?

I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation:

 1. Consider a call-graph like `[A] -> [B -> D] <- [C]` (where the letters are functions and the modules are enclosed in `[]`)
 2. In our specific instance, the earlier compilations were inlining the call to`B` into `A`; thus `A` ended up with a external reference to the symbol `D` in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, `B` definition and `D` declaration were imported into `[A]`.
 3. The change between incremental builds was that the call `D <- C` was removed.
 4. That change, coupled with other decisions within `rustc`, made the compiler decide to make `D` an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of `D` was inlined into `B` and `D` itself was eliminated entirely.
  5. The current LTO import information reported that `B` alone is imported into `[A]` for the *current compilation*. So when the Rust compiler surveyed the dependence graph, it determined that nothing `[A]` imports changed since the last build (and `[A]` itself has not changed either), so it chooses to reuse the object code generated during the previous compilation.
  6. But that previous object code has an unresolved reference to `D`, and that causes a link time failure!

----

The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR rust-lang#53673).

I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and `rustc` to do just the right set of the inlining and `internal`-reclassification choices that cause this particular problem to arise.

----

Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. ~~To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a *superset* of the current LTO import-set. This way, the overwriting process should always be safe to run.~~
 * The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just *compare* them, and act accordingly.
 * Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.
bors added a commit that referenced this pull request Dec 20, 2019
…ts, r=mw

save LTO import info and check it when trying to reuse build products

Fix #59535

Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations.

This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk.

----

What is the scenario where comparing against past LTO import information is necessary?

I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation:

 1. Consider a call-graph like `[A] -> [B -> D] <- [C]` (where the letters are functions and the modules are enclosed in `[]`)
 2. In our specific instance, the earlier compilations were inlining the call to`B` into `A`; thus `A` ended up with a external reference to the symbol `D` in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, `B` definition and `D` declaration were imported into `[A]`.
 3. The change between incremental builds was that the call `D <- C` was removed.
 4. That change, coupled with other decisions within `rustc`, made the compiler decide to make `D` an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of `D` was inlined into `B` and `D` itself was eliminated entirely.
  5. The current LTO import information reported that `B` alone is imported into `[A]` for the *current compilation*. So when the Rust compiler surveyed the dependence graph, it determined that nothing `[A]` imports changed since the last build (and `[A]` itself has not changed either), so it chooses to reuse the object code generated during the previous compilation.
  6. But that previous object code has an unresolved reference to `D`, and that causes a link time failure!

----

The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR #53673).

I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and `rustc` to do just the right set of the inlining and `internal`-reclassification choices that cause this particular problem to arise.

----

Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. ~~To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a *superset* of the current LTO import-set. This way, the overwriting process should always be safe to run.~~
 * The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just *compare* them, and act accordingly.
 * Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.