New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experiment] Benchmark incremental ThinLTO'd compiler. #56678

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
8 participants
@michaelwoerister
Copy link
Contributor

michaelwoerister commented Dec 10, 2018

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 10, 2018

@bors try

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Dec 10, 2018

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 10, 2018

⌛️ Trying commit 208b9a7 with merge 5caf635...

bors added a commit that referenced this pull request Dec 10, 2018

Auto merge of #56678 - michaelwoerister:incr-thinlto-bench, r=<try>
[experiment] Benchmark incremental ThinLTO'd compiler.

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Dec 10, 2018

The job dist-x86_64-linux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_fold:end:services

travis_fold:start:git.checkout
travis_time:start:004206c0
$ git clone --depth=2 --branch=try https://github.com/rust-lang/rust.git rust-lang/rust
---
[00:00:48] 
[00:00:48] Total download size: 4.9 M
[00:00:48] Downloading Packages:
[00:00:51] --------------------------------------------------------------------------------
[00:00:51] Total                                           1.8 MB/s | 4.9 MB     00:02     
[00:00:51] warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID e8562897
[00:00:51] Importing GPG key 0xE8562897 "CentOS-5 Key (CentOS 5 Official Signing Key) <centos-5-key@centos.org>" from /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
[00:00:51] Running Transaction Test
[00:00:51] Finished Transaction Test
[00:00:51] Transaction Test Succeeded
[00:00:51] Running Transaction
---
[00:03:07] + hide_output make install
[00:03:07] + set +x
[00:03:25] shared.sh: line 11:   351 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[00:03:25] + cd ..
[00:03:25] + rm -rf openssl-1.0.2k
[00:03:25] ./build-openssl.sh: line 25:  4114 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"  (wd: /tmp/openssl-1.0.2k)
[00:03:25] + ln -nsf /etc/pki/tls/cert.pem /rustroot/ssl/
[00:03:26] Removing intermediate container e0a6287fa86f
[00:03:26] Step 14/41 : COPY dist-x86_64-linux/build-curl.sh /tmp/
[00:03:26]  ---> f2f711068964
[00:03:26] Step 15/41 : RUN ./build-curl.sh
[00:03:26] Step 15/41 : RUN ./build-curl.sh
[00:03:26]  ---> Running in 24cce2d607f7
[00:03:26] + source shared.sh
[00:03:26] + VERSION=7.51.0
[00:03:26] + curl http://cool.haxx.se/download/curl-7.51.0.tar.bz2
[00:03:28]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[00:03:28]                                  Dload  Upload   Total   Spent    Left  Speed
[00:03:29] 
  0 2509k    0 14215    0     0   9439      0  0:04:32  0:00:01  0:04:31  9439
  0 2509k    0 14215    0     0   9439      0  0:04:32  0:00:01  0:04:31  9439
  2 2509k    2 59559    0     0  33949      0  0:01:15  0:00:01  0:01:14  177k
100 2509k  100 2509k    0     0   939k      0  0:00:02  0:00:02 --:--:-- 2140k
[00:03:29] + mkdir curl-build
[00:03:29] + cd curl-build
[00:03:29] + hide_output ../curl-7.51.0/configure --prefix=/rustroot --with-ssl=/rustroot --disable-sspi --disable-gopher --disable-smtp --disable-smb --disable-imap --disable-pop3 --disable-tftp --disable-telnet --disable-manual --disable-dict --disable-rtsp --disable-ldaps --disable-ldap
[00:03:49] + hide_output make -j10
[00:03:49] + set +x
[00:04:02] shared.sh: line 11:    12 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[00:04:02] + hide_output make install
---
 98 82.1M   98 80.9M    0     0  1004k      0  0:01:23  0:01:22  0:00:01 1101k
 99 82.1M   99 82.0M    0     0  1006k      0  0:01:23  0:01:23 --:--:-- 1079k
100 82.1M  100 82.1M    0     0  1006k      0  0:01:23  0:01:23 --:--:-- 1181k
[00:08:56] + cd gcc-4.8.5
[00:08:56] + sed -i 's|ftp://gcc\.gnu\.org/|http://gcc.gnu.org/|g' ./contrib/download_prerequisites
[00:08:56] --2018-12-10 15:30:50--  http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-2.4.2.tar.bz2
[00:08:56] Resolving gcc.gnu.org... 209.132.180.131
[00:08:56] Connecting to gcc.gnu.org|209.132.180.131|:80... connected.
[00:08:56] HTTP request sent, awaiting response... 200 OK
---
 29 27.0M   29 8128k    0     0  6610k      0  0:00:04  0:00:01  0:00:03 6762k
 61 27.0M   61 16.4M    0     0  7572k      0  0:00:03  0:00:02  0:00:01 7665k
 89 27.0M   89 24.0M    0     0  7636k      0  0:00:03  0:00:03 --:--:-- 7701k
100 27.0M  100 27.0M    0     0  7873k      0  0:00:03  0:00:03 --:--:-- 7937k
[00:42:44] + cd llvm-7.0.0.src
[00:42:44] + mkdir -p tools/clang
[00:42:44] + curl https://releases.llvm.org/7.0.0/cfe-7.0.0.src.tar.xz
[00:42:44] + xz -d
[00:42:44] + tar xf - -C tools/clang --strip-components=1
[00:42:44]                                  Dload  Upload   Total   Spent    Left  Speed
[00:42:46] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 28 11.9M   28 3486k    0     0  4992k      0  0:00:02 --:--:--  0:00:02 5001k
---
[00:42:46]                                  Dload  Upload   Total   Spent    Left  Speed
[00:42:46] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  894k  100  894k    0     0  1926k      0 --:--:-- --:--:-- --:--:-- 1935k
[00:42:46] + mkdir ../clang-build
[00:42:46] + cd ../clang-build
[00:42:46] + INC=/rustroot/include
[00:42:46] + INC=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed
[00:42:46] + INC=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed:/usr/include
[00:42:46] + hide_output cmake ../llvm-7.0.0.src -DCMAKE_C_COMPILER=/rustroot/bin/gcc -DCMAKE_CXX_COMPILER=/rustroot/bin/g++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rustroot -DLLVM_TARGETS_TO_BUILD=X86 -DC_INCLUDE_DIRS=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed:/usr/include
[00:43:16] Mon Dec 10 16:05:11 UTC 2018 - building ...
[00:43:18] + hide_output make -j10
[00:43:18] + set +x
[00:43:48] Mon Dec 10 16:05:42 UTC 2018 - building ...
---
[01:41:18] + set +x
[01:41:19] + hide_output make INSTALL_HDR_PATH=dest headers_install
[01:41:19] + set +x
[01:41:22] shared.sh: line 11:    10 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[01:41:22] + find dest/include '(' -name .install -o -name ..install.cmd ')' -delete
[01:41:22] + yes
[01:41:22] + yes
[01:41:22] + cp -fr dest/include/asm dest/include/asm-generic dest/include/drm dest/include/linux dest/include/mtd dest/include/rdma dest/include/scsi dest/include/sound dest/include/video dest/include/xen /usr/include
[01:41:22] + rm -rf linux-3.2.84
[01:41:23]  ---> f7766546ed27
[01:41:23] Removing intermediate container beae4c2a93c2
[01:41:23] Step 31/41 : COPY dist-x86_64-linux/build-perl.sh /tmp/
[01:41:23] Step 31/41 : COPY dist-x86_64-linux/build-perl.sh /tmp/
[01:41:23]  ---> d6259b67dfa1
[01:41:23] Step 32/41 : RUN ./build-perl.sh
[01:41:23]  ---> Running in fb6f61c82495
[01:41:24] + source shared.sh
[01:41:24] + curl https://www.cpan.org/src/5.0/perl-5.28.0.tar.gz
[01:41:24]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[01:41:24]                                  Dload  Upload   Total   Spent    Left  Speed
[01:41:25] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 10, 2018

💔 Test failed - status-travis

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 10, 2018

@bors: retry

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 10, 2018

⌛️ Trying commit 208b9a7 with merge 8f33a3d...

bors added a commit that referenced this pull request Dec 10, 2018

Auto merge of #56678 - michaelwoerister:incr-thinlto-bench, r=<try>
[experiment] Benchmark incremental ThinLTO'd compiler.

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 10, 2018

💔 Test failed - status-travis

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 10, 2018

Hm I'm not entirely sure what's going on.... I couldn't easily tell if that's a normal timeout or a spurious timeout

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 11, 2018

@bors retry

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 11, 2018

⌛️ Trying commit 208b9a7 with merge 26f96e5...

bors added a commit that referenced this pull request Dec 11, 2018

Auto merge of #56678 - michaelwoerister:incr-thinlto-bench, r=<try>
[experiment] Benchmark incremental ThinLTO'd compiler.

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 11, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 11, 2018

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Dec 11, 2018

Success: Queued 26f96e5 with parent 3a31213, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Dec 11, 2018

Finished benchmarking try commit 26f96e5

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 11, 2018

All things considered that's actually a really impressive benchmark. The top 4 regressions are probably just one or two missing #[inline] annotations

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 11, 2018

Yes, the medium sized crates are hardly affected. Pretty cool!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 12, 2018

OK, so this gives us one good, real-world test case where incremental+ThinLTO gives roughly the same result as regular ThinLTO as far as the runtime performance of the generated code goes. We'll need more examples (@nnethercote mentioned that he might take a look at Firefox as another big, real-world test case) but if things keep looking the way they are looking now, we should think about making incremental compilation the default for optimized builds too because the speedups to be had here are substantial.

The following table compares compile times with and without incremental compilation for optimized builds (numbers taken from a random, recent perf.rlo run)

non-incr incr-base incr-small-change one-time cost speed up
piston-image 7.82s 8.18s 1.45s +5% 5.4x
webrender 20.64s 23.34s 4.63s +13% 4.5x
ripgrep 8.04s 8.5s 1.82s +6% 4.4x
encoding 1.5s 1.73s 0.43s +15% 3.5x
futures 0.95s 1.16s 0.31s +22% 3.1x
regex 4.63s 5.08s 2.02s +10% 2.3x
clap-rs 25.37s 26.47s 13.32s +4% 1.9x
sentry-cli 19.1s 21.06s 11.43s +10% 1.7x
cargo 38.95s 43.63s 24.5s +12% 1.6x
tokio-webpush-simple 4.21s 4.77s 3.09s +13% 1.4x
helloworld 0.15s 0.16s 0.15s +7% 1.0x

Initial compilation still, as expected, is slower with incr. comp. but subsequent compilation sessions can be much faster! We should really try to get this into the hands of end-users somehow. I mean, how often do we get a chance to make compilation up to five times faster? :D

If we had a debug-opt Cargo profile, this would be a no-brainer. Since we only have release and debug, it's a bit more complicated. Actually released code should still be compiled non-incrementally, with one codegen unit, and possibly with whole-crate-graph LTO enabled.

cc @rust-lang/core @rust-lang/cargo @rust-lang/wg-compiler-performance

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 12, 2018

Those indeed are some compelling numbers, and I could get behind making incremental release mode the default! I think we could even go further in release mode and turn on "infinite codegen units" by default as well, as that wouldn't have the one-time cost of incremental but would benefit compiling all crates.io crates in release mode which aren't compiled incrementally (and may help eat the one time cost of the local project being incremental!)

I would personally be in favor of pursuing an aggressive strategy of "simply turn on incremental by default in release mode". There's always going to be some configuration which eeks out a few percentage points of performance here and there (LTO, PGO, one CGU, implementing a mode in rustc that just reruns all LLVM passes, etc, etc). What I think this is very convincingly showing is that there's no major regressions with incrementally compiled code in release mode and it's still very competitive.

In terms of next steps I think we'd probably want to do a call on internals like we initially did for ThinLTO, basically having a very short set of instructions for how to test and ask folks to post various numbers for various projects. I suspect CARGO_INCREMENTAL=1 cargo build --release is enough to test this, but I haven't tested!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 12, 2018

In terms of next steps I think we'd probably want to do a call on internals like we initially did for ThinLTO, basically having a very short set of instructions for how to test and ask folks to post various numbers for various projects.

Heh, you mean like this one? :)
Maybe the instructions were too complicated.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 12, 2018

Ha yes I mean exactly that one!

We may want to try messaging yeah with an easy-to-run thing at the top, and then perhaps also message simultaneously say that we're considering turning this on by default so it's important to test!

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Dec 12, 2018

FWIW, I've been running with CARGO_INCREMENTAL=1 on all my smaller Rust projects and have never noticed any slowdowns either -- including across all the Advent of Code benchmarks that I've run (so tight-loop code).

I think we may be able to just go ahead and do this without any call to action, and see if people come complaining. I believe the Cargo team is working long-term on a reworking of profiles which would help with a true "release" build here -- though at least in my experience incremental release builds today are more than fast enough.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Dec 12, 2018

As I said in internals, we have lolbench now. We have a way to accurately measure the change on performance of the compiled output over a large number benchmarks from real world code. (I don't have access to a Computer I can dedicate to running this experiment or I would do it myself.) We don't intentionally make breaking changes without asking crater. We can do the same for the performance of compiled output!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Dec 12, 2018

A simple way to test this via lolbench would be to make incr. comp. the default. We are early in the cycle, after all, and can revert if performance takes too much of a hit. I don't know how much work a local lolbench run would be but it sounded like it wouldn't work out of the box.

@Aaronepower

This comment has been minimized.

Copy link
Contributor

Aaronepower commented Jan 11, 2019

Triage; @michaelwoerister Hello, have you been able to get back to this PR?

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 14, 2019

This PR was just an experiment and we got some good numbers out of it. Once beta has branched off, we could look into making incr. comp. the default for release builds temporarily in order to get some numbers via lolbench.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment