Turn quadratic time on number of impl blocks into linear time #78317

est31 · 2020-10-24T08:54:50Z

Previously, if you had a lot of inherent impl blocks on a type like:

struct Foo;

impl Foo { fn foo_1() {} }
// ...
impl Foo { fn foo_100_000() {} }

The compiler would be very slow at processing it, because
an internal algorithm would run in O(n^2), where n is the number
of impl blocks. Now, we add a new algorithm that allocates but
is faster asymptotically.

Comparing rustc nightly with a local build of rustc as of this PR (results in seconds):

N	real time before	real time after
4_000	0.57	0.46
8_000	1.31	0.84
16_000	3.56	1.69
32_000	10.60	3.73

I've tuned up the numbers to make the effect larger than the startup noise of rustc, but the asymptotic difference should hold for smaller n as well.

Note: current state of the PR omits error messages if there are other errors present already. For now, I'm mainly interested in a perf run to study whether this issue is present at all. Please queue one for this PR. Thanks!

rust-highfive · 2020-10-24T08:54:53Z

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

bugadani

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

bugadani · 2020-10-24T09:22:32Z

compiler/rustc_typeck/src/coherence/inherent_impls_overlap.rs

+                // faster asymptotic runtime.
+                if impls.len() < 20 {
+                    for (i, &impl1_def_id) in impls.iter().enumerate() {
+                        for &impl2_def_id in &impls[(i + 1)..] {


I guess this is how it is and there is no reason to change, but wouldn't cloning the iterator for the inner loop be nicer? There's no need to complicate the iteration when we only need to look ahead, I think.
Am I missing something?

IDK for me as someone coming from C/C++, iterator magic is more complicated than doing things with indices.

I also have a background in C and I hate working with indices when I can avoid them, they are too much a footgun :)

est31 · 2020-10-24T09:41:52Z

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

Probably quite rare, however note that impl blocks might be distributed across multiple files. IIRC servo has something like this. I might misremember though. There might be no such case in the perf's testsuite at all. I just stumbled onto this code and it felt bad. Anyways, the swap I add above should have an impact for smaller number of impl blocks already. Maybe two perf runs are needed? IDK.

bugadani · 2020-10-24T09:43:57Z

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

matthewjasper · 2020-10-24T10:10:15Z

@bors try @rust-timer queue

rust-timer · 2020-10-24T10:10:16Z

Awaiting bors try build completion

bors · 2020-10-24T10:10:25Z

⌛ Trying commit c17fe5be24fb07fd8f122c5cfd5cb1726d685f35 with merge 925eba0a75e6e25806ed3a4a654610329bd00254...

est31 · 2020-10-24T10:29:05Z

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

A single location... won't be much of an impact I guess especially considering the fact that for 20 the effect is still low, at least compared to what the rest of the compiler is doing.

Also note that I pulled the 20 out of my nose. It's been chosen on gut feelings alone. If there is an impact, it might be improved. If there's none, maybe it was chosen wrongly.

bors · 2020-10-24T10:55:17Z

☀️ Try build successful - checks-actions
Build commit: 925eba0a75e6e25806ed3a4a654610329bd00254 (925eba0a75e6e25806ed3a4a654610329bd00254)

rust-timer · 2020-10-24T10:55:19Z

Queued 925eba0a75e6e25806ed3a4a654610329bd00254 with parent 7bade6e, future comparison URL.

jonas-schievink · 2020-10-24T11:55:30Z

cc #69009

est31 · 2020-10-24T12:30:44Z

@jonas-schievink nice, someone had the same bright idea as me! I'd attribute the regression in that PR to cases where the brute-force search is better. The most extreme instance of this is probably two impl blocks, a short one followed by a very long one. If the order is right, the current O(n^2) search runs once over one of the short impl block, while looking up into the larger block. The linear algorithm of your PR would put both impl blocks into the hashmap and cause a lot of bureaucracy. In fact, a single long impl block would favour the O(n^2) search even better as.

This gives me lots of ideas to improve the algorithm...

The perf run of your PR shows (next to the regressions) a 2% improvement on syn-opt which is IMO a great achievement. Ten 2% improvements constitute a 19% improvement, 30 such improvements cut the time in half. Very encouraging!

rust-timer · 2020-10-24T13:06:48Z

Finished benchmarking try commit (925eba0a75e6e25806ed3a4a654610329bd00254): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

compiler/rustc_typeck/src/coherence/inherent_impls_overlap.rs

est31 · 2020-10-24T13:13:14Z

Up to 3.1% improvements for incr-unchanged of packed-simd-check... very nice.

Mark-Simulacrum · 2020-10-24T13:40:10Z

I would not look at incremental benchmarks on PRs like this - they're not likely to be super meaningful as an expression of improvement (or regression) because they are skipping parts of the build, and unless we think this optimization permits skipping more (seems unlikely), the percent win there is just larger because the base is smaller.

It looks like this is about a 1% win in practice instruction counts wise, but appears to be a regression on wall times (e.g. rustc middle regressed roughly 2 seconds, it looks like).

I think we are least don't have solid evidence this is a win on the code examples tested by perf. I'm not sure if perhaps switching algorithms when we detect n > 30 or something like that makes sense?

jonas-schievink · 2020-10-24T13:44:45Z

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

est31 · 2020-10-24T13:46:23Z

Oh right it's red down there... a bit unfortunate :/.

I think I should split up the PR as a next step. It's doing two separate optimizations, better measure them separately.

In any case, the improvement for packed-simd is real, as the crate_inherent_impls_overlap_check query used to take 1.96% of the build, now takes 0.1%.

@Mark-Simulacrum Do you know whether there is a way to show the query times for rustc-middle and other red crates?

Mark-Simulacrum · 2020-10-24T14:51:40Z

We don't currently record self profile data for rustc compilation.

est31 · 2020-10-24T14:57:32Z

FYI if you're wondering, NodeRef of the alloc crate has 26 inherent impl blocks.

est31 · 2020-10-24T15:05:56Z

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

Wow that crate has approximately a thousand inherent impl blocks per device that you can enable, and in total there are 11 features... If you enable them all you get 14043 impl blocks...

So it's definitely a good real case stress test for overlap checking. It also has a ton of name overlap, making the namespace checks very relevant. I think it's a good benchmark, thanks for suggesting it!

est31 · 2020-10-25T10:11:53Z

Note that the title of this PR as well as of #69009 can be misunderstood, as the entire impl overlap check in the worst case still runs in O(n^2). The worst case happens in the instance of stm32f4 where you have a) tons of impl blocks on the same generic struct but b) parameterized with different types so that they don't overlap. However, c) the impl blocks all contain the same function names, so the early check (that both my and @jonas-schievink 's PRs turn into linear time) registers an overlap, and gives the pair of impl blocks to the code that checks for a type wise overlap of the impl blocks. That function takes two impl blocks and its code is a bit more involved than the heuristics in the file changed by our PRs...

That being said, the earlier code has been changed from quadratic to linear time, and often there is no overlap. There's probably some way to speed up things for stm32f4 in particular by adding more heuristics to keep time down, but there will likely always be ways to construct quadratic time instances... need to look more into things...

est31 · 2020-12-07T02:10:35Z

The PR is ready for review!

I have done some smaller fixes that made it a bit slower than the version I ran the benchmarks on, e.g. I added sorting to make the output not depend on hash set iterator ordering. I don't want to run all benchmarks yet again (as they are currently quite manual), but this is a table about a subset of the benchmarks:

features	components	cargo +local-bare check	cargo +local check
stm32f401,stm32f405	2335,968	30.915s	30.495s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411	5424,2182	1m33.891s	1m28.858s
stm32f401,stm32f405,stm32f407,stm32f410, stm32f411,stm32f412,stm32f413,stm32f427	9404,3765	3m31.394s	3m18.193s

Also updated the table of the PR description.

r? @matthewjasper

est31 · 2020-12-07T22:37:43Z

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

est31 · 2020-12-17T16:01:25Z

@matthewjasper friendly ping!

jyn514 · 2020-12-18T15:44:37Z

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

I would expect this to be the main reason rustc is slow on diesel and stm32: rust-lang/rustc-perf#807 (comment)

est31 · 2020-12-18T16:58:21Z

@jyn514 this PR addresses the crate_inherent_impls_overlap_check pass. In your linked comment I couldn't see that pass appear at all, so I assume that it won't improve diesel's compile time.

matthewjasper · 2020-12-20T18:35:47Z

@bors r+

bors · 2020-12-20T18:35:48Z

📌 Commit 73a7d93 has been approved by matthewjasper

bors · 2020-12-20T19:33:03Z

⌛ Testing commit 73a7d93 with merge 3ce7ca6d86c4fd9982a4f2effc438bbfcb203ab2...

rust-log-analyzer · 2020-12-20T19:39:18Z

The job dist-s390x-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

[TIMING] StdLink { compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target_compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } -- 0.000
[TIMING] Std { target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None }, compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } } -- 88.479
Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1


build script failed, must exit now', /cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.44/src/lib.rs:885:5
 finished in 0.011 seconds
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host s390x-unknown-linux-gnu --target s390x-unknown-linux-gnu
Build completed unsuccessfully in 0:01:29

bors · 2020-12-20T19:40:59Z

💔 Test failed - checks-actions

jyn514 · 2020-12-20T19:47:57Z

Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1

cc @rust-lang/infra, I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

@bors retry

bors · 2020-12-20T19:54:20Z

⌛ Testing commit 73a7d93 with merge c609b2e...

bors · 2020-12-20T22:51:55Z

☀️ Test successful - checks-actions
Approved by: matthewjasper
Pushing c609b2e to master...

pietroalbini · 2020-12-21T12:46:06Z

I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

That seems to be a spurious network failure while downloading LLVM:

2020-12-20T19:34:57.8161875Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:34:57.8199484Z + [[ 1 -lt 5 ]]
2020-12-20T19:34:57.8199934Z + sleep 1
2020-12-20T19:34:58.8224385Z + (( n++ ))
2020-12-20T19:34:58.8225299Z Command failed. Attempt 2/5:
2020-12-20T19:34:58.8227627Z + echo 'Command failed. Attempt 2/5:'
2020-12-20T19:34:58.8228520Z + true
2020-12-20T19:34:58.8232046Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:03.1438299Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:03.1464331Z + [[ 2 -lt 5 ]]
2020-12-20T19:35:03.1464857Z + sleep 2
2020-12-20T19:35:05.1479280Z + (( n++ ))
2020-12-20T19:35:05.1481078Z + echo 'Command failed. Attempt 3/5:'
2020-12-20T19:35:05.1481971Z + true
2020-12-20T19:35:05.1482774Z Command failed. Attempt 3/5:
2020-12-20T19:35:05.1489474Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:14.8627824Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:14.8652110Z + [[ 3 -lt 5 ]]
2020-12-20T19:35:14.8652926Z + sleep 3
2020-12-20T19:35:17.8666398Z + (( n++ ))
2020-12-20T19:35:17.8666991Z Command failed. Attempt 4/5:
2020-12-20T19:35:17.8668227Z + echo 'Command failed. Attempt 4/5:'
2020-12-20T19:35:17.8668735Z + true
2020-12-20T19:35:17.8670734Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:21.0742337Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:21.0771810Z + [[ 4 -lt 5 ]]
2020-12-20T19:35:21.0772636Z + sleep 4
2020-12-20T19:35:25.0784632Z + (( n++ ))
2020-12-20T19:35:25.0785167Z Command failed. Attempt 5/5:
2020-12-20T19:35:25.0786668Z + echo 'Command failed. Attempt 5/5:'
2020-12-20T19:35:25.0787170Z + true
2020-12-20T19:35:25.0789147Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:31.2565957Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:31.2597156Z + [[ 5 -lt 5 ]]
2020-12-20T19:35:31.2598171Z The command has failed after 5 attempts.

rust-highfive assigned matthewjasper Oct 24, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 24, 2020

bugadani reviewed Oct 24, 2020

View reviewed changes

jonas-schievink reviewed Oct 24, 2020

View reviewed changes

compiler/rustc_typeck/src/coherence/inherent_impls_overlap.rs Outdated Show resolved Hide resolved

est31 force-pushed the linear_in_impl_count branch from c17fe5b to 4ff10e8 Compare October 24, 2020 13:53

This was referenced Oct 24, 2020

TyCtxt: generate single impl block with slice_interners macro #78318

Merged

Iterate over the smaller list #78323

Merged

jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 24, 2020

est31 force-pushed the linear_in_impl_count branch from ee55a93 to 73a7d93 Compare December 7, 2020 01:01

est31 marked this pull request as ready for review December 7, 2020 01:49

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 7, 2020

est31 mentioned this pull request Dec 15, 2020

Add stm327xx-hal benchmark rust-lang/rustc-perf#802

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2020

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 20, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 20, 2020

bors merged commit c609b2e into rust-lang:master Dec 20, 2020

rustbot added this to the 1.50.0 milestone Dec 20, 2020

tgnottingham mentioned this pull request Jan 22, 2021

Non linear runtime increasment for expand_crate pass while compiling diesel with 128-column-tables #81262

Open

Turn quadratic time on number of impl blocks into linear time #78317

Turn quadratic time on number of impl blocks into linear time #78317

Conversation

est31 commented Oct 24, 2020 • edited Loading

rust-highfive commented Oct 24, 2020

bugadani left a comment

Choose a reason for hiding this comment

bugadani Oct 24, 2020

Choose a reason for hiding this comment

est31 Oct 24, 2020

Choose a reason for hiding this comment

bugadani Oct 24, 2020

Choose a reason for hiding this comment

est31 commented Oct 24, 2020

bugadani commented Oct 24, 2020

matthewjasper commented Oct 24, 2020

rust-timer commented Oct 24, 2020

bors commented Oct 24, 2020

est31 commented Oct 24, 2020

bors commented Oct 24, 2020

rust-timer commented Oct 24, 2020

jonas-schievink commented Oct 24, 2020

est31 commented Oct 24, 2020

rust-timer commented Oct 24, 2020

est31 commented Oct 24, 2020

Mark-Simulacrum commented Oct 24, 2020

jonas-schievink commented Oct 24, 2020

est31 commented Oct 24, 2020 • edited Loading

Mark-Simulacrum commented Oct 24, 2020

est31 commented Oct 24, 2020

est31 commented Oct 24, 2020

est31 commented Oct 25, 2020

est31 commented Dec 7, 2020

est31 commented Dec 7, 2020

est31 commented Dec 17, 2020

jyn514 commented Dec 18, 2020

est31 commented Dec 18, 2020

matthewjasper commented Dec 20, 2020

bors commented Dec 20, 2020

bors commented Dec 20, 2020

rust-log-analyzer commented Dec 20, 2020

bors commented Dec 20, 2020

jyn514 commented Dec 20, 2020

bors commented Dec 20, 2020

bors commented Dec 20, 2020

pietroalbini commented Dec 21, 2020

est31 commented Oct 24, 2020 •

edited

Loading

est31 commented Oct 24, 2020 •

edited

Loading