Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn quadratic time on number of impl blocks into linear time #78317

Merged
merged 3 commits into from
Dec 20, 2020

Conversation

est31
Copy link
Member

@est31 est31 commented Oct 24, 2020

Previously, if you had a lot of inherent impl blocks on a type like:

struct Foo;

impl Foo { fn foo_1() {} }
// ...
impl Foo { fn foo_100_000() {} }

The compiler would be very slow at processing it, because
an internal algorithm would run in O(n^2), where n is the number
of impl blocks. Now, we add a new algorithm that allocates but
is faster asymptotically.

Comparing rustc nightly with a local build of rustc as of this PR (results in seconds):

N real time before real time after
4_000 0.57 0.46
8_000 1.31 0.84
16_000 3.56 1.69
32_000 10.60 3.73

I've tuned up the numbers to make the effect larger than the startup noise of rustc, but the asymptotic difference should hold for smaller n as well.

Note: current state of the PR omits error messages if there are other errors present already. For now, I'm mainly interested in a perf run to study whether this issue is present at all. Please queue one for this PR. Thanks!

@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 24, 2020
Copy link
Contributor

@bugadani bugadani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

// faster asymptotic runtime.
if impls.len() < 20 {
for (i, &impl1_def_id) in impls.iter().enumerate() {
for &impl2_def_id in &impls[(i + 1)..] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is how it is and there is no reason to change, but wouldn't cloning the iterator for the inner loop be nicer? There's no need to complicate the iteration when we only need to look ahead, I think.
Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK for me as someone coming from C/C++, iterator magic is more complicated than doing things with indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a background in C and I hate working with indices when I can avoid them, they are too much a footgun :)

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

Probably quite rare, however note that impl blocks might be distributed across multiple files. IIRC servo has something like this. I might misremember though. There might be no such case in the perf's testsuite at all. I just stumbled onto this code and it felt bad. Anyways, the swap I add above should have an impact for smaller number of impl blocks already. Maybe two perf runs are needed? IDK.

@bugadani
Copy link
Contributor

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

@matthewjasper
Copy link
Contributor

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Oct 24, 2020

⌛ Trying commit c17fe5be24fb07fd8f122c5cfd5cb1726d685f35 with merge 925eba0a75e6e25806ed3a4a654610329bd00254...

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

A single location... won't be much of an impact I guess especially considering the fact that for 20 the effect is still low, at least compared to what the rest of the compiler is doing.

Also note that I pulled the 20 out of my nose. It's been chosen on gut feelings alone. If there is an impact, it might be improved. If there's none, maybe it was chosen wrongly.

@bors
Copy link
Contributor

bors commented Oct 24, 2020

☀️ Try build successful - checks-actions
Build commit: 925eba0a75e6e25806ed3a4a654610329bd00254 (925eba0a75e6e25806ed3a4a654610329bd00254)

@rust-timer
Copy link
Collaborator

Queued 925eba0a75e6e25806ed3a4a654610329bd00254 with parent 7bade6e, future comparison URL.

@jonas-schievink
Copy link
Contributor

cc #69009

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

@jonas-schievink nice, someone had the same bright idea as me! I'd attribute the regression in that PR to cases where the brute-force search is better. The most extreme instance of this is probably two impl blocks, a short one followed by a very long one. If the order is right, the current O(n^2) search runs once over one of the short impl block, while looking up into the larger block. The linear algorithm of your PR would put both impl blocks into the hashmap and cause a lot of bureaucracy. In fact, a single long impl block would favour the O(n^2) search even better as.

This gives me lots of ideas to improve the algorithm...

The perf run of your PR shows (next to the regressions) a 2% improvement on syn-opt which is IMO a great achievement. Ten 2% improvements constitute a 19% improvement, 30 such improvements cut the time in half. Very encouraging!

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (925eba0a75e6e25806ed3a4a654610329bd00254): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

Up to 3.1% improvements for incr-unchanged of packed-simd-check... very nice.

@Mark-Simulacrum
Copy link
Member

I would not look at incremental benchmarks on PRs like this - they're not likely to be super meaningful as an expression of improvement (or regression) because they are skipping parts of the build, and unless we think this optimization permits skipping more (seems unlikely), the percent win there is just larger because the base is smaller.

It looks like this is about a 1% win in practice instruction counts wise, but appears to be a regression on wall times (e.g. rustc middle regressed roughly 2 seconds, it looks like).

I think we are least don't have solid evidence this is a win on the code examples tested by perf. I'm not sure if perhaps switching algorithms when we detect n > 30 or something like that makes sense?

@jonas-schievink
Copy link
Contributor

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

Oh right it's red down there... a bit unfortunate :/.

I think I should split up the PR as a next step. It's doing two separate optimizations, better measure them separately.

In any case, the improvement for packed-simd is real, as the crate_inherent_impls_overlap_check query used to take 1.96% of the build, now takes 0.1%.

@Mark-Simulacrum Do you know whether there is a way to show the query times for rustc-middle and other red crates?

@Mark-Simulacrum
Copy link
Member

We don't currently record self profile data for rustc compilation.

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

FYI if you're wondering, NodeRef of the alloc crate has 26 inherent impl blocks.

@est31
Copy link
Member Author

est31 commented Oct 24, 2020

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

Wow that crate has approximately a thousand inherent impl blocks per device that you can enable, and in total there are 11 features... If you enable them all you get 14043 impl blocks...

So it's definitely a good real case stress test for overlap checking. It also has a ton of name overlap, making the namespace checks very relevant. I think it's a good benchmark, thanks for suggesting it!

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 24, 2020
@est31
Copy link
Member Author

est31 commented Oct 25, 2020

Note that the title of this PR as well as of #69009 can be misunderstood, as the entire impl overlap check in the worst case still runs in O(n^2). The worst case happens in the instance of stm32f4 where you have a) tons of impl blocks on the same generic struct but b) parameterized with different types so that they don't overlap. However, c) the impl blocks all contain the same function names, so the early check (that both my and @jonas-schievink 's PRs turn into linear time) registers an overlap, and gives the pair of impl blocks to the code that checks for a type wise overlap of the impl blocks. That function takes two impl blocks and its code is a bit more involved than the heuristics in the file changed by our PRs...

That being said, the earlier code has been changed from quadratic to linear time, and often there is no overlap. There's probably some way to speed up things for stm32f4 in particular by adding more heuristics to keep time down, but there will likely always be ways to construct quadratic time instances... need to look more into things...

@est31 est31 marked this pull request as ready for review December 7, 2020 01:49
@est31
Copy link
Member Author

est31 commented Dec 7, 2020

The PR is ready for review!

I have done some smaller fixes that made it a bit slower than the version I ran the benchmarks on, e.g. I added sorting to make the output not depend on hash set iterator ordering. I don't want to run all benchmarks yet again (as they are currently quite manual), but this is a table about a subset of the benchmarks:

features components cargo +local-bare check cargo +local check
stm32f401,stm32f405 2335,968 30.915s 30.495s
stm32f401,stm32f405,stm32f407,stm32f410,
stm32f411
5424,2182 1m33.891s 1m28.858s
stm32f401,stm32f405,stm32f407,stm32f410,
stm32f411,stm32f412,stm32f413,stm32f427
9404,3765 3m31.394s 3m18.193s

Also updated the table of the PR description.

r? @matthewjasper

@est31
Copy link
Member Author

est31 commented Dec 7, 2020

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 7, 2020
@est31
Copy link
Member Author

est31 commented Dec 17, 2020

@matthewjasper friendly ping!

@jyn514
Copy link
Member

jyn514 commented Dec 18, 2020

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

I would expect this to be the main reason rustc is slow on diesel and stm32: rust-lang/rustc-perf#807 (comment)

@est31
Copy link
Member Author

est31 commented Dec 18, 2020

@jyn514 this PR addresses the crate_inherent_impls_overlap_check pass. In your linked comment I couldn't see that pass appear at all, so I assume that it won't improve diesel's compile time.

@matthewjasper
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Dec 20, 2020

📌 Commit 73a7d93 has been approved by matthewjasper

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2020
@bors
Copy link
Contributor

bors commented Dec 20, 2020

⌛ Testing commit 73a7d93 with merge 3ce7ca6d86c4fd9982a4f2effc438bbfcb203ab2...

@rust-log-analyzer
Copy link
Collaborator

The job dist-s390x-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING] StdLink { compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target_compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } -- 0.000
[TIMING] Std { target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None }, compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } } -- 88.479
Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1


build script failed, must exit now', /cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.44/src/lib.rs:885:5
 finished in 0.011 seconds
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host s390x-unknown-linux-gnu --target s390x-unknown-linux-gnu
Build completed unsuccessfully in 0:01:29

@bors
Copy link
Contributor

bors commented Dec 20, 2020

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 20, 2020
@jyn514
Copy link
Member

jyn514 commented Dec 20, 2020

Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1

cc @rust-lang/infra, I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2020
@bors
Copy link
Contributor

bors commented Dec 20, 2020

⌛ Testing commit 73a7d93 with merge c609b2e...

@bors
Copy link
Contributor

bors commented Dec 20, 2020

☀️ Test successful - checks-actions
Approved by: matthewjasper
Pushing c609b2e to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 20, 2020
@bors bors merged commit c609b2e into rust-lang:master Dec 20, 2020
@rustbot rustbot added this to the 1.50.0 milestone Dec 20, 2020
@pietroalbini
Copy link
Member

I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

That seems to be a spurious network failure while downloading LLVM:

2020-12-20T19:34:57.8161875Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:34:57.8199484Z + [[ 1 -lt 5 ]]
2020-12-20T19:34:57.8199934Z + sleep 1
2020-12-20T19:34:58.8224385Z + (( n++ ))
2020-12-20T19:34:58.8225299Z Command failed. Attempt 2/5:
2020-12-20T19:34:58.8227627Z + echo 'Command failed. Attempt 2/5:'
2020-12-20T19:34:58.8228520Z + true
2020-12-20T19:34:58.8232046Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:03.1438299Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:03.1464331Z + [[ 2 -lt 5 ]]
2020-12-20T19:35:03.1464857Z + sleep 2
2020-12-20T19:35:05.1479280Z + (( n++ ))
2020-12-20T19:35:05.1481078Z + echo 'Command failed. Attempt 3/5:'
2020-12-20T19:35:05.1481971Z + true
2020-12-20T19:35:05.1482774Z Command failed. Attempt 3/5:
2020-12-20T19:35:05.1489474Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:14.8627824Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:14.8652110Z + [[ 3 -lt 5 ]]
2020-12-20T19:35:14.8652926Z + sleep 3
2020-12-20T19:35:17.8666398Z + (( n++ ))
2020-12-20T19:35:17.8666991Z Command failed. Attempt 4/5:
2020-12-20T19:35:17.8668227Z + echo 'Command failed. Attempt 4/5:'
2020-12-20T19:35:17.8668735Z + true
2020-12-20T19:35:17.8670734Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:21.0742337Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:21.0771810Z + [[ 4 -lt 5 ]]
2020-12-20T19:35:21.0772636Z + sleep 4
2020-12-20T19:35:25.0784632Z + (( n++ ))
2020-12-20T19:35:25.0785167Z Command failed. Attempt 5/5:
2020-12-20T19:35:25.0786668Z + echo 'Command failed. Attempt 5/5:'
2020-12-20T19:35:25.0787170Z + true
2020-12-20T19:35:25.0789147Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:31.2565957Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:31.2597156Z + [[ 5 -lt 5 ]]
2020-12-20T19:35:31.2598171Z The command has failed after 5 attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet