Make `RawVec::grow` mostly non-generic. #72013

nnethercote · 2020-05-08T13:09:09Z

cargo-llvm-lines shows that, in various benchmarks, RawVec::grow is
instantiated 10s or 100s of times and accounts for 1-8% of lines of
generated LLVM IR.

This commit moves most of RawVec::grow into a separate function that
isn't parameterized by T, which means it doesn't need to be
instantiated many times. This reduces compile time significantly.

r? @ghost

nnethercote · 2020-05-08T13:19:45Z

Here is some sample cargo-llvm-lines output for the webrender benchmark, showing just the lines relevant to RawVec::grow:

  Lines        Copies         Function name
  -----        ------         -------------
1012499 (100%)  26143 (100%)  (TOTAL)
  56700 (5.6%)    108 (0.4%)  alloc::raw_vec::RawVec<T,A>::grow
   9864 (1.0%)    137 (0.5%)  core::alloc::layout::Layout::array
   8208 (0.8%)    432 (1.7%)  alloc::raw_vec::RawVec<T,A>::grow::{{closure}}
   7128 (0.7%)    108 (0.4%)  alloc::raw_vec::RawVec<T,A>::current_memory

The current patch gets rid of most of that.

My local perf results are mostly good, with instruction counts reductions of up to 9%, but a few small regressions. I'm not quite sure where the regressions are coming from, I will investigate that more on Monday.

The code is in draft form, and needs cleaning up before being properly reviewed. @Amanieu may be interested.

nnethercote · 2020-05-08T13:20:16Z

@bors try @rust-timer queue

rust-timer · 2020-05-08T13:20:17Z

Awaiting bors try build completion

bors · 2020-05-08T13:20:27Z

⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge d4d11c4e38c5b4fe42c2d2c10124bc45ba1fbcc8...

src/liballoc/raw_vec.rs

Mark-Simulacrum · 2020-05-08T13:36:12Z

cc @davidtwco @nikomatsakis -- this seems relevant for the polymorphization efforts, shows at least one potential big win

nikomatsakis · 2020-05-08T13:43:17Z

Yes, it does!

bors · 2020-05-08T17:35:27Z

💥 Test timed out

nnethercote · 2020-05-08T22:15:15Z

@bors try

bors · 2020-05-08T22:15:25Z

⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge cee9586dc574582deea517fa2d0eaeeb882167e3...

bors · 2020-05-09T01:10:50Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: cee9586dc574582deea517fa2d0eaeeb882167e3 (cee9586dc574582deea517fa2d0eaeeb882167e3)

rust-timer · 2020-05-09T01:10:51Z

Queued cee9586dc574582deea517fa2d0eaeeb882167e3 with parent 7b80539, future comparison URL.

rust-timer · 2020-05-09T10:31:35Z

Finished benchmarking try commit cee9586dc574582deea517fa2d0eaeeb882167e3, comparison URL.

nnethercote · 2020-05-09T22:30:36Z

The perf results are all over the place. They're easier to understand if you focus on two subsets.

debug-full: This shows the potential. Lots of wins, the best being 6.6%. (I saw a best win of 9.1% locally, but that was without the recent wins from LLVM bitcode removal, which would overlap somewhat with these wins.)
check-full: This shows mostly regressions, of up to 1.5%. The current code has presumably slowed RawVec::grow down somewhat.

Hopefully I can fix the slowdowns without too much trouble. I will investigate that tomorrow.

nnethercote · 2020-05-11T02:08:49Z

I think the slowdowns are caused by worse code being generated in some cases due to the Layout computations now being dynamic rather than static. For example, there is a codegen test codegen/vec-iter-collect-len.rs that contains the expression [1, 2, 3].iter().collect::<Vec<_>>().len(). It's supposed to boil down to 3 in the generated LLVM IR, but in at least some of my working versions of this PR I've seen the test fail because the optimization is inhibited and it generates lots of LLVM IR.

I'm taking a slightly different tack now, trying to keep those Layout computations within RawVec::grow so that the computations involving mem::{size,align}_of::<T>() are static.

nnethercote · 2020-05-11T02:24:11Z

BTW, for the attached patches, I've seen reductions in the number of lines of LLVM IR generate as high as 15% (for syn). I was undercounting because some additional instantiations that are outside of alloc are avoided, esp. Result::map_err.

nnethercote · 2020-05-11T10:19:35Z

I've reworked the code significantly, giving wins that are slightly smaller than before, but avoiding the vast majority of the losses.

There is scope for pushing harder on moving stuff out of RawVec, but when I try it the losses start to come back to some extent. I'm not sure why, possibly when RawVec::grow_* get small enough then inlining decisions start changing.

nnethercote · 2020-05-11T10:19:48Z

@bors try @rust-timer queue

rust-timer · 2020-05-11T10:19:49Z

Awaiting bors try build completion

bors · 2020-05-11T10:19:58Z

⌛ Trying commit 77aa42ca0e3b632fee16bcd3fe23ab33ce5c066b with merge 78ecf2ce2428bc1c359a284c6dc8bc33246879ac...

bors · 2020-05-11T13:29:03Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 78ecf2ce2428bc1c359a284c6dc8bc33246879ac (78ecf2ce2428bc1c359a284c6dc8bc33246879ac)

rust-timer · 2020-05-11T13:29:05Z

Queued 78ecf2ce2428bc1c359a284c6dc8bc33246879ac with parent aeb4738, future comparison URL.

nnethercote · 2020-05-11T22:32:15Z

Perf results are looking pretty good. Debug builds have some wins of up to 5.7%. Opt builds have a few wins, up to 1.7%. Check builds mostly are very slightly regressed, typically by 0.2%.

I will fiddle with this some more today, see if I can make it any better.

It's unused.

bors · 2020-05-12T10:44:10Z

📌 Commit 68b7503 has been approved by Amanieu

nnethercote · 2020-05-12T10:52:16Z

@bors rollup=never

Because it affects perf.

Dylan-DPC-zz · 2020-05-13T13:01:23Z

@bors p=1

bors · 2020-05-13T14:30:05Z

⌛ Testing commit 68b7503 with merge 75e1463...

bors · 2020-05-13T17:58:45Z

☀️ Test successful - checks-actions, checks-azure
Approved by: Amanieu
Pushing 75e1463 to master...

Currently, if you repeatedly push to an empty vector, the capacity growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the relevant code (the "amortized" growth strategy) to skip 1 and 2 in most cases, instead using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.) This idea (along with the phrase "tiny Vecs are dumb") comes from the "doubling" growth strategy that was removed from `RawVec` in rust-lang#72013. That strategy was barely ever used -- only when a `VecDeque` was grown, oddly enough -- which is why it was removed in rust-lang#72013. (Fun fact: until just a few days ago, I thought the "doubling" strategy was used for repeated push case. In other words, this commit makes `Vec`s behave the way I always thought they behaved.) This change reduces the number of allocations done by rustc itself by 10% or more. It speeds up rustc, and will also speed up any other Rust program that uses `Vec`s a lot.

nnethercote · 2020-05-19T04:39:39Z

The final perf improvements are here.

@Amanieu

…nieu Tiny Vecs are dumb. Currently, if you repeatedly push to an empty vector, the capacity growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the relevant code (the "amortized" growth strategy) to skip 1 and 2, instead using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.) This idea (along with the phrase "tiny Vecs are dumb") comes from the "doubling" growth strategy that was removed from `RawVec` in rust-lang#72013. That strategy was barely ever used -- only when a `VecDeque` was grown, oddly enough -- which is why it was removed in rust-lang#72013. (Fun fact: until just a few days ago, I thought the "doubling" strategy was used for repeated push case. In other words, this commit makes `Vec`s behave the way I always thought they behaved.) This change reduces the number of allocations done by rustc itself by 10% or more. It speeds up rustc, and will also speed up any other Rust program that uses `Vec`s a lot. In theory, the change could increase memory usage, but in practice it doesn't. It would be an unusual program where very small `Vec`s having a capacity of 4 rather than 1 or 2 would make a difference. You'd need a *lot* of very small `Vec`s, and/or some very small `Vec`s with very large elements. r? @Amanieu

tamird · 2020-05-23T16:06:50Z

src/liballoc/collections/vec_deque.rs

-    #[inline]
-    fn grow_if_necessary(&mut self) {
+    #[inline(never)]
+    fn grow(&mut self) {
        if self.is_full() {


this check is duplicated now, isn't it?

Agree - a better change might have been to keep the inline function but have it call a never inlined grow_always instead. Should lead to the same code generation in the end.

SimonSapin · 2020-06-03T09:46:30Z

In servo/servo#26713 (comment) grow_amortized introduced in this PR is still near the top of cargo llvm-lines, for two of Servo’s largest crates. Dividing by the "copies" column shows that each monomophization takes 165 lines of IR.

I really don’t have a good sense of scale, does that sound like a lot? Given “we want it to be as small as possible” in the code comment.

nnethercote · 2020-06-03T10:13:13Z

I have tried shrinking grow_amortized some more, but it's hard to go much further without hurting runtime performance. I will keep trying, though.

SimonSapin · 2020-06-03T11:14:31Z

I understand it’s not easy, and I’m sure this PR has already improved things. I was wondering how reasonable 165 lines sounds, but maybe the easiest would be to look at those lines and see what they do.

Amanieu reviewed May 8, 2020

View reviewed changes

src/liballoc/raw_vec.rs Outdated Show resolved Hide resolved

bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 8, 2020

nnethercote mentioned this pull request May 8, 2020

Add totals and percentages for lines/copies. dtolnay/cargo-llvm-lines#15

Merged

nnethercote force-pushed the make-RawVec-grow-mostly-non-generic branch from 6d2926b to 77aa42c Compare May 11, 2020 10:16

Remove RawVec::double_in_place.

a3cc435

It's unused.

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 12, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label May 13, 2020

bors merged commit 75e1463 into rust-lang:master May 13, 2020

bors mentioned this pull request May 13, 2020

Add generic parameter for an allocator to collection types and box-like structures. #71873

Closed

nnethercote deleted the make-RawVec-grow-mostly-non-generic branch May 13, 2020 22:47

nnethercote mentioned this pull request May 15, 2020

Tiny Vecs are dumb. #72227

Merged

nnethercote mentioned this pull request May 15, 2020

Provide a tool to see what user code is causing rustc to use lots of time rust-lang/measureme#51

Open

tamird reviewed May 23, 2020

View reviewed changes

nnethercote mentioned this pull request May 26, 2020

Build time regression from the Compound type ron-rs/ron#239

Open

panstromek mentioned this pull request Jun 12, 2020

rust-analyzer is slow to compile rust-lang/rust-analyzer#1987

Closed

andjo403 mentioned this pull request Sep 22, 2020

use constants to generate less llvm-ir for raw_vec functions #77068

Closed

nnethercote mentioned this pull request Dec 13, 2021

Make Vec::push as non-generic as possible #91848

Closed

Make RawVec::grow mostly non-generic. #72013

Make RawVec::grow mostly non-generic. #72013

Uh oh!

Conversation

nnethercote commented May 8, 2020

Uh oh!

nnethercote commented May 8, 2020

Uh oh!

nnethercote commented May 8, 2020

Uh oh!

rust-timer commented May 8, 2020

Uh oh!

bors commented May 8, 2020

Uh oh!

Uh oh!

Mark-Simulacrum commented May 8, 2020

Uh oh!

nikomatsakis commented May 8, 2020

Uh oh!

bors commented May 8, 2020

Uh oh!

nnethercote commented May 8, 2020

Uh oh!

bors commented May 8, 2020

Uh oh!

bors commented May 9, 2020

Uh oh!

rust-timer commented May 9, 2020

Uh oh!

rust-timer commented May 9, 2020

Uh oh!

nnethercote commented May 9, 2020

Uh oh!

nnethercote commented May 11, 2020

Uh oh!

nnethercote commented May 11, 2020

Uh oh!

nnethercote commented May 11, 2020

Uh oh!

nnethercote commented May 11, 2020

Uh oh!

rust-timer commented May 11, 2020

Uh oh!

bors commented May 11, 2020

Uh oh!

bors commented May 11, 2020

Uh oh!

rust-timer commented May 11, 2020

Uh oh!

nnethercote commented May 11, 2020

Uh oh!

bors commented May 12, 2020

Uh oh!

nnethercote commented May 12, 2020

Uh oh!

Dylan-DPC-zz commented May 13, 2020

Uh oh!

bors commented May 13, 2020

Uh oh!

bors commented May 13, 2020

Uh oh!

nnethercote commented May 19, 2020

Uh oh!

tamird May 23, 2020

Choose a reason for hiding this comment

Uh oh!

LionsAd Jun 24, 2020

Choose a reason for hiding this comment

Uh oh!

SimonSapin commented Jun 3, 2020

Uh oh!

nnethercote commented Jun 3, 2020

Uh oh!

SimonSapin commented Jun 3, 2020

Uh oh!

Uh oh!

Make `RawVec::grow` mostly non-generic. #72013

Make `RawVec::grow` mostly non-generic. #72013