Speed up `SparseBitMatrix` use in `RegionValues`. #52250

nnethercote · 2018-07-11T10:33:11Z

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of inflate by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of clap-rs by 30% and a couple of others by
up to 5%, while decreasing max-rss of coercions by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis

nnethercote · 2018-07-11T10:35:28Z

Here are the instruction count improvements exceeding 1%:

inflate-check
        avg: -32.4%     min: -32.4%     max: -32.4%
style-servo-check
        avg: -4.5%     min: -4.5%     max: -4.5%
clap-rs-check
        avg: -2.3%      min: -2.3%      max: -2.3%
coercions-check
        avg: -2.3%     min: -2.3%     max: -2.3%
sentry-cli-check
        avg: -1.5%      min: -1.5%      max: -1.5%
webrender-check
        avg: -1.5%      min: -1.5%      max: -1.5%
cargo-check
        avg: -1.5%      min: -1.5%      max: -1.5%
encoding-check
        avg: -1.5%      min: -1.5%      max: -1.5%
ripgrep-check
        avg: -1.3%      min: -1.3%      max: -1.3%
regex-check
        avg: -1.2%      min: -1.2%      max: -1.2%

Here are the max-rss changes exceeding 1%:

clap-rs-check
        avg: 29.5%      min: 29.5%      max: 29.5%
coercions-check
        avg: -14.8%    min: -14.8%    max: -14.8%
inflate-check
        avg: 5.7%       min: 5.7%       max: 5.7%
regression-31157-check
        avg: 1.6%       min: 1.6%       max: 1.6%
syn-check
        avg: 1.0%       min: 1.0%       max: 1.0%
helloworld-check
        avg: -1.0%      min: -1.0%      max: -1.0%
regex-check
        avg: 1.0%       min: 1.0%       max: 1.0%

nnethercote · 2018-07-11T10:39:38Z

Here are some measurement of how full the BitMatrix instances get, for inflate

(  1)       24 (46.2%, 46.2%): after: 2 x 4 = 8; 6 used (75%)
(  2)        4 ( 7.7%, 53.8%): after: 2 x 7 = 14; 12 used (85.71%)
(  3)        2 ( 3.8%, 57.7%): after: 4 x 5 = 20; 12 used (60%)
(  4)        1 ( 1.9%, 59.6%): after: 120 x 835 = 100200; 22099 used (22.05%)
(  5)        1 ( 1.9%, 61.5%): after: 16 x 29 = 464; 250 used (53.87%)
(  6)        1 ( 1.9%, 63.5%): after: 2 x 37 = 74; 72 used (97.29%)
(  7)        1 ( 1.9%, 65.4%): after: 87 x 463 = 40281; 4158 used (10.32%)
(  8)        1 ( 1.9%, 67.3%): after: 4 x 8 = 32; 24 used (75%)
(  9)        1 ( 1.9%, 69.2%): after: 40 x 52 = 2080; 865 used (41.58%)
( 10)        1 ( 1.9%, 71.2%): after: 42 x 454 = 19068; 1902 used (9.97%)
( 11)        1 ( 1.9%, 73.1%): after: 18 x 51 = 918; 522 used (56.86%)
( 12)        1 ( 1.9%, 75.0%): after: 12 x 58 = 696; 503 used (72.27%)
( 13)        1 ( 1.9%, 76.9%): after: 132 x 506 = 66792; 15589 used (23.33%)
( 14)        1 ( 1.9%, 78.8%): after: 18 x 9 = 162; 96 used (59.25%)
( 15)        1 ( 1.9%, 80.8%): after: 12 x 38 = 456; 319 used (69.95%)
( 16)        1 ( 1.9%, 82.7%): after: 2 x 15 = 30; 28 used (93.33%)
( 17)        1 ( 1.9%, 84.6%): after: 4912 x 40782 = 200321184; 39886050 used (19.91%)
( 18)        1 ( 1.9%, 86.5%): after: 2 x 23 = 46; 44 used (95.65%)
( 19)        1 ( 1.9%, 88.5%): after: 101 x 501 = 50601; 5036 used (9.95%)
( 20)        1 ( 1.9%, 90.4%): after: 17 x 133 = 2261; 1063 used (47.01%)
( 21)        1 ( 1.9%, 92.3%): after: 24 x 172 = 4128; 450 used (10.90%)
( 22)        1 ( 1.9%, 94.2%): after: 52 x 202 = 10504; 5022 used (47.81%)
( 23)        1 ( 1.9%, 96.2%): after: 81 x 338 = 27378; 14638 used (53.46%)
( 24)        1 ( 1.9%, 98.1%): after: 18 x 90 = 1620; 1052 used (64.93%)
( 25)        1 ( 1.9%,100.0%): after: 11 x 15 = 165; 130 used (78.78%)

Note the very large one for (17) which dominates.

style-servo is broadly similar, though it has a number of larger ones, instead of being dominated by a single large one.

nnethercote · 2018-07-11T12:50:06Z

I just measured html5ever with NLL as well. It reduces its instruction count by 35%, and its max-rss by 10%.

nnethercote · 2018-07-11T12:58:30Z

The max-rss increase for clap-rs is because of one very large BitMatrix:

  after: 25897 x 24965 = 646518605; 94492819 used (14.61%)

This is 77.5 MiB, and it gets doubled because it gets cloned here:

rust/src/librustc_mir/borrow_check/nll/region_infer/mod.rs

Line 421 in a178cba

let mut inferred_values = self.liveness_constraints.clone();

I tried getting rid of that clone -- which would greatly reduce the max-rss increase -- by transferring ownership of the BitMatrix from self.liveness_constraints to self.inferred_values (which required making liveness_constaints an Option<RegionValues>) but it caused test failures -- looks like liveness_constraints is used for error message production after inferred_values is created.

Anyway, even if the clone remains, some benchmarks take more memory but some take less, so it's basically a wash on that front, and the speed improvements are large enough to make this compelling.

nnethercote · 2018-07-12T02:21:49Z

I was able to speed up inflate and clap-rs a bit more by optimizing BitVector::merge some more.

nikomatsakis · 2018-07-12T06:21:54Z

Hmm, this change will interact poorly with @davidtwco's changes in #52190, because I think that in that context we don't know the number of region variables when we allocate the RegionValues.

We might want a kind of hybrid -- maybe we want to modify DenseMatrix to use an IndexVec<BitSet> instead of one big allocation?

(The family of bitset types also needs a bit of cleanup... this change though might allow us to remove the "buf vs slice" distinction which would simplify things.)

nikomatsakis · 2018-07-12T07:16:23Z

@nnethercote note that the final values will probably be affected also by rebasing over #51987, which .. modifies that clone sort of. (The clone is removed, but a variant of it remains.)

We could probably free the liveness matrix at some point, though it wouldn't affect peak memory usage. It would potentially require a bit of work on the diagnostic side.

nikomatsakis · 2018-07-12T07:37:04Z

Hmm, #51987 also reduces inflate-check's running time dramatically (by 43%). I would not however expect these two to "multiply" -- rather I suspect the benefits of this PR may be subsumed by #51987, since it reduces dramatically the number of sparse matrix merges that we do.

nikomatsakis · 2018-07-12T07:40:42Z

Probably worth testing, in any case.

nnethercote · 2018-07-12T08:57:04Z

Yes, the benefit here is entirely from making matrix merges faster.

I guess I'll wait until #51987 and #52190 play out and see if this PR still makes sense. This PR has a large effect for a small change, hopefully those two other PRs have as big or bigger effect.

nnethercote · 2018-07-13T00:52:42Z

I just got "try" privileges, so I'm doing to test them in this PR.

@bors try

bors · 2018-07-13T00:52:43Z

@nnethercote: 🔑 Insufficient privileges: not in try users

bors · 2018-07-13T15:49:56Z

☔ The latest upstream changes (presumably #51987) made this pull request unmergeable. Please resolve the merge conflicts.

nikomatsakis · 2018-07-17T18:59:17Z

OK, so #51987 has landed -- @nnethercote do you have thoughts on whether it makes sense to continue with this PR?

nikomatsakis · 2018-07-18T18:43:56Z

Hmm, those results look great! One concern though: in the branch I'm working on, I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change. I'm thinking about how to solve this -- one way might be to split up the matrices into pieces. So for example we could store one matrix for points and then a separate matrix for regions.

nnethercote · 2018-07-18T21:45:15Z

I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change.

If the number of rows is growing, it should be fine as is. If the number of columns is growing, then that's different... it should be possible to make the number of columns in SparseBitMatrix extensible, though the added flexibility will likely shrink the size of the wins here.

nikomatsakis · 2018-07-19T19:04:13Z

It's the number of columns that changes, yes, but I suspect I may be able to finesse it by breaking things into two matrices -- one for "points" and one for "regions" -- and allocating them at separate times. (In other words, we'd wait to allocate the region matrix until we know its proper size.)

To that end, I think we should probably land this PR, and I can try to rebase over it.

nikomatsakis · 2018-07-19T20:57:08Z

@bors r+

bors · 2018-07-19T20:57:09Z

📌 Commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee has been approved by nikomatsakis

bors · 2018-07-20T01:42:44Z

⌛ Testing commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee with merge 4ebbaa1809e39d08dfc02e86b3c59612e61a7eed...

bors · 2018-07-20T02:52:14Z

💔 Test failed - status-appveyor

Using a `BTreeMap` to represent rows in the bit matrix is really slow. This patch changes things so that each row is represented by a `BitVector`. This is a less sparse representation, but a much faster one. As a result, `SparseBitSet` and `SparseChunk` can be removed. Other minor changes in this patch. - It renames `BitVector::insert()` as `merge()`, which matches the terminology in the other classes in bitvec.rs. - It removes `SparseBitMatrix::is_subset()`, which is unused. - It reinstates `RegionValueElements::num_elements()`, which rust-lang#52190 had removed. - It removes a low-value `debug!` call in `SparseBitMatrix::add()`.

nnethercote · 2018-07-20T06:58:31Z

@bors retry

nnethercote · 2018-07-20T22:14:31Z

Apparently I have "try" permissions but not "retry" permissions. @nikomatsakis, can you reapprove this? Thanks.

kennytm · 2018-07-20T22:17:51Z

@nnethercote retry works only if you haven't pushed anything new. Pushing a new commit does require re-r+.

nikomatsakis · 2018-07-21T11:31:46Z

@bors delegate=nnethercote

bors · 2018-07-21T11:31:46Z

✌️ @nnethercote can now approve this pull request

nikomatsakis · 2018-07-21T11:31:49Z

@bors r+

bors · 2018-07-21T11:31:50Z

📌 Commit 798209e has been approved by nikomatsakis

bors · 2018-07-22T02:44:04Z

⌛ Testing commit 798209e with merge a57d5d7...

@nikomatsakis

Speed up `SparseBitMatrix` use in `RegionValues`. In practice, these matrices range from 10% to 90%+ full once they are filled in, so the dense representation is better. This reduces the runtime of Check Nll builds of `inflate` by 32%, and several other benchmarks by 1--5%. It also increases max-rss of `clap-rs` by 30% and a couple of others by up to 5%, while decreasing max-rss of `coercions` by 14%. I think the speed-ups justify the max-rss increases. r? @nikomatsakis

bors · 2018-07-22T04:46:42Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing a57d5d7 to master...

rust-highfive assigned nikomatsakis Jul 11, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 11, 2018

nnethercote force-pushed the no-SparseBitMatrix branch from 6556f64 to 63abbed Compare July 11, 2018 12:51

nnethercote mentioned this pull request Jul 11, 2018

html5ever in the rustc-perf repository is memory-intensive #52028

Closed

nnethercote force-pushed the no-SparseBitMatrix branch from 63abbed to 7786a74 Compare July 11, 2018 23:01

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2018

kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2018

kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 14, 2018

nnethercote changed the title ~~Use BitMatrix instead of SparseBitMatrix in RegionValues.~~ Speed up SparseBitMatrix use in RegionValues. Jul 18, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 19, 2018

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 20, 2018

nnethercote force-pushed the no-SparseBitMatrix branch from 9bfd1c1 to 798209e Compare July 20, 2018 06:57

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 20, 2018

bors merged commit 798209e into rust-lang:master Jul 22, 2018

bors mentioned this pull request Jul 22, 2018

introduce universes to NLL type check #52488

Merged

nnethercote deleted the no-SparseBitMatrix branch July 22, 2018 07:44

pnkfelix mentioned this pull request Oct 9, 2018

error: internal compiler error: Accessing (*_310) with the kind Write(Move) shouldn't be possible #54597

Closed

Speed up SparseBitMatrix use in RegionValues. #52250

Speed up SparseBitMatrix use in RegionValues. #52250

Uh oh!

Conversation

nnethercote commented Jul 11, 2018

Uh oh!

nnethercote commented Jul 11, 2018

Uh oh!

nnethercote commented Jul 11, 2018

Uh oh!

nnethercote commented Jul 11, 2018

Uh oh!

nnethercote commented Jul 11, 2018

Uh oh!

nnethercote commented Jul 12, 2018

Uh oh!

nikomatsakis commented Jul 12, 2018

Uh oh!

nikomatsakis commented Jul 12, 2018

Uh oh!

nikomatsakis commented Jul 12, 2018

Uh oh!

nikomatsakis commented Jul 12, 2018

Uh oh!

nnethercote commented Jul 12, 2018

Uh oh!

nnethercote commented Jul 13, 2018

Uh oh!

bors commented Jul 13, 2018

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

bors commented Jul 13, 2018

Uh oh!

nikomatsakis commented Jul 17, 2018

Uh oh!

nikomatsakis commented Jul 18, 2018

Uh oh!

nnethercote commented Jul 18, 2018

Uh oh!

nikomatsakis commented Jul 19, 2018

Uh oh!

nikomatsakis commented Jul 19, 2018

Uh oh!

bors commented Jul 19, 2018

Uh oh!

bors commented Jul 20, 2018

Uh oh!

bors commented Jul 20, 2018

Uh oh!

nnethercote commented Jul 20, 2018

Uh oh!

nnethercote commented Jul 20, 2018

Uh oh!

kennytm commented Jul 20, 2018

Uh oh!

nikomatsakis commented Jul 21, 2018

Uh oh!

bors commented Jul 21, 2018

Uh oh!

nikomatsakis commented Jul 21, 2018

Uh oh!

bors commented Jul 21, 2018

Uh oh!

bors commented Jul 22, 2018

Uh oh!

bors commented Jul 22, 2018

Uh oh!

Uh oh!

Speed up `SparseBitMatrix` use in `RegionValues`. #52250

Speed up `SparseBitMatrix` use in `RegionValues`. #52250