New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up `SparseBitMatrix` use in `RegionValues`. #52250

Merged
merged 1 commit into from Jul 22, 2018

Conversation

Projects
None yet
7 participants
@nnethercote
Contributor

nnethercote commented Jul 11, 2018

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of inflate by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of clap-rs by 30% and a couple of others by
up to 5%, while decreasing max-rss of coercions by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 11, 2018

Here are the instruction count improvements exceeding 1%:

inflate-check
        avg: -32.4%     min: -32.4%     max: -32.4%
style-servo-check
        avg: -4.5%     min: -4.5%     max: -4.5%
clap-rs-check
        avg: -2.3%      min: -2.3%      max: -2.3%
coercions-check
        avg: -2.3%     min: -2.3%     max: -2.3%
sentry-cli-check
        avg: -1.5%      min: -1.5%      max: -1.5%
webrender-check
        avg: -1.5%      min: -1.5%      max: -1.5%
cargo-check
        avg: -1.5%      min: -1.5%      max: -1.5%
encoding-check
        avg: -1.5%      min: -1.5%      max: -1.5%
ripgrep-check
        avg: -1.3%      min: -1.3%      max: -1.3%
regex-check
        avg: -1.2%      min: -1.2%      max: -1.2%

Here are the max-rss changes exceeding 1%:

clap-rs-check
        avg: 29.5%      min: 29.5%      max: 29.5%
coercions-check
        avg: -14.8%    min: -14.8%    max: -14.8%
inflate-check
        avg: 5.7%       min: 5.7%       max: 5.7%
regression-31157-check
        avg: 1.6%       min: 1.6%       max: 1.6%
syn-check
        avg: 1.0%       min: 1.0%       max: 1.0%
helloworld-check
        avg: -1.0%      min: -1.0%      max: -1.0%
regex-check
        avg: 1.0%       min: 1.0%       max: 1.0%
@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 11, 2018

Here are some measurement of how full the BitMatrix instances get, for inflate

(  1)       24 (46.2%, 46.2%): after: 2 x 4 = 8; 6 used (75%)
(  2)        4 ( 7.7%, 53.8%): after: 2 x 7 = 14; 12 used (85.71%)
(  3)        2 ( 3.8%, 57.7%): after: 4 x 5 = 20; 12 used (60%)
(  4)        1 ( 1.9%, 59.6%): after: 120 x 835 = 100200; 22099 used (22.05%)
(  5)        1 ( 1.9%, 61.5%): after: 16 x 29 = 464; 250 used (53.87%)
(  6)        1 ( 1.9%, 63.5%): after: 2 x 37 = 74; 72 used (97.29%)
(  7)        1 ( 1.9%, 65.4%): after: 87 x 463 = 40281; 4158 used (10.32%)
(  8)        1 ( 1.9%, 67.3%): after: 4 x 8 = 32; 24 used (75%)
(  9)        1 ( 1.9%, 69.2%): after: 40 x 52 = 2080; 865 used (41.58%)
( 10)        1 ( 1.9%, 71.2%): after: 42 x 454 = 19068; 1902 used (9.97%)
( 11)        1 ( 1.9%, 73.1%): after: 18 x 51 = 918; 522 used (56.86%)
( 12)        1 ( 1.9%, 75.0%): after: 12 x 58 = 696; 503 used (72.27%)
( 13)        1 ( 1.9%, 76.9%): after: 132 x 506 = 66792; 15589 used (23.33%)
( 14)        1 ( 1.9%, 78.8%): after: 18 x 9 = 162; 96 used (59.25%)
( 15)        1 ( 1.9%, 80.8%): after: 12 x 38 = 456; 319 used (69.95%)
( 16)        1 ( 1.9%, 82.7%): after: 2 x 15 = 30; 28 used (93.33%)
( 17)        1 ( 1.9%, 84.6%): after: 4912 x 40782 = 200321184; 39886050 used (19.91%)
( 18)        1 ( 1.9%, 86.5%): after: 2 x 23 = 46; 44 used (95.65%)
( 19)        1 ( 1.9%, 88.5%): after: 101 x 501 = 50601; 5036 used (9.95%)
( 20)        1 ( 1.9%, 90.4%): after: 17 x 133 = 2261; 1063 used (47.01%)
( 21)        1 ( 1.9%, 92.3%): after: 24 x 172 = 4128; 450 used (10.90%)
( 22)        1 ( 1.9%, 94.2%): after: 52 x 202 = 10504; 5022 used (47.81%)
( 23)        1 ( 1.9%, 96.2%): after: 81 x 338 = 27378; 14638 used (53.46%)
( 24)        1 ( 1.9%, 98.1%): after: 18 x 90 = 1620; 1052 used (64.93%)
( 25)        1 ( 1.9%,100.0%): after: 11 x 15 = 165; 130 used (78.78%)

Note the very large one for (17) which dominates.

style-servo is broadly similar, though it has a number of larger ones, instead of being dominated by a single large one.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 11, 2018

I just measured html5ever with NLL as well. It reduces its instruction count by 35%, and its max-rss by 10%.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 11, 2018

The max-rss increase for clap-rs is because of one very large BitMatrix:

  after: 25897 x 24965 = 646518605; 94492819 used (14.61%)

This is 77.5 MiB, and it gets doubled because it gets cloned here:

let mut inferred_values = self.liveness_constraints.clone();

I tried getting rid of that clone -- which would greatly reduce the max-rss increase -- by transferring ownership of the BitMatrix from self.liveness_constraints to self.inferred_values (which required making liveness_constaints an Option<RegionValues>) but it caused test failures -- looks like liveness_constraints is used for error message production after inferred_values is created.

Anyway, even if the clone remains, some benchmarks take more memory but some take less, so it's basically a wash on that front, and the speed improvements are large enough to make this compelling.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 12, 2018

I was able to speed up inflate and clap-rs a bit more by optimizing BitVector::merge some more.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 12, 2018

Hmm, this change will interact poorly with @davidtwco's changes in #52190, because I think that in that context we don't know the number of region variables when we allocate the RegionValues.

We might want a kind of hybrid -- maybe we want to modify DenseMatrix to use an IndexVec<BitSet> instead of one big allocation?

(The family of bitset types also needs a bit of cleanup... this change though might allow us to remove the "buf vs slice" distinction which would simplify things.)

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 12, 2018

@nnethercote note that the final values will probably be affected also by rebasing over #51987, which .. modifies that clone sort of. (The clone is removed, but a variant of it remains.)

We could probably free the liveness matrix at some point, though it wouldn't affect peak memory usage. It would potentially require a bit of work on the diagnostic side.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 12, 2018

Hmm, #51987 also reduces inflate-check's running time dramatically (by 43%). I would not however expect these two to "multiply" -- rather I suspect the benefits of this PR may be subsumed by #51987, since it reduces dramatically the number of sparse matrix merges that we do.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 12, 2018

Probably worth testing, in any case.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 12, 2018

Yes, the benefit here is entirely from making matrix merges faster.

I guess I'll wait until #51987 and #52190 play out and see if this PR still makes sense. This PR has a large effect for a small change, hopefully those two other PRs have as big or bigger effect.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 13, 2018

I just got "try" privileges, so I'm doing to test them in this PR.

@bors try

@bors

This comment has been minimized.

Contributor

bors commented Jul 13, 2018

@nnethercote: 🔑 Insufficient privileges: not in try users

@kennytm

This comment was marked as outdated.

Member

kennytm commented Jul 13, 2018

@bors try

@bors

This comment was marked as outdated.

Contributor

bors commented Jul 13, 2018

⌛️ Trying commit f678d1b with merge 1e03182...

bors added a commit that referenced this pull request Jul 13, 2018

Auto merge of #52250 - nnethercote:no-SparseBitMatrix, r=<try>
Use `BitMatrix` instead of `SparseBitMatrix` in `RegionValues`.

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of `inflate` by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of `clap-rs` by 30% and a couple of others by
up to 5%, while decreasing max-rss of `coercions` by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis
@bors

This comment was marked as outdated.

Contributor

bors commented Jul 13, 2018

💔 Test failed - status-travis

@bors

This comment has been minimized.

Contributor

bors commented Jul 13, 2018

☔️ The latest upstream changes (presumably #51987) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 17, 2018

OK, so #51987 has landed -- @nnethercote do you have thoughts on whether it makes sense to continue with this PR?

@nnethercote nnethercote changed the title from Use `BitMatrix` instead of `SparseBitMatrix` in `RegionValues`. to Speed up `SparseBitMatrix` use in `RegionValues`. Jul 18, 2018

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 18, 2018

Instruction count changes:

html5ever-check
        avg: -33.1%     min: -33.1%     max: -33.1%
clap-rs-check
        avg: -1.7%      min: -1.7%      max: -1.7%
webrender-check
        avg: -1.2%      min: -1.2%      max: -1.2%
sentry-cli-check
        avg: -1.2%      min: -1.2%      max: -1.2%
cargo-check
        avg: -1.2%      min: -1.2%      max: -1.2%
style-servo-check
        avg: -1.1%?     min: -1.1%?     max: -1.1%?
encoding-check
        avg: -1.1%      min: -1.1%      max: -1.1%
regex-check
        avg: -0.9%      min: -0.9%      max: -0.9%
ripgrep-check
        avg: -0.8%      min: -0.8%      max: -0.8%
inflate-check
        avg: -0.6%      min: -0.6%      max: -0.6%
piston-image-check
        avg: -0.6%      min: -0.6%      max: -0.6%
syn-check
        avg: -0.5%      min: -0.5%      max: -0.5%

max-rss changes:

clap-rs-check
        avg: 36.3%      min: 36.3%      max: 36.3%
        nll-check       310,236.00      422,924.00      36.3%
html5ever-check
        avg: -22.2%     min: -22.2%     max: -22.2%
        nll-check       1,934,008.00    1,504,568.00    -22.2%
coercions-check
        avg: -7.5%?     min: -7.5%?     max: -7.5%?
issue-46449-check
        avg: 5.1%       min: 5.1%       max: 5.1%
unify-linearly-check
        avg: 4.2%       min: 4.2%       max: 4.2%
regression-31157-check
        avg: 3.4%       min: 3.4%       max: 3.4%
inflate-check
        avg: 3.3%       min: 3.3%       max: 3.3%
futures-check
        avg: 3.3%       min: 3.3%       max: 3.3%
regex-check
        avg: 2.9%       min: 2.9%       max: 2.9%
deeply-nested-check
        avg: 2.7%       min: 2.7%       max: 2.7%
tokio-webpush-simple-check
        avg: 2.6%       min: 2.6%       max: 2.6%
syn-check
        avg: 2.5%       min: 2.5%       max: 2.5%
encoding-check
        avg: 2.1%       min: 2.1%       max: 2.1%
ripgrep-check
        avg: 2.0%       min: 2.0%       max: 2.0%
helloworld-check
        avg: 1.9%       min: 1.9%       max: 1.9%
unused-warnings-check
        avg: 1.9%       min: 1.9%       max: 1.9%
deep-vector-check
        avg: 1.8%       min: 1.8%       max: 1.8%
piston-image-check
        avg: 1.6%       min: 1.6%       max: 1.6%
serde-check
        avg: 1.5%       min: 1.5%       max: 1.5%
sentry-cli-check
        avg: 1.1%       min: 1.1%       max: 1.1%
webrender-check
        avg: 1.0%       min: 1.0%       max: 1.0%

Relatively speaking, it's a big regression for clap-rs, but note that the improvement for html5ever is much bigger in absolute terms. A bunch of others are moderately worse, too.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 18, 2018

(Note that I messed up the previous comment on my first attempt, and have now corrected the numbers.)

@kennytm

This comment has been minimized.

Member

kennytm commented Jul 18, 2018

Perf is ready. The numbers are similar to #52250 (comment).

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 18, 2018

Hmm, those results look great! One concern though: in the branch I'm working on, I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change. I'm thinking about how to solve this -- one way might be to split up the matrices into pieces. So for example we could store one matrix for points and then a separate matrix for regions.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 18, 2018

I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change.

If the number of rows is growing, it should be fine as is. If the number of columns is growing, then that's different... it should be possible to make the number of columns in SparseBitMatrix extensible, though the added flexibility will likely shrink the size of the wins here.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 19, 2018

It's the number of columns that changes, yes, but I suspect I may be able to finesse it by breaking things into two matrices -- one for "points" and one for "regions" -- and allocating them at separate times. (In other words, we'd wait to allocate the region matrix until we know its proper size.)

To that end, I think we should probably land this PR, and I can try to rebase over it.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 19, 2018

@bors r+

@bors

This comment has been minimized.

Contributor

bors commented Jul 19, 2018

📌 Commit 9bfd1c1 has been approved by nikomatsakis

@bors

This comment has been minimized.

Contributor

bors commented Jul 20, 2018

⌛️ Testing commit 9bfd1c1 with merge 4ebbaa1...

bors added a commit that referenced this pull request Jul 20, 2018

Auto merge of #52250 - nnethercote:no-SparseBitMatrix, r=nikomatsakis
Speed up `SparseBitMatrix` use in `RegionValues`.

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of `inflate` by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of `clap-rs` by 30% and a couple of others by
up to 5%, while decreasing max-rss of `coercions` by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis
@bors

This comment has been minimized.

Contributor

bors commented Jul 20, 2018

💔 Test failed - status-appveyor

Speed up `SparseBitMatrix`.
Using a `BTreeMap` to represent rows in the bit matrix is really slow.
This patch changes things so that each row is represented by a
`BitVector`. This is a less sparse representation, but a much faster
one.

As a result, `SparseBitSet` and `SparseChunk` can be removed.

Other minor changes in this patch.

- It renames `BitVector::insert()` as `merge()`, which matches the
  terminology in the other classes in bitvec.rs.

- It removes `SparseBitMatrix::is_subset()`, which is unused.

- It reinstates `RegionValueElements::num_elements()`, which #52190 had
  removed.

- It removes a low-value `debug!` call in `SparseBitMatrix::add()`.
@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 20, 2018

@bors retry

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 20, 2018

Apparently I have "try" permissions but not "retry" permissions. @nikomatsakis, can you reapprove this? Thanks.

@kennytm

This comment has been minimized.

Member

kennytm commented Jul 20, 2018

@nnethercote retry works only if you haven't pushed anything new. Pushing a new commit does require re-r+.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 21, 2018

@bors delegate=nnethercote

@bors

This comment has been minimized.

Contributor

bors commented Jul 21, 2018

✌️ @nnethercote can now approve this pull request

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 21, 2018

@bors r+

@bors

This comment has been minimized.

Contributor

bors commented Jul 21, 2018

📌 Commit 798209e has been approved by nikomatsakis

@bors

This comment has been minimized.

Contributor

bors commented Jul 22, 2018

⌛️ Testing commit 798209e with merge a57d5d7...

bors added a commit that referenced this pull request Jul 22, 2018

Auto merge of #52250 - nnethercote:no-SparseBitMatrix, r=nikomatsakis
Speed up `SparseBitMatrix` use in `RegionValues`.

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of `inflate` by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of `clap-rs` by 30% and a couple of others by
up to 5%, while decreasing max-rss of `coercions` by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis
@bors

This comment has been minimized.

Contributor

bors commented Jul 22, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing a57d5d7 to master...

@bors bors merged commit 798209e into rust-lang:master Jul 22, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:no-SparseBitMatrix branch Jul 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment