Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Speed up `SparseBitMatrix` use in `RegionValues`. #52250
Conversation
|
Here are the instruction count improvements exceeding 1%:
Here are the max-rss changes exceeding 1%:
|
|
Here are some measurement of how full the BitMatrix instances get, for
Note the very large one for (17) which dominates.
|
|
I just measured |
6556f64
to
63abbed
|
The max-rss increase for
This is 77.5 MiB, and it gets doubled because it gets cloned here: I tried getting rid of that clone -- which would greatly reduce the max-rss increase -- by transferring ownership of the Anyway, even if the clone remains, some benchmarks take more memory but some take less, so it's basically a wash on that front, and the speed improvements are large enough to make this compelling. |
63abbed
to
7786a74
|
I was able to speed up |
|
Hmm, this change will interact poorly with @davidtwco's changes in #52190, because I think that in that context we don't know the number of region variables when we allocate the We might want a kind of hybrid -- maybe we want to modify (The family of bitset types also needs a bit of cleanup... this change though might allow us to remove the "buf vs slice" distinction which would simplify things.) |
|
@nnethercote note that the final values will probably be affected also by rebasing over #51987, which .. modifies that clone sort of. (The clone is removed, but a variant of it remains.) We could probably free the liveness matrix at some point, though it wouldn't affect peak memory usage. It would potentially require a bit of work on the diagnostic side. |
|
Probably worth testing, in any case. |
|
I just got "try" privileges, so I'm doing to test them in this PR. @bors try |
|
@nnethercote: |
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
Use `BitMatrix` instead of `SparseBitMatrix` in `RegionValues`. In practice, these matrices range from 10% to 90%+ full once they are filled in, so the dense representation is better. This reduces the runtime of Check Nll builds of `inflate` by 32%, and several other benchmarks by 1--5%. It also increases max-rss of `clap-rs` by 30% and a couple of others by up to 5%, while decreasing max-rss of `coercions` by 14%. I think the speed-ups justify the max-rss increases. r? @nikomatsakis
This comment has been hidden.
This comment has been hidden.
|
|
|
OK, so #51987 has landed -- @nnethercote do you have thoughts on whether it makes sense to continue with this PR? |
|
Hmm, those results look great! One concern though: in the branch I'm working on, I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change. I'm thinking about how to solve this -- one way might be to split up the matrices into pieces. So for example we could store one matrix for points and then a separate matrix for regions. |
If the number of rows is growing, it should be fine as is. If the number of columns is growing, then that's different... it should be possible to make the number of columns in |
|
It's the number of columns that changes, yes, but I suspect I may be able to finesse it by breaking things into two matrices -- one for "points" and one for "regions" -- and allocating them at separate times. (In other words, we'd wait to allocate the region matrix until we know its proper size.) To that end, I think we should probably land this PR, and I can try to rebase over it. |
|
@bors r+ |
|
|
Speed up `SparseBitMatrix` use in `RegionValues`. In practice, these matrices range from 10% to 90%+ full once they are filled in, so the dense representation is better. This reduces the runtime of Check Nll builds of `inflate` by 32%, and several other benchmarks by 1--5%. It also increases max-rss of `clap-rs` by 30% and a couple of others by up to 5%, while decreasing max-rss of `coercions` by 14%. I think the speed-ups justify the max-rss increases. r? @nikomatsakis
|
|
Using a `BTreeMap` to represent rows in the bit matrix is really slow. This patch changes things so that each row is represented by a `BitVector`. This is a less sparse representation, but a much faster one. As a result, `SparseBitSet` and `SparseChunk` can be removed. Other minor changes in this patch. - It renames `BitVector::insert()` as `merge()`, which matches the terminology in the other classes in bitvec.rs. - It removes `SparseBitMatrix::is_subset()`, which is unused. - It reinstates `RegionValueElements::num_elements()`, which #52190 had removed. - It removes a low-value `debug!` call in `SparseBitMatrix::add()`.
9bfd1c1
to
798209e
|
@bors retry |
|
Apparently I have "try" permissions but not "retry" permissions. @nikomatsakis, can you reapprove this? Thanks. |
|
@nnethercote retry works only if you haven't pushed anything new. Pushing a new commit does require re-r+. |
|
@bors delegate=nnethercote |
|
|
|
@bors r+ |
|
|
Speed up `SparseBitMatrix` use in `RegionValues`. In practice, these matrices range from 10% to 90%+ full once they are filled in, so the dense representation is better. This reduces the runtime of Check Nll builds of `inflate` by 32%, and several other benchmarks by 1--5%. It also increases max-rss of `clap-rs` by 30% and a couple of others by up to 5%, while decreasing max-rss of `coercions` by 14%. I think the speed-ups justify the max-rss increases. r? @nikomatsakis
|
|
In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.
This reduces the runtime of Check Nll builds of
inflateby 32%, andseveral other benchmarks by 1--5%.
It also increases max-rss of
clap-rsby 30% and a couple of others byup to 5%, while decreasing max-rss of
coercionsby 14%. I think thespeed-ups justify the max-rss increases.
r? @nikomatsakis