Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Speed up NLL with HybridIdxSetBuf. #53383
Conversation
|
(rust_highfive has picked a reviewer for you, use r? to override) |
|
Instruction count results for NLL-check builds:
I don't know why Some notable max-rss results:
I'd love to hear suggestions on how to improve the code under the |
|
@bors try |
Introduce and use HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
|
| HybridIdxSetBuf::Sparse(sparse, _) => self.subtract_sparse(sparse), | ||
| HybridIdxSetBuf::Dense(dense, _) => self.subtract(dense), | ||
| } | ||
| } |
Mark-Simulacrum
Aug 15, 2018
Member
I'm wondering if it's a good idea to have a trait instead of different methods?
I'm wondering if it's a good idea to have a trait instead of different methods?
nnethercote
Aug 15, 2018
Author
Contributor
What would the trait look like? I'm having trouble imagining it.
What would the trait look like? I'm having trouble imagining it.
nikomatsakis
Aug 16, 2018
Contributor
A trait would read nicer, for sure, but we could leave it to follow-up. I imagine it would be something like:
trait SubtractFrom<T> {
fn subtract_from(&self, target: &mut IdxSetBuf<T>) -> bool;
}
and then you would have
impl<T> IdxSetBuf<T> {
fn subtract(&mut self, other: &impl SubtractFrom<T>) -> bool {
other.subtract_from(self)
}
}
and then
impl<T> SubtractFrom<T> for IdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for HybridIdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for SparseIdxSetBuf<T> { .. }
A trait would read nicer, for sure, but we could leave it to follow-up. I imagine it would be something like:
trait SubtractFrom<T> {
fn subtract_from(&self, target: &mut IdxSetBuf<T>) -> bool;
}and then you would have
impl<T> IdxSetBuf<T> {
fn subtract(&mut self, other: &impl SubtractFrom<T>) -> bool {
other.subtract_from(self)
}
}and then
impl<T> SubtractFrom<T> for IdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for HybridIdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for SparseIdxSetBuf<T> { .. }
Mark-Simulacrum
Aug 16, 2018
Member
I think we can leave as-is for now and maybe clean it up a little later. I'm not actually sure that the trait would be a win for readability anyway.
I think we can leave as-is for now and maybe clean it up a little later. I'm not actually sure that the trait would be a win for readability anyway.
| } | ||
| } | ||
|
|
||
| pub struct SparseIter<'a, T: Idx> { |
Mark-Simulacrum
Aug 15, 2018
Member
We should probably instead use impl Trait and pass through the various iterator traits; DoubleEnded, TrustedLen, etc. Otherwise we're losing performance here...
We should at least be able to implement size_hint, count, and nth.
We should probably instead use impl Trait and pass through the various iterator traits; DoubleEnded, TrustedLen, etc. Otherwise we're losing performance here...
We should at least be able to implement size_hint, count, and nth.
nnethercote
Aug 15, 2018
Author
Contributor
Can you point at some existing code that does similar things? I'm not sure what this would look like. Thanks.
Can you point at some existing code that does similar things? I'm not sure what this would look like. Thanks.
nnethercote
Aug 16, 2018
Author
Contributor
Oh, FWIW, I just copied the code in the existing Iter for IdxSetBuf. So if there is a genuine win to be had here -- and profiling doesn't indicate so, at least for the moment -- it should be changed as well. Perhaps that could also be a follow-up.
Oh, FWIW, I just copied the code in the existing Iter for IdxSetBuf. So if there is a genuine win to be had here -- and profiling doesn't indicate so, at least for the moment -- it should be changed as well. Perhaps that could also be a follow-up.
| mem::replace(self, HybridIdxSetBuf::Dense(dense, universe_size)); | ||
| changed | ||
| } | ||
| _ => panic!("impossible"), |
Mark-Simulacrum
Aug 15, 2018
Member
This should be an unreachable! I think.
This should be an unreachable! I think.
nnethercote
Aug 16, 2018
Author
Contributor
Ok, but ideally someone would point out a way to rewrite the code so this isn't necessary :)
Ok, but ideally someone would point out a way to rewrite the code so this isn't necessary :)
ljedrz
Aug 16, 2018
Contributor
if let HybridIdxSetBuf::Sparse(sparse, universe_size) = mem::replace(self, dummy)? It doesn't require an else branch.
if let HybridIdxSetBuf::Sparse(sparse, universe_size) = mem::replace(self, dummy)? It doesn't require an else branch.
nnethercote
Aug 16, 2018
Author
Contributor
True, but I find that misleading -- it's strange to have an if let that cannot fail. The match makes the cannot-fail-ness more obvious.
True, but I find that misleading -- it's strange to have an if let that cannot fail. The match makes the cannot-fail-ness more obvious.
| } | ||
|
|
||
| /// Removes `elem` from the set `self`. | ||
| pub fn remove(&mut self, elem: &T) -> bool { |
Mark-Simulacrum
Aug 15, 2018
Member
No reason to take by reference here since I'd is Copy
No reason to take by reference here since I'd is Copy
nnethercote
Aug 16, 2018
Author
Contributor
I just copied the existing method in IdxSetBuf, so I will leave this alone. Converting all such methods in this file could be done as a follow-up.
I just copied the existing method in IdxSetBuf, so I will leave this alone. Converting all such methods in this file could be done as a follow-up.
|
@rust-timer build 496278c |
|
Success: Queued 496278c with parent 81cfaad, comparison URL. |
|
@rust-timer build 496278c |
|
Success: Queued 496278c with parent 81cfaad, comparison URL. |
1 similar comment
|
Success: Queued 496278c with parent 81cfaad, comparison URL. |
Here is the full Cachegrind diff showing the difference between the two versions:
That is an amazingly small diff. The slowdown is entirely due to more hash table operations underneath @RalfJung, do you have any idea what might be going on? |
|
Perf results are in, and they're even better than what I saw locally. Notable instruction count results (look at the
Things to note:
Notable max-rss results:
These are similar to my local measurements. There are also these max-rss results:
These are not |
|
The specific ICE is this:
That doesn't mean much to me, alas. |
|
https://github.com/rust-lang/rust/blob/master/src/libcore/iter/mod.rs#L513 is a sample of the forwarding needed. I wish there was a better way, but I'm unaware of it. |
|
@nnethercote I can take a look at the ICE, though prob not for a day or two |
|
@nnethercote also, those perf stats look great! |
|
I worked out the ICE by incrementally reverting parts of the patch. The problem is in |
`HybridIdxSetBuf` is a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large `universe_size` values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces the execution time of the five slowest NLL benchmarks by 55%, 21%, 16%, 10% and 9%. It also reduces the max-rss of three benchmarks by 53%, 33%, and 9%.
00e7f62
to
5745597
|
New version fixes the test failures. I also did another perf run locally, and |
|
@bors try |
Introduce and use HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
|
|
@bors retry |
Speed up NLL with HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
|
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
@bors retry |
Speed up NLL with HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
|
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
@bors retry |
Speed up NLL with HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
|
|
The top 5 on the NLL dashboard now looks like this:
And |
|
@Mark-Simulacrum: hmm, any idea why ripgrep is showing up above sentry-cli even though its ratio is lower? A few of the others lower down the list are out of order too. |
|
No, that shouldn't be happening. Could you file an issue? |
It's a sparse-when-small but dense-when-large index set that is very
efficient for sets that (a) have few elements, (b) have large
universe_size values, and (c) are cleared frequently. Which makes it
perfect for the
gen_setandkill_setsets used by the new borrowchecker.
This patch reduces
tuple-stress's NLL-check time by 40%, and up to 12%for several other benchmarks. And it halves the max-rss for
keccak,and has smaller wins for
inflateandclap-rs.