Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve worst-case performance of BTreeSet intersection v3 #59186

Merged

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Mar 14, 2019

Variation of #59078 with Intersection remaining a struct

r? @scottmcm

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 14, 2019
@KodrAus
Copy link
Contributor

KodrAus commented Mar 19, 2019

r? @KodrAus

@rust-highfive rust-highfive assigned KodrAus and unassigned scottmcm Mar 19, 2019
@KodrAus
Copy link
Contributor

KodrAus commented Mar 19, 2019

Hi @ssomers 👋

Would you prefer to close #58577 and #59078 and look at this PR instead?

I haven't had a chance to look at what's changed since the changeset I approved, could you summarize the differences? It looks like this one does keep the enum for the strategy private which I think is an improvement on the v2 implementation.

@ssomers
Copy link
Contributor Author

ssomers commented Mar 19, 2019

Sure. The difference with the approved state is:

  • struct Intersection has both fields replaced by the enum, even though both implementations share the same iterator. I think it makes the code more readable and it simplifies changing/adding algorithms that don't need that field.
  • The gut of both implementations is exposed (but hidden, again) as unstable feature, to allow the benchmark to compare them.
  • Thus new benchmarks contrast the performance of both implementations with the actual implementation chosen by the size rule.
  • The size rule favors the new implementation much more than before, but still less than looks optimal (on my system, for this benchmark, with these kind of elements). And if one of the sets is empty, the other set's iterator isn't even constructed.
  • Sets are always ordered by size, because it simplifies code and because size_hint already had to choose the small set.
  • The classic implementation no longer relies on Peekable (which greatly helps for union and difference, but not here).

PS and yes, I prefer to close the other two PRs.

} else {
(other, self)
};
if a_set.len() > b_set.len() / 16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth leaving a comment here about what this branch is for and why we use the constant 16 to decide which strategy to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment left, but it seems you have to find it yourself.

}
}
}
}

fn size_hint(&self) -> (usize, Option<usize>) {
(0, Some(min(self.a.len(), self.b.len())))
let max_size = match &self.inner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be min_size, right?

Copy link
Contributor Author

@ssomers ssomers Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had I taken the time to look up the doc of size_hint, I would have called it upper_bound. It's the "min" of the input sets, and "max" of the size_hint, so I'm not a fan of min_size either. How about min_len?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_len is ok with me 👍

b_iter: Iter<'a, T>,
},
Search {
a_iter: Iter<'a, T>, // for size_hint, should be the smaller of the sets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the implementation depends on a_iter being smaller I think we should choose more descriptive names here for a and b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I hate the name a because it's intermingled with the same lifetime identifier, but didn't dare to change them. I'll go for small and large in the Search case, small and other in the Stitch case, unless you object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good!

@@ -1,5 +1,6 @@
#![feature(repr_simd)]
#![feature(test)]
#![feature(benches_btree_set)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging I think we should remove this feature and make the methods private (or just inline them). It's nice to get an idea of the performance characteristics of each strategy, but once we understand those I think we can move forward with just the general benchmarks.

Copy link
Contributor Author

@ssomers ssomers Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we only understand the performance characteristics:

  • on 1 platform and 1 machine (I could build it on Linux on another machine though, but nothing modern),
  • for one generation of the BTreeSet implementation. I see significant improvement from stable to nightly in my BTreeSet macro-benchmarks, tens of percents, with the same intersection implementation.

What alternatives are there for the feature litter?

  • Moving the benchmarks closer to the lib code is terrible: every tweak or comment requires over an hour to build.
  • Duplicating the lib code near/in the benchmark file is doable. You should notice if it's not up to date anymore when you check that the actual performance matches that of either implementation.
  • Or similarly, duplicating the lib code and benchmarks in some separate repository (like I already did in https://github.com/ssomers/Bron-Kerbosch/blob/master/rust/bron_kerbosch/src/util.rs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssomers I think duplicating the code in a separate repository is the best way to go here 👍 I believe we can run benchmarks in CI here (I don't remember the command off the top of my head) so we can worry about this later.

@ssomers
Copy link
Contributor Author

ssomers commented Mar 21, 2019

I made the benchmarks more fine-grained with set sizes and plotted the results (still only one machine). I think this highly suggests that the ideal strategy rule is much more complicated (not just logarithmic with size), and if that rule doesn't spend the performance it gains on evaluating itself, it could be greatly off on a system with different word size, cache sizes and architecture.

But it also confirms that factor 16 seems quite reasonable. To prevent the <30% performance hit for intersection of particularly crafted large sets, it would have to be 19. From the viewpoint of random sets, factor 16 should be lowered instead.

But if I commit all 146 benchmarks I have now, it's simply annoying for someone casually checking performance in general. So I'm becoming convinced that it's better to move this case study over to a separate repository, and leave in this PR only the final rule, and the few benchmarks already merged in earlier (or less), and thus nothing exposed as public unstable. Checking the rust source code, there are other references to github repositories besides rust-lang so I can probably just create one myself.

PS the <30% performance hit is actually closer to 15%, apparently thanks to no longer using Peekable

ssomers added a commit to ssomers/rust_bench_btreeset_intersection that referenced this pull request Mar 21, 2019
Copy link
Contributor Author

@ssomers ssomers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the new comment (cause it seems rather difficult to find in the previous comment)

Oh well, this seems worse and can't be deleted.

@ssomers
Copy link
Contributor Author

ssomers commented Mar 24, 2019

I configured Travis on the separate repository and the readings on the linux build on that (virtual) machine are quite similar (with much less stability, as one would expect). The ideal factor is < 1 higher accross the range of sizes.

@KodrAus
Copy link
Contributor

KodrAus commented Mar 25, 2019

Thanks for all your investigation work @ssomers!

Just for good measure, let's add another test case to liballoc::tests::btree::set::test_intersection that checks intersection of a very large set with a very small one so we can be sure it'll be using the other strategy. Something like:

check_intersection(&[11, 5000, 1, 3, 77, 8924, 103],
    &(0..1000).collect::<Vec<_>>(),
    &[11, 1, 3, 77, 103]);

@ssomers
Copy link
Contributor Author

ssomers commented Mar 26, 2019

I thought the test cases in the first PR were still there and they didn't even include non-subset samples. Rest assured, the proptest in the separate repository covered everything.

So are we there now? Nope, I just realized that is_subset/superset is really quite similar. I'll keep that out of this PR, but it might mean that the 16 becomes a named constant.

PS change of plan: it wasn't that much work, that revealed the comments in intersection weren't accurate, and the benchmarks clunky, so I committed it here anyway

@ssomers
Copy link
Contributor Author

ssomers commented Mar 27, 2019

I cooked up a similar change to the implementation of set difference. It only needs half of the peekables, and it benefits from the same performance boost as with intersection if the right hand set is huge. I cooked up similar code resulting in:

before:

test btree::set::difference_random_100_vs_100            ... bench:         914 ns/iter (+/- 27)
test btree::set::difference_random_100_vs_10k            ... bench:      45,641 ns/iter (+/- 605)
test btree::set::difference_random_10k_vs_100            ... bench:      63,561 ns/iter (+/- 512)
test btree::set::difference_random_10k_vs_10k            ... bench:     205,747 ns/iter (+/- 5,792)
test btree::set::difference_staggered_100_vs_100         ... bench:         989 ns/iter (+/- 19)
test btree::set::difference_staggered_100_vs_10k         ... bench:      35,919 ns/iter (+/- 418)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      93,716 ns/iter (+/- 1,358)

after:

test btree::set::difference_random_100_vs_100            ... bench:         829 ns/iter (+/- 14)
test btree::set::difference_random_100_vs_10k            ... bench:       2,871 ns/iter (+/- 112)
test btree::set::difference_random_10k_vs_100            ... bench:      60,580 ns/iter (+/- 675)
test btree::set::difference_random_10k_vs_10k            ... bench:     188,386 ns/iter (+/- 1,727)
test btree::set::difference_staggered_100_vs_100         ... bench:         863 ns/iter (+/- 21)
test btree::set::difference_staggered_100_vs_10k         ... bench:       2,673 ns/iter (+/- 70)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      83,875 ns/iter (+/- 2,977)

Do you want me to commit it here or later? (or not at all...)

@KodrAus
Copy link
Contributor

KodrAus commented Mar 27, 2019

This is looking great! Thanks for giving these methods some TLC @ssomers.

Do you want me to commit it here or later? (or not at all...)

Yeh I think we can roll difference into this PR as well while we're working on these set operations. I'm happy with the implementation of intersection now.

@ssomers
Copy link
Contributor Author

ssomers commented Mar 28, 2019

I meant "push" instead of "commit", but github lists the commits according to time committed locally anyway. Confusing...

Anyways, it's all here now, and nothing changed to the implementation of intersection.

@ssomers
Copy link
Contributor Author

ssomers commented Mar 29, 2019

For future reference: if you wonder why we have to tediously implement clone for Difference and for Intersection, while BTreeSet itself gets away with an easy derive(Clone), it's because of #26925. BTreeSet doesn't have clone unless T has it, and that makes all the sense in the world. It doesn't make much sense for Difference and Intersection, because they hand out references to T.

@KodrAus
Copy link
Contributor

KodrAus commented Mar 29, 2019

Alrighty, this PR introduces some additional complexity to BTreeSet, but is much more efficient when performing set operations between a large and small set, which I think is a reasonable case. There's no change in the public API. So let's merge this one in!

@bors r+

@bors
Copy link
Contributor

bors commented Mar 29, 2019

📌 Commit bb7bf9b8ea66d72a7e29c6c7a37ddaa8924ef62f has been approved by KodrAus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 29, 2019
@KodrAus
Copy link
Contributor

KodrAus commented Mar 29, 2019

@bors r-

Sorry, @ssomers do you mind squashing down some of that git history so we don't have a merge commit in there? I didn't see that one before.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 29, 2019
@ssomers
Copy link
Contributor Author

ssomers commented Mar 29, 2019

I don't mind, but it's going to take a while to figure out how.

@ssomers ssomers force-pushed the btreeset_intersection_revisited_again branch from bb7bf9b to f5fee8f Compare March 29, 2019 11:20
@KodrAus
Copy link
Contributor

KodrAus commented Mar 29, 2019

@bors r+

@bors
Copy link
Contributor

bors commented Mar 29, 2019

📌 Commit f5fee8f has been approved by KodrAus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2019
@bors
Copy link
Contributor

bors commented Mar 30, 2019

⌛ Testing commit f5fee8f with merge e0e27d75ee5c9618f834391bc06355d848dfc2d7...

@bors
Copy link
Contributor

bors commented Mar 30, 2019

💔 Test failed - checks-travis

@rust-highfive
Copy link
Collaborator

The job dist-x86_64-netbsd of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:52:59] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-netbsd" "-Zdual-proc-macros" "-j" "4" "--release" "--locked" "--color" "always" "--manifest-path" "/checkout/src/tools/miri/Cargo.toml" "--features" "rustc-workspace-hack/all-static" "--message-format" "json"
[01:52:59] expected success, got: exit code: 101
[01:52:59] [TIMING] ToolBuild { compiler: Compiler { stage: 2, host: "x86_64-unknown-linux-gnu" }, target: "x86_64-unknown-netbsd", tool: "miri", path: "src/tools/miri", mode: ToolRustc, is_optional_tool: true, source_type: Submodule, extra_features: [] } -- 18.310
[01:52:59] Unable to build miri, skipping dist
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 30, 2019
@Centril Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 30, 2019
Centril added a commit to Centril/rust that referenced this pull request Apr 2, 2019
…ited_again, r=KodrAus

improve worst-case performance of BTreeSet intersection v3

Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct

r? @scottmcm
Centril added a commit to Centril/rust that referenced this pull request Apr 3, 2019
…ited_again, r=KodrAus

improve worst-case performance of BTreeSet intersection v3

Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct

r? @scottmcm
bors added a commit that referenced this pull request Apr 3, 2019
Rollup of 4 pull requests

Successful merges:

 - #55448 (Add 'partition_at_index/_by/_by_key' for slices.)
 - #59186 (improve worst-case performance of BTreeSet intersection v3)
 - #59514 (Remove adt_def from projections and downcasts in MIR)
 - #59630 (Shrink `mir::Statement`.)

Failed merges:

r? @ghost
@bors bors merged commit f5fee8f into rust-lang:master Apr 3, 2019
Centril added a commit to Centril/rust that referenced this pull request Apr 5, 2019
improve worst-case performance of HashSet.is_subset

One more simple optimization opportunity for HashSet that was applied in BTreeSet in rust-lang#59186 (and wasn't in rust-lang#57043). Already covered by the existing unit test.

r? @KodrAus
@ssomers ssomers deleted the btreeset_intersection_revisited_again branch April 6, 2019 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants