improve worst-case performance of BTreeSet intersection v3 #59186

ssomers · 2019-03-14T16:46:34Z

Variation of #59078 with Intersection remaining a struct

KodrAus · 2019-03-19T03:14:58Z

KodrAus · 2019-03-19T03:17:58Z

Would you prefer to close #58577 and #59078 and look at this PR instead?

I haven't had a chance to look at what's changed since the changeset I approved, could you summarize the differences? It looks like this one does keep the enum for the strategy private which I think is an improvement on the v2 implementation.

ssomers · 2019-03-19T11:10:40Z

Sure. The difference with the approved state is:

struct Intersection has both fields replaced by the enum, even though both implementations share the same iterator. I think it makes the code more readable and it simplifies changing/adding algorithms that don't need that field.
The gut of both implementations is exposed (but hidden, again) as unstable feature, to allow the benchmark to compare them.
Thus new benchmarks contrast the performance of both implementations with the actual implementation chosen by the size rule.
The size rule favors the new implementation much more than before, but still less than looks optimal (on my system, for this benchmark, with these kind of elements). And if one of the sets is empty, the other set's iterator isn't even constructed.
Sets are always ordered by size, because it simplifies code and because size_hint already had to choose the small set.
The classic implementation no longer relies on Peekable (which greatly helps for union and difference, but not here).

PS and yes, I prefer to close the other two PRs.

KodrAus · 2019-03-20T22:21:14Z

src/liballoc/collections/btree/set.rs

+        } else {
+            (other, self)
+        };
+        if a_set.len() > b_set.len() / 16 {


Might be worth leaving a comment here about what this branch is for and why we use the constant 16 to decide which strategy to use.

Comment left, but it seems you have to find it yourself.

KodrAus · 2019-03-20T22:25:12Z

src/liballoc/collections/btree/set.rs

                }
            }
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
-        (0, Some(min(self.a.len(), self.b.len())))
+        let max_size = match &self.inner {


This would actually be min_size, right?

Had I taken the time to look up the doc of size_hint, I would have called it upper_bound. It's the "min" of the input sets, and "max" of the size_hint, so I'm not a fan of min_size either. How about min_len?

min_len is ok with me 👍

KodrAus · 2019-03-20T22:28:58Z

src/liballoc/collections/btree/set.rs

+        b_iter: Iter<'a, T>,
+    },
+    Search {
+        a_iter: Iter<'a, T>, // for size_hint, should be the smaller of the sets


Since the implementation depends on a_iter being smaller I think we should choose more descriptive names here for a and b.

Frankly, I hate the name a because it's intermingled with the same lifetime identifier, but didn't dare to change them. I'll go for small and large in the Search case, small and other in the Stitch case, unless you object.

That sounds good!

KodrAus · 2019-03-20T22:30:04Z

src/liballoc/benches/lib.rs

@@ -1,5 +1,6 @@
 #![feature(repr_simd)]
 #![feature(test)]
+#![feature(benches_btree_set)]


Before merging I think we should remove this feature and make the methods private (or just inline them). It's nice to get an idea of the performance characteristics of each strategy, but once we understand those I think we can move forward with just the general benchmarks.

But we only understand the performance characteristics:

on 1 platform and 1 machine (I could build it on Linux on another machine though, but nothing modern),

for one generation of the BTreeSet implementation. I see significant improvement from stable to nightly in my BTreeSet macro-benchmarks, tens of percents, with the same intersection implementation.

What alternatives are there for the feature litter?

Moving the benchmarks closer to the lib code is terrible: every tweak or comment requires over an hour to build.

Duplicating the lib code near/in the benchmark file is doable. You should notice if it's not up to date anymore when you check that the actual performance matches that of either implementation.

Or similarly, duplicating the lib code and benchmarks in some separate repository (like I already did in https://github.com/ssomers/Bron-Kerbosch/blob/master/rust/bron_kerbosch/src/util.rs).

@ssomers I think duplicating the code in a separate repository is the best way to go here 👍 I believe we can run benchmarks in CI here (I don't remember the command off the top of my head) so we can worry about this later.

ssomers · 2019-03-21T11:57:34Z

I made the benchmarks more fine-grained with set sizes and plotted the results (still only one machine). I think this highly suggests that the ideal strategy rule is much more complicated (not just logarithmic with size), and if that rule doesn't spend the performance it gains on evaluating itself, it could be greatly off on a system with different word size, cache sizes and architecture.

But it also confirms that factor 16 seems quite reasonable. To prevent the <30% performance hit for intersection of particularly crafted large sets, it would have to be 19. From the viewpoint of random sets, factor 16 should be lowered instead.

But if I commit all 146 benchmarks I have now, it's simply annoying for someone casually checking performance in general. So I'm becoming convinced that it's better to move this case study over to a separate repository, and leave in this PR only the final rule, and the few benchmarks already merged in earlier (or less), and thus nothing exposed as public unstable. Checking the rust source code, there are other references to github repositories besides rust-lang so I can probably just create one myself.

PS the <30% performance hit is actually closer to 15%, apparently thanks to no longer using Peekable

ssomers

Here's the new comment (cause it seems rather difficult to find in the previous comment)

Oh well, this seems worse and can't be deleted.

ssomers · 2019-03-24T16:22:23Z

I configured Travis on the separate repository and the readings on the linux build on that (virtual) machine are quite similar (with much less stability, as one would expect). The ideal factor is < 1 higher accross the range of sizes.

KodrAus · 2019-03-25T23:41:25Z

Thanks for all your investigation work @ssomers!

Just for good measure, let's add another test case to liballoc::tests::btree::set::test_intersection that checks intersection of a very large set with a very small one so we can be sure it'll be using the other strategy. Something like:

check_intersection(&[11, 5000, 1, 3, 77, 8924, 103],
    &(0..1000).collect::<Vec<_>>(),
    &[11, 1, 3, 77, 103]);

ssomers · 2019-03-26T10:07:26Z

I thought the test cases in the first PR were still there and they didn't even include non-subset samples. Rest assured, the proptest in the separate repository covered everything.

So are we there now? Nope, I just realized that is_subset/superset is really quite similar. I'll keep that out of this PR, but it might mean that the 16 becomes a named constant.

PS change of plan: it wasn't that much work, that revealed the comments in intersection weren't accurate, and the benchmarks clunky, so I committed it here anyway

ssomers · 2019-03-27T12:27:36Z

I cooked up a similar change to the implementation of set difference. It only needs half of the peekables, and it benefits from the same performance boost as with intersection if the right hand set is huge. I cooked up similar code resulting in:

before:

test btree::set::difference_random_100_vs_100            ... bench:         914 ns/iter (+/- 27)
test btree::set::difference_random_100_vs_10k            ... bench:      45,641 ns/iter (+/- 605)
test btree::set::difference_random_10k_vs_100            ... bench:      63,561 ns/iter (+/- 512)
test btree::set::difference_random_10k_vs_10k            ... bench:     205,747 ns/iter (+/- 5,792)
test btree::set::difference_staggered_100_vs_100         ... bench:         989 ns/iter (+/- 19)
test btree::set::difference_staggered_100_vs_10k         ... bench:      35,919 ns/iter (+/- 418)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      93,716 ns/iter (+/- 1,358)

after:

test btree::set::difference_random_100_vs_100            ... bench:         829 ns/iter (+/- 14)
test btree::set::difference_random_100_vs_10k            ... bench:       2,871 ns/iter (+/- 112)
test btree::set::difference_random_10k_vs_100            ... bench:      60,580 ns/iter (+/- 675)
test btree::set::difference_random_10k_vs_10k            ... bench:     188,386 ns/iter (+/- 1,727)
test btree::set::difference_staggered_100_vs_100         ... bench:         863 ns/iter (+/- 21)
test btree::set::difference_staggered_100_vs_10k         ... bench:       2,673 ns/iter (+/- 70)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      83,875 ns/iter (+/- 2,977)

Do you want me to commit it here or later? (or not at all...)

KodrAus · 2019-03-27T22:32:55Z

This is looking great! Thanks for giving these methods some TLC @ssomers.

Do you want me to commit it here or later? (or not at all...)

Yeh I think we can roll difference into this PR as well while we're working on these set operations. I'm happy with the implementation of intersection now.

ssomers · 2019-03-28T11:33:54Z

I meant "push" instead of "commit", but github lists the commits according to time committed locally anyway. Confusing...

Anyways, it's all here now, and nothing changed to the implementation of intersection.

ssomers · 2019-03-29T10:09:10Z

For future reference: if you wonder why we have to tediously implement clone for Difference and for Intersection, while BTreeSet itself gets away with an easy derive(Clone), it's because of #26925. BTreeSet doesn't have clone unless T has it, and that makes all the sense in the world. It doesn't make much sense for Difference and Intersection, because they hand out references to T.

KodrAus · 2019-03-29T10:46:51Z

Alrighty, this PR introduces some additional complexity to BTreeSet, but is much more efficient when performing set operations between a large and small set, which I think is a reasonable case. There's no change in the public API. So let's merge this one in!

@bors r+

bors · 2019-03-29T10:46:52Z

📌 Commit bb7bf9b8ea66d72a7e29c6c7a37ddaa8924ef62f has been approved by KodrAus

KodrAus · 2019-03-29T10:47:56Z

@bors r-

Sorry, @ssomers do you mind squashing down some of that git history so we don't have a merge commit in there? I didn't see that one before.

ssomers · 2019-03-29T11:09:45Z

I don't mind, but it's going to take a while to figure out how.

KodrAus · 2019-03-29T22:55:00Z

@bors r+

bors · 2019-03-29T22:55:01Z

📌 Commit f5fee8f has been approved by KodrAus

bors · 2019-03-30T06:06:09Z

⌛ Testing commit f5fee8f with merge e0e27d75ee5c9618f834391bc06355d848dfc2d7...

bors · 2019-03-30T08:32:07Z

💔 Test failed - checks-travis

rust-highfive · 2019-03-30T08:32:08Z

The job dist-x86_64-netbsd of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[01:52:59] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-netbsd" "-Zdual-proc-macros" "-j" "4" "--release" "--locked" "--color" "always" "--manifest-path" "/checkout/src/tools/miri/Cargo.toml" "--features" "rustc-workspace-hack/all-static" "--message-format" "json"
[01:52:59] expected success, got: exit code: 101
[01:52:59] [TIMING] ToolBuild { compiler: Compiler { stage: 2, host: "x86_64-unknown-linux-gnu" }, target: "x86_64-unknown-netbsd", tool: "miri", path: "src/tools/miri", mode: ToolRustc, is_optional_tool: true, source_type: Submodule, extra_features: [] } -- 18.310
[01:52:59] Unable to build miri, skipping dist
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@scottmcm

…ited_again, r=KodrAus improve worst-case performance of BTreeSet intersection v3 Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct r? @scottmcm

@scottmcm

…ited_again, r=KodrAus improve worst-case performance of BTreeSet intersection v3 Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct r? @scottmcm

@ghost

Rollup of 4 pull requests Successful merges: - #55448 (Add 'partition_at_index/_by/_by_key' for slices.) - #59186 (improve worst-case performance of BTreeSet intersection v3) - #59514 (Remove adt_def from projections and downcasts in MIR) - #59630 (Shrink `mir::Statement`.) Failed merges: r? @ghost

@KodrAus

improve worst-case performance of HashSet.is_subset One more simple optimization opportunity for HashSet that was applied in BTreeSet in rust-lang#59186 (and wasn't in rust-lang#57043). Already covered by the existing unit test. r? @KodrAus

rust-highfive assigned scottmcm Mar 14, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 14, 2019

rust-highfive assigned KodrAus and unassigned scottmcm Mar 19, 2019

KodrAus reviewed Mar 20, 2019

View reviewed changes

ssomers added a commit to ssomers/rust_bench_btreeset_intersection that referenced this pull request Mar 21, 2019

copy and adapt code and benchmarks from rust-lang/rust#59186

a1f7f3b

ssomers commented Mar 22, 2019

View reviewed changes

ssomers mentioned this pull request Mar 29, 2019

improve worst-case performance of BTreeSet intersection v1 #58577

Closed

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 29, 2019

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 29, 2019

improve worst-case performance of BTreeSet difference and intersection

f5fee8f

ssomers force-pushed the btreeset_intersection_revisited_again branch from bb7bf9b to f5fee8f Compare March 29, 2019 11:20

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2019

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 30, 2019

Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 30, 2019

Centril mentioned this pull request Apr 2, 2019

Rollup of 5 pull requests #59653

Closed

Centril mentioned this pull request Apr 3, 2019

Rollup of 4 pull requests #59657

Merged

bors merged commit f5fee8f into rust-lang:master Apr 3, 2019

ssomers mentioned this pull request Apr 3, 2019

improve worst-case performance of HashSet.is_subset #59665

Merged

ssomers deleted the btreeset_intersection_revisited_again branch April 6, 2019 08:14

ssomers mentioned this pull request Oct 20, 2020

btree: merge the implementations of MergeIter #78015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve worst-case performance of BTreeSet intersection v3 #59186

improve worst-case performance of BTreeSet intersection v3 #59186

ssomers commented Mar 14, 2019

KodrAus commented Mar 19, 2019

KodrAus commented Mar 19, 2019

ssomers commented Mar 19, 2019 •

edited

Loading

KodrAus Mar 20, 2019

ssomers Mar 22, 2019

KodrAus Mar 20, 2019

ssomers Mar 20, 2019 •

edited

Loading

KodrAus Mar 25, 2019

KodrAus Mar 20, 2019

ssomers Mar 20, 2019

KodrAus Mar 25, 2019

KodrAus Mar 20, 2019

ssomers Mar 20, 2019 •

edited

Loading

KodrAus Mar 22, 2019

ssomers commented Mar 21, 2019 •

edited

Loading

ssomers left a comment •

edited

Loading

ssomers commented Mar 24, 2019

KodrAus commented Mar 25, 2019

ssomers commented Mar 26, 2019 •

edited

Loading

ssomers commented Mar 27, 2019

KodrAus commented Mar 27, 2019

ssomers commented Mar 28, 2019

ssomers commented Mar 29, 2019

KodrAus commented Mar 29, 2019

bors commented Mar 29, 2019

KodrAus commented Mar 29, 2019

ssomers commented Mar 29, 2019

KodrAus commented Mar 29, 2019

bors commented Mar 29, 2019

bors commented Mar 30, 2019

bors commented Mar 30, 2019

rust-highfive commented Mar 30, 2019

improve worst-case performance of BTreeSet intersection v3 #59186

improve worst-case performance of BTreeSet intersection v3 #59186

Conversation

ssomers commented Mar 14, 2019

KodrAus commented Mar 19, 2019

KodrAus commented Mar 19, 2019

ssomers commented Mar 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers Mar 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers Mar 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers commented Mar 21, 2019 • edited Loading

ssomers left a comment • edited Loading

Choose a reason for hiding this comment

ssomers commented Mar 24, 2019

KodrAus commented Mar 25, 2019

ssomers commented Mar 26, 2019 • edited Loading

ssomers commented Mar 27, 2019

KodrAus commented Mar 27, 2019

ssomers commented Mar 28, 2019

ssomers commented Mar 29, 2019

KodrAus commented Mar 29, 2019

bors commented Mar 29, 2019

KodrAus commented Mar 29, 2019

ssomers commented Mar 29, 2019

KodrAus commented Mar 29, 2019

bors commented Mar 29, 2019

bors commented Mar 30, 2019

bors commented Mar 30, 2019

rust-highfive commented Mar 30, 2019

ssomers commented Mar 19, 2019 •

edited

Loading

ssomers Mar 20, 2019 •

edited

Loading

ssomers Mar 20, 2019 •

edited

Loading

ssomers commented Mar 21, 2019 •

edited

Loading

ssomers left a comment •

edited

Loading

ssomers commented Mar 26, 2019 •

edited

Loading