Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve worst-case performance of BTreeSet intersection v3 #59186

Merged

Conversation

Projects
None yet
6 participants
@ssomers
Copy link
Contributor

commented Mar 14, 2019

Variation of #59078 with Intersection remaining a struct

r? @scottmcm

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

@rust-highfive rust-highfive assigned KodrAus and unassigned scottmcm Mar 19, 2019

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Hi @ssomers 👋

Would you prefer to close #58577 and #59078 and look at this PR instead?

I haven't had a chance to look at what's changed since the changeset I approved, could you summarize the differences? It looks like this one does keep the enum for the strategy private which I think is an improvement on the v2 implementation.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Sure. The difference with the approved state is:

  • struct Intersection has both fields replaced by the enum, even though both implementations share the same iterator. I think it makes the code more readable and it simplifies changing/adding algorithms that don't need that field.
  • The gut of both implementations is exposed (but hidden, again) as unstable feature, to allow the benchmark to compare them.
  • Thus new benchmarks contrast the performance of both implementations with the actual implementation chosen by the size rule.
  • The size rule favors the new implementation much more than before, but still less than looks optimal (on my system, for this benchmark, with these kind of elements). And if one of the sets is empty, the other set's iterator isn't even constructed.
  • Sets are always ordered by size, because it simplifies code and because size_hint already had to choose the small set.
  • The classic implementation no longer relies on Peekable (which greatly helps for union and difference, but not here).

PS and yes, I prefer to close the other two PRs.

} else {
(other, self)
};
if a_set.len() > b_set.len() / 16 {

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 20, 2019

Contributor

Might be worth leaving a comment here about what this branch is for and why we use the constant 16 to decide which strategy to use.

This comment has been minimized.

Copy link
@ssomers

ssomers Mar 22, 2019

Author Contributor

Comment left, but it seems you have to find it yourself.

}
}
}
}

fn size_hint(&self) -> (usize, Option<usize>) {
(0, Some(min(self.a.len(), self.b.len())))
let max_size = match &self.inner {

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 20, 2019

Contributor

This would actually be min_size, right?

This comment has been minimized.

Copy link
@ssomers

ssomers Mar 20, 2019

Author Contributor

Had I taken the time to look up the doc of size_hint, I would have called it upper_bound. It's the "min" of the input sets, and "max" of the size_hint, so I'm not a fan of min_size either. How about min_len?

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 25, 2019

Contributor

min_len is ok with me 👍

b_iter: Iter<'a, T>,
},
Search {
a_iter: Iter<'a, T>, // for size_hint, should be the smaller of the sets

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 20, 2019

Contributor

Since the implementation depends on a_iter being smaller I think we should choose more descriptive names here for a and b.

This comment has been minimized.

Copy link
@ssomers

ssomers Mar 20, 2019

Author Contributor

Frankly, I hate the name a because it's intermingled with the same lifetime identifier, but didn't dare to change them. I'll go for small and large in the Search case, small and other in the Stitch case, unless you object.

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 25, 2019

Contributor

That sounds good!

@@ -1,5 +1,6 @@
#![feature(repr_simd)]
#![feature(test)]
#![feature(benches_btree_set)]

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 20, 2019

Contributor

Before merging I think we should remove this feature and make the methods private (or just inline them). It's nice to get an idea of the performance characteristics of each strategy, but once we understand those I think we can move forward with just the general benchmarks.

This comment has been minimized.

Copy link
@ssomers

ssomers Mar 20, 2019

Author Contributor

But we only understand the performance characteristics:

  • on 1 platform and 1 machine (I could build it on Linux on another machine though, but nothing modern),
  • for one generation of the BTreeSet implementation. I see significant improvement from stable to nightly in my BTreeSet macro-benchmarks, tens of percents, with the same intersection implementation.

What alternatives are there for the feature litter?

  • Moving the benchmarks closer to the lib code is terrible: every tweak or comment requires over an hour to build.
  • Duplicating the lib code near/in the benchmark file is doable. You should notice if it's not up to date anymore when you check that the actual performance matches that of either implementation.
  • Or similarly, duplicating the lib code and benchmarks in some separate repository (like I already did in https://github.com/ssomers/Bron-Kerbosch/blob/master/rust/bron_kerbosch/src/util.rs).

This comment has been minimized.

Copy link
@KodrAus

KodrAus Mar 22, 2019

Contributor

@ssomers I think duplicating the code in a separate repository is the best way to go here 👍 I believe we can run benchmarks in CI here (I don't remember the command off the top of my head) so we can worry about this later.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

I made the benchmarks more fine-grained with set sizes and plotted the results (still only one machine). I think this highly suggests that the ideal strategy rule is much more complicated (not just logarithmic with size), and if that rule doesn't spend the performance it gains on evaluating itself, it could be greatly off on a system with different word size, cache sizes and architecture.

But it also confirms that factor 16 seems quite reasonable. To prevent the <30% performance hit for intersection of particularly crafted large sets, it would have to be 19. From the viewpoint of random sets, factor 16 should be lowered instead.

But if I commit all 146 benchmarks I have now, it's simply annoying for someone casually checking performance in general. So I'm becoming convinced that it's better to move this case study over to a separate repository, and leave in this PR only the final rule, and the few benchmarks already merged in earlier (or less), and thus nothing exposed as public unstable. Checking the rust source code, there are other references to github repositories besides rust-lang so I can probably just create one myself.

PS the <30% performance hit is actually closer to 15%, apparently thanks to no longer using Peekable

ssomers added a commit to ssomers/rust_bench_btreeset_intersection that referenced this pull request Mar 21, 2019

@ssomers
Copy link
Contributor Author

left a comment

Here's the new comment (cause it seems rather difficult to find in the previous comment)

Oh well, this seems worse and can't be deleted.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

I configured Travis on the separate repository and the readings on the linux build on that (virtual) machine are quite similar (with much less stability, as one would expect). The ideal factor is < 1 higher accross the range of sizes.

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

Thanks for all your investigation work @ssomers!

Just for good measure, let's add another test case to liballoc::tests::btree::set::test_intersection that checks intersection of a very large set with a very small one so we can be sure it'll be using the other strategy. Something like:

check_intersection(&[11, 5000, 1, 3, 77, 8924, 103],
    &(0..1000).collect::<Vec<_>>(),
    &[11, 1, 3, 77, 103]);
@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2019

I thought the test cases in the first PR were still there and they didn't even include non-subset samples. Rest assured, the proptest in the separate repository covered everything.

So are we there now? Nope, I just realized that is_subset/superset is really quite similar. I'll keep that out of this PR, but it might mean that the 16 becomes a named constant.

PS change of plan: it wasn't that much work, that revealed the comments in intersection weren't accurate, and the benchmarks clunky, so I committed it here anyway

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2019

I cooked up a similar change to the implementation of set difference. It only needs half of the peekables, and it benefits from the same performance boost as with intersection if the right hand set is huge. I cooked up similar code resulting in:

before:

test btree::set::difference_random_100_vs_100            ... bench:         914 ns/iter (+/- 27)
test btree::set::difference_random_100_vs_10k            ... bench:      45,641 ns/iter (+/- 605)
test btree::set::difference_random_10k_vs_100            ... bench:      63,561 ns/iter (+/- 512)
test btree::set::difference_random_10k_vs_10k            ... bench:     205,747 ns/iter (+/- 5,792)
test btree::set::difference_staggered_100_vs_100         ... bench:         989 ns/iter (+/- 19)
test btree::set::difference_staggered_100_vs_10k         ... bench:      35,919 ns/iter (+/- 418)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      93,716 ns/iter (+/- 1,358)

after:

test btree::set::difference_random_100_vs_100            ... bench:         829 ns/iter (+/- 14)
test btree::set::difference_random_100_vs_10k            ... bench:       2,871 ns/iter (+/- 112)
test btree::set::difference_random_10k_vs_100            ... bench:      60,580 ns/iter (+/- 675)
test btree::set::difference_random_10k_vs_10k            ... bench:     188,386 ns/iter (+/- 1,727)
test btree::set::difference_staggered_100_vs_100         ... bench:         863 ns/iter (+/- 21)
test btree::set::difference_staggered_100_vs_10k         ... bench:       2,673 ns/iter (+/- 70)
test btree::set::difference_staggered_10k_vs_10k         ... bench:      83,875 ns/iter (+/- 2,977)

Do you want me to commit it here or later? (or not at all...)

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

This is looking great! Thanks for giving these methods some TLC @ssomers.

Do you want me to commit it here or later? (or not at all...)

Yeh I think we can roll difference into this PR as well while we're working on these set operations. I'm happy with the implementation of intersection now.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 28, 2019

I meant "push" instead of "commit", but github lists the commits according to time committed locally anyway. Confusing...

Anyways, it's all here now, and nothing changed to the implementation of intersection.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

For future reference: if you wonder why we have to tediously implement clone for Difference and for Intersection, while BTreeSet itself gets away with an easy derive(Clone), it's because of #26925. BTreeSet doesn't have clone unless T has it, and that makes all the sense in the world. It doesn't make much sense for Difference and Intersection, because they hand out references to T.

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Alrighty, this PR introduces some additional complexity to BTreeSet, but is much more efficient when performing set operations between a large and small set, which I think is a reasonable case. There's no change in the public API. So let's merge this one in!

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

📌 Commit bb7bf9b has been approved by KodrAus

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@bors r-

Sorry, @ssomers do you mind squashing down some of that git history so we don't have a merge commit in there? I didn't see that one before.

@ssomers

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

I don't mind, but it's going to take a while to figure out how.

@ssomers ssomers force-pushed the ssomers:btreeset_intersection_revisited_again branch from bb7bf9b to f5fee8f Mar 29, 2019

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

📌 Commit f5fee8f has been approved by KodrAus

@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

⌛️ Testing commit f5fee8f with merge e0e27d7...

bors added a commit that referenced this pull request Mar 30, 2019

Auto merge of #59186 - ssomers:btreeset_intersection_revisited_again,…
… r=KodrAus

improve worst-case performance of BTreeSet intersection v3

Variation of [#59078](#59078) with `Intersection` remaining a struct

r? @scottmcm
@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

💔 Test failed - checks-travis

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Mar 30, 2019

The job dist-x86_64-netbsd of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:52:59] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-netbsd" "-Zdual-proc-macros" "-j" "4" "--release" "--locked" "--color" "always" "--manifest-path" "/checkout/src/tools/miri/Cargo.toml" "--features" "rustc-workspace-hack/all-static" "--message-format" "json"
[01:52:59] expected success, got: exit code: 101
[01:52:59] [TIMING] ToolBuild { compiler: Compiler { stage: 2, host: "x86_64-unknown-linux-gnu" }, target: "x86_64-unknown-netbsd", tool: "miri", path: "src/tools/miri", mode: ToolRustc, is_optional_tool: true, source_type: Submodule, extra_features: [] } -- 18.310
[01:52:59] Unable to build miri, skipping dist
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Centril added a commit to Centril/rust that referenced this pull request Apr 2, 2019

Rollup merge of rust-lang#59186 - ssomers:btreeset_intersection_revis…
…ited_again, r=KodrAus

improve worst-case performance of BTreeSet intersection v3

Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct

r? @scottmcm

@Centril Centril referenced this pull request Apr 2, 2019

Closed

Rollup of 5 pull requests #59653

bors added a commit that referenced this pull request Apr 3, 2019

Auto merge of #59653 - Centril:rollup-0kr2w0l, r=Centril
Rollup of 5 pull requests

Successful merges:

 - #55448 (Add 'partition_at_index/_by/_by_key' for slices.)
 - #59186 (improve worst-case performance of BTreeSet intersection v3)
 - #59514 (Remove adt_def from projections and downcasts in MIR)
 - #59619 (wasi: Implement more of the standard library)
 - #59630 (Shrink `mir::Statement`.)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Apr 3, 2019

Rollup merge of rust-lang#59186 - ssomers:btreeset_intersection_revis…
…ited_again, r=KodrAus

improve worst-case performance of BTreeSet intersection v3

Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct

r? @scottmcm

@Centril Centril referenced this pull request Apr 3, 2019

Merged

Rollup of 4 pull requests #59657

bors added a commit that referenced this pull request Apr 3, 2019

Auto merge of #59657 - Centril:rollup-w5p98mc, r=Centril
Rollup of 4 pull requests

Successful merges:

 - #55448 (Add 'partition_at_index/_by/_by_key' for slices.)
 - #59186 (improve worst-case performance of BTreeSet intersection v3)
 - #59514 (Remove adt_def from projections and downcasts in MIR)
 - #59630 (Shrink `mir::Statement`.)

Failed merges:

r? @ghost

@bors bors merged commit f5fee8f into rust-lang:master Apr 3, 2019

1 check failed

homu Test failed
Details

Centril added a commit to Centril/rust that referenced this pull request Apr 5, 2019

Rollup merge of rust-lang#59665 - ssomers:hashset_revisited, r=KodrAus
improve worst-case performance of HashSet.is_subset

One more simple optimization opportunity for HashSet that was applied in BTreeSet in rust-lang#59186 (and wasn't in rust-lang#57043). Already covered by the existing unit test.

r? @KodrAus

@ssomers ssomers deleted the ssomers:btreeset_intersection_revisited_again branch Apr 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.