Fix HashSet::union performance #66280

stepancheg · 2019-11-10T23:57:34Z

Consider this example: small_set = 0..2, large_set = 0..1000.

To efficiently compute the union of these sets, we should

take all elements of the larger set
for each element of the smaller set check it is not in the larger set

This is exactly what this commit does.

This particular optimization was implemented a year ago, but the
author mistaken < and >.

Consider this example: small_set = 0..2, large_set = 0..1000. To efficiently compute the union of these sets, we should * take all elements of the larger set * for each element of the smaller set check it is not in the larger set This is exactly what this commit does. This particular optimization was implemented a year ago, but the author mistaken `<` and `>`.

rust-highfive · 2019-11-10T23:57:45Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2019-11-11T18:38:32Z

@bors: r+ rollup

bors · 2019-11-11T18:38:33Z

📌 Commit 04a237b has been approved by alexcrichton

ssomers · 2019-11-12T00:28:40Z

What happened is that I measured it on free standing code with free standing benchmarks, and the code must have gotten mangled while transplanting it into libstd, and I didn't even try to transplant benchmarks. In general, there are few benchmarks in libstd/collections/hash compared to liballoc/benches/btree. I could make some but I guess it's too late to include in this PR.

Fix HashSet::union performance Consider this example: small_set = 0..2, large_set = 0..1000. To efficiently compute the union of these sets, we should * take all elements of the larger set * for each element of the smaller set check it is not in the larger set This is exactly what this commit does. This particular optimization was implemented a year ago, but the author mistaken `<` and `>`.

@ghost

Rollup of 11 pull requests Successful merges: - #65965 (Clean up librustc_typeck error_codes file) - #66230 (remove vestigial comments referring to defunct numeric trait hierarchy) - #66241 (bump openssl version) - #66257 (Drop long-section-names linker workaround for windows-gnu) - #66263 (make the error message more readable) - #66267 (Add rustdoc doc) - #66276 (Move lock into CodeStats) - #66278 (Fix error message about exported symbols from proc-macro crates) - #66280 (Fix HashSet::union performance) - #66299 (support issue = "none" in unstable attributes ) - #66309 (Tiny cleanup to size assertions) Failed merges: r? @ghost

ssomers · 2019-11-13T23:57:35Z

cargo benchcmp of a few benchmarks on modestly sized sets (10 versus 100 elements) says about this change:

name                                                before ns/iter  after ns/iter  diff ns/iter   diff %  speedup
collections::hash::bench::set_difference            124             125                       1    0.81%   x 0.99
collections::hash::bench::set_intersection          122             126                       4    3.28%   x 0.97
collections::hash::bench::set_symmetric_difference  1,624           1,821                   197   12.13%   x 0.89
collections::hash::bench::set_union                 1,581           266                  -1,315  -83.18%   x 5.94

Quite a lot better than correcting the "still measurable" improvement I reported in #57043, whatever it was that I tested.

introduce benchmarks of HashSet operations To avoid goofs such as corrected by #66280, I added benchmarks of binary HashSet operations. Due to the fact x.py keeps recompiling the whole shebang (or at least a big part of it) whenever you touch the test code, and because piling up all tests in one file does not strike me as future proof, I tried moving the hash benches to the separate place they are for liballoc/collections/btree. But it turns out that, in a cleaned checkout, x.py still recompiles the whole shebang whenever you touch the test code (PS or when you add or delete any irrelevant file). So I'm not going to add more tests, and I doubt others will, and these tests have proven their point already, so this PR is kind of pointless

rust-highfive assigned alexcrichton Nov 10, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 10, 2019

stepancheg mentioned this pull request Nov 10, 2019

Fix poor worst case performance of set intersection #57043

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 11, 2019

JohnTitor mentioned this pull request Nov 12, 2019

Rollup of 11 pull requests #66323

Merged

bors merged commit 04a237b into rust-lang:master Nov 12, 2019

stepancheg deleted the union branch November 14, 2019 00:06

ssomers mentioned this pull request Nov 14, 2019

introduce benchmarks of HashSet operations #66393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix HashSet::union performance #66280

Fix HashSet::union performance #66280

Uh oh!

stepancheg commented Nov 10, 2019

Uh oh!

rust-highfive commented Nov 10, 2019

Uh oh!

alexcrichton commented Nov 11, 2019

Uh oh!

bors commented Nov 11, 2019

Uh oh!

ssomers commented Nov 12, 2019

Uh oh!

ssomers commented Nov 13, 2019

Uh oh!

Uh oh!

Fix HashSet::union performance #66280

Fix HashSet::union performance #66280

Uh oh!

Conversation

stepancheg commented Nov 10, 2019

Uh oh!

rust-highfive commented Nov 10, 2019

Uh oh!

alexcrichton commented Nov 11, 2019

Uh oh!

bors commented Nov 11, 2019

Uh oh!

ssomers commented Nov 12, 2019

Uh oh!

ssomers commented Nov 13, 2019

Uh oh!

Uh oh!