Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HashSet::union performance #66280

Merged
merged 1 commit into from Nov 12, 2019

Conversation

@stepancheg
Copy link
Contributor

stepancheg commented Nov 10, 2019

Consider this example: small_set = 0..2, large_set = 0..1000.

To efficiently compute the union of these sets, we should

  • take all elements of the larger set
  • for each element of the smaller set check it is not in the larger set

This is exactly what this commit does.

This particular optimization was implemented a year ago, but the
author mistaken < and >.

Consider this example: small_set = 0..2, large_set = 0..1000.

To efficiently compute the union of these sets, we should
* take all elements of the larger set
* for each element of the smaller set check it is not in the larger set

This is exactly what this commit does.

This particular optimization was implemented a year ago, but the
author mistaken `<` and `>`.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Nov 10, 2019

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Nov 11, 2019

@bors: r+ rollup

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 11, 2019

📌 Commit 04a237b has been approved by alexcrichton

@ssomers

This comment has been minimized.

Copy link
Contributor

ssomers commented Nov 12, 2019

What happened is that I measured it on free standing code with free standing benchmarks, and the code must have gotten mangled while transplanting it into libstd, and I didn't even try to transplant benchmarks. In general, there are few benchmarks in libstd/collections/hash compared to liballoc/benches/btree. I could make some but I guess it's too late to include in this PR.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Nov 12, 2019
Fix HashSet::union performance

Consider this example: small_set = 0..2, large_set = 0..1000.

To efficiently compute the union of these sets, we should
* take all elements of the larger set
* for each element of the smaller set check it is not in the larger set

This is exactly what this commit does.

This particular optimization was implemented a year ago, but the
author mistaken `<` and `>`.
bors added a commit that referenced this pull request Nov 12, 2019
Rollup of 11 pull requests

Successful merges:

 - #65965 (Clean up librustc_typeck error_codes file)
 - #66230 (remove vestigial comments referring to defunct numeric trait hierarchy)
 - #66241 (bump openssl version)
 - #66257 (Drop long-section-names linker workaround for windows-gnu)
 - #66263 (make the error message more readable)
 - #66267 (Add rustdoc doc)
 - #66276 (Move lock into CodeStats)
 - #66278 (Fix error message about exported symbols from proc-macro crates)
 - #66280 (Fix HashSet::union performance)
 - #66299 (support issue = "none" in unstable attributes )
 - #66309 (Tiny cleanup to size assertions)

Failed merges:

r? @ghost
@bors bors merged commit 04a237b into rust-lang:master Nov 12, 2019
4 checks passed
4 checks passed
pr Build #20191110.44 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (Linux x86_64-gnu-tools) Linux x86_64-gnu-tools succeeded
Details
@ssomers

This comment has been minimized.

Copy link
Contributor

ssomers commented Nov 13, 2019

cargo benchcmp of a few benchmarks on modestly sized sets (10 versus 100 elements) says about this change:

name                                                before ns/iter  after ns/iter  diff ns/iter   diff %  speedup
collections::hash::bench::set_difference            124             125                       1    0.81%   x 0.99
collections::hash::bench::set_intersection          122             126                       4    3.28%   x 0.97
collections::hash::bench::set_symmetric_difference  1,624           1,821                   197   12.13%   x 0.89
collections::hash::bench::set_union                 1,581           266                  -1,315  -83.18%   x 5.94

Quite a lot better than correcting the "still measurable" improvement I reported in #57043, whatever it was that I tested.

@stepancheg stepancheg deleted the stepancheg:union branch Nov 14, 2019
bors added a commit that referenced this pull request Nov 24, 2019
introduce benchmarks of HashSet operations

To avoid goofs such as corrected by #66280, I added benchmarks of binary HashSet operations.

Due to the fact x.py keeps recompiling the whole shebang (or at least a big part of it) whenever you touch the test code, and because piling up all tests in one file does not strike me as future proof, I tried moving the hash benches to the separate place they are for liballoc/collections/btree. But it turns out that, in a cleaned checkout, x.py still recompiles the whole shebang whenever you touch the test code (PS or when you add or delete any irrelevant file). So I'm not going to add more tests, and I doubt others will, and these tests have proven their point already, so this PR is kind of pointless
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.