Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up DefaultHasher, SipHasher, and SipHasher13. #69152

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Feb 14, 2020

This PR applies the speedups to SipHasher128 from #68914 to the Sip hashers in libcore, and also adds the missing write_* methods required so that they can benefit from the speedups. Default hashing of integers is now something like 2.5x faster, and default hash tables should be more competitive with hash tables from the fxhash crate.

r? @michaelwoerister

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2020
@nnethercote
Copy link
Contributor Author

This should have negligible effect on rustc's own performance, because rustc uses fxhash everywhere. But let's check, just to make sure.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Feb 14, 2020

⌛ Trying commit 66a83ae9f40856e3e4a21c3e3ea2605fca7dbd44 with merge 22f6b02ddd18fb74ee6551aea9a7dde4607f3cd8...

@nnethercote
Copy link
Contributor Author

Rust's Sip hashing is notorious for being slow, hence the existence of FnvHash{Map,Set} and FxHash{Map,Set}. This change is worth mentioning in the release notes. Is there a way to mark it for release note consideration?

@nagisa nagisa added relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 14, 2020
@bors
Copy link
Contributor

bors commented Feb 14, 2020

☀️ Try build successful - checks-azure
Build commit: 22f6b02ddd18fb74ee6551aea9a7dde4607f3cd8 (22f6b02ddd18fb74ee6551aea9a7dde4607f3cd8)

@rust-timer
Copy link
Collaborator

Queued 22f6b02ddd18fb74ee6551aea9a7dde4607f3cd8 with parent 1010408, future comparison URL.

@nnethercote
Copy link
Contributor Author

We have four benchmarks with significant regressions.

regex-opt
        avg: 2.2%       min: -0.1%      max: 4.7%
webrender-wrench-opt
        avg: 1.5%       min: 0.2%       max: 3.3%
cargo-opt
        avg: 1.2%       min: -0.1%      max: 2.0%
tokio-webpush-simple-opt
        avg: 1.0%       min: -0.0%      max: 1.6%

They are all opt builds, which suggests that code generation is the problem. (check results all show tiny improvements, mostly -0.1% to -0.2%.)

I did a diff with Cachegrind on regex-opt-clean, which showed the largest regression, and it confirms that the extra time is entirely due to LLVM code. I tested just the first commit and it doesn't cause the regression; this is unsurprising because it only changes short_write but short_write isn't actually used until the second commit. The second commit adds all the missing write_xyz methods, presumably their presence is causing more code to be generated. I tried removing the #[inline] from short_write, but unfortunately that didn't help.

@nnethercote
Copy link
Contributor Author

I tried a bunch of things such as de-inlining, removing write_* methods in various places, and even implemented sip hashing directly in DefaultHasher rather than it relying on SipHasher13, but at best I could only get rid of a bit less than half of the regression (using regex-opt-clean as the benchmark of choice). Hmm.

@michaelwoerister
Copy link
Member

@nnethercote Thanks for the PR!

I can't really r+ things in the standard library. Would you mind splitting out the SipHasher128 parts and ask @rust-lang/libs to review the rest?

@Amanieu
Copy link
Member

Amanieu commented Feb 20, 2020

From the libs perspective, as long as SipHasher continues to return the same output (given a fixed seed) then there should be no problem. Maybe we could add some tests for that?

@michaelwoerister
Copy link
Member

The official test vectors would be great.

@nnethercote
Copy link
Contributor Author

I have split out the SipHasher128 changes in #69332.

We have a decision to make about what to do with the rest of the PR. I see three options.

  • Simply remove sip::Hasher::short_write and sip::Hasher::write_{u8,usize} because they're dead code. And probably add a comment about that.

  • Keep sip::Hasher::short_write, change it to match SipHasher128::short_write, and add a comment about it being unused due to the lack of write_* methods in the various other hashers. This is a bit silly, but would at least move sip::Hasher and SipHasher128 closer together.

  • Do the full change: improve sip::Hasher::short_write and also add all the missing write_* methods. This will benefit every Rust program that uses HashMap/HashSet with default hashing (or DefaultHasher directly) at a cost of making compilation slower for a small fraction of programs. I can see reasonable people coming down on either side of this trade-off.

With respect to the final point, I did a local run and got better results than the CI run:

regex-opt
        avg: 2.7%       min: -0.1%      max: 5.9%
webrender-wrench-opt
        avg: 1.4%       min: 0.3%       max: 3.2%
cargo-opt
        avg: 1.4%       min: -0.1%      max: 2.4%
issue-46449-debug
        avg: -1.8%      min: -2.1%      max: -0.2%
tokio-webpush-simple-opt
        avg: 1.3%       min: -0.0%      max: 2.0%
webrender-debug
        avg: -1.0%      min: -1.9%      max: -0.1%
regression-31157-debug
        avg: -1.0%      min: -1.9%      max: -0.1%
cargo-debug
        avg: -1.3%      min: -1.8%      max: -0.1%
syn-debug
        avg: -1.0%      min: -1.7%      max: -0.1%
coercions-debug
        avg: -0.2%?     min: -1.3%?     max: 1.7%?
piston-image-debug
        avg: -0.8%      min: -1.5%      max: -0.1%
deeply-nested-debug
        avg: -1.0%      min: -1.4%      max: -0.2%
regex-debug
        avg: -0.9%      min: -1.4%      max: -0.1%
cranelift-codegen-debug
        avg: -0.9%      min: -1.4%      max: -0.1%
clap-rs-debug
        avg: -0.9%      min: -1.3%      max: -0.0%
hyper-2-debug
        avg: -0.9%      min: -1.3%      max: -0.2%
encoding-debug
        avg: -0.7%      min: -1.3%      max: -0.1%
tokio-webpush-simple-debug
        avg: -0.6%      min: -1.0%      max: 0.3%
ctfe-stress-4-opt
        avg: -0.5%?     min: -1.0%?     max: -0.0%?
serde-serde_derive-debug
        avg: -0.6%      min: -0.9%      max: -0.0%
ripgrep-debug
        avg: -0.4%      min: -0.9%      max: -0.0%
token-stream-stress-check
        avg: -0.9%      min: -0.9%      max: -0.9%
webrender-wrench-debug
        avg: -0.3%      min: -0.8%      max: 0.2%
clap-rs-opt
        avg: 0.2%       min: -0.2%      max: 0.8%
inflate-debug
        avg: -0.5%      min: -0.7%      max: -0.1%
webrender-opt
        avg: -0.4%      min: -0.7%      max: -0.1%
deep-vector-debug
        avg: -0.5%      min: -0.6%      max: -0.3%
cranelift-codegen-opt
        avg: -0.5%      min: -0.6%      max: -0.1%
futures-debug
        avg: -0.4%      min: -0.6%      max: -0.1%
html5ever-debug
        avg: -0.4%      min: -0.6%      max: -0.1%
encoding-opt
        avg: -0.3%      min: -0.5%      max: -0.1%
token-stream-stress-debug
        avg: -0.4%      min: -0.4%      max: -0.4%
token-stream-stress-opt
        avg: -0.4%      min: -0.4%      max: -0.3%
await-call-tree-debug
        avg: -0.4%      min: -0.4%      max: -0.2%
helloworld-check
        avg: -0.3%      min: -0.4%      max: -0.2%
html5ever-opt
        avg: -0.2%      min: -0.4%      max: -0.0%

The worst regressions are bigger than the best improvements, but there are quite a few more improvements than regressions. I looked at issue-46449-debug-check which had the best improvement, and again it was all differences in the LLVM back-end, which makes sense. The exact effect on compile times is going to vary a lot depending on the vagaries of inlining, and may end up being a wash.

Another interesting case is token-stream-stress-check-clean, which gets a 0.9% improvement due to the use of DefaultHasher somewhere (not sure where, maybe librustc_session?).

I'm leaning towards the third option (full PR) but I'm happy to hear other opinions.

cc @rust-lang/wg-compiler-performance

@nikic
Copy link
Contributor

nikic commented Feb 21, 2020

@nnethercote As this is trading off worse compile-time performance for better run-time performance, it would be nice to have some benchmark of what the actual run-time performance improvement would look like.

@nnethercote
Copy link
Contributor Author

it would be nice to have some benchmark of what the actual run-time performance improvement would look like.

That will depend heavily on the code in question and how much it uses DefaultHasher/HashMap/HashSet with integers. (E.g. rustc itself sees very little speedup because it barely uses those.) Microbenchmarks show that integer hashing is about 2.5x faster, though.

@nnethercote
Copy link
Contributor Author

I did some perf comparisons with rustc.

  • rust0: normal rustc
  • rust1: rust0 with all the FxHasher uses changed to DefaultHasher.
  • rust2: rust1 with this PR's improvements applied.

rust1 is massively slower than rust0. On check-clean runs of all benchmarks, slowdowns ranged from 4.3% to 128.6%.

rust2 is significantly faster than rust1. On check-clean runs of all benchmarks, speed-ups ranged from 0.9% to 19.6%. Not nearly enough to gain back the losses going from rust0 to rust1, but it shows the potential benefits to any Rust program that uses DefaultHasher heavily, as rust1 does.

This commit changes `sip::Hasher` to use the faster `short_write` approach
that was used for `SipHasher128` in rust-lang#68914.

This has no effect because `sip::Hasher::short_write` is currently
unused. See the next commit for more details, and a fix.

(One difference with rust-lang#68914 is that this commit doesn't apply the
`u8to64_le` change from that PR, because I found it is slower, because
it introduces `memcpy` calls with non-statically-known lengths.
Therefore, this commit also undoes the `u8to64_le` change in
`SipHasher128` for this reason. This doesn't affect `SipHasher128` much
because it doesn't use `u8to64_le` much, but I am making the change to
keep the two implementations consistent.)
…, `sip::Hasher`.

`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.

This commit adds all the `write_xyz` methods to `DefaultHasher`,
`SipHasher`, `SipHasher13` and `sip::Hasher`, which means that
`short_write` will be used for all of them.

(I haven't properly implemented 128-bit writes in `sip::Hasher` and
`SipHasher128`. That could be a follow-up if something thinks it is
important.)

Some microbenchmarks I wrote show that this commit speeds up default
hashing of integers by about 2.5x, and can speed up hash tables that use
`DefaultHasher` and have keys that contain integers by up to 30%, though
the exact amount depends heavily on the details of the keys and how the
table is used. This makes default `Hash{Map,Set}` much more likely to be
competitive with `FxHash{Map,Set}`.
@nnethercote nnethercote force-pushed the speed-up-DefaultHasher-SipHasher-SipHasher13 branch from 66a83ae to 96c4e42 Compare February 24, 2020 00:48
@nnethercote
Copy link
Contributor Author

nnethercote commented Feb 24, 2020

I rebased. Let's measure again, see if the perf results vary.

@bors try @rust-timer queue

@bors
Copy link
Contributor

bors commented Feb 24, 2020

⌛ Trying commit 96c4e42 with merge 3fd2ff0fe14703eb6d09f9596264ddebb2b4b86d...

@bors
Copy link
Contributor

bors commented Feb 24, 2020

☀️ Try build successful - checks-azure
Build commit: 3fd2ff0fe14703eb6d09f9596264ddebb2b4b86d (3fd2ff0fe14703eb6d09f9596264ddebb2b4b86d)

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Feb 24, 2020

⌛ Trying commit 96c4e42 with merge d4aac45b9c1492286b3fd245a27eb35cbd66a89f...

@bors
Copy link
Contributor

bors commented Feb 24, 2020

☀️ Try build successful - checks-azure
Build commit: d4aac45b9c1492286b3fd245a27eb35cbd66a89f (d4aac45b9c1492286b3fd245a27eb35cbd66a89f)

@rust-timer
Copy link
Collaborator

Queued d4aac45b9c1492286b3fd245a27eb35cbd66a89f with parent 79cd224, future comparison URL.

@nnethercote
Copy link
Contributor Author

The latest CI perf run is slightly worse than the first one, with small regressions (~1%) on more of the benchmarks.

nnethercote added a commit to nnethercote/rust that referenced this pull request Feb 26, 2020
`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

(I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.)

The alternative would be to add all the missing `write_xyz` methods.
This does give some significant speed-ups, but it hurts compile times a
little in some cases. See rust-lang#69152 for details. This commit does the
conservative thing and doesn't change existing behaviour.
@nnethercote
Copy link
Contributor Author

We have a decision to make about what to do with the rest of the PR. I see three options.

* Simply remove `sip::Hasher::short_write` and `sip::Hasher::write_{u8,usize}` because they're dead code. And probably add a comment about that.

I have created #69471 for this option. I have been having trouble deciding between option 1 and option 3, so I ended up settling on option 1 because it preserves existing behaviour. (I did write a comment explaining the runtime perf win that has been given up.)

bors added a commit that referenced this pull request Feb 26, 2020
Remove `sip::Hasher::short_write`.

`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

(I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.)

The alternative would be to add all the missing `write_xyz` methods.
This does give some significant speed-ups, but it hurts compile times a
little in some cases. See #69152 for details. This commit does the
conservative thing and doesn't change existing behaviour.

r? @rust-lang/libs
@Dylan-DPC-zz
Copy link

Marking this as blocked on #69471

@Dylan-DPC-zz Dylan-DPC-zz added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 11, 2020
@nnethercote
Copy link
Contributor Author

I think we can just close this in favour of #69471.

@Dylan-DPC-zz Dylan-DPC-zz added S-blocked-closed and removed relnotes Marks issues that should be documented in the release notes of the next release. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Mar 11, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 14, 2020
…te, r=dtolnay

Remove `sip::Hasher::short_write`.

`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

(I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.)

The alternative would be to add all the missing `write_xyz` methods.
This does give some significant speed-ups, but it hurts compile times a
little in some cases. See rust-lang#69152 for details. This commit does the
conservative thing and doesn't change existing behaviour.

r? @rust-lang/libs
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 14, 2020
…te, r=dtolnay

Remove `sip::Hasher::short_write`.

`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

(I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.)

The alternative would be to add all the missing `write_xyz` methods.
This does give some significant speed-ups, but it hurts compile times a
little in some cases. See rust-lang#69152 for details. This commit does the
conservative thing and doesn't change existing behaviour.

r? @rust-lang/libs
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 15, 2020
…te, r=dtolnay

Remove `sip::Hasher::short_write`.

`sip::Hasher::short_write` is currently unused. It is called by
`sip::Hasher::write_{u8,usize}`, but those methods are also unused,
because `DefaultHasher`, `SipHasher` and `SipHasher13` don't implement
any of the `write_xyz` methods, so all their write operations end up
calling `sip::Hasher::write`.

(I confirmed this by inserting a `panic!` in `sip::Hasher::short_write`
and running the tests -- they all passed.)

The alternative would be to add all the missing `write_xyz` methods.
This does give some significant speed-ups, but it hurts compile times a
little in some cases. See rust-lang#69152 for details. This commit does the
conservative thing and doesn't change existing behaviour.

r? @rust-lang/libs
@nnethercote nnethercote deleted the speed-up-DefaultHasher-SipHasher-SipHasher13 branch April 1, 2020 19:47
@jyn514 jyn514 added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked-closed labels Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants