Replace FNV with a faster hash function. #37229

Merged
merged 2 commits into from Nov 9, 2016

Conversation

Projects
None yet
@nnethercote
Contributor

nnethercote commented Oct 17, 2016

Hash table lookups are very hot in rustc profiles and the time taken within FnvHash itself is a big part of that. Although FNV is a simple hash, it processes its input one byte at a time. In contrast, Firefox has a homespun hash function that is also simple but works on multiple bytes at a time. So I tried it out and the results are compelling:

futures-rs-test  4.326s vs  4.212s --> 1.027x faster (variance: 1.001x, 1.007x)
helloworld       0.233s vs  0.232s --> 1.004x faster (variance: 1.037x, 1.016x)
html5ever-2016-  5.397s vs  5.210s --> 1.036x faster (variance: 1.009x, 1.006x)
hyper.0.5.0      5.018s vs  4.905s --> 1.023x faster (variance: 1.007x, 1.006x)
inflate-0.1.0    4.889s vs  4.872s --> 1.004x faster (variance: 1.012x, 1.007x)
issue-32062-equ  0.347s vs  0.335s --> 1.035x faster (variance: 1.033x, 1.019x)
issue-32278-big  1.717s vs  1.622s --> 1.059x faster (variance: 1.027x, 1.028x)
jld-day15-parse  1.537s vs  1.459s --> 1.054x faster (variance: 1.005x, 1.003x)
piston-image-0. 11.863s vs 11.482s --> 1.033x faster (variance: 1.060x, 1.002x)
regex.0.1.30     2.517s vs  2.453s --> 1.026x faster (variance: 1.011x, 1.013x)
rust-encoding-0  2.080s vs  2.047s --> 1.016x faster (variance: 1.005x, 1.005x)
syntex-0.42.2   32.268s vs 31.275s --> 1.032x faster (variance: 1.014x, 1.022x)
syntex-0.42.2-i 17.629s vs 16.559s --> 1.065x faster (variance: 1.013x, 1.021x)

(That's a stage1 compiler doing debug builds. Results for a stage2 compiler are similar.)

The attached commit is not in a state suitable for landing because I changed the implementation of FnvHasher without changing its name (because that would have required touching many lines in the compiler). Nonetheless, it is a good place to start discussions.

Profiles show very clearly that this new hash function is a lot faster to compute than FNV. The quality of the new hash function is less clear -- it seems to do better in some cases and worse in others (judging by the number of instructions executed in Hash{Map,Set}::get).

CC @brson, @arthurprs

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Oct 17, 2016

Collaborator

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Oct 17, 2016

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 17, 2016

Contributor

Do we have any backing data for this algorithm? Maybe from the Firefox development process/source? Smhasher run?

Contributor

arthurprs commented Oct 17, 2016

Do we have any backing data for this algorithm? Maybe from the Firefox development process/source? Smhasher run?

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 17, 2016

Contributor

I forgot to mention that there is something of an explanation about this hash function in the Firefox source: https://dxr.mozilla.org/mozilla-central/source/mfbt/HashFunctions.h#74-117.

I modified it from 32-bits to 64-bits by changing the multiplication factor from 0x9E3779B9 (the golden ratio in fixed point) to 0x517cc1b727220a95 (pi in fixed point). I changed it from the golden ratio to pi because the golden ratio in 64-bit fixed point is even -- see http://stackoverflow.com/questions/5889238/why-is-xor-the-default-way-to-combine-hashes#comment54810251_27952689

This hash function was introduced into Firefox in https://bugzilla.mozilla.org/show_bug.cgi?id=729940. There's very little discussion in that bug report about how it was derived.

I'm happy to try Smhasher on it. But the ultimate workload for the hash function used within rustc is rustc itself, and it's clearly working well there.

Contributor

nnethercote commented Oct 17, 2016

I forgot to mention that there is something of an explanation about this hash function in the Firefox source: https://dxr.mozilla.org/mozilla-central/source/mfbt/HashFunctions.h#74-117.

I modified it from 32-bits to 64-bits by changing the multiplication factor from 0x9E3779B9 (the golden ratio in fixed point) to 0x517cc1b727220a95 (pi in fixed point). I changed it from the golden ratio to pi because the golden ratio in 64-bit fixed point is even -- see http://stackoverflow.com/questions/5889238/why-is-xor-the-default-way-to-combine-hashes#comment54810251_27952689

This hash function was introduced into Firefox in https://bugzilla.mozilla.org/show_bug.cgi?id=729940. There's very little discussion in that bug report about how it was derived.

I'm happy to try Smhasher on it. But the ultimate workload for the hash function used within rustc is rustc itself, and it's clearly working well there.

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 17, 2016

Contributor

I think it's worth discussing a couple more things while we're at it, so we nail this for good.

  • do calculations considering an usize sized hash, this will help a lot in 32bit systems. It's fine to just expand it to u64 on finish. Std HashMap hashes will eventually become usize #36567 anyway.
  • this works at the same byte at a time for &str/&[int] slices so the improvement is coming exclusively from integral hashing. For those, I'm curious if we can process usize bytes at a time before falling back to the byte at a time.
Contributor

arthurprs commented Oct 17, 2016

I think it's worth discussing a couple more things while we're at it, so we nail this for good.

  • do calculations considering an usize sized hash, this will help a lot in 32bit systems. It's fine to just expand it to u64 on finish. Std HashMap hashes will eventually become usize #36567 anyway.
  • this works at the same byte at a time for &str/&[int] slices so the improvement is coming exclusively from integral hashing. For those, I'm curious if we can process usize bytes at a time before falling back to the byte at a time.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 17, 2016

Contributor

I think you miswrote your second dot point... but I did some ad hoc profiling and found that the vast majority of occurrences are write_u32 and write_u64. write was less than 1% of occurrences.

Contributor

nnethercote commented Oct 17, 2016

I think you miswrote your second dot point... but I did some ad hoc profiling and found that the vast majority of occurrences are write_u32 and write_u64. write was less than 1% of occurrences.

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 17, 2016

Contributor

Did I? I can't find it. I guess I need more more coffee.

Interesting, but I'm almost sure it'll show up when we eventually move everything away from siphasher (string interner for example). Siphasher is still even higher in the profiles.

Contributor

arthurprs commented Oct 17, 2016

Did I? I can't find it. I guess I need more more coffee.

Interesting, but I'm almost sure it'll show up when we eventually move everything away from siphasher (string interner for example). Siphasher is still even higher in the profiles.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Oct 17, 2016

Contributor

Fnv and SipHasher both have the property that the stream of bytes to hash is "untyped": a u16 fed as u16 or its byte representation is hashed the same way.

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

Contributor

bluss commented Oct 17, 2016

Fnv and SipHasher both have the property that the stream of bytes to hash is "untyped": a u16 fed as u16 or its byte representation is hashed the same way.

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 17, 2016

Contributor

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

Yeah, luckily the Hash trait doesn't impose any special streaming requirement.

Contributor

arthurprs commented Oct 17, 2016

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

Yeah, luckily the Hash trait doesn't impose any special streaming requirement.

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 17, 2016

Contributor

I got curious so I ran smhasher on the 64bit (PR) and original 32bit hashes, I had to include 2 variants of each to be able to see how both modes of the hasher behave (integral and byte-byte...)

see gist for results: https://gist.github.com/arthurprs/5e57cd59586acd8c52dbb02b55711096

A few comments considering the code in the PR.

Hashing integral types (write_...)

The quality is really bad but it's so cheap to calculate for integral types (what rustc seems to be using fnv for) that it's still a win for the combination of the workload + hashmap implementation. I'm fairly sure that the compiler sees the 0 seed and the hash boils down to a single IMUL instruction.

Hashing slices (write_usize() + write())

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

I think the right way forward is to have two hashers in the rustc codebase, one general purpose-ish and another for integral types. This PR has potential for the later.

Contributor

arthurprs commented Oct 17, 2016

I got curious so I ran smhasher on the 64bit (PR) and original 32bit hashes, I had to include 2 variants of each to be able to see how both modes of the hasher behave (integral and byte-byte...)

see gist for results: https://gist.github.com/arthurprs/5e57cd59586acd8c52dbb02b55711096

A few comments considering the code in the PR.

Hashing integral types (write_...)

The quality is really bad but it's so cheap to calculate for integral types (what rustc seems to be using fnv for) that it's still a win for the combination of the workload + hashmap implementation. I'm fairly sure that the compiler sees the 0 seed and the hash boils down to a single IMUL instruction.

Hashing slices (write_usize() + write())

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

I think the right way forward is to have two hashers in the rustc codebase, one general purpose-ish and another for integral types. This PR has potential for the later.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 17, 2016

Contributor

@arthurps: Thank you for running these! I was about to do it myself but you've saved me the trouble.

Looking at the results... whelp, there are a lot of numbers there that I don't know how to interpret, though the "FAIL" results sound bad.

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

Contributor

nnethercote commented Oct 17, 2016

@arthurps: Thank you for running these! I was about to do it myself but you've saved me the trouble.

Looking at the results... whelp, there are a lot of numbers there that I don't know how to interpret, though the "FAIL" results sound bad.

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 18, 2016

Contributor

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

It's a 15% difference in my Intel Skylake processor, 690MB/s vs 800MB/s. You can see some rough numbers in the gist.

Contributor

arthurprs commented Oct 18, 2016

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

It's a 15% difference in my Intel Skylake processor, 690MB/s vs 800MB/s. You can see some rough numbers in the gist.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 25, 2016

Contributor

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

Are you sure? Where does that requirement come from? I was thinking about changing write so that it processes 4 or 8 bytes at a time and then does single-byte clean-up for any excess bytes at the end...

Contributor

nnethercote commented Oct 25, 2016

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

Are you sure? Where does that requirement come from? I was thinking about changing write so that it processes 4 or 8 bytes at a time and then does single-byte clean-up for any excess bytes at the end...

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 25, 2016

Contributor

New version. I've made the following changes.

  • FxHasher is now a separate type. FnvHasher still exists.
  • I've converted all uses of FnvHash{Map,Set} to FxHash{Map,Set}. I did some profiling and found that write calls (i.e. variable-length hash cases) account for less than 0.1% of occurrences. Even when I weight them by their length, they account for less than 2% of all FxHasher operations. So I don't think treating variable-length cases differently is worthwhile.
  • FxHasher now works with usize so that it will be faster on 32-bit machines.
  • I remeasured and the speed-ups are basically unchanged from those in the first comment above.

r? @arthurps: what do you think?

Contributor

nnethercote commented Oct 25, 2016

New version. I've made the following changes.

  • FxHasher is now a separate type. FnvHasher still exists.
  • I've converted all uses of FnvHash{Map,Set} to FxHash{Map,Set}. I did some profiling and found that write calls (i.e. variable-length hash cases) account for less than 0.1% of occurrences. Even when I weight them by their length, they account for less than 2% of all FxHasher operations. So I don't think treating variable-length cases differently is worthwhile.
  • FxHasher now works with usize so that it will be faster on 32-bit machines.
  • I remeasured and the speed-ups are basically unchanged from those in the first comment above.

r? @arthurps: what do you think?

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Oct 25, 2016

Contributor

@nnethercote I'm not sure; it's something that needs to be discussed and put into the documentation.

I think it's the logical rule by the construction of Hash. Imagine a chunked rope datastructure. It should have the same hash value, regardless of how it is chunked, as long as its whole string is the same. How it is chunked will determine how its data is fed to Hasher::write.

Contributor

bluss commented Oct 25, 2016

@nnethercote I'm not sure; it's something that needs to be discussed and put into the documentation.

I think it's the logical rule by the construction of Hash. Imagine a chunked rope datastructure. It should have the same hash value, regardless of how it is chunked, as long as its whole string is the same. How it is chunked will determine how its data is fed to Hasher::write.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Oct 25, 2016

Contributor

To make a concrete example, imagine struct Rope(Vec<String>). The actual string value is the concatenation of the strings in the representation. Rope(["a", "b"]) and Rope(["ab"]) should have the same hash.

Contributor

bluss commented Oct 25, 2016

To make a concrete example, imagine struct Rope(Vec<String>). The actual string value is the concatenation of the strings in the representation. Rope(["a", "b"]) and Rope(["ab"]) should have the same hash.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 25, 2016

Contributor

(New version removes the println! statements that I accidentally left in...)

Contributor

nnethercote commented Oct 25, 2016

(New version removes the println! statements that I accidentally left in...)

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Oct 25, 2016

Contributor

Looks good to me. Somebody from the core team should weight about how to move this forward.

I wouldn't be worried about the Hasher having the "strict streaming" characteristic as the Hash trait is "strongly typed" and will make the same writes to hasher every time.

Contributor

arthurprs commented Oct 25, 2016

Looks good to me. Somebody from the core team should weight about how to move this forward.

I wouldn't be worried about the Hasher having the "strict streaming" characteristic as the Hash trait is "strongly typed" and will make the same writes to hasher every time.

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 25, 2016

Contributor

☔️ The latest upstream changes (presumably #37292) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Oct 25, 2016

☔️ The latest upstream changes (presumably #37292) made this pull request unmergeable. Please resolve the merge conflicts.

@aturon

This comment has been minimized.

Show comment
Hide comment
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 26, 2016

Contributor

☔️ The latest upstream changes (presumably #37270) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Oct 26, 2016

☔️ The latest upstream changes (presumably #37270) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 31, 2016

Contributor

With the notable exception of @arthurprs, this is being ignored. It's a big compile speed win, the biggest one I know of, but I fear that concerns about theoretical worst cases will overwhelm the benefit that's been demonstrated widely in practice.

How can we move this forward?

Contributor

nnethercote commented Oct 31, 2016

With the notable exception of @arthurprs, this is being ignored. It's a big compile speed win, the biggest one I know of, but I fear that concerns about theoretical worst cases will overwhelm the benefit that's been demonstrated widely in practice.

How can we move this forward?

@pnkfelix

This comment has been minimized.

Show comment
Hide comment
@pnkfelix

pnkfelix Oct 31, 2016

Member

(Nominated for discussion amongst compiler team; hopefully that will help it move forward...)

Member

pnkfelix commented Oct 31, 2016

(Nominated for discussion amongst compiler team; hopefully that will help it move forward...)

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 1, 2016

Contributor

I think the problem is that @Aatch hasn't been too active of late, so the PR went unnoticed. I have no strong opinion about what hash function we use --- basically, if it's faster, I'm for it. I'm curious if anyone has any objections.

Contributor

nikomatsakis commented Nov 1, 2016

I think the problem is that @Aatch hasn't been too active of late, so the PR went unnoticed. I have no strong opinion about what hash function we use --- basically, if it's faster, I'm for it. I'm curious if anyone has any objections.

@nikomatsakis nikomatsakis assigned nikomatsakis and unassigned Aatch Nov 1, 2016

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
Contributor

nikomatsakis commented Nov 1, 2016

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 1, 2016

Contributor

@rfcbot fcp merge

I'm not sure if this merits a FCP-style decision making process, but it seems harmless enough. Maybe if everyone is in favor we can avoid the need to discuss at the meeting. =) (In any case, I'd rather if we can conduct the discussion here on the PR in advance.)

Contributor

nikomatsakis commented Nov 1, 2016

@rfcbot fcp merge

I'm not sure if this merits a FCP-style decision making process, but it seems harmless enough. Maybe if everyone is in favor we can avoid the need to discuss at the meeting. =) (In any case, I'd rather if we can conduct the discussion here on the PR in advance.)

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Nov 1, 2016

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented Nov 1, 2016

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Nov 3, 2016

Contributor

Could you have the Fnv -> Fx global rename in its own commit?

Contributor

arielb1 commented Nov 3, 2016

Could you have the Fnv -> Fx global rename in its own commit?

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 3, 2016

Contributor

Could you have the Fnv -> Fx global rename in its own commit?

You mean this?

  • First commit adds the new fx.rs file.
  • Second commit changes all the FnvHashMap/Set occurrences to FxHashMap/Set.

Sure. I'll wait until I get full approval from the compiler team, because I have some other conflicts that I need to fix and I might as well do them later to reduce the likelihood of more conflicts afterwards.

Contributor

nnethercote commented Nov 3, 2016

Could you have the Fnv -> Fx global rename in its own commit?

You mean this?

  • First commit adds the new fx.rs file.
  • Second commit changes all the FnvHashMap/Set occurrences to FxHashMap/Set.

Sure. I'll wait until I get full approval from the compiler team, because I have some other conflicts that I need to fix and I might as well do them later to reduce the likelihood of more conflicts afterwards.

@malbarbo

This comment has been minimized.

Show comment
Hide comment
@malbarbo

malbarbo Nov 4, 2016

Contributor

How about defining type alias DefaultHashMap and DefaultHashSet? So in the future the concrete types can be easily changed.

Contributor

malbarbo commented Nov 4, 2016

How about defining type alias DefaultHashMap and DefaultHashSet? So in the future the concrete types can be easily changed.

@arthurprs

This comment has been minimized.

Show comment
Hide comment
@arthurprs

arthurprs Nov 4, 2016

Contributor

Although there's no one size fits all for hashers I think it's easier to opt-out of it if necessary than the other way around. So +1 for the DefaultMap/Set.

Contributor

arthurprs commented Nov 4, 2016

Although there's no one size fits all for hashers I think it's easier to opt-out of it if necessary than the other way around. So +1 for the DefaultMap/Set.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 8, 2016

Contributor

@nnethercote everybody is in favor!

Contributor

nikomatsakis commented Nov 8, 2016

@nnethercote everybody is in favor!

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 8, 2016

Contributor

I rebased and split the PR into two commits: one adding FxHasher, and one converting all FnvHash instances to FxHash instances.

I also remeasured and the results are similar to before.

futures-rs-test  4.020s vs  3.918s --> 1.026x faster (variance: 1.008x, 1.007x)
helloworld       0.225s vs  0.225s --> 0.999x faster (variance: 1.009x, 1.009x)
html5ever-2016-  3.800s vs  3.637s --> 1.045x faster (variance: 1.006x, 1.006x)
hyper.0.5.0      4.642s vs  4.521s --> 1.027x faster (variance: 1.006x, 1.007x)
inflate-0.1.0    3.714s vs  3.671s --> 1.012x faster (variance: 1.007x, 1.007x)
issue-32062-equ  0.300s vs  0.292s --> 1.029x faster (variance: 1.011x, 1.026x)
issue-32278-big  1.535s vs  1.484s --> 1.034x faster (variance: 1.024x, 1.006x)
jld-day15-parse  1.343s vs  1.272s --> 1.056x faster (variance: 1.001x, 1.012x)
ostn15_phf      19.419s vs 18.372s --> 1.057x faster (variance: 1.003x, 1.027x)
piston-image-0. 10.855s vs 10.464s --> 1.037x faster (variance: 1.004x, 1.010x)
reddit-stress    2.217s vs  2.133s --> 1.039x faster (variance: 1.009x, 1.006x)
regex.0.1.30     2.244s vs  2.185s --> 1.027x faster (variance: 1.019x, 1.004x)
rust-encoding-0  1.862s vs  1.814s --> 1.027x faster (variance: 1.002x, 1.007x)
syntex-0.42.2   29.155s vs 28.059s --> 1.039x faster (variance: 1.019x, 1.003x)
syntex-0.42.2-i 13.689s vs 12.897s --> 1.061x faster (variance: 1.010x, 1.007x)

(reddit-stress and ostn15_phf are a couple of programs that aren't in rust-benchmarks that I've been measuring.)

r? @nikomatsakis

Contributor

nnethercote commented Nov 8, 2016

I rebased and split the PR into two commits: one adding FxHasher, and one converting all FnvHash instances to FxHash instances.

I also remeasured and the results are similar to before.

futures-rs-test  4.020s vs  3.918s --> 1.026x faster (variance: 1.008x, 1.007x)
helloworld       0.225s vs  0.225s --> 0.999x faster (variance: 1.009x, 1.009x)
html5ever-2016-  3.800s vs  3.637s --> 1.045x faster (variance: 1.006x, 1.006x)
hyper.0.5.0      4.642s vs  4.521s --> 1.027x faster (variance: 1.006x, 1.007x)
inflate-0.1.0    3.714s vs  3.671s --> 1.012x faster (variance: 1.007x, 1.007x)
issue-32062-equ  0.300s vs  0.292s --> 1.029x faster (variance: 1.011x, 1.026x)
issue-32278-big  1.535s vs  1.484s --> 1.034x faster (variance: 1.024x, 1.006x)
jld-day15-parse  1.343s vs  1.272s --> 1.056x faster (variance: 1.001x, 1.012x)
ostn15_phf      19.419s vs 18.372s --> 1.057x faster (variance: 1.003x, 1.027x)
piston-image-0. 10.855s vs 10.464s --> 1.037x faster (variance: 1.004x, 1.010x)
reddit-stress    2.217s vs  2.133s --> 1.039x faster (variance: 1.009x, 1.006x)
regex.0.1.30     2.244s vs  2.185s --> 1.027x faster (variance: 1.019x, 1.004x)
rust-encoding-0  1.862s vs  1.814s --> 1.027x faster (variance: 1.002x, 1.007x)
syntex-0.42.2   29.155s vs 28.059s --> 1.039x faster (variance: 1.019x, 1.003x)
syntex-0.42.2-i 13.689s vs 12.897s --> 1.061x faster (variance: 1.010x, 1.007x)

(reddit-stress and ostn15_phf are a couple of programs that aren't in rust-benchmarks that I've been measuring.)

r? @nikomatsakis

nnethercote added some commits Nov 8, 2016

Replace FnvHasher use with FxHasher.
This speeds up compilation by 3--6% across most of rustc-benchmarks.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 8, 2016

Contributor

Ugh, this PR is so conflict-prone.

Contributor

nnethercote commented Nov 8, 2016

Ugh, this PR is so conflict-prone.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Nov 8, 2016

Contributor

@bors r+

Contributor

nikomatsakis commented Nov 8, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 8, 2016

Contributor

📌 Commit 00e48af has been approved by nikomatsakis

Contributor

bors commented Nov 8, 2016

📌 Commit 00e48af has been approved by nikomatsakis

eddyb added a commit to eddyb/rust that referenced this pull request Nov 9, 2016

Rollup merge of #37229 - nnethercote:FxHasher, r=nikomatsakis
Replace FNV with a faster hash function.

Hash table lookups are very hot in rustc profiles and the time taken within `FnvHash` itself is a big part of that. Although FNV is a simple hash, it processes its input one byte at a time. In contrast, Firefox has a homespun hash function that is also simple but works on multiple bytes at a time. So I tried it out and the results are compelling:

```
futures-rs-test  4.326s vs  4.212s --> 1.027x faster (variance: 1.001x, 1.007x)
helloworld       0.233s vs  0.232s --> 1.004x faster (variance: 1.037x, 1.016x)
html5ever-2016-  5.397s vs  5.210s --> 1.036x faster (variance: 1.009x, 1.006x)
hyper.0.5.0      5.018s vs  4.905s --> 1.023x faster (variance: 1.007x, 1.006x)
inflate-0.1.0    4.889s vs  4.872s --> 1.004x faster (variance: 1.012x, 1.007x)
issue-32062-equ  0.347s vs  0.335s --> 1.035x faster (variance: 1.033x, 1.019x)
issue-32278-big  1.717s vs  1.622s --> 1.059x faster (variance: 1.027x, 1.028x)
jld-day15-parse  1.537s vs  1.459s --> 1.054x faster (variance: 1.005x, 1.003x)
piston-image-0. 11.863s vs 11.482s --> 1.033x faster (variance: 1.060x, 1.002x)
regex.0.1.30     2.517s vs  2.453s --> 1.026x faster (variance: 1.011x, 1.013x)
rust-encoding-0  2.080s vs  2.047s --> 1.016x faster (variance: 1.005x, 1.005x)
syntex-0.42.2   32.268s vs 31.275s --> 1.032x faster (variance: 1.014x, 1.022x)
syntex-0.42.2-i 17.629s vs 16.559s --> 1.065x faster (variance: 1.013x, 1.021x)
```

(That's a stage1 compiler doing debug builds. Results for a stage2 compiler are similar.)

The attached commit is not in a state suitable for landing because I changed the implementation of FnvHasher without changing its name (because that would have required touching many lines in the compiler). Nonetheless, it is a good place to start discussions.

Profiles show very clearly that this new hash function is a lot faster to compute than FNV. The quality of the new hash function is less clear -- it seems to do better in some cases and worse in others (judging by the number of instructions executed in `Hash{Map,Set}::get`).

CC @brson, @arthurprs

@eddyb eddyb referenced this pull request Nov 9, 2016

Merged

Rollup of 15 pull requests #37670

bors added a commit that referenced this pull request Nov 9, 2016

Auto merge of #37670 - eddyb:rollup, r=eddyb
Rollup of 15 pull requests

- Successful merges: #36868, #37134, #37229, #37250, #37370, #37428, #37432, #37472, #37524, #37614, #37622, #37627, #37636, #37644, #37654
- Failed merges: #37463, #37542, #37645

@bors bors merged commit 00e48af into rust-lang:master Nov 9, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nnethercote nnethercote deleted the nnethercote:FxHasher branch Nov 10, 2016

@brson brson added the relnotes label Nov 15, 2016

@cbreeden

This comment has been minimized.

Show comment
Hide comment
@cbreeden

cbreeden Jan 19, 2017

Contributor

@nnethercote this hash is super fast on my dataset. Here are my tests for this hash one a personal round robin hashset implementation for about 4500 u32 (unicode):

test fnv        ... 
1 => 2586
2 => 1244
3 => 468
4 => 139
5 => 30
6 => 10
bench:      63,889 ns/iter (+/- 7,812)

test fxhasher   ... 
1 => 3305
2 => 1116
3 => 56
bench:      22,290 ns/iter (+/- 3,283)

test phf        ... bench:      72,287 ns/iter (+/- 6,156)
test static_fnv ... bench:      64,639 ns/iter (+/- 6,879)

The # => # shows how many probes were required in the round robin to find the correct element.
This should probably be a crate. Do you mind if I make one out of it? Or should you?

Contributor

cbreeden commented Jan 19, 2017

@nnethercote this hash is super fast on my dataset. Here are my tests for this hash one a personal round robin hashset implementation for about 4500 u32 (unicode):

test fnv        ... 
1 => 2586
2 => 1244
3 => 468
4 => 139
5 => 30
6 => 10
bench:      63,889 ns/iter (+/- 7,812)

test fxhasher   ... 
1 => 3305
2 => 1116
3 => 56
bench:      22,290 ns/iter (+/- 3,283)

test phf        ... bench:      72,287 ns/iter (+/- 6,156)
test static_fnv ... bench:      64,639 ns/iter (+/- 6,879)

The # => # shows how many probes were required in the round robin to find the correct element.
This should probably be a crate. Do you mind if I make one out of it? Or should you?

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Jan 19, 2017

Contributor

@cbreeden I'm happy if you want to make a crate out of it. Make sure you observe the rustc license (of course) and you should probably make it clear in the docs that it's not a "well-designed" hash and so may not be suitable in all situations. Thanks.

Contributor

nnethercote commented Jan 19, 2017

@cbreeden I'm happy if you want to make a crate out of it. Make sure you observe the rustc license (of course) and you should probably make it clear in the docs that it's not a "well-designed" hash and so may not be suitable in all situations. Thanks.

@cbreeden

This comment has been minimized.

Show comment
Hide comment
@cbreeden

cbreeden Jan 19, 2017

Contributor

sounds good. Yeah, I got pretty lucky there, I'd say.

Contributor

cbreeden commented Jan 19, 2017

sounds good. Yeah, I got pretty lucky there, I'd say.

@cbreeden

This comment has been minimized.

Show comment
Hide comment
@cbreeden

cbreeden May 19, 2017

Contributor

I went ahead and decided to modify the write(..) method to hash in 4-byte chunks:

    fn write(&mut self, bytes: &[u8]) {
        let mut buf = bytes;
        while buf.len() >= 4 {
            let n = buf.read_u32::<NativeEndian>().unwrap();
            self.write_u32(n);
        }

        for byte in buf {
            let i = *byte;
            self.add_to_hash(i as usize);
        }
    }

Testing this with a few ascii byte slices yield these results:

 name           old ns/iter  chunks ns/iter  diff ns/iter   diff %  speedup
 bench_3chars   2            3                          1   50.00%   x 0.67
 bench_4chars   3            2                         -1  -33.33%   x 1.50
 bench_11chars  8            5                         -3  -37.50%   x 1.60
 bench_12chars  9            3                         -6  -66.67%   x 3.00
 bench_23chars  21           8                        -13  -61.90%   x 2.62
 bench_24chars  24           6                        -18  -75.00%   x 4.00

It appears that there is a clear win for hashing any byte slice with length > 3, which I believe is the common case. For some reason there is a regression when hashing in chunks of u64. (x64 Intel i7-6600U @ 2.6 GHz, Windows 10).

@nnethercote I know that you said .write() was called less than 1% of the time in your testing, but do you mind me asking what commands you used to for the rustc profile benchmarks? I would be curious if a patch like this would make any difference.

Contributor

cbreeden commented May 19, 2017

I went ahead and decided to modify the write(..) method to hash in 4-byte chunks:

    fn write(&mut self, bytes: &[u8]) {
        let mut buf = bytes;
        while buf.len() >= 4 {
            let n = buf.read_u32::<NativeEndian>().unwrap();
            self.write_u32(n);
        }

        for byte in buf {
            let i = *byte;
            self.add_to_hash(i as usize);
        }
    }

Testing this with a few ascii byte slices yield these results:

 name           old ns/iter  chunks ns/iter  diff ns/iter   diff %  speedup
 bench_3chars   2            3                          1   50.00%   x 0.67
 bench_4chars   3            2                         -1  -33.33%   x 1.50
 bench_11chars  8            5                         -3  -37.50%   x 1.60
 bench_12chars  9            3                         -6  -66.67%   x 3.00
 bench_23chars  21           8                        -13  -61.90%   x 2.62
 bench_24chars  24           6                        -18  -75.00%   x 4.00

It appears that there is a clear win for hashing any byte slice with length > 3, which I believe is the common case. For some reason there is a regression when hashing in chunks of u64. (x64 Intel i7-6600U @ 2.6 GHz, Windows 10).

@nnethercote I know that you said .write() was called less than 1% of the time in your testing, but do you mind me asking what commands you used to for the rustc profile benchmarks? I would be curious if a patch like this would make any difference.

@cbreeden

This comment has been minimized.

Show comment
Hide comment
@cbreeden

cbreeden May 19, 2017

Contributor

@nnethercote nevermind, sorry for the spam. I think you were using https://github.com/rust-lang-nursery/rustc-benchmarks. I'll try it out when I get back home on a computer that can compile rustc in a reasonable amount of time.

Contributor

cbreeden commented May 19, 2017

@nnethercote nevermind, sorry for the spam. I think you were using https://github.com/rust-lang-nursery/rustc-benchmarks. I'll try it out when I get back home on a computer that can compile rustc in a reasonable amount of time.

@emilio emilio referenced this pull request in servo/servo Aug 2, 2017

Closed

style: Switch to FxHash for the style system. #17946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment