New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashmap: use siphash-1-3 as default hasher #33940

Merged
merged 1 commit into from Jul 2, 2016

Conversation

Projects
None yet
@seanmonstar
Contributor

seanmonstar commented May 29, 2016

Also exposes SipHash13 and SipHash24 in core::hash::sip, for those that want to differentiate.

For motivation, see this quote from the original issue:

we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of SipHasher to SipHash24, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented May 29, 2016

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

#[stable(feature = "rust1", since = "1.0.0")]
pub struct SipHasher {
pub type SipHasher = SipHash24;

This comment has been minimized.

@sfackler

sfackler May 29, 2016

Member

We should probably make this an opaque tuple struct to allow us flexibility in the future to change what it corresponds to.

@seanmonstar seanmonstar force-pushed the seanmonstar:siphash-1-3 branch from fb81708 to c137114 May 29, 2016

@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented May 29, 2016

@sfackler good idea, done.

@bluss

This comment has been minimized.

Contributor

bluss commented May 30, 2016

Exposing SipHash (2-4) publicly was not needed, and we could have avoided it. I think we should not make SipHash13 public, to avoid repeating the mistake.

Rust is after all very reluctant to ship anything crypto-related.

SipHash13 should have tests that check it against the reference implementation. Here are the answers from the test vectors #29754 (comment)

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 31, 2016

Thanks for the PR! Do you have any benchmarks as well showing changes either here or there?

@alexcrichton alexcrichton added the T-libs label May 31, 2016

@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented May 31, 2016

@alexcrichton In the issue, there is this image of benchmarks, though it is from November 2015.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 31, 2016

Excellent! I doubt the numbers have changed much in the meantime, so sounds good to me.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 31, 2016

Leaving this tagged as T-libs for discussion during triage. I may not be present for the next one, so some things to think about:

  • Do we want to expose this as a public API? Was exposing SipHasher a mistake? I personally feel like we've been too minimal and wouldn't mind expanding our hash algorithms a bit.
  • Do we want to merge this now? Perhaps we should publish a siphash13 crate on crates.io and see how it fares for a bit with hash maps? Crates can already switch the hash algorithm on stable Rust, so experimentation is possible.
  • What do we want to do with SipHasher? Leave it as-is, deprecate it, or deprecate/rename it to SipHash24? My preference would be to introduce SipHash13 and SipHash24, deprecating SipHasher with a rename, although this would of course be influenced by the first question.
@brson

This comment has been minimized.

Contributor

brson commented May 31, 2016

I don't think it's worth deprecating SipHasher just because we named it wrong. Are these hashers only exposed in core or also in std?

I do think we should just change the default hasher to siphash 1-3.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented May 31, 2016

The core::hash module I believe is wholesale reexported in std, so the contents should be the same. I also agree with @brson that we should switch the default to siphash 1-3, the only question in my mind is what to do about the APIs.

If we don't deprecate SipHasher, I'd vote for std::hash::SipHasher13 (negotiable on the "er"), I wouldn't want the extra sip module indirection. If we do deprecate it then I'd vote for std::hash::{SipHash13,SipHash24}. (still negotiable on "er")

@arthurprs

This comment has been minimized.

Contributor

arthurprs commented Jun 2, 2016

This is a welcome change.

I don't think we should expose the faster variant. In fact I think we should deprecate the existing hasher as well and don't expose any specific implementation.

@arthurprs

This comment has been minimized.

Contributor

arthurprs commented Jun 2, 2016

Here are updated benchmarks (I got a new computer)

Results http://i.imgur.com/QA6Ey9B.png

Code https://github.com/arthurprs/hash-rs

Sip1-3 is very competitive at the lower end and good enough in the high end.

@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented Jun 14, 2016

Since the functionality is desirable and provides an immediate performance improvement for HashMaps, would it be a good idea to merge the change of the default hasher, and leave naming/exposing variants/etc to the stabilization issue?

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jun 14, 2016

Unfortunately the libs team hasn't had a chance to talk about this just yet, but I suspect we'll all be on board with the current strategy basically.

@sfackler

This comment has been minimized.

Member

sfackler commented Jun 14, 2016

👍

@sorear

This comment has been minimized.

Contributor

sorear commented Jun 19, 2016

Would there be any interest in additionally using specialization to get rid of the trailing 0xFF sentinel on str slices? The sentinel is needed if you're hashing anything else after the string, but it's not needed for hashing a string by itself, and my tests last year indicated that removing the sentinel was a rather large performance gain.

I did a PR last year which optimized that, but it changed the public API of the Hash trait and was (rightly) rejected; now that we have specialization it seems possibly worthy to revisit, because it might be doable without a public API change now.

The biggest problem with short string SipHash performance isn't the number of rounds (although reducing that can't hurt), it's the u8 to u64 conversion step…

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jun 21, 2016

The libs team got a chance to discuss this PR today and the conclusions were:

  • Let's not have a public sip module in std::hash, instead reexport types under std::hash directly.
  • Let's not deprecate SipHasher yet (as this PR is doing), and consider later once we stabilize these new APIs
  • We want to consider renaming SipHash24 to SipHasher24 as that's more conventional currently (hasher implementations have "hasher", not "hash")

Unfortunately we realized that this is a breaking change as we didn't quite protect ourselves from changing the default hasher 100%. Today, this code compiles:

let a: SipHasher = RandomState::new().build_hasher();

That is, the implementation of BuildHasher has SipHasher as the stable output type. This should be something like std::collections::hash_map::DefaultHasher which we can swap the algorithm internally with. To me it seems that it's very unlikely for anything to break in practice when changing this, but we should do a crater run at least. There were other ideas about how to perhaps make this change and retain backwards compatibility, but they're somewhat ... outlandish.

Finally, when stabilizing these APIs, we'll have a few questions that arise:

  • Do we really want to stabilize more hash algorithms in std::hash? Maybe we can get away with an empty std::hash and a deprecated SipHasher in the long run. We're pretty likely to change the hash algorithm of hash maps over time so we don't necessarily want to have std::hash collect a bunch of historical old "broken" algorithms, but on the other hand they have very low maintenance costs, so it's perhaps not so bad.
  • What to do with SipHasher when other types are stabilized.

In any case though all the API additions here are unstable, so these are questions we can punt to stabilization for now. In the meantime we'd like to merge this to get the benefits of using SipHash 1-3 instead of 2-4.

@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented Jun 21, 2016

For backwards compatibility, there's 2 options:

  • impl BuilderHasher for RandomState { type Hasher = SipHash13; }
    This breaks the example you listed, let a: SipHash = RandomState::new().build_hasher(), but keeps SipHasher internally equivalent.
  • pub struct SipHasher(SipHasher13)
    This allows the example to still compile, but means that any tests using SipHasher with specific keys may start breaking, as the rounds are less and the result hash differs.
@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jun 21, 2016

Ideally we would have:

impl BuildHasher for RandomState {
    type Hasher = DefaultHasher;
    // ...
}

pub struct DefaultHasher(SipHasher);

// ...

We could then change the internals of DefaultHasher at any time.

Could you implement this and then we can at least see if there's any fallout in crater?

@seanmonstar seanmonstar force-pushed the seanmonstar:siphash-1-3 branch from c137114 to d674184 Jun 21, 2016

@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented Jun 21, 2016

I've added an unstable DefaultHasher to the map module, as you suggested. (make check is still slowly building on my machine).

@seanmonstar seanmonstar force-pushed the seanmonstar:siphash-1-3 branch 2 times, most recently from 78e2dd5 to 9f6b4bd Jun 21, 2016

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jun 21, 2016

The travis error looks legit

@seanmonstar seanmonstar force-pushed the seanmonstar:siphash-1-3 branch from 9f6b4bd to d657792 Jun 21, 2016

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Jul 1, 2016

@bors: retry

On Fri, Jul 1, 2016 at 12:21 PM, bors notifications@github.com wrote:

💔 Test failed - auto-win-msvc-64-opt-rustbuild
https://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/1642


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33940 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAD95HutNH8189Moy-Fd0tqP3mMW_SQ0ks5qRWjXgaJpZM4IpOE5
.

@bors

This comment has been minimized.

Contributor

bors commented Jul 1, 2016

⌛️ Testing commit db1b191 with merge 0141193...

bors added a commit that referenced this pull request Jul 1, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

@bors bors merged commit db1b191 into rust-lang:master Jul 2, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@seanmonstar

This comment has been minimized.

Contributor

seanmonstar commented Jul 2, 2016

yay!

@sorear

This comment has been minimized.

Contributor

sorear commented Jul 2, 2016

\o/

Next question: would there be interest in another attempt to address #29754 (comment) , possibly via specialization?

@funny-falcon

This comment has been minimized.

funny-falcon commented Jul 2, 2016

@sorear if it is possible to split hasher to "advance" and "finalization", then it could be possible to keep whole 256bit SipHash state, feed it either with 64bit integer at once, or with string. Then run finalization once.

Strings should be feeded as usual siphash inner code without finalization (but including xoring with 0xff).

Integers are mixed in as one Siphash iteration + xoring with 0xff as prevention of string extention.

@funny-falcon

This comment has been minimized.

funny-falcon commented Jul 2, 2016

Probably, my suggestion is over complexified.

Perhaps, using just incremental Siphash implementation is just enough.

@sorear

This comment has been minimized.

Contributor

sorear commented Jul 2, 2016

@funny-falcon We already are using an incremental implementation. The problem is substantially more complicated than that. See the linked comment, and my #29139.

@pczarn

This comment has been minimized.

Contributor

pczarn commented Jul 3, 2016

@sorear Yes, let's use specialization. I'm currently writing RFCs. The implementation is already here: contain-rs/hashmap2#5

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 9, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 12, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 12, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 5, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 26, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 26, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 27, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 2, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 4, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 5, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Dec 8, 2016

switch SipHash from SipHash24 to SipHash13 variant
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

hsbt pushed a commit to ruby/ruby that referenced this pull request Jan 20, 2017

switch SipHash from SipHash24 to SipHash13 variant
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

From: Sokolov Yura aka funny_falcon <funny.falcon@gmail.com>
Date: Thu, 8 Dec 2016 20:31:29 +0300
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
Fixes: [Feature #13017]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment