Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Siphash to use one of the faster variants of the algorithm (Siphash13, Highwayhash) #29754

Closed
funny-falcon opened this Issue Nov 10, 2015 · 77 comments

Comments

Projects
None yet
@funny-falcon
Copy link

commented Nov 10, 2015

Jean-Philippe Aumasson answered on "can Siphash13 used instead of Siphash24 for hash function of hash tables":

we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

So that, there is no reason to burn more cpu power on redundant rounds.

@funny-falcon funny-falcon changed the title Use lightweighter Siphash variant: Siphash13 instead of Siphash24 Use light Siphash variant: Siphash13 instead of Siphash24 Nov 10, 2015

@steveklabnik steveklabnik added the A-libs label Nov 10, 2015

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2015

Sounds like it would only take a two-line change to test the perf impact of this change. @Gankro, did you have some hashtable benchmark scripts lying around?

@arthurprs

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2015

That'd be interesting. Sub.

@Gankro

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2015

You can fork http://cglab.ca/~abeinges/blah/hash-rs/ if you want to do some ok-ish comparisons.

@bluss is my goto expert on siphash.

@Gankro

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2015

Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.

@bluss

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2015

Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically? I'm curious what people have learned about SipHash in general now that it's been out for several years.

If we weaken it so much that it doesn't prevent hash flooding anymore, then we lose the reason to use SipHash and could use a faster hash function.

I know I tried long ago and noted that SipHash-1-2 didn't even pass the Smhasher testsuite (which is probably bad). (SipHash-N-M is for N hash rounds per 8 bytes of input and M fixed rounds as finalization).

@funny-falcon

This comment has been minimized.

Copy link
Author

commented Nov 11, 2015

Siphash-1-3 passes Smhasher. It is smallest configuration, which has good avalanche.

Siphash is protected from "seed independent collisions" with any configuration with good avalanche.

Siphash24 is recomended for MAC cause when you do MAC you put result in public, so there are other vectors of attack: plain text, chosen plain text, differential, etc. As cryptographers, Siphash authors simply recommend more rounds than minimal required to protect against future investigations.

Within hash-table result of hash-function is not published, so there is no way to recover seed. Then protection from "seed independent collisions" is just enough.

Note that direct consumers of SipHash may break if they decided to manually seed it and store the results. So this isn't a totally free change.

Agree. There should be separate SipHash13 for internal/hash-table usage, and SipHash (24) remains for direct/external usage.

@funny-falcon

This comment has been minimized.

Copy link
Author

commented Nov 11, 2015

Aumasson is the cryptographer, maybe we can ask him what he thinks about Rust going for SipHash-1-3 specifically?

I asked him year ago about using SipHash13 for hash tables. Answer is in issue text.

I think, we can ask him again.

@tarcieri

This comment has been minimized.

Copy link

commented Nov 12, 2015

If @veorq feels its fine then it should be fine. The worst case is there's a hashDoS against Siphash13, in which case you can upgrade to a stronger hash function.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

@Gankro, how likely do you think this is to break code in the wild? I was under the impression that we considered the precise underlying default hash function to be an implementation detail (with the caveat that you can count on it to be cryptographically secure).

@tarcieri

This comment has been minimized.

Copy link

commented Nov 12, 2015

@bstrie from a security perspective, I'd say you definitely should not couple to the hash function in any way. Java made that mistake (the hashing algorithm was included in the Java Language Specification) and that made it so they were not able to adequately respond to hashDoS.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

@tarcieri That's actually tied in with what I'm implying here. Supposing that we did switch to 1-3, and that an attack on 1-3 was later discovered, we'd want to reserve the right to bump it back up to 2-4 (or, you know, whatever the state of the art is by then) without risking massive breakage in the ecosystem. If changing the default hash function should prove to be disruptive at the current point in time then that's a separate problem that we need to consider, hence why I'm curious as to the true magnitude of such a change.

Furthermore, I get the impression that the breakage that Gankro's envisioning wouldn't be detectable by Crater.

@bstrie bstrie added the I-slow label Nov 12, 2015

@bluss

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

Using 1-3 doesn't fix the fundamental slowness of our hashing, but it will instead speed up the long data case. The short data case remains a problem, due to (I think):

  • Lack of one-shot hashing, means the code visits branches to read and or update the partially written tail
  • Strings and slices write both data and length / delimiter, which is slow since it enters SipHash::write twice, hitting the code in the first point.

Of course we should still go for it, if possible, but I don't think it's a substitute for allowing faster alternatives to be used (application may choose to sacrifice hash flooding defence, deciding it's not useful for them).

@tarcieri

This comment has been minimized.

Copy link

commented Nov 12, 2015

@bstrie it seems when Java was last affected by hashDoS they ultimately ended up making a breaking change to the language specification:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html

@ranma42

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

Did anybody experiment adding padding to 8 bytes in SipHash?
In particular, if we padded whatever is input to 8 bytes, it would be possible to remove the ntail handling code here

if self.ntail != 0 {
needed = 8 - self.ntail;
if length < needed {
self.tail |= u8to64_le!(msg, 0, length) << 8 * self.ntail;
self.ntail += length;
return
}
let m = self.tail | u8to64_le!(msg, 0, needed) << 8 * self.ntail;
self.v3 ^= m;
compress!(self.v0, self.v1, self.v2, self.v3);
compress!(self.v0, self.v1, self.v2, self.v3);
self.v0 ^= m;
self.ntail = 0;
}

and
self.tail = u8to64_le!(msg, i, left);
self.ntail = left;

@veorq

This comment has been minimized.

Copy link

commented Nov 12, 2015

SipHash designer here, haven't changed my opinion about SipHash-1-3 :-)

A warning, though: SipHash-1-3 leaves 4 rounds between the last attacker-controlled input and the output. There's a "distinguisher" on 4 rounds in https://eprint.iacr.org/2014/722.pdf, or in simplest terms a statistical bias that shows up given a specific difference pattern in the input of the 4-round sequence. But you can't inject that pattern in SipHash-1-3 because you don't control all the state. And even if you could inject that pattern the bias wouldn't be exploitable anyway.

@arthurprs

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

@bluss yeah, sooner of later that will need to be addressed as well.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

@veorq, thanks for your input!

@bluss, indeed I don't believe anybody here is suggesting this would obviate the desired ability to provide an alternative hashing algorithm.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

@ranma42, would you be so kind as to open an issue for that? It likely deserves its own discussion and shouldn't get tangled up with this one.

@tarcieri, do you know of any experience reports from the fallout of that change in the broader Java ecosystem?

@arthurprs

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

I benchmarked it with @Gankro machinery, it definitely improves the speed. See https://i.imgur.com/5dKecOW.jpg

As we got confirmation it's secure enough it looks like a worthy improvement.

@Gankro

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

Yeah, looks good. Only worry is people "inappropriately" depending on the algorithm being stable.

@arthurprs

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

Can't we include this variant as SipHasher13 and set it as the default hasher? I don't see how this can break anybodys code. Am I missing something?

@bluss

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2015

I don't think we should change std::hash::SipHash, since it's stable and documented to be SipHash-2-4. We can keep the new hasher private for now.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2015

@bluss I can imagine support for an RFC to deprecate std::hash::SipHash and replace it with std::hash::SipHash24 and std::hash::SipHash13.

@bstrie

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2015

@Gankro, I'm leaning towards saying that this warrants an RFC, unless you think breakage will be essentially nanoscule. (Did we really document that our SipHash variant is specifically 2-4, and did we make it sound like that's guaranteed? If so, that's worth changing too.)

@bluss

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2015

I think our HashMap is completely free to pick its hash function. Changing it should not be a problem.

@Gankro

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2015

SipHash 2-4 is, as far as I can tell, "the" SipHash, so it's totally sound for it to be exposed as SipHash. Having WeakSipHash or whatevs should be fine.

@briansmith

This comment has been minimized.

Copy link

commented May 27, 2016

suggest proceeding with the Siphash13 ideia as it looks like a good compromise.

The benchmarks (https://i.imgur.com/5dKecOW.jpg) show real improvements (specially for smaller sizes, in which we are really slow right now) and it seems secure enough. Remember, it can always be upgraded without breakage.

Why not simply use a faster implementation of SipHash-2-4? Rust is currently using a known-to-be-slow implementation, so simply optimizing the implementation of the stronger, recommend, algorithm seems like the natural first step.

@arthurprs

This comment has been minimized.

Copy link
Contributor

commented May 27, 2016

@briansmith sounds like a good idea to me.

@ticki

This comment has been minimized.

Copy link
Contributor

commented May 27, 2016

I've been working on trhasher, which can be used to benchmark such an implementation.

@funny-falcon

This comment has been minimized.

Copy link
Author

commented May 27, 2016

@briansmith cause SipHash-1-3 is already secure enough for any usage in hash table.
More over, it is still not broken for any other usage, so at this moment of time, it is still cryptographically secure.
It is better to use "faster implementation of SipHash-1-3", than "faster implementation of SipHash-2-4".

bors added a commit that referenced this issue Jun 28, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

bors added a commit that referenced this issue Jun 30, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

bors added a commit that referenced this issue Jul 1, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

bors added a commit that referenced this issue Jul 1, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

bors added a commit that referenced this issue Jul 1, 2016

Auto merge of #33940 - seanmonstar:siphash-1-3, r=alexcrichton
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core:#️⃣:sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

@bors bors closed this in #33940 Jul 2, 2016

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 9, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 10, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 12, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Jul 12, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Sep 5, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Sep 26, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Sep 26, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Sep 27, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Nov 2, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Nov 4, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Nov 5, 2016

random.c: use sip_hash13 as faster but still safe alternative to sip_…
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

funny-falcon added a commit to funny-falcon/ruby that referenced this issue Dec 8, 2016

switch SipHash from SipHash24 to SipHash13 variant
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

hsbt pushed a commit to ruby/ruby that referenced this issue Jan 20, 2017

switch SipHash from SipHash24 to SipHash13 variant
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

From: Sokolov Yura aka funny_falcon <funny.falcon@gmail.com>
Date: Thu, 8 Dec 2016 20:31:29 +0300
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
Fixes: [Feature #13017]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

@pjanx pjanx referenced this issue Oct 20, 2018

Closed

Benchmark #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.