New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Extend Go's map crypto hash guarentee to all platforms in 1.5 #9365

Closed
tildeleb opened this Issue Dec 17, 2014 · 17 comments

Comments

Projects
None yet
9 participants
@tildeleb

tildeleb commented Dec 17, 2014

I originally wrote this as a response to #9295 but then realized that issue is closed and a patch is pending.
Also see OneOfOne/xxhash#2

It would be much better to extend Go's map crypto hash guarentee to all platforms.

With all due respect I think the proposed new fallback hash function is a mistake. Seriously, I was about to make a "Merry Christmas" post on go-nuts thanking the Go Authors for using a crypto quality hash to implement maps on X86-64. As many other languages struggle with their hash functions being hacked with differential crypto, Go is in much better shape.

My proposal is SipHash.
https://131002.net/siphash/

On that page note "C++ program to find universal (key-independent) multicollisions for CityHash64"
Run the program and see how trivial it is to generate multi collisions.

I wonder how many users have Go programs and APIs that accept uncontrolled input over the web and pump that into a map? The Go team has an opportunity here to protect users from themselves starting with 1.5.

There is already an X86_64 optimized version of SipHash here and it's in the "public domain".
https://github.com/dchest/siphash

I'll try to post some numbers in a few days showing that SipHash could be a viable replacement for the fallback hash.

FWIW, I am working on Go project to offer a large number of hash functions along with testing and benchmarking under one project here:
https://github.com/tildeleb/hashland

I took SMHasher and aeshash code from the Go runtime. There is also a proposal for a new, non streaming hash interface, and more. It's still in pretty rough shape, as I've been hacking on it every day. The benchmarking that it does is currently kind of broken and suffers from using an generic adapter function. I am working on that.

I really hope Go 1.5 can extend map's crypto guarentee to all platforms.

@cespare

This comment has been minimized.

Contributor

cespare commented Dec 17, 2014

To what existing guarantee are you referring?

@minux

This comment has been minimized.

Member

minux commented Dec 17, 2014

I think he means (strong) collision resistant.

If the OP could demonstrate that it's possible for an adversary to generate
collision for khr's new fallback hash proposal, then the chance that we
switch to siphash is better.

I'd rather implement AES hash for power64 and the upcoming arm64 port,
which have better chance of being used as server platforms. (both have
AES instructions on the server variants.)

@davecheney

This comment has been minimized.

Contributor

davecheney commented Dec 17, 2014

OP, could you be more specific about platforms that are exposed? You mention that amd64 is not affected, but then leave the loop open.

Do you mean arm, i386, or power64 ?

On 17 Dec 2014, at 20:34, Minux Ma notifications@github.com wrote:

I think he means (strong) collision resistant.

If the OP could demonstrate that it's possible for an adversary to generate
collision for khr's new fallback hash proposal, then the chance that we
switch to siphash is better.

I'd rather implement AES hash for power64 and the upcoming arm64 port,
which have better chance of being used as server platforms. (both have
AES instructions on the server variants.)

Reply to this email directly or view it on GitHub.

@minux

This comment has been minimized.

Member

minux commented Dec 17, 2014

Just FTR, both amd64 and i386 will use the AES hash if AESNI is available.

@tildeleb

This comment has been minimized.

tildeleb commented Dec 17, 2014

  1. Right. As of 1.4 any X86 platform without AESNI and all other platforms are vulnerable.

  2. I want to say up front that I am not a crypto hacker.

  3. The guarantee I am referring to is that some effort was expended to give the X86 platforms with AESNI a cryptographic based hash used in the implementation of map. That means it's much harder, if not impossible, to generate a (large) set of keys that produce identical hashes.

  4. The primary concern here is "hash-flooding DoS" attacks that drive an map/hash table insert of N items from O(N) to O(N^2). See the slides in #3, the code there, and read these:
    http://www.rootsecure.net/content/downloads/pdf/dos_via_algorithmic_complexity_attack.pdf
    https://131002.net/siphash/siphash_slides.pdf

  5. Note the last paper was written in 2003. These attacks have been known for a long time and just ignored. Differential cryptanalysis was "discovered" in the 1980's, known to IBM since 1974, and known to the NSA before that. So the basis for these attacks is at least 45+ years old! See the Wikipedia.

  6. If you read my OP there is a pointer to:
    https://131002.net/siphash/
    On that page note "C++ program to find universal (key-independent) multicollisions for CityHash64"
    Run the program and see how trivial it is to generate multi collisions. It's a very portable piece of C code. You type "make" and it just compiles. Let me do it for you:

    leb@hula:/gotest/src/github.com/tildeleb/hashland/citycollisions % make
    g++ citycollisions.cc city.cc -o citycollisions -O3
    g++ citycollisions.cc city.cc -o citycollisions_ascii -DASCII -O3
    leb@hula:
    /gotest/src/github.com/tildeleb/hashland/citycollisions % ./citycollisions 4
    128-bit key 1e20798f100c668c499d50fb660caabc
    CityHash64( 8e69324ad2a005ff2148534148202020, 16 ) = b85d70a55a402013
    CityHash64( 7ae31886221136ba2148534148202021, 16 ) = b85d70a55a402013
    CityHash64( a9a6b9c6888e94ff2148534148202022, 16 ) = b85d70a55a402013
    CityHash64( 85fdcf8310e3e9552148534148202023, 16 ) = b85d70a55a402013
    the program runs instantly and will generate as many collisions as you need.

  7. I am sure that khi's algorithm which is based on CityHash and XXHash is subject to the same differential crypto attacks used against Murmur3 and CityHash but I can't generate such an attack myself (yet).

  8. Google must have some cryptographers on staff. The AESNI hash functions used by map should be reviewed and vetted by someone with crypto credentials and then peer reviewed. The Hash32() and Hash64() variants only do 3 rounds using 2 alternating rounds keys. That's not many rounds. Hash() and HashStr() do a variable number of rounds based on the key length with a minimum of 5 rounds, again using 2 alternating round keys. It might be that more rounds or more round keys are called for, More rounds will slow it down. Adding more rounds keys probably won't delta cache effects which I wouldn't expect for small deltas of more round keys.

  9. While I don't have the numbers to back it up yet I suspect that SipHash with assembler benchmarks very close to AESNI on the same machines. If that's the case, I think I still vote for SIpHash over AES with special instructions.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 17, 2014

I am not a crypto expert. But I believe that the Go runtime is somewhat resistant to this kind of attack because every map uses an individual hash seed that is chosen randomly at run time. Since an attacker who is not on the local machine has very limited visibility into map lookup times, I think it would be quite difficult to run such an attack remotely.

@ianlancetaylor ianlancetaylor changed the title from Extend Go's map crypto hash guarentee to all platforms in 1.5 to runtime: Extend Go's map crypto hash guarentee to all platforms in 1.5 Dec 17, 2014

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 17, 2014

@tildeleb

This comment has been minimized.

tildeleb commented Dec 17, 2014

I am also not a crypto expert. The slides below show the hash flood attacks are independent of seed for CityHash and Murmur3.
https://131002.net/siphash/siphash_slides.pdf

Here is an example I just ran in HashLand that demonstrates seed independent collisions, what they call "multicollisions", for murmur3, 32 bit:

seed=0, key=daf1596449909da0dde3987638909728, hash=0xf36bb110
seed=0, key=32937a594990ec64dde3987638909728, hash=0xf36bb110
seed=0, key=daf1596449909da08542160a38904864, hash=0xf36bb110
seed=0, key=32937a594990ec648542160a38904864, hash=0xf36bb110
seed=1, key=daf1596449909da0dde3987638909728, hash=0xf36bb110
seed=1, key=32937a594990ec64dde3987638909728, hash=0xf36bb110
seed=1, key=daf1596449909da08542160a38904864, hash=0xf36bb110
seed=1, key=32937a594990ec648542160a38904864, hash=0xf36bb110

Just to be clear these keys are strings stored in a file in utf8/hex format and I converted them to []byte.

The source code that generated these keys is referenced above.

@randall77

This comment has been minimized.

Contributor

randall77 commented Dec 17, 2014

thanking the Go Authors for using a crypto quality hash to implement maps on X86-64

That's not what we did. AesHash is not a crypto quality hash. Despite its name, it has no crypto guarantees. Its only relation to AES is that it coopts assembly instructions that were originally designed to make AES faster. AesHash's main features are that it is fast and that it passes SMHasher, a good (non-cryptographic) hash test suite. A nice-to-have additional feature is that it can fold in some process startup randomness to thwart DoS attacks (In addtion to the per-map random seed that Ian mentioned).

I intend to add some startup randomness to the new hash functions once they are in. It should only require a judicious xor or two to add that in.

SipHash-2-4 is about 4x slower on small keys and 10x slower on large keys than the new functions for 1.5. (Comparing C versions of both)

@tildeleb

This comment has been minimized.

tildeleb commented Dec 18, 2014

I appreciate you clearing up the fact that aesHash is not a crypto quality hash. Unfortunately, "now matters are worse."

I still feel strongly that Go should seriously consider using a crypto quality hash for the map implementation in 1.5. As the code and slides referenced above demonstrate, using a seed with CitiyHash and Murmur3 is completely broken. Pick any seed you want, it doesn't make any difference. Yet, you still mention the "per-map random seed". It almost seems as if you are claiming you can thwart "hash-flooding DoS" attacks with your new hash code by doing a "judicious xor or two" combined with a random seed based on "some process startup randomness"? Is that your claim? If so, I suggest you write it up and present a paper for peer review. Why not send it to the SipHash authors for review? I am serious. Better to be proactively picked apart then to cower after you've been hacked.

One fine point about SipHash. If SipHash is parameterized (the benchmarked version below is not) without a performance loss and Go's map were ever hacked, binaries could be patched to increase c and/or d (# rounds) and the attacks could be instantly mitigated.

My AesHash vs SipHash benchmarks are very different than the numbers you mention. Mine are Go 1.4 w/ asm numbers. Only the hash function is called and the result is not saved. In my numbers siphash starts off 1.5 times slower for a 4 byte keys and ends up 2.1 times slower with 1k keys. My machine is an Intel i7-2860QM @ 2.5 GHz. Code is all in HashLand but you have to recompile to get these results. Results below. Assume for a second my numbers are correct.

I understand that everyone is motivated to increase the performance of Go. Me too, that's why I wrote #9337. Creating a new language is more than winning the benchmark wars. Go isn't going to win that one anyway.

Perhaps SipHash could be further optimized using AVX or AVX2 instructions. Perhaps the number of d rounds could be decreased. Maybe there's a faster crypto hash out there. Google should expend some of it's considerable resources to develop the next great high performance crypto hashing algorithm, pay some others to do so, or hold a $10M contest to find one.

Go can mitigate "hash-flooding DoS" attacks with a 2X performance hit to map. I admit that a 2X speed decrease is a big deal. Users would complain. Let's face the facts. Many users will continue to pump uncontrolled keys directly from the web into maps and put themselves, their employers, and maybe even our country, at risk. Look at the headlines today ("Sony"). Look at the slides I pointed to above where the SipHash authors pick appart Python, Java, and others. It takes real leadership to slow the map implementation down 1.5-2.1x to increase Go's security. Security is more important than it ever was. Performance is less important than it ever was. Please adjust your priorities. I beg you to reconsider.

"aeshash"
    ksize=4, n=100 M, size=400 MB, 62 Mhashes/sec, 246.8 MB/sec
    ksize=8, n=100 M, size=800 MB, 56 Mhashes/sec, 450.1 MB/sec
    ksize=16, n=100 M, size=1 GB, 57 Mhashes/sec, 905.6 MB/sec
    ksize=32, n=100 M, size=3 GB, 45 Mhashes/sec, 1.4 GB/sec
    ksize=64, n=100 M, size=6 GB, 31 Mhashes/sec, 2.0 GB/sec
    ksize=512, n=10 M, size=5 GB, 6 Mhashes/sec, 3.1 GB/sec
    ksize=1024, n=10 M, size=10 GB, 3 Mhashes/sec, 3.2 GB/sec

"siphash"
    ksize=4, n=100 M, size=400 MB, 41 Mhashes/sec, 165.4 MB/sec
    ksize=8, n=100 M, size=800 MB, 31 Mhashes/sec, 246.5 MB/sec
    ksize=16, n=100 M, size=1 GB, 25 Mhashes/sec, 405.3 MB/sec
    ksize=32, n=100 M, size=3 GB, 20 Mhashes/sec, 655.8 MB/sec
    ksize=64, n=100 M, size=6 GB, 14 Mhashes/sec, 924.2 MB/sec
    ksize=512, n=10 M, size=5 GB, 3 Mhashes/sec, 1.4 GB/sec
    ksize=1024, n=10 M, size=10 GB, 1 Mhashes/sec, 1.5 GB/sec
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 18, 2014

I couldn't get any useful information from the slides, but there is a good paper at https://131002.net/siphash/siphash.pdf . It basically argues that timing information is always available, even across the network. The basic attack on an HTTP server is to send many headers, attempting to get them to collide in such a way that hash lookups become less efficient. The Go net/http package is somewhat resistant to this kind of attack, because it limits the size of the headers in an incoming HTTP request. The default limit is 1MB. So an attack on the Go net/http package has only a limited number of strings to play with. And each new HTTP request gets a new map, with a different seed, so the earlier timing information is useless. So I don't see a serious attack on Go's HTTP server here.

An attack requires a single map stored across multiple requests, such that each new request can add entries to the map, with no limit on the number of entries, and where timing information is available for each new request. When that is the case, an attacker can cause the map to flood. In the case of Go's implementation this will mean a long series of overflow buckets for the same hash value, shifting map lookup time, and, perhaps more importantly, map insertion time, from O(1) to O(N).

It's difficult for me to judge the risks here. In a modern server, a single map that retains entries across multiple requests does not seem to be a likely case. A single map that accepts new entries with no bounds also seems unlikely, and such a map would seem to be susceptible to much simpler attackes. But it is certainly possible. The question at hand is whether to slow down the map implementation for everybody to protect against this unlikely but possible case.

Another possible approach would be to change the hash seed each time the hashmap grows. That would require sometimes computing the hash key twice while map growth was in progress. I think we already rehash each key as we evacuate an old bucket, so I don't think it would cost anything there. I think this would significantly reduce the scope of any attack, as the seed would have to be recomputed each time the hash map grows.

Anyhow, khr can make a real decision here. I'm having a hard time translating from the theoretical attacks on things like caching DNS servers to practical attacks on real Go programs, but there could well be aspects of this that I am missing.

@randall77

This comment has been minimized.

Contributor

randall77 commented Dec 18, 2014

As the code and slides referenced above demonstrate, using a seed with CitiyHash and Murmur3 is completely broken. Pick any seed you want, it doesn't make any difference.

It is not a surprise that hashes not designed for DoS resistance have no DoS resistance. For instance, CityHash does f(g(msg), seed). Collisions in g cause collisions in CityHash independent of seed. My conclusion: don't fold in the seed like that if you want DoS resistance.

It almost seems as if you are claiming you can thwart "hash-flooding DoS" attacks with your new hash code by doing a "judicious xor or two" combined with a random seed based on "some process startup randomness"? Is that your claim?

Yes, and there is an existence proof. SipHash does exactly that with 4 xors.

My AesHash vs SipHash benchmarks are very different than the numbers you mention.

That's something we should figure out one way or another. I found a performance bug in my implementation (byte instead of word loads for siphash). After fixing it I get more like 2.6x/6.7x (short/long) compared to the new hash algorithm and 2.9x/12.6x compared to aeshash.

My testbed is inside the SMHasher test suite, instructions to grab it and try it yourself are below.
Part of it may be overheads - from your aes numbers it looks like you're getting 40 cycles for a 4-byte key and I'm getting 23 cycles. The more overhead, the less difference you see between implementations.

Download http://keithandkatie.com/downloads/smhasher-gohash.tar.gz
tar xvfz ~/Downloads/smhasher-gohash.tar.gz
cd smhasher-gohash
mkdir build
cd build
cmake ..
make
./SMHasher SipHash64
./SMHasher GoHash64
./SMHasher AesHash64

Look at the headlines today ("Sony").

Equating DoS attacks with attacks like these doesn't help your argument.

As Ian said, no one is claiming that DoS attacks aren't worth defending against. The question is one of cost to every other user of maps. Also, DoS defenses don't have to be perfect. A low-cost solution that makes DoS attacks 100x harder may be better than a high-cost solution that eliminates them entirely.

Finally, note that the new fallback hash that started this whole conversation is 1e9x better than the old fallback hash. So, progress.

@mortdeus

This comment has been minimized.

mortdeus commented Dec 27, 2014

Perhaps it's worth considering overloading the make builtin for maps to take an optional hash.Hash interface argument that allows the developer to override the default hashing algorithm with whatever hashing algorithm they want?

@slimsag

This comment has been minimized.

slimsag commented Dec 27, 2014

Perhaps it's worth considering overloading the make builtin for maps to take an optional hash.Hash interface argument that allows the developer to override the default hashing algorithm with whatever hashing algorithm they want?

That does not sound like a good thing to do, IMHO. Maps should work bets for the most generic cases, anything else can be implemented elsewhere. I am not a crypto expert but I think this issue is about finding a way to improve maps in generic cases -- not adding special ones.

@mortdeus

This comment has been minimized.

mortdeus commented Dec 27, 2014

@slimsag Its worth pointing out that what I am recommending is an optional argument that is analogous to the optional "cap" argument when using the make built in to allocate a slice.

In other words the feature isn't something developer's have to interact with in general unless they explicitly want to use the feature.

In my opinion I don't think every map in a user's program would always benefit more from the security add by a crypto hash, than it would from the performance it must sacrifice in order to use a more computationally expensive hash.

Also its worth noting that allowing us to override the hash function could allow us to return a map's value's hash along side the actual value in a map lookup statement, (e.g val, hash := myMap[key]) which could be valuable in certain contexts.

@minux

This comment has been minimized.

Member

minux commented Jan 1, 2015

Modifying the language for optional security feature won't work well, because that ultimately
depends on the author of a package, who can't control where the package is going to be
used.

The author of a package might think that a certain map in this package is not going to hold
values controllable by external attackers, but that really depends on where the package is
used. What if the package is later used to receive inputs controlled by external attackers?

Even if we had such a feature of optionally using hardened hash functions for certain maps,
the only safe option for a package author is to always enable that for every map, because
the author can't be certain all the use cases of the package.

That said, I don't think there is any evidence that the current hash functions are insecure
against algorithmic DDoS. Unless there is concrete proofs, I don't think we need to do any
thing for this.
(By concrete proof, I really mean that you need to at least point out how the attacker can
reasonably get around of the per-map seed. Don't just cite some papers discussing some
other hashes, read the source, and see if the attack in those papers are applicable to ours.)

@minux

This comment has been minimized.

Member

minux commented Jan 1, 2015

I'm closing this issue, because I think there is nothing left to do.

@minux minux closed this Jan 1, 2015

gopherbot pushed a commit that referenced this issue Jan 7, 2015

runtime: use some startup randomness in the fallback hashes
Fold in some startup randomness to make the hash vary across
different runs.  This helps prevent attackers from choosing
keys that all map to the same bucket.

Also, reorganize the hash a bit.  Move the *m1 multiply to after
the xor of the current hash and the message.  For hash quality
it doesn't really matter, but for DDOS resistance it helps a lot
(any processing done to the message before it is merged with the
random seed is useless, as it is easily inverted by an attacker).

Update #9365

Change-Id: Ib19968168e1bbc541d1d28be2701bb83e53f1e24
Reviewed-on: https://go-review.googlesource.com/2344
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@golang golang locked and limited conversation to collaborators Jun 25, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.