-
-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the security claims in README #17
Comments
Yo can use any MAC function, in particular UMAC and VMAC are very fast on large inputs. Also look at HighwayHash. They claim MAC properties, but dont published any analysis yet. |
No. In the simplest case a non-linear search, such as logarithmic, avoids O(n), but even normal linear open addressing schemes, such as Robin Hood were recently proved to be constant in the worst case. The hash function or the table size has not much to do with that (besides that it should not be degenerated).
That's why the tests are here to find such degenerated hash functions. I.e. perl5 used such a function until last year.
If the attacker wants to do a DDOS hash table there are much easier ways to perform that than finding the seed and generating collisions. Crypto analysis on a 32bit hash function for seed recovery is not needed because you can trivially brute force it or solve it with a simple solver. If Siphash or AES or SHA3 truncated to 32bit or any other. This is pure smoke & mirrors. There are some proven secure hash tables schemes which cannot degenerate to O(n) in the worst case, but they are only UHASH, Robin-Hood and some slow ones, like Cuckoo or ordered tables. |
Can you demonstrate that with SipHash? I.E. generation of collisions without knowing the seed faster than one success per 2^32 attempts |
No, but people demonstrates that with sha1 and sha2 via z3 or smtlib, not even using faster and easier methods. So symbolizing and solving siphash is considered trivial compared to SHA2 or md5, with a bit of data and known code. I brute forced 32bit hashes in 4-5 seconds. |
Can you please point us to the text demonstrating how to produce sha2-32 collisions faster than with brute force? I'm sure that you misundertood something since there are no known attacks at sha2 yet. |
With some state known, i.e. the collisions or the traversal order it's easy. E.g. https://jheusser.github.io/2013/02/03/satcoin.html for SHA-256 with several solvers, and also see the links in the References section. Esp. with cbmc. Formerly people encoded it into smtlib or z3, which was quite some work. You don't attack the hash function per se, you attack the hash table state to get the seed and then produce the collisions brute force. With a small table size you get a lot of easy to find collisions. Only ~10 rightmost bits needed. |
You can never trust the 32bit hash, you always have to compare the length and full keys with strcmp or better memcmp. Trusting a 32bit hash or anything <256 would be foolish. And yes, this is bad. That's why fast HTTP parsers use better tricks than hash lookups. They usually look at the first 32 or 64bit word and cmp that in one instruction. |
Thanks for the comments.
This is indeed the case and why hash tables with non-random keys or flaky characteristics should not be used on untrusted data. And it's the reason why Perl, Python, Ruby etc changed from DJB33*, FNV1 and other easy to attack hashes to SipHash. In your opinion, they could have picked any hash function (e.g. Murmur3) with a random seed? The attack on hash tables was based on easily being able to generate collisions, causing the http header hash table to hog the CPU memcmp'ing the key data while using very little network bandwidth.
Isn't this the reason why difficulty of key recovery is a characteristic of a good, secure hash function?
You imply that generating collisions is trivial, and not dependent on the hash function. Can you link to an attack demonstrating this? The satcoin example was intriguing, are you aware of this being applied to hash table DDoSing?
I guess this was why I opened this issue. The readme makes claims that counter research papers on the subject but don't cite any sources or give any background. You seem to be convinced that hash function security is a non-issue but could you back this up so it would be convincing to others too? You seem to have a wealth of knowledge on the topic, I would appreciate if you'd augment the readme with some references on the topic. Thanks. |
No, with a proper hash table you can guarantee constant lookup even for the worst case. But the hash function has almost nothing to do with that. (besides uhash, which uses two random simple mult hash functions, simpler than FNV1)
A random seed helps, but was not enough. SipHash does not help at all. It doesn't help security, it only makes it slower. The best is to forget about linear chaining. This is the real culprit.
Yes, I know the attacks. There are many similar scenarios. glibc, the linux kernel, most routers, dns resolvers, all use proper hash tables, which are mostly fast. To counter a DDOS just detect it, like collisions & (collisions % 16) and sleep a while. Or limit the attack vector as PHP did (MAX_POST_SIZE). Or randomize the iterator as Perl did to hide the seed (not a good idea, but still). collisions>8 don't appear in the wild, unless you are using linear probing and a higher fill factor. Where the collisions are all in the cache and so it doesn't cost at all.
A hash function cannot be secure by default. This is not possible with max 32bit, where only 7-10 bits are used to attack it.
Nope. I don't have the time to write such a thing, academia or script kiddies will do sooner or later.
No. Just timing attacks and using solvers to create hash function collisions, currently MD5.
It's pure logic. 32bit hash functions cannot be called secure. They can only be broken or not, or fast or not. Siphash has a prominent name behind it, but this still doesn't defy common sense. And it does harm to all those dynamic languages using it. There are also much more silly theories in hash tables in the wild, such as using primes for the size for better module distribution, esp. with open addressing. As in glibc and in the linux kernel. |
I'm not convinced that the claims about resistance to "hash flooding" algorithmic complexity DDoS attacks in the SECURITY chapter of README.md are correct. I'll address some of the claims.
The hash table collision resolution scheme does affect the performance but does not address the O(n) issues with hash collisions. A good resolution scheme can only help when (hash % tablesize) is equal between two hashes, but the hashes are not equal.
If two keys in the hash table give bit-to-bit identical hashes, the hash table implementation has to use strcmp() or similar to compare the entire key byte-by-byte. This will result in O(n^2) work with regards to the size of the input keys. This is bad when hashing HTTP request headers or other untrusted data.
This issue is made worse in the very simple hash functions because it's trivial to create hash collisions by appending a few bytes to the end of a key and strcmp() can't early exit before scanning the whole input.
Indeed, the motivation for SipHash is to make recovering the key from timing information (or other side channels) difficult. It's the only hash function that I could find with any credible, peer-reviewed cryptanalysis on key recover.
Too bad that SipHash is pretty slow (2000 MB/s on my Skylake, the SSE2, SSSE3 and AVX2 implementations were slower) due to poor instruction level parallelism caused by explicit dependencies between instructions (in particular when implemented with SIMD). There are hash functions in smhasher that perform 5x to 15x faster than that.
If anyone is aware of any better performing hash functions that have been designed and analyzed with key recovery difficulty in mind, please do share.
Please discuss.
The text was updated successfully, but these errors were encountered: