String hashes vulnerable to algorithmic complexity attacks #1616

Closed
kosta opened this Issue Jan 23, 2012 · 14 comments

Projects

None yet

7 participants

@kosta
kosta commented Jan 23, 2012

Hello!

The string hash function from here is vulnerable to an algorithmic complexity attack (from src/libcore/str.rs):

fn hash(&&s: str) -> uint {
    // djb hash.
    // FIXME: replace with murmur.

    let u: uint = 5381u;
    for c: u8 in s { u *= 33u; u += c as uint; }
    ret u;
}

Consider a web service that accepts e.g. a POST request in urlencoded form or JSON and constructs a map<String, String> from the POST data (extremely common case). For this hash function, it is extremely easy for an attacker to create input where all map keys will have the same hash. This allows for e.g. 1MB of POST data to get one CPU core busy for hours.

This is an issue that affects a lot of languages, see http://www.ocert.org/advisories/ocert-2011-003.html

Here a talk by the ppl creating the advisory: http://www.youtube.com/watch?v=R2Cq3CLI6H8

One solution is to use a random seed and to use XOR instead of adding. If this is to be replaced with murmur, make sure that the seed is random (I'm not an expert on murmur).

@marijnh
Contributor
marijnh commented Jan 23, 2012

Secure hashing and hash table hashing are different problems. This seems a bit of a non-issue.

@kosta
kosta commented Jan 23, 2012

This has nothing to do with cryptographic (=secure) hashing.

The issue is that simply filling a hash table with user-controlled strings can take hours if the user crafted malicious data (full of hash collisions).

If rust is a suitable choice to e.g. write a web server that accepts POST request, it would be undesired behaviour for a reasonably-sized POST request to eat hours of CPU time

@brson
Contributor
brson commented Jan 24, 2012

I've read this story, and I agree it's a real issue, and I'll be very happy if we ever reach a point where this turns into a vulnerability in production code. Patches welcome.

@kosta
kosta commented Jan 27, 2012

I'd like to write a patch but it seems there is no random number generator yet. Is that correct?

@marijnh
Contributor
marijnh commented Jan 27, 2012

There's std::rand. I still feel this bug in misguided. Our hash table allows any function to be plugged in, so for cases where it is important that attackers can't predict hashes, people can plug in a more complicated, safer function. For most situations, a simpler, faster function is preferable. Also, the same problem exists in every single hash function used in the Rust codebase, not just the one on strings.

@brson
Contributor
brson commented Jan 27, 2012

Also agreed with marijn. It's a much more pervasive issue than that one hash function.

@thoughtpolice
Contributor

Personally I also think this issue is somewhat misguided and can be mitigated by other means. I agree that something more robust than DJB hash should be used, but - and not to try and say this isn't an actual issue - is it not feasible to simply use a different data structure for such associative container, like a rebalancing binary search tree? This is the underlying implementation of Data.Map in Haskell and it works well. C++'s default associative container map is also a search tree underneath (and unordered_map is the hash-table version.)

The reason I say this is because they have better worst-case properties than a hash table which will degrade to O(n) with specially crafted input. Using a better hash function won't necessarily fix this worst case property (and let's not forget that randomness as a seed in a hash depends on a secure PRNG, which is a topic in and of itself.)

Yes, this is a real issue, and it's even affected the linux kernel. Sometimes the worst-case is actually very important as opposed to the average-case. I think it is strange that so many people default to hash tables for such containers. At best, this seems like an issue of people just using whatever the default their language provides them with - or what is familiar - as opposed to what makes sense (many scripting languages use hash tables as the default associative-data-structure as an implementation detail. It seems like "the most natural choice," but I'd argue that's a position of familiarity, not necessarily the reality.)

BST's are well understood, simple, and have predictable performance properties, as opposed to hash tables which have complicated performance properties due to both the hash and underlying implementation (and a much worse worst-case.) If you optimize and find out that dealing with the fields in POST data is a bottleneck of horrific proportions, then optimize with a hash table, and a random seed, whatever. Otherwise, use simple things that work and you'll avoid the problem.

Again I think everybody would accept a patch here and I'm not trying to say this is not an issue, but I feel that maybe these issues can be attacked in other, simpler ways with more predictable performance, as opposed to just adding randomness and a better hash, and hoping it goes away.

@killerswan
Contributor

I agree that maps are better than hashtables.

I have, however, been poking at a murmur3 implementation for a while now. I think it is probably still too slow, and it was the first thing I wrote in Rust. When I get around to it again, I'll ask for more optimization help: https://github.com/killerswan/murmur/blob/master/murmur.rs

I was also just reading about CityHash... http://code.google.com/p/cityhash/

@kosta
kosta commented Jan 30, 2012

So there's a lot of different points made here, let me re-cap

  1. marijnh: "For most situations, a simpler, faster function is preferable":

Fixing this is a matter of a) random seed generated once at each program startup, b) Using ^ instead of +, c) maybe change the 33 to a larger prime (still as easy to compute, e.g. 65537). This should not be slower nor too much complexity.

  1. marijnh: "the same problem exists in every single hash function used in the Rust codebase, not just the one on strings."

True. But a) ideally, all hash functions should be fixed and b) strings are the most popular dictionary key type, especially when you write a web service that accepts JSON or similar input.

  1. thoughtpolice/killerswan: maps are better than hashtables

For huge sizes, the log n average case difference can matter, that can be a valid reason to use a hash table. This is why hash tables should not be vulnerable to collision attacks. Independent of that point, it might be sensible to use maps as default container.

  1. killerswan: We should use murmur or CityHash or similar

Might be a good idea and would hopefully fix this bug (haven't looked at the actual code of those hash functions).

@graydon
Contributor
graydon commented Jan 30, 2012

I'd be happy to take a patch that randomizes the various hashes used in transient (in-memory) hashtables. Both in the runtime and in the core and standard libraries. We have a CPRNG in the runtime, exposed as std::rand (should probably move to core since it's in the runtime, and randomness is both small and quite essential for many programs). This is a legitimate request.

@catamorphism
Contributor

@graydon Can this be closed now?

@brson
Contributor
brson commented Sep 14, 2012

@graydon I'm also curious how we stand on hashing now. You've made a lot of improvements.

@graydon
Contributor
graydon commented Sep 19, 2012

I'll take another pass over src today and remove any remaining hokey hash functions; they should now all be obsolete / unused. But they have a way of re-appearing!

@graydon graydon closed this in 5e41739 Sep 19, 2012
@kosta
kosta commented Oct 16, 2012

Thanks for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment