-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-3268] Hash-Dependent Iteration Order Produces Quadratic Behaviour in Dictionary/Set #45856
Comments
This is pretty interesting. It would be great to get some benchmarks added here: https://github.com/apple/swift/tree/master/benchmark |
Comment by Alexis Beingessner (JIRA) Adding a benchmark now would probably be inappropriate as it would DDOS the benchmarking infra. One should definitely be added with a fix. |
It could be a small benchmark for now 🙂 |
This is such a neat bug. I read the Rust article but hadn’t looked at whether it affects Swift collections as well—good find! One way to fix this without storing an additional per-collection seed is to XOR the storage capacity with an element's hash value before squeezing. If the source and destination collections have the same capacity the quadratic behavior doesn't show up, and if they don't have the same capacity that will guarantee that they get a different hash order for the same elements, eliminating the blow ups. There's an additional cost for bucket lookups this way but you avoid the quadratic behavior. |
You don't need to store a per-collection seed, because each collection already has a source of identity: the memory address of its storage allocation. You can use relevant bits from there, e.g.
to seed hashing. |
We're certainly susceptible to this effect; I attached some benchmarks (from #15017 (comment) demonstrating quadratic behavior for `dict.filter { _ in true }`. It's a wonderful gem of a benchmark curve, beautifully alternating between linear and quadratic segments. It's almost a shame we need to get rid of it! 🙂 Set and Dictionary currently use a global random seed; we should perturb that seed with something to eliminate this issue. Simply XORing either the storage address or the capacity to the seed would eliminate the O(n^2) behavior. Per-instance seeding with the storage address is generally a good idea, except in cases where we want repeatable results – in test suites and the like. In those environments, we disable random hash seeding, but these operations should still not regress to quadratic performance. So when a hash seed override is in effect, we should still use the storage capacity as the hash seed. |
Is the storage capacity the requested capacity, or the real spare capacity of the allocation? If the latter, then that's still a source of instability. edit: This is assuming that malloc_size could give different answers for the same sized malloc. |
That's true, but Set/Dictionary ordering already depends on the allocated capacity, so making the dependency even tighter would not cause any regressions. Set/Dictionary is pretty good about having repeatable allocation patterns, so capacity-dependent ordering doesn't typically lead to nondeterminism in tests that are otherwise deterministic. edit: We don't currently make use of any extra space that malloc gives us. Our hash tables are restricted to power-of-two sizes, which makes it difficult to change this. |
Why are power-of-two sizes required? I know that such restrictions are common in quadratically probed hash tables (e.g. LLVM's), so that the traversal is more evenly distributed modulo modulo, but am unfamiliar with why a linearly probed hash table would require this. |
Comment by Alexis Beingessner (JIRA) > Why are power-of-two sizes required? It's so we can store the cap as `mask = realCap - 1` and implement all accesses as `idx & mask`. |
That sounds like a good reason. |
Fixed in #15239; the "Quadratic.attaresult" chart (attached) plots the per-element costs of Dictionary.filter { _ in true } before and after the fix. |
Attachment: Download
Environment
Should be true in at least Swift 2 and Swift 3, and all platforms.
Additional Detail from JIRA
md5: ee40175f0f6337a3ee378a50f0a30cdf
Issue Description:
Rust recently ran into a nasty bug which led to quadratic behaviour in appending the contents of one HashMap to another. Swift has the same problem for Dictionary/Set. The problem is explained excellently here: http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion
In essence, the problem requires:
iteration order to be tied to where elements hash in the collection (linear scan of the array)
a linear-probing-based collision resolution algorithm (FCFS for Swift, Robin Hood for Rust)
hashes mapped to indices via modulo (masking off bits)
max load factor greater than 50% (75% for Swift, 90% for Rust) (? not 100% sure if <50% is adequate)
inserting the elements of one map into another with less capacity (as would naturally occur when inserting all the elements of a map into a freshly created one without reserving space)
With all of these combined, the larger map has become a soft hash collision computer for the smaller map. Elements near the start of the first and second half of the larger map are guaranteed to be mapped near the start of the smaller map. Once iteration of the larger map hits the midpoint of its array, this leads to tons of collisions at the start of the smaller map's array, snowballing into linear search times for empty buckets.
Rust's solution was to use a different seed for its hash function for every hashmap (in fact, older versions of Rust already did this, the change to a global seed was an attempted optimization). I'm not sure if this is even possible while we're using Foundation's hashcode system.
Alternatively you could come up with some iteration scheme which is decoupled from the hash. Naively shifting the linear scan is inadequate. This is likely to hurt iteration performance under normal operation, as it uses cache worse.
Alternatively we could move to a fancier collision resolution scheme quadratic probing?). This is likely to hurt performance on normal operation as it trashes the cache.
Alternatively we could lower the load factor❓. This will hurt iteration speed and increase memory usage in common usage.
We could also just decide that this code is "bad" and it performing poorly is acceptable. I'm not a fan of this solution, but y'know, it's there.
Here's a sample program exhibiting the problematic behaviour.
Running this on different input sizes shows clearly quadratic behaviour:
The dip in run time at 4,000,000 is a result of the behaviour being very load factor sensitive.
The text was updated successfully, but these errors were encountered: