-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashtable DoS: Use a better rolling hash. #3
Comments
I think the hashtable has been reimplemented to be more efficient, so this shouldn't be an issue anymore... right? |
The fix would be to replace gettag() with a collision-resistant hash (such as siphash), but since the value being hashed is the weak sum this is pointless right now. If the weak sum is later replaced with a collision-resistant rolling sum, then gettag() will do the right thing automatically. |
What would you use for the stronger rolling sum? |
I don't have a good answer for that right now - ensuring the subsequent strong sum lookup and calculation is fast would be the best bet without changing things significantly. EDIT: that suggestion is a mitigation, not a fix. |
To clarify, we're talking here about a malicious 'new' file? One approach would be at signature generation time to choose a random value to perturb the weak sums, and store that in the signature file. That night make it harder to generate collisions, if the attacker doesn't have the signature file. I'm not sure if the current rollsum algorithm would be able to accommodate this. |
Yes, we're talking about a malicious new file. The current rolling sum won't support this - at least not as a secret suffix or prefix:
|
One option would be to interpose a stream cypher before calculating the weak sums. |
I'm fairly sure that will either break deduplication or break the "rolling" property (if you restarted the stream cipher every block). |
We could also optionally disable the weak sums, and only match strong sums on aligned block boundaries. That would give up the actually novel point of the rsync algorithm, but could still be useful in many places for files that have only rewrites not insertions/deletions. |
I did some analysis of the rollsum and found that it has some pretty nasty clustering and collision behavior, particularly for small blocks of ascii data. The new hashtable mitigates the clustering (which is nasty for open addressing hashtables) by using murmur-hash's mix32 finalizer on the weaksum before using it for the hashtable index. It doesn't fix the collision problem. Some of the work I did on tidying up the signature code was in preparation for making it possible to plug in alternative rollsum implementations. One of the simplest fixes for reducing collisions was making it sum the square of the bytes instead of just the bytes. However, that is not going to fix the ABBA BAAB crafted collisions. I have a bad feeling that any algorithm that can be efficiently "rolled" is going to be vulnerable to crafted collisions, and difficult to "random seed" effectively. |
Another update on this; I did some more analysis and research which I've put up at; https://github.com/dbaarda/rollsum-tests TLDR; RabinKarp is a much better rolling hash than the rollsum we are using. |
I think this issue can be summarized as "use a better rolling hash". I'm going to make this the master bug tracking that and close a bunch of others as dupes of this. |
Pending #163 adds RabinKarp support as the default. It makes deltas ~15% faster for large random files due to less hash collisions resulting in less strongsum calculation/comparison. |
Add static config.h for aarch64-unknown-linux-gnu
[ Extremely low impact DoS condition ]
The hashtable implementation is vulnerable to collisions - if somebody were to make a file full of rolling sum collisions (where md4 didn't collide), this would cause an md4 to be generated at each colliding offset, then a sequential search through the list.
A tree of weak sums pointing to a tree of md4 sums would avoid some of this worst-case behaviour, but you'd still have to compute the md4 each collision.
The text was updated successfully, but these errors were encountered: