Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtable DoS: Use a better rolling hash. #3

Closed
therealmik opened this issue Jul 15, 2014 · 13 comments
Closed

Hashtable DoS: Use a better rolling hash. #3

therealmik opened this issue Jul 15, 2014 · 13 comments

Comments

@therealmik
Copy link
Contributor

@therealmik therealmik commented Jul 15, 2014

[ Extremely low impact DoS condition ]

The hashtable implementation is vulnerable to collisions - if somebody were to make a file full of rolling sum collisions (where md4 didn't collide), this would cause an md4 to be generated at each colliding offset, then a sequential search through the list.

A tree of weak sums pointing to a tree of md4 sums would avoid some of this worst-case behaviour, but you'd still have to compute the md4 each collision.

@paulharris
Copy link

@paulharris paulharris commented Jul 1, 2015

I think the hashtable has been reimplemented to be more efficient, so this shouldn't be an issue anymore... right?
Possibly fixed in merge #14

@therealmik
Copy link
Contributor Author

@therealmik therealmik commented Jul 1, 2015

The fix would be to replace gettag() with a collision-resistant hash (such as siphash), but since the value being hashed is the weak sum this is pointless right now. If the weak sum is later replaced with a collision-resistant rolling sum, then gettag() will do the right thing automatically.

@paulharris
Copy link

@paulharris paulharris commented Jul 2, 2015

What would you use for the stronger rolling sum?

@therealmik
Copy link
Contributor Author

@therealmik therealmik commented Jul 2, 2015

I don't have a good answer for that right now - ensuring the subsequent strong sum lookup and calculation is fast would be the best bet without changing things significantly.

EDIT: that suggestion is a mitigation, not a fix.

@sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented Jul 2, 2015

To clarify, we're talking here about a malicious 'new' file?

One approach would be at signature generation time to choose a random value to perturb the weak sums, and store that in the signature file. That night make it harder to generate collisions, if the attacker doesn't have the signature file.

I'm not sure if the current rollsum algorithm would be able to accommodate this.

@therealmik
Copy link
Contributor Author

@therealmik therealmik commented Jul 2, 2015

Yes, we're talking about a malicious new file.

The current rolling sum won't support this - at least not as a secret suffix or prefix:

028f0106 ABBA
028f0106 BAAB
0a1e024a fooBAAB
0a1e024a fooABBA

@sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented Jul 7, 2015

One option would be to interpose a stream cypher before calculating the weak sums.

@therealmik
Copy link
Contributor Author

@therealmik therealmik commented Jul 7, 2015

I'm fairly sure that will either break deduplication or break the "rolling" property (if you restarted the stream cipher every block).

@sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented May 1, 2016

We could also optionally disable the weak sums, and only match strong sums on aligned block boundaries. That would give up the actually novel point of the rsync algorithm, but could still be useful in many places for files that have only rewrites not insertions/deletions.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Mar 1, 2017

I did some analysis of the rollsum and found that it has some pretty nasty clustering and collision behavior, particularly for small blocks of ascii data. The new hashtable mitigates the clustering (which is nasty for open addressing hashtables) by using murmur-hash's mix32 finalizer on the weaksum before using it for the hashtable index. It doesn't fix the collision problem.

Some of the work I did on tidying up the signature code was in preparation for making it possible to plug in alternative rollsum implementations. One of the simplest fixes for reducing collisions was making it sum the square of the bytes instead of just the bytes. However, that is not going to fix the ABBA BAAB crafted collisions.

I have a bad feeling that any algorithm that can be efficiently "rolled" is going to be vulnerable to crafted collisions, and difficult to "random seed" effectively.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Jun 1, 2017

Another update on this; I did some more analysis and research which I've put up at;

https://github.com/dbaarda/rollsum-tests

TLDR; RabinKarp is a much better rolling hash than the rollsum we are using.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Jun 2, 2017

I think this issue can be summarized as "use a better rolling hash". I'm going to make this the master bug tracking that and close a bunch of others as dupes of this.

@dbaarda dbaarda changed the title Hashtable DoS Hashtable DoS: Use a better rolling hash. Jun 2, 2017
@dbaarda
Copy link
Member

@dbaarda dbaarda commented Sep 6, 2019

Pending #163 adds RabinKarp support as the default. It makes deltas ~15% faster for large random files due to less hash collisions resulting in less strongsum calculation/comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants