Skip to content

configurable collision checking #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 25, 2019
Merged

configurable collision checking #88

merged 11 commits into from
Oct 25, 2019

Conversation

karlmcguire
Copy link
Contributor

@karlmcguire karlmcguire commented Oct 4, 2019

Not finished yet, but for purposes of discussion, this is what I'm thinking for collision prevention.


This change is Reviewable

@karlmcguire karlmcguire changed the title rudimentary collision checking configurable collision checking Oct 10, 2019
Copy link
Contributor Author

@karlmcguire karlmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 2 unresolved discussions (waiting on @jarifibrahim and @manishrjain)


store.go, line 26 at r2 (raw file):

type storeItem struct {
	hashed uint64

call it key


store.go, line 176 at r2 (raw file):

}

func (m *lockedMap) Update(hashed uint64, key, value interface{}) bool {

Just use keyHash or something.

Copy link
Contributor Author

@karlmcguire karlmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 7 files at r2, 2 of 2 files at r3, 3 of 3 files at r4, 4 of 4 files at r5.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jarifibrahim and @manishrjain)

Copy link
Contributor Author

@karlmcguire karlmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jarifibrahim and @manishrjain)

@erikdubbelboer
Copy link

I'm confused, how is this going to help with collisions. Doesn't appending something to an input that already has a collision also produce a collision?

If I use strings as keys, and I set Hashes to 2, it is just going to take 2 hashes of my string, one with 0 (code) appended at the end (code) and one with 1 appended at the end.
But if I have two strings that have a memhash collision, appending the exact same byte to both strings is still going to cause a collision.

I'm not familiar with memhash, but isn't it so that two strings that cause a collision will both bring the internal state of memhash to an identical point before the 0 and 1 get added to it, producing the exact same state and output again?

I'm not aware of any memhash collision examples that I could test this with. But you can see the same behavior with MD5 or SHA1 which I have collision examples for.
See: https://gist.github.com/erikdubbelboer/2a4d2a8bd56e897805a599cdb91772b2
This gist shows that adding anything to the inputs that already cause a collision will still cause a collision.

@karlmcguire karlmcguire restored the karl/collisions branch October 25, 2019 20:07
@karlmcguire
Copy link
Contributor Author

@erikdubbelboer You appear to be correct. We assumed that the probability of collision was the same for different inputs, but it appears after one collision all bets are off. I'm going to assume memhash performs the same way.

Thanks for the heads up.

@dgryski
Copy link

dgryski commented Oct 28, 2019

Modern hashes do not process a single byte at a time. There is no reason to believe a collision for strings s and t implies a collision for s + "0" and t + "0".

@erikdubbelboer
Copy link

@dgryski that's a big assumption to make. I showed that it is the case for SHA1 and MD5. What if the last 3 bytes of the inputs were already the same but the rest was different?

What I would suggest is doing an XOR with some pattern that changes for each hash N.
For example XOR with byte(n) + byte(n)<<3 + byte(n)<<6. This does lower the maximum number of hashes to 8 but that should be enough.

@joshua-goldstein joshua-goldstein deleted the karl/collisions branch September 15, 2022 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants