-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from m3db/stackmurmur3 to a customized implementation of murmur3 #2
Conversation
bloom.go
Outdated
h1, h2 := hash.Sum128() | ||
hash = hash.Write(entropy) // Add entropy | ||
h3, h4 := hash.Sum128() | ||
h1, h2 := murmur3.Sum128(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this different semantically from the previous code? That is, previous code would be the equivalent of:
h1, h2 := murmur3.Sum128(data)
// hash hasn't reset I assume, so the new call is actually on data + entropy)
h3, h4 := murmur3.Sum128(data + entropy)
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I completely misread the code.
… hash function with padding for entropy
murmur.go
Outdated
h1, h2 = _h1, _h2 // restore state | ||
clen++ // add entropy byte to total hash data length | ||
switch len(data) + 1 { // process entropy byte *only*, no fallthrough - unless it's now a full block | ||
case 16: // == entire block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in chat--I think this is a reasonable way to write this, but I still found it confusing; maybe worth adding more commentary?
Something to the effect of:
Pretend that the entropy byte is the last byte of the array, and apply the Sum128() logic accordingly.
Logically, this is:
if len(data) + 1 == 16 {
// do one more bmix pass, and then skip the switch statement below (no cases would apply)
}
switch len(data) + 1 {
case 15:
// entropy byte is the last byte; treat it specially
k2 ^= entropy << 48
fallthrough
// now, continue processing the rest of the slice
// }
It's hard to write it this way because of the fallthrough's--we would need to (wastefully) check at every case whether this is actually the last byte (in which case, use `entropy`) or not (in which case, use the actual byte at that position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or something like that; dunno if that kind of verbiage is more clear or just more verbose :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation lgtm, nice fix! Like I mentioned--I had a bit of a hard time mapping it to the twmb implementation directly, but after sitting with it a bit it makes sense and seems correct. Dunno if there's much we can do to make that more clear so I'm good with this as it stands.
murmur.go
Outdated
switch rlen + 1 { | ||
case 15: | ||
k2 ^= entropy << 48 | ||
goto l14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe a comment to the effect of:
// the entropy byte is the last byte (index 14)--process it, then proceed with the rest of the array (length 14
case below).
murmur.go
Outdated
k1, k2 = 0, 0 | ||
h1, h2 = _h1, _h2 // restore state to before final mixing | ||
clen++ // total hash data length now includes entropy | ||
// Fist, process entropy byte *only*. No fallthrough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/fist/first/g
Use hand-rolled hash function, identical to the existing algo using
m3db/stackmurmur3
Code is all based on
twmb/murmur3
and is actually faster, without any use ofunsafe
or assembly foramd64