-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Murmurhash doesn't equal known murmurhashes. #10
Comments
Yeah, makes sense, this is Gary's code unchanged. Do you have a patch? |
I'm checking why it behaves differently. I tried this python library and it outputs correct hashes import pymmh3 as mmh3
mmh3.hash('foo')
# prints -156908512 |
Okay. I have a patch. Patch diff --git a/murmurhash.js b/murmurhash.js
index 6a73210..a4bbc73 100644
--- a/murmurhash.js
+++ b/murmurhash.js
@@ -117,7 +117,14 @@
h1 = ((((h1 & 0xffff) * 0xc2b2ae35) + ((((h1 >>> 16) * 0xc2b2ae35) & 0xffff) << 16))) & 0xffffffff;
h1 ^= h1 >>> 16;
- return h1 >>> 0;
+ h1 = h1 >>> 0;
+
+ if (h1 & 0x80000000 == 0) {
+ return h1;
+ }
+ else {
+ return -( (h1 ^ 0xFFFFFFFF) + 1 )
+ }
}
var murmur = MurmurHashV3; New Output var mmh3 = require('./murmurhash')
mmh3.v3('foo')
// prints -156908512 This output matches the output of python code. I can raise a pull request. |
Created PR. |
Actually, it's not really an issue.
Instead of a permanent code change, maybe we can drive this behaviour through a flag passed in argument? |
Yeah I was thinking that too, this would likely break hashes otherwise, lets flag control it. |
I would argue that this lib using unsigned integers as output is more valid. If you compare to the outputs from other hash functions or implementations of murmur3 beyond the referenced python one, in my experience they tend to be unsigned. If converting the number to a string to represent the hash, a quick way is const bit_shifts = [24, 16, 8, 0]
function to_bytes(input) {
let str = ""
bit_shifts.forEach(bits => {
str += `${((input >> bits) & 0xff)} `
})
return str
}
let a = -1621654532
let b = 2673312764
console.log(a.toString(36)); // "-qthon8"
console.log(b.toString(36)); // "187mdbw"
// Bitwise operations will convert to uint32 regardless
console.log(to_bytes(a)) // "159 87 131 252 "
console.log(to_bytes(b)) // "159 87 131 252 "
// or if you had converted the separate bytes into base36
console.log([
(159).toString(36),
(87).toString(36),
(131).toString(36),
(252).toString(36)
].join("")) // "4f2f3n70"
|
we'd probably want a new constructor for signed hashes, to ensure the existing behavior is preserved. Feel free to send me an updated PR (this one is pretty old) which a proposed signed constructor + tests, I'll merge + publish. |
Unfortunately I will have to think of another approach. The javascript codes are also incompatible with the encoding defined by RFC-2045, used to perform character encoding of the aforementioned Python library. I need to apply the hash algorithm to images and icons. |
I honestly don't think it should be opened for time being right now, in case @guloic is reccomending to change everything to python |
Environment
System: Windows 10, 64 bit.
Nodejs Version: 12.13.0
Steps to Reproduce
Then run the following code -
Expected
One of those outputs needs to be equal to
-156908512
which is the known murmurhash forfoo
Note
I observed this behaviour with Gary's original work too. Please refer to this issue.
The text was updated successfully, but these errors were encountered: