Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Murmur3 hashes differ from guava hashes? #3

Closed
jcornaz opened this issue Jul 4, 2018 · 4 comments
Closed

Why Murmur3 hashes differ from guava hashes? #3

jcornaz opened this issue Jul 4, 2018 · 4 comments
Assignees
Labels
wontfix As designed, Not worth the effort

Comments

@jcornaz
Copy link

jcornaz commented Jul 4, 2018

Hello,

My understanding is that Murmur3.hash_x64_128 (of this project) should return the same result (in bytes) as Hasing.murmur3_128().hashBytes (of guava library).

But It doesn't. May I ask if it is expected to be different and why?

Here is my code, just in case I made an obvious mistake you could point out. (the code is written Kotlin, but should be easily understandable)

import com.google.common.hash.Hashing
import com.sangupta.murmur.Murmur3
import org.junit.Test
import java.nio.ByteBuffer
import java.util.*
import kotlin.test.assertTrue

private const val SEED = 42

class Murmur3Test {

  @Test
  fun murmur32shouldCorrespondToGuavaHashes() {
    val guava = Hashing.murmur3_32(SEED)
    repeat(1000) {
      val data = UUID.randomUUID().toByteArray()

      val guavaResult = guava.hashBytes(data).asBytes()
      val murmurResult = Murmur3.hash_x86_32(data, 16, SEED.toLong()).asBytes()

      assertTrue(Arrays.equals(guavaResult, murmurResult))
    }
  }

  @Test
  fun murmur128shouldCorrespondToGuavaHashes() {
    val guava = Hashing.murmur3_128(SEED)
    repeat(1000) {
      val data = UUID.randomUUID().toByteArray()

      val guavaResult = guava.hashBytes(data).asBytes()
      val murmurResult = Murmur3.hash_x64_128(data, 16, SEED.toLong()).asBytes()

      assertTrue(Arrays.equals(guavaResult, murmurResult))
    }
  }
}

fun UUID.toByteArray(): ByteArray {
  val buffer = ByteBuffer.allocate(16)

  buffer.putLong(mostSignificantBits)
  buffer.putLong(leastSignificantBits)

  return buffer.array()
}

fun LongArray.asBytes(): ByteArray {
  val buffer = ByteBuffer.allocate(size * 8)

  forEach { buffer.putLong(it) }

  return buffer.array()
}

fun Long.asBytes(): ByteArray {
  val buffer = ByteBuffer.allocate(8)

  buffer.putLong(this)

  return buffer.array()
}
@sangupta
Copy link
Owner

sangupta commented May 7, 2019

Hi @jcornaz - I coded this library using the C++ generated hashes and confirmed that they were the same. It has been quite a long time and I would need some time to debug this issue. My bad on noticing it this late.

@sangupta sangupta self-assigned this May 7, 2019
@sangupta
Copy link
Owner

sangupta commented May 8, 2019

@jcornaz

I just added MurmurGuavaTest to test the same. The hash generated are same, its the endian-ness of the result that makes it look different.

I will probably add a converter to make it equivalent to Guava. I also checked the code in C and Java have different endian-ness.

@jcornaz
Copy link
Author

jcornaz commented May 9, 2019

The hash generated are same, its the endian-ness of the result that makes it look different.

Ok, make sense.

Thanks for your investigation ;-)

I let you decide if you want to close this issue or rename it.

@sangupta
Copy link
Owner

@jcornaz

The long hash is the same when computed in value (as long) between both Guava and Murmur. I have added documentation on how to convert long to byte[] in both big-endian and little-endian format (refer 56545af).

@sangupta sangupta added the wontfix As designed, Not worth the effort label Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix As designed, Not worth the effort
Projects
None yet
Development

No branches or pull requests

2 participants