Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get murmurhash3 of binary file in python #30

Closed
milahu opened this issue Oct 31, 2021 · 1 comment
Closed

get murmurhash3 of binary file in python #30

milahu opened this issue Oct 31, 2021 · 1 comment

Comments

@milahu
Copy link

milahu commented Oct 31, 2021

getting the murmur3 hash of a text file is trivial,
and i can get the murmur2 hash of binary files,
see https://github.com/milahu/murmurhash-cli-python

how to get the murmur3 hash of a binary file?

there is https://pypi.org/project/mmh3-binary/ but its an "empty fork"

expected API

#!/usr/bin/env python3

import mmh3

fd = open('/bin/sh', 'rb')
hash = mmh3.hash_from_buffer(fd)

fd is an io.BufferedReader

ideally, avoid passing a bytes array ... this should support "a million gigabyte" files in theory,
so the bytes should be "streamed" or "piped" into the mmh3 function

currently, mmh3 says

mmh3.hash_from_buffer(fd)
TypeError: a bytes-like object is required, not '_io.BufferedReader'
@milahu
Copy link
Author

milahu commented Oct 31, 2021

solved in https://stackoverflow.com/questions/52706164/get-murmur-hash-of-a-file-with-python-3

import mmh3, murmurhash

path = '/bin/sh'
bytes = open(path, 'rb').read()

hash3 = mmh3.hash_bytes(bytes).hex()
hash2 = murmurhash.hash_bytes(bytes).to_bytes(4, byteorder='big', signed=True).hex()

print(f"  mmh3 128 {hash3}")
print(f"  mmh3  64 {hash3[0:16]}")
print(f"  mmh3  32 {hash3[0:8]}")

print(f"  mmh2  32 {hash2}")
print(f"  mmh2  16 {hash2[0:4]}")

ideally, avoid passing a bytes array

is currently not supported in python mmh3, but has low priority → closing

@milahu milahu closed this as completed Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant