-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: --algo blake3
#6
Comments
This is somewhat related to closed #3 |
Rust library Python bindings do exist, default to 1 thread, and accept from blake3 import blake3
# Hash a large input using multiple threads. Note that this can be slower for
# inputs shorter than ~1 MB, and it's a good idea to benchmark it for your use
# case on your platform.
large_input = bytearray(1_000_000)
hash_single = blake3(large_input).digest()
hash_two = blake3(large_input, max_threads=2).digest()
hash_many = blake3(large_input, max_threads=blake3.AUTO).digest()
assert hash_single == hash_two == hash_many |
You can take a look in the blake3 branch. I have not had time to test it so please let me know how it performs. |
Thank you for suggesting blake3. It really has a lot of improvements over md5 so I've made it the default. |
Wow, thank you for such a quick integration! And sorry for not yet reacting to your request for testing - I would have done that within the next few days, as Now I'm definitely going to try the new algorithm 😋 I did record hashing/checking times with md5 :) Thank you! |
A small update on speeds:
|
Hmm, I forgot to include elapsed so I had to fix that first ;) With md5 (10 workers)
With blake3
@spock I think your IO is not able to keep up. |
I agree, it looks like IO is my bottleneck with |
Hi, thank you for a promising-looking file bitrot/hash checker. I especially like the built-in logic of "modified content and date are fine, modified content alone is not" - this exactly what I've been looking for!
For file integrity checking there is a rather new BLAKE3 algorithm, that is significantly faster (like 9x) than md5, but also claims to be better; they published an article with more details and benchmarks. It was designed specifically for file (content) hashing.
Primary (binary) implementation is in Rust (with parallelization), but there are also reference/educational non-parallel implementations in C and pure Python.
If you think this could be a nice
--algo
option, what could be the best way to integrate it? As you already have multi-worker support, I guess calling their single-threaded C library (or asking for single-thread processing from the main Rust library) would be the best? I haven't yet checked if Python bindings exist, but I'd assume they do.The text was updated successfully, but these errors were encountered: