-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate different hashes #13
Comments
BLAKE3 may be similar to what we've implemented - see comments in sigstore/sigstore-python#1018 (comment). |
There seems to be a package available https://pypi.org/project/blake3/ |
One thing to be aware of is that blake2 is not a NIST hash, and blake3 seems based on a similar idea (reduced number of rounds in the compression function). It also fixes the shard size to 1KB, which we would likely want to be parameterizable in practice. |
When we benchmarked model signing for different model sizes, disk performance became the bottleneck quite quickly, so my sense is that changing the hash function would have little to no impact. That being said, supporting different hash functions is good for the sake of modularity. But that seems like a different goal from what you're talking about here. |
How do you know disk performance was the bottleneck? It could be that the number of vCPUs was the bottleneck, since we parallelize by dispatching computation on each vCPU. Is there another graph showing that disk IO was the bottleneck? |
That's fair, we don't know for sure (for reference though the machine had 48 vCPUs). But even if the bottleneck is the number of vCPUs I still don't see why a different hash function would affect performance? |
If the hash can compute faster each shard on each vCPU, we'd get a lower plateau (first part of graph) and a slower linear increase (second part of graph). Of course, memory also comes into play and could become the bottleneck too :) |
One thing I forgot to mention: You can see on the graph that the results are very similar across M2 and M3 (and at the end identical), despite the fact that M3 has twice the number of cores. So this again suggests that the bottleneck is disk IO. @mihaimaruseac has a graph about memory usage showing that it really never got that high (like 4-8 GB RAM if I remember correctly) so I think that unless we seriously increase the shard size this is unlikely to become a bottleneck. |
Here's the serial implementation numbers
Doubling model size approximately doubles the time here. But now, let's see the optimized method:
See the huge jump at the end, that is caused by disk read. Confirmed by watching the process during hashing with RAM usage was never more than 4GB (including Python interpreter size, etc.) |
Thanks. So after 150GB or so, it's better to store the model files on different SSDs, correct? Also, for files < 100MB or so, naive hashing is faster (I suspect because there's no thread / process scheduling contention), correct? |
I think so. I'm actually planning to add microbenchmarks as part of the API redesign. Will hopefully send a PR today (need to do the same internally, so depends on how I get the CL reviews) |
A few other things to report in the graph / benchmark:
|
E.g. https://www.blake2.net (reduced rounds), https://github.com/BLAKE3-team/BLAKE3
The text was updated successfully, but these errors were encountered: