go get lukechampine.com/blake3
blake3 implements the BLAKE3 cryptographic hash function.
This implementation aims to be performant without sacrificing (too much)
readability, in the hopes of eventually landing in
In addition to the pure-Go implementation, this package also contains AVX-512
and AVX2 routines (generated by
that greatly increase performance for large inputs and outputs.
Contributions are greatly appreciated. All contributors are eligible to receive an Urbit planet.
Tested on a 2020 MacBook Air (i5-7600K @ 3.80GHz). Benchmarks will improve as
soon as I get access to a beefier AVX-512 machine.
BenchmarkSum256/64 120 ns/op 533.00 MB/s BenchmarkSum256/1024 2229 ns/op 459.36 MB/s BenchmarkSum256/65536 16245 ns/op 4034.11 MB/s BenchmarkWrite 245 ns/op 4177.38 MB/s BenchmarkXOF 246 ns/op 4159.30 MB/s
BenchmarkSum256/64 120 ns/op 533.00 MB/s BenchmarkSum256/1024 2229 ns/op 459.36 MB/s BenchmarkSum256/65536 31137 ns/op 2104.76 MB/s BenchmarkWrite 487 ns/op 2103.12 MB/s BenchmarkXOF 329 ns/op 3111.27 MB/s
BenchmarkSum256/64 120 ns/op 533.00 MB/s BenchmarkSum256/1024 2229 ns/op 459.36 MB/s BenchmarkSum256/65536 133505 ns/op 490.89 MB/s BenchmarkWrite 2022 ns/op 506.36 MB/s BenchmarkXOF 1914 ns/op 534.98 MB/s
There is no assembly routine for single-block compressions. This is most noticeable for ~1KB inputs.
Each assembly routine inlines all 7 rounds, causing thousands of lines of duplicated code. Ideally the routines could be merged such that only a single routine is generated for AVX-512 and AVX2, without sacrificing too much performance.