Optimized implementation of BlaBla for SSE2/SSSE3/AVX2
This project is an optimized implementation of BlaBla for CPUs supporting SSE2, SSSE3 or AVX2 instructions. A reference C implementation is also provided for comparison. Another reference C implementation was written by Frank Denis.
The optimization strategy is inspired by the AVX2 ChaCha implementation by Samuel Neves.
The project still lacks extensive benchmarks on multiple architectures, but current tests suggest ~15% performance improvement over AVX2 ChaCha implementation for the same number of rounds.
You can check that the code compiles and benchmark the various implementations as follows.
make ./bench-ref ./bench-opt-sse2 ./bench-opt-ssse3 ./bench-opt-avx2
Guillaume Endignoux, while intern at Kudelski Security