-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARM Blake2b #845
Comments
I was talking with SN from the BLAKE2 team. Here's what he uncovered:
|
Closing. I don't think we want to take this on, BLAKE is already fast enough in plain C++ so the extra complexity doesn't seem worthwhile. |
Attached and below is a patch for Blake2b using ARM NEON instrinsics. Its another partial patch, and others will have to complete it.
The code is based on a reference implementation provided by Samuel Neves (@sneves), who is one of the authors of BLAKE2. He provided it in a private email some time ago. It was recently revisited for some benchmarking, so it was a good time to offer a Botan cut-in. Samuel Neves and JP Aumasson should receive credit.
The dev-boards used for testing were a BeagleBoard v3 (Cortex-A8), Banana Pi (Cortex-A7) and CubieTruck v5 (Cortex-A7). The BeagleBoard was configured with
-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard
; and the CubieTruck was configured with-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard
.Here are the relative numbers:
Only enable NEON for 32-bit ARM (A-32). Do not use NEON for 64-bit ARM (Aarch64). Here's a surprising result: (1) CXX and NEON run equally fast on Cortex-A53 Pine64 and HiKey. (2) NEON runs 50% slower than CXX on Cortex-A57 Overdrive 1000. I don't know what happens under Aarch32 because I don't have a device.
A quick look at a disassembly of the A57 CXX code reveals GCC never generates a NEON instruction for the
compress
function. Its purely integer operations on the cpu, and nothing goes to the coprocessor.Here are the A57 timings after configuring with
-march=armv8-a+crc+crypto -mtune=cortex-a57
:Here is the updated
blake2.cpp
and the diff packaged as a ZIP file.blake2_updated.zip
The text was updated successfully, but these errors were encountered: