Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA-512 hardware acceleration on ARMv8 #3860

Merged
merged 1 commit into from
Jan 1, 2024
Merged

Conversation

securitykernel
Copy link
Collaborator

Adds support for ARMv8 hardware acceleration using instrinsics. It is heavily based on the existing SHA-256 implementation with changes to use the equivalent intrinsics for SHA-512.

Tested on a MacBook Pro with M2 Pro.

Baseline software:

$ ./botan speed --msec=3000 SHA-512
SHA-512 hash buffer size 1024 bytes: 489.480 MiB/sec (1468.440 MiB in 3000.000 ms)

ARMv8 intrinsics:

$ ./botan speed --msec=3000 SHA-512
SHA-512 hash buffer size 1024 bytes: 1307.809 MiB/sec (3923.428 MiB in 3000.000 ms)

Which means a factor 2,67 or 267% performance increase.

Even beats OpenSSL on the same machine (via Homebrew):

$ src/scripts/bench.py SHA-512
   INFO: Comparing Botan 3.3.0 with OpenSSL 3.2.0
   INFO: Running OpenSSL benchmark for SHA-512
   INFO: Running Botan benchmark for SHA-512
algo SHA-512 buf_size     16 botan    158735807 bps openssl     84757824 bps adv botan by 1.87
algo SHA-512 buf_size     64 botan    756406746 bps openssl    340762560 bps adv botan by 2.22
algo SHA-512 buf_size    256 botan    950167132 bps openssl    743652864 bps adv botan by 1.28
algo SHA-512 buf_size   1024 botan   1386804433 bps openssl   1238273024 bps adv botan by 1.12
algo SHA-512 buf_size   8192 botan   1565492078 bps openssl   1518950906 bps adv botan by 1.03
algo SHA-512 buf_size  16384 botan   1579776447 bps openssl   1556676608 bps adv botan by 1.01

Before, the baseline software was significantly slower:

$ src/scripts/bench.py SHA-512
   INFO: Comparing Botan 3.3.0 with OpenSSL 3.2.0
   INFO: Running OpenSSL benchmark for SHA-512
   INFO: Running Botan benchmark for SHA-512
algo SHA-512 buf_size     16 botan     69516433 bps openssl     82078000 bps adv openssl by 1.18
algo SHA-512 buf_size     64 botan    289047132 bps openssl    318918273 bps adv openssl by 1.10
algo SHA-512 buf_size    256 botan    389636626 bps openssl    711016210 bps adv openssl by 1.82
algo SHA-512 buf_size   1024 botan    504381686 bps openssl   1153344512 bps adv openssl by 2.29
algo SHA-512 buf_size   8192 botan    563204317 bps openssl   1458631111 bps adv openssl by 2.59
algo SHA-512 buf_size  16384 botan    580899115 bps openssl   1547288576 bps adv openssl by 2.66

@coveralls
Copy link

coveralls commented Dec 30, 2023

Coverage Status

coverage: 92.064% (+0.003%) from 92.061%
when pulling eb8384c on sk/sha512-armv8
into 5ca400e on master.

@randombit
Copy link
Owner

Regarding


algo SHA-512 buf_size     16 botan    158735807 bps openssl     84757824 bps adv botan by 1.87
algo SHA-512 buf_size     64 botan    756406746 bps openssl    340762560 bps adv botan by 2.22
algo SHA-512 buf_size    256 botan    950167132 bps openssl    743652864 bps adv botan by 1.28
algo SHA-512 buf_size   1024 botan   1386804433 bps openssl   1238273024 bps adv botan by 1.12
algo SHA-512 buf_size   8192 botan   1565492078 bps openssl   1518950906 bps adv botan by 1.03
algo SHA-512 buf_size  16384 botan   1579776447 bps openssl   1556676608 bps adv botan by 1.01

you probably noticed that our advantage is confined to the smaller input sizes. I recently had to benchmark several (Rust) SHA-256 libraries using SHA-NI and noticed something similar; basically every implementation was fast for long messages, but for small messages there were huge swings in performance, and with OpenSSL always the worst. I believe this is due to differences in buffer management. This is a good sign that we're doing the right thing in this regard.

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things but otherwise looks good to me, thanks!

src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp Outdated Show resolved Hide resolved
src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp Outdated Show resolved Hide resolved
@randombit
Copy link
Owner

CI failures look relevent, but are confusing.

iOS:


  src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp:76:16: error: always_inline function 'vsha512h2q_u64' requires target feature 'sha3', but would be inlined into function 'compress_digest_armv8' that is compiled without support for 'sha3'
        STATE3 = vsha512h2q_u64(TSTATE1, STATE1, STATE0);

sha3?

Android:


  src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp:94:14: error: always_inline function 'vsha512su1q_u64' requires target feature 'sha3', but would be inlined into function 'compress_digest_armv8' that is compiled without support for 'sha3'
        MSG2 = vsha512su1q_u64(vsha512su0q_u64(MSG2, MSG3), MSG1, vextq_u64(MSG6, MSG7, 1));

sha3?

GCC

  In file included from src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp:10:
  /usr/lib/gcc-cross/aarch64-linux-gnu/11/include/arm_neon.h:32278:1: error: inlining failed in call to ‘always_inline’ ‘uint64x2_t vsha512h2q_u64(uint64x2_t, uint64x2_t, uint64x2_t)’: target specific option mismatch
  32278 | vsha512h2q_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
        | ^~~~~~~~~~~~~~
  src/lib/hash/sha2_64/sha2_64_armv8/sha2_64_armv8.cpp:374:30: note: called from here
    374 |       STATE1 = vsha512h2q_u64(TSTATE1, STATE3, STATE2);
        |                ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

Doesn't specify but I'm going to go ahead and guess it's sha3

@randombit
Copy link
Owner

Oh right, instead of armv8crypto ISA flag you need armv8sha512. For whatever reason ARM bundles the SHA-512 and SHA-3 instructions in the same group.

@securitykernel
Copy link
Collaborator Author

securitykernel commented Jan 1, 2024

Oh right, instead of armv8crypto ISA flag you need armv8sha512. For whatever reason ARM bundles the SHA-512 and SHA-3 instructions in the same group.

That is it, thanks! Thanks for the quick review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants