Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM's SHA256 is very inefficient #56121

Open
nico opened this issue Jun 19, 2022 · 5 comments
Open

LLVM's SHA256 is very inefficient #56121

nico opened this issue Jun 19, 2022 · 5 comments

Comments

@nico
Copy link
Contributor

nico commented Jun 19, 2022

I was able to make ld64.lld 20% (!) faster by tweaking how it computes SHA256.

  1. The current SHA256 API requires putting the 32-byte result on the stack and the memcpy()ing it to the right place. Tweaking the API to have an outparam would save this.
  2. The SHA256 internals aren't very optimized. Using e.g. CommonCrypto/CommonDigest.h on macOS is faster. (17% speedup)
  3. (We don't parallelize the SHA256 implementation. Doing that helps a bit more (20%)

https://github.com/nico/llvm-project/commits/hash has details, proof-of-concept level.

I used the repro file at https://drive.google.com/file/d/1wWCeDWQ3OAyVwadyCdFZ0WXNB11ADnBK/view?usp=sharing as benchmark case.

@nico nico added the lld:MachO label Jun 19, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 19, 2022

@llvm/issue-subscribers-lld-macho

@nico
Copy link
Contributor Author

nico commented Jun 20, 2022

As for what to do about this: Rather than calling out to system libraries (defensible on macOS, but weird on other unixen), we should probably try to make our sha256 impl less slow.

There are SIMD instructions for modern x86 chips: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sha-extensions.html (using the intrins)

There are also SIMD instructions for armv8+crypto (which M1 chips support): vsha256hq_u32, vsha256h2q_u32, vsha256su1q_u32.

Looks like there's public-domain code using these at https://sources.debian.org/src/libcrypto++/8.4.0-1/sha_simd.cpp/ which can provide inspiration for replacing SHA256::hashBlock() with a SIMD impl (…probably using something like https://llvm.org/devmtg/2014-10/Slides/Christopher-Function%20Multiversioning%20Talk.pdf ).

(Alternatively, llvm/lib/Support/BLAKE3/ can provide inspiration for going with asm instead of intrinsics – but intrinsics are probably easier to work with.)

@nico
Copy link
Contributor Author

nico commented Jun 21, 2022

858e8b1 did the cop-out for now.

But still slow when e.g. linking a mac/arm binary on a linux system.

@nico
Copy link
Contributor Author

nico commented Jun 22, 2022

0baf13e did the parallelization. (Commit message has some metrics.)

After these changes, things are fairly fast on macOS. Linking an arm64 mac binary on different hosts is still 2.55% slower than it needs to be; the fix for that is to make LLVM's SHA256 implementation faster as described in #56121 (comment)

@davidben
Copy link
Contributor

One thing to note is the Intel SHA extensions aren't all that common right now. As I understand it, they were originally only on low power chips. (Though AMD Zen has it now, and I think the newest Intel ones might too?) OpenSSL and friends also have SHA-256 implementations using more general purpose SIMD that kick in more often in x86 today.

SHA-256 acceleration on Arm is common though. It's part of the original cryptography extension from Armv8, so most Armv8 chips have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants