New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/crypto/sha3: add SHA3 assembly implementation for ARMv7 #28148

Open
anonymouse64 opened this Issue Oct 11, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@anonymouse64
Copy link

anonymouse64 commented Oct 11, 2018

Currently, there's no assembly implementation for SHA3 hashing for ARM platforms (specifically ARMv7). On ARMv7+ there are vector assembly instructions (known as NEON) available which greatly speed up the speed of SHA3 hashing. There is an upstream reference implementation (here: https://github.com/KeccakTeam/KeccakCodePackage/blob/master/lib/low/KeccakP-1600-times2/OptimizedAsmARM/KeccakP-1600-inplace-pl2-armv7a-neon-le-gcc.s) available that implements SHA3 hashing using these vector instructions and so I have ported this to Go.

Unfortunately, there is no support in the Go assembler/dissassembler for ARMv7 vector instructions, and so I wrote a small tool (available: https://github.com/anonymouse64/asm2go) which translates native assembly code for ARM into Go's plan9 based assembly unsupported opcode syntax in order to integrate the upstream implementation in Go.

I see approximately 3-4 time speedup in SHA3 hashing on a reference Raspberry Pi 3 Model B Revision 1.2 board:

benchmark                          old ns/op     new ns/op     delta
BenchmarkPermutationFunction-4     19033         6054          -68.19%
BenchmarkSha3_512_MTU-4            388484        137001        -64.73%
BenchmarkSha3_384_MTU-4            279054        100177        -64.10%
BenchmarkSha3_256_MTU-4            224595        81443         -63.74%
BenchmarkSha3_224_MTU-4            210459        79196         -62.37%
BenchmarkShake128_MTU-4            181445        66482         -63.36%
BenchmarkShake256_MTU-4            199495        71572         -64.12%
BenchmarkShake256_16x-4            2704755       1094933       -59.52%
BenchmarkShake256_1MiB-4           151870003     53953383      -64.47%
BenchmarkSha3_512_1MiB-4           283838790     97048578      -65.81%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkPermutationFunction-4     10.51        33.03        3.14x
BenchmarkSha3_512_MTU-4            3.48         9.85         2.83x
BenchmarkSha3_384_MTU-4            4.84         13.48        2.79x
BenchmarkSha3_256_MTU-4            6.01         16.58        2.76x
BenchmarkSha3_224_MTU-4            6.41         17.05        2.66x
BenchmarkShake128_MTU-4            7.44         20.31        2.73x
BenchmarkShake256_MTU-4            6.77         18.86        2.79x
BenchmarkShake256_16x-4            6.06         14.96        2.47x
BenchmarkShake256_1MiB-4           6.90         19.43        2.82x

I opened a CL providing this implementation here: https://go-review.googlesource.com/c/crypto/+/119255 however I have not received any feedback on the CL, so here I am opening this issue to hopefully get more visibility on this.

@ALTree ALTree changed the title Add SHA3 assembly implementation on ARM x/crypto/sha3: add SHA3 assembly implementation for ARMv7 Oct 11, 2018

@gopherbot gopherbot added this to the Unreleased milestone Oct 11, 2018

@aead

This comment has been minimized.

Copy link
Contributor

aead commented Oct 12, 2018

Ref: #28126

@anonymouse64

This comment has been minimized.

Copy link

anonymouse64 commented Oct 12, 2018

As I wasn't aware of the mentioned Assembly Policy mentioned in the issue, I will provide responses to some of the points mentioned there that should be discussed when considering assembly contributions:

  • The change I have only implements the minimal KeccakF1600 hashing function, which is the minimal part that ought to be implemented in assembly.

This function is on the "fast path" when performing SHA3 hashing of large files, etc. I don't have specific benchmarks of implementing other parts of the hashing algorithm in assembly to compare to, but it should be clear from both the speedup I got and the fact that the amd64 implementation only implements this same function in assembly that the function I'm providing is the "minimal" amount to implement in assembly.

  • While my CL includes a large amount of generated assembly, I think because it's generated it's okay that there's a lot more of it than say in the equivalent amd64 assembly implementation.

I would argue this is okay because the generated assembly is generated from a relatively small amount of hand-written native assembly. The reason why I didn't write the native assembly in plan9 assembly is because Go's assembler doesn't support the ARM vector instructions needed for this implementation (and using vector instructions is why the implementation is much faster than the native Go implementation).
Additionally, this small hand-written assembly comes from the upstream reference implementation for SHA3, and is in fact the same place where the original amd64 assembly implementation comes from.

  • In the future, this assembly code could be simplified with either support for vector instructions in plan9 assembly code, or if the go compiler gained the ability to emit vector instructions when compiling specifically for ARMv7+ processors.

  • SHA3 is likely not going to become a legacy primitive anytime soon, considering it was specifically chosen in a public, open manner to replace older hashing mechanisms

  • Specifically for our use case at Canonical Ltd., our package management system snapd uses SHA3 hashing for almost all (possibly all not 100% sure) hashing of files.

One particular place this lack of performance on ARM hits us is during the first boot of a newly flashed Ubuntu Core ARM device, where the system when installing snaps (and other tasks) takes up a good portion of the time performing SHA3 hashes. An example difference for a customer device I can't unfortunately release more details on, performs the first boot in a factory environment about 1-2 minutes faster (approx 10% faster) when using this assembly implementation, and this time/percentage increases with the number of snaps the customer wants to configure their device with initially.

@andybons andybons changed the title x/crypto/sha3: add SHA3 assembly implementation for ARMv7 proposal: x/crypto/sha3: add SHA3 assembly implementation for ARMv7 Oct 17, 2018

@gopherbot gopherbot added the Proposal label Oct 17, 2018

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Oct 17, 2018

@anonymouse64

This comment has been minimized.

Copy link

anonymouse64 commented Dec 3, 2018

Hi folks, any update on this? Is it now too late for this to be considered for go 1.12?

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Dec 4, 2018

@anonymouse64 hi there. It is too late for this to make it into 1.12 since we are deep into the freeze period where only bugs are being fixed. More info is here: https://golang.org/wiki/Go-Release-Cycle

@anonymouse64

This comment has been minimized.

Copy link

anonymouse64 commented Dec 4, 2018

Okay, so can I at least get a decision about whether an assembly implementation of SHA3 for ARM is acceptable or not (disregarding what specific go release it goes into)?
My CL has been open for almost 6 months and this issue has been open for almost 2 months and I have not yet heard any feedback on whether it is acceptable or not. I would like to at least know if it is acceptable, as I was told that I should attempt to upstream this implementation rather than try to maintain our own fork of the SHA3 crypto library. If it's not acceptable, I would need to prepare for making the case for us to maintain a fork which would be highly unfortunate so it would be good for me to know.

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Dec 4, 2018

I'm sorry you've been waiting so long for a response. Hopefully this finds itself at the top of @FiloSottile's notifications and can be looked at more closely soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment