Acceleration of chacha20 on aarch64 by SVE #17916

daniel-hu-arm · 2022-03-16T19:27:20Z

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%.
Optimization is only applied to the scenario where input data is over
256 bytes

Test result:
With SVE
type 1024 bytes 8192 bytes 16384 bytes
ChaCha20 1596373.33k 1639167.32k 1643173.21k

Without SVE (by Neon/Scalar)
type 1024 bytes 8192 bytes 16384 bytes
chacha20 1355487.91k 1372678.83k 1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu Daniel.Hu@arm.com

Checklist

documentation is added or updated
tests are added or updated

paulidale

Any chance of an independent review of the assembly file?

paulidale · 2022-03-17T04:16:05Z

crypto/arm64cpuid.pl

+.type	_armv8_sve_probe,%function
+_armv8_sve_probe:
+	AARCH64_VALID_CALL_TARGET
+	.long	0x04a03000	// eor z0.d,z0.d,z0.d


Is eor correct here? If so it would be assigning zero wouldn't it? Which would be kind of pointless.

It does not matter what the instruction does here for the probe. The code that calls it just sets the bit if SIGILL is not emitted by the instruction.

yes, the only purpose of this instruction is to test if SVE feature is supported by CPU (z0 register only available with SVE). if not, the instruction will cause SIGKILL sent to user process. if yes, it will set a scratch register (z0) to zero with no side-effect to caller.

We have the assembly code internally reviewed in Arm. but not sure if we can find reviewer from other perspective.

One thing that could help would be if you (or someone from Arm) were willing to become community maintainer for one of the aarch64 configuration targets that are currently unsupported:
https://www.openssl.org/policies/platformpolicy.html

@t8m we are keen to support the community. can you let us know what you expect of the role? in terms of people, equipment, process, effort, etc.

It would be really up to you to choose which one you're able to support. And of course you can support also multiple ones.

@t8m, could we identify 4 arm engineers contributing to OpenSSL as community maintainers? We expect there are some backups and supports from different time zone. Besides individual identity is provided, maybe an email alias like openssl-arm-support.arm.com can be set up to reach all of us. Is there any formal process or guide we can follow to apply for being the community maintainer?

Another question about platform policy. Reading https://www.openssl.org/blog/blog/2021/09/22/OpenSSL3-fips-submission, there are some platforms that were tested before OpenSSL 3.0 FIPS module was submitted for FIPS validation, are these tested platforms chosen from primary and community platforms? If not, how a platform can be involved to be tested when validate FIPS module?

Thanks.

You can have multiple people as the contact persons for the platform.

Then one of them simply has to make a PR against the openssl/web repo to make the necessary modifications to the platform policy page and have that PR approved. Once approved and merged its official.

There are no more requirements than what I wrote above in this comment thread.

The FIPS module validation is a separate topic - perhaps @mattcaswell can expand on that.

are these tested platforms chosen from primary and community platforms

Actually no. The FIPS work was sponsored by a number of companies (see https://www.openssl.org/blog/blog/2018/09/25/fips/). We included platforms nominated by our sponsors as well as a couple that the project itself nominated.

Moving forward the OpenSSL Management Committee (OMC) is responsible for maintaining the validation. It is likely that the OMC will want to keep the list of platforms tested to some minimal set. Our primary and secondary lists (currently no platforms are secondary) are likely to influence decisions about any additional platforms to be added if any.

Thanks @t8m for the guide, we will make that PR to apply for the community maintainer. And thanks @mattcaswell for the explanation about FIPS platform.

crypto/chacha/asm/chacha-armv8-sve.pl

crypto/arm64cpuid.pl

crypto/chacha/asm/chacha-armv8-sve.pl

samuel-lee-msft · 2022-04-19T11:40:07Z

With the simplification to remove handling tail cases in SVE I think the code is a lot cleaner - LGTM overall

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

openssl-machine · 2022-04-29T09:00:16Z

This pull request is ready to merge

paulidale · 2022-05-03T04:40:02Z

Merged, thanks for the contribution.

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #17916)

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#17916) (cherry picked from commit b1b2146)

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #17916) (cherry picked from commit b1b2146)

paulidale reviewed Mar 17, 2022

View reviewed changes

t8m added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member triaged: feature The issue/pr requests/adds a feature labels Mar 17, 2022

paulidale approved these changes Mar 31, 2022

View reviewed changes

paulidale removed the approval: otc review pending This pull request needs review by an OTC member label Mar 31, 2022