New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize AES-CTR for ARM Neoverse V1 and V2. #22733
Conversation
According to optimization guide of Neoverse V1 & V2, at least 8 data chunks should be interleaved for aes max performance. |
The CI failure indicates that the code won't work on arm-linux-gnueabihf. Can you please exclude it appropriately? |
Yes, I'm modifying my code. |
Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd
4a27def
to
785fdbd
Compare
We(Tom and I) have reviewed this patch internally, it looks good to us. |
It would be nice if @tom-cosgrove-arm approves this formally here. |
Sure, and just a heads-up, @tom-cosgrove-arm is on vacation, we suppose he will make comments next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been reviewed and approved internally (apologies, have been away on vacation)
This pull request is ready to merge |
Merged to the master branch. Thank you for your contribution. |
Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #22733)
Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl/openssl#22733) Signed-off-by: fly2x <fly2x@hitls.org>
Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#22733)
Checklist