Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SM2 on aarch64 #20754

Closed
wants to merge 1 commit into from
Closed

Optimize SM2 on aarch64 #20754

wants to merge 1 commit into from

Conversation

xu-yi-zhou
Copy link
Contributor

@xu-yi-zhou xu-yi-zhou commented Apr 17, 2023

This patch optimizes SM2 for ARM processor using A64 instruction and precomputation table, which can speed up SM2 sign about 10 times and SM2 verify about 3 times. A new configure option no-sm2-precomp has been added to disable the precomputed table for point multiplicatin of the base point.

Perf data on Kunpeng-920 2.6GHz hardware looks like this:

sign verify sign/s verify/s
Before 0.0004s 0.0004s 2438.2 2460.9
After 0.0000s 0.0001s 27137.4 7434.0
After (no-sm2-precomp) 0.0001s 0.0002s 7745.5 4411.2

Signed-off-by: Xu Yizhou xuyizhou1@huawei.com

Checklist
  • documentation is added or updated

@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Apr 17, 2023
@paulidale paulidale added branch: master Merge to master branch triaged: feature The issue/pr requests/adds a feature labels Apr 17, 2023
@zzl360
Copy link

zzl360 commented Apr 17, 2023

any benchmark data?

@paulidale paulidale added approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member labels Apr 17, 2023
@t8m t8m added the tests: exempted The PR is exempt from requirements for testing label Apr 17, 2023
@xu-yi-zhou
Copy link
Contributor Author

Hi @zzl360, I've updated the benchmark data.

Hi @paulidale @t8m @tom-cosgrove-arm, could you please review it?

Copy link
Contributor

@paulidale paulidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits.

That's quite a performance boost and one large file.

crypto/ec/build.info Outdated Show resolved Hide resolved
crypto/ec/ecp_sm2p256.c Outdated Show resolved Hide resolved
crypto/ec/ecp_sm2p256.c Outdated Show resolved Hide resolved
@xu-yi-zhou
Copy link
Contributor Author

Hi @InfoHunter, could you please review this patch?

@xu-yi-zhou
Copy link
Contributor Author

ping for review

Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this optimization should be excluded if OPENSSL_SMALL_FOOTPRINT is defined. Given the large table size.

crypto/ec/ecp_sm2p256.c Outdated Show resolved Hide resolved
@paulidale
Copy link
Contributor

I wonder if this optimization should be excluded if OPENSSL_SMALL_FOOTPRINT is defined. Given the large table size.

It would be a bonus but aarch64 isn't normally associated with small devices.

crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
@openssl-machine
Copy link
Collaborator

This PR is in a state where it requires action by @openssl/otc but the last update was 30 days ago

@tom-cosgrove-arm
Copy link
Contributor

(As I understand it, @docularxu is still reviewing, and there should be an update to the PR from the OP)

@paulidale paulidale added the stalled: awaiting contributor response This pull request is awaiting a response by the contributor label Jul 3, 2023
@tom-cosgrove-arm tom-cosgrove-arm removed the stalled: awaiting contributor response This pull request is awaiting a response by the contributor label Jul 6, 2023
@tom-cosgrove-arm
Copy link
Contributor

(removed the stalled label as the PR has been updated)

@docularxu
Copy link

I'm from Linaro and specializing in Arm platform optimization. The latest version from @xu-yi-zhou effectively addresses the concerns I had previously raised. Also, I verified this version on Apple silicon M2, and the results align with the performance improvements mentioned earlier:

which can speed up SM2 sign about 10 times and SM2 verify about 3 times.

With these, I would recommend merging this patch into the master branch.

crypto/ec/build.info Outdated Show resolved Hide resolved
crypto/ec/build.info Outdated Show resolved Hide resolved
@xu-yi-zhou
Copy link
Contributor Author

Hi all,

I'm thinking about whether to implement constant-time point multiplication and modular inversion to avoid side channel attack.

I'm trying the following to fix side channel,

  • implement constant-time ecp_sm2p256_point_double, ecp_sm2p256_point_add_affine and ecp_sm2p256_point_add
  • ignore the check for index in ecp_sm2p256_point_G_mul_by_scalar and ecp_sm2p256_point_P_mul_by_scalar

After these changes, the performance of sign is reduced by 2% (27137.4->26641.9) and the one of verify is reduced by 6% (7434.0->6973.9).

If the side channel protection is necessary, I still need to implement a gather method for precomputed table like ecp_nistz256_gather_w7, ecp_nistz256_gather_w5 to avoid memory access patterns, a constant time BN_MOD_INV for modular inversion. I'm not sure how much performance will decrease.

Welcome to discsuss.

@mspncp
Copy link
Contributor

mspncp commented Jul 11, 2023

With these, I would recommend merging this patch into the master branch.

By-the-way: thank you @docularxu for investing your time and sharing your valuable opinion. Your help is very much appreciated!

@InfoHunter
Copy link
Member

Hi all,

I'm thinking about whether to implement constant-time point multiplication and modular inversion to avoid side channel attack.

I'm trying the following to fix side channel,

  • implement constant-time ecp_sm2p256_point_double, ecp_sm2p256_point_add_affine and ecp_sm2p256_point_add
  • ignore the check for index in ecp_sm2p256_point_G_mul_by_scalar and ecp_sm2p256_point_P_mul_by_scalar

After these changes, the performance of sign is reduced by 2% (27137.4->26641.9) and the one of verify is reduced by 6% (7434.0->6973.9).

If the side channel protection is necessary, I still need to implement a gather method for precomputed table like ecp_nistz256_gather_w7, ecp_nistz256_gather_w5 to avoid memory access patterns, a constant time BN_MOD_INV for modular inversion. I'm not sure how much performance will decrease.

Welcome to discsuss.

Yes, I consider anti-side-channel should be a 'must' stuff prior to performance.

@xu-yi-zhou
Copy link
Contributor Author

We need compile-time config option for this that is on by default. The increased size of libcrypto also needs to be documented in CHANGES.md.

OK, I will do that as soon as possible.

@xu-yi-zhou
Copy link
Contributor Author

Hi all, I'm trying to add an option enable-sm2-precomp to enable the precomputed table. Do I need to add this option in .github/workflows/*.yml?

@t8m
Copy link
Member

t8m commented Aug 18, 2023

Yes, please. Add no-sm2-precomp to run-checker-daily.yml. Although it won't test much as this is arm specific and run-checker-daily runs on x86_64, it will at least test the disable-ability of the option.

@tom-cosgrove-arm
Copy link
Contributor

OTC: We need compile-time config option for this that is on by default. The increased size of libcrypto also needs to be documented in CHANGES.md.

My understanding of this request is that the large table would be included in default AArch64 builds, but can be disabled for smaller builds by configuring with no-sm2-precomp.

This seems to only use the large table if enable-sm2-precomp is given - @t8m is that what you wanted?

@xu-yi-zhou
Copy link
Contributor Author

@tom-cosgrove-arm I think you are right, and I also forgot to add the increased size of libcrypto in CHANGES.md, I will fix these.

CHANGES.md Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
crypto/ec/build.info Show resolved Hide resolved
Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits

crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
crypto/ec/asm/ecp_sm2p256-armv8.pl Outdated Show resolved Hide resolved
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>
@t8m t8m added approval: review pending This pull request needs review by a committer tests: present The PR has suitable tests present and removed tests: exempted The PR is exempt from requirements for testing labels Aug 23, 2023
Copy link
Member

@beldmit beldmit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@beldmit beldmit added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Aug 23, 2023
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Aug 24, 2023
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@t8m
Copy link
Member

t8m commented Aug 24, 2023

Merged to master branch. Thank you for your contribution.

@t8m t8m closed this Aug 24, 2023
openssl-machine pushed a commit that referenced this pull request Aug 24, 2023
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>

Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #20754)
@sususu98
Copy link

This patch optimizes SM2 for ARM processor using A64 instruction and precomputation table, which can speed up SM2 sign about 10 times and SM2 verify about 3 times. A new configure option no-sm2-precomp has been added to disable the precomputed table for point multiplicatin of the base point.

Perf data on Kunpeng-920 2.6GHz hardware looks like this:

sign verify sign/s verify/s
Before 0.0004s 0.0004s 2438.2 2460.9
After 0.0000s 0.0001s 27137.4 7434.0
After (no-sm2-precomp) 0.0001s 0.0002s 7745.5 4411.2
Signed-off-by: Xu Yizhou xuyizhou1@huawei.com

Checklist
  • documentation is added or updated

Hello, I tested using the default compilation options on Kunpeng-920 2.6GHz hardware and obtained a signature performance of 25057 times/s and a signature verification performance of 7042 times/s. May I ask if you added other options to the data obtained during compilation

@xu-yi-zhou
Copy link
Contributor Author

Hello, I tested using the default compilation options on Kunpeng-920 2.6GHz hardware and obtained a signature performance of 25057 times/s and a signature verification performance of 7042 times/s. May I ask if you added other options to the data obtained during compilation

no other options, try using the taskset command to bind the CPU while testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources tests: present The PR has suitable tests present triaged: feature The issue/pr requests/adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet