8294731: Improve multiplicative inverse for secp256r1 implementation #10544

XueleiFan · 2022-10-03T19:00:44Z

Hi,

May I have this patch reviewed?

This is one of a few steps to improve the EC performance. The multiplicative inverse implementation could be improved for better performance.

For secp256r1 prime p, the current multiplicative inverse impl needs 256 square and 128 multiplication. With the path, the operation needs 256 square and 13 multiplication.

For secp256r1 order n, the current multiplicative inverse impl needs 256 square and 169 multiplication. With the patch, the operation needs 256 square and 43 multiplication.

In EC operations, square operation is much faster than multiplication. Decreasing multiplication numbers could speed up the multiplicative inverse significantly.

The benchmark for ECDSA Signature is checked in order to see if the performance improvement is visible. Here are the benchmark numbers before the patch applied:

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1412.644 ±  5.529  ops/s
Signatures.sign              512  thrpt   15  1407.711 ± 14.118  ops/s
Signatures.sign             2048  thrpt   15  1415.674 ±  6.965  ops/s
Signatures.sign            16384  thrpt   15  1395.582 ± 12.689  ops/s

And the following are the benchmarking after the patch applied.

Signatures.sign               64  thrpt   15  1484.404 ± 10.705  ops/s
Signatures.sign              512  thrpt   15  1486.563 ±  7.514  ops/s
Signatures.sign             2048  thrpt   15  1479.866 ± 15.028  ops/s
Signatures.sign            16384  thrpt   15  1469.789 ±  3.844  ops/s

The performance improvement of the patch is about 5% for ECDSA signature. It looks like the improvement is no significant enough for now. But it may be 2+ times more in numbers when the scalar multiplication implementation is improved in a follow-up enhancement in another pull request.

For comparing, here is the benchmarking numbers by using BigInteger.modInverse();

Benchmark        (messageLength)   Mode  Cnt     Score     Error  Units
Signatures.sign               64  thrpt   15  1395.628 ± 180.649  ops/s
Signatures.sign              512  thrpt   15  1510.590 ±   9.826  ops/s
Signatures.sign             2048  thrpt   15  1514.282 ±   3.382  ops/s
Signatures.sign            16384  thrpt   15  1497.325 ±   6.854  ops/s

and numbers for using BigInteger.modPow():

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1486.764 ± 17.908  ops/s
Signatures.sign              512  thrpt   15  1494.801 ± 14.072  ops/s
Signatures.sign             2048  thrpt   15  1500.170 ±  6.998  ops/s
Signatures.sign            16384  thrpt   15  1434.192 ± 49.269  ops/s

Enhancement for other curves will be considered in separate pull requests.

Thanks,
Xuelei

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8294731: Improve multiplicative inverse for secp256r1 implementation

Reviewers

Daniel Jeliński (@djelinski - Committer) ⚠️ Review applies to ae1df949
John Jiang (@johnshajiang - Reviewer) ⚠️ Review applies to ae1df949

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10544/head:pull/10544
$ git checkout pull/10544

Update a local copy of the PR:
$ git checkout pull/10544
$ git pull https://git.openjdk.org/jdk pull/10544/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10544

View PR using the GUI difftool:
$ git pr show -t 10544

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10544.diff

bridgekeeper · 2022-10-03T19:02:04Z

👋 Welcome back xuelei! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-10-03T19:09:10Z

@XueleiFan The following label will be automatically applied to this pull request:

security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-10-04T21:08:17Z

Webrevs

mcpowers · 2022-10-06T16:11:17Z

It seems to me the scalar multiplication enhancement should be done first, or maybe integrated with this fix.
Do you have a bug number for the scalar multiplication enhancement?

XueleiFan · 2022-10-06T18:33:51Z

It seems to me the scalar multiplication enhancement should be done first, or maybe integrated with this fix. Do you have a bug number for the scalar multiplication enhancement?

I did not file the scalar multiplication enhancement in JBS yet. There are a few places that could be improved for the EC performance. However, the update is big if having them all in one PR. In order to simplify the code review and implementation, I would like to break it down into small enhancements. I filed an umbrella RFE for the performance improvement in EC. The goal to make the common EC crypto operations (key generation/exchange/signature) 3+ times faster, and make the TLS connections 20%+ faster .

I may have to wait for a few more weeks so that I can come up with the scalar multiplication pull request.

djelinski · 2022-10-06T19:10:54Z

src/java.base/share/classes/sun/security/util/math/IntegerModuloP.java

+
+                // calculate imp ^ (2^16 - 1)
+                MutableIntegerModuloP t = imp.mutable();
+                for (int i = 15; i != 0; i--) {


Fun!

You can further reduce the number of multiplications:
t3 = t^2 * t
t15 = t3 ^ 4 * t3
t255 = t15^16 * t15
t65535 = t255^256 * t255
only four multiplications to get t^(2^16 - 1)

djelinski

could you also try using precomputed powers of t between 0-15? similar to what we do in ECOperations.multiply (see pointMultiples). This will also improve the number of multiplications.

ferakocz · 2022-10-07T12:29:22Z

According to our measurements, changing the body of IntegerModuloP.multiplicativeInverse() to

return getField().getElement(asBigInteger().modPow(modulus.subtract(BigInteger.TWO), modulus));

will result in a bit better performance improvement without the added complexity of new code for every curve.

djelinski · 2022-10-07T15:37:24Z

@XueleiFan tests are failing after the last commit; see sun/security/ec/TestEC.java for example.

@ferakocz biginteger math is not constant-time, which is why it can't be used here.

ferakocz · 2022-10-07T15:46:00Z

@djelinski for this purpose, it doesn't matter if the exponentiation is not constant time, as its running time only depends on the value of the exponent, which is a known (public) value.

djelinski · 2022-10-07T16:31:13Z

BigInteger exponentiation time also depends on also depends on the base; quick benchmark:
BigInteger.ONE.modPow(mod.subtract(BigInteger.TWO), mod) vs BigInteger.TWO.modPow(mod.subtract(BigInteger.TWO), mod):

Benchmark        (messageLength)   Mode  Cnt         Score         Error  Units
Signatures.pow1               64  thrpt   15  67352286,115 ± 1281517,907  ops/s
Signatures.pow2               64  thrpt   15     62431,716 ±    1056,398  ops/s

for IntegerModuloP the result should not depend on base, and if it does, we should fix that.

XueleiFan · 2022-10-07T17:44:39Z

@XueleiFan tests are failing after the last commit; see sun/security/ec/TestEC.java for example.

@djelinski Thank you very much for help for the testing. The test passed in my testing, but I may made something wrong in the commit. Anyway, I'm working on further improvement, similar to your comments. I will make sure the test passed for the next commit.

XueleiFan · 2022-10-08T07:44:31Z

could you also try using precomputed powers of t between 0-15? similar to what we do in ECOperations.multiply (see pointMultiples). This will also improve the number of multiplications.

0-15 may be too much for the P256 order field because of the bit sets in it. I tried 0-8 and 0-4. 0-4 has a little bit better benchmark numbers. The two is about the same for multiplication numbers, but 0-8 uses more memory. In the last commit, 0-4 is used for caching as it is more memory friendly.

ferakocz · 2022-10-10T08:21:57Z

BigInteger exponentiation time also depends on also depends on the base; quick benchmark: BigInteger.ONE.modPow(mod.subtract(BigInteger.TWO), mod) vs BigInteger.TWO.modPow(mod.subtract(BigInteger.TWO), mod):
Benchmark        (messageLength)   Mode  Cnt         Score         Error  Units
Signatures.pow1               64  thrpt   15  67352286,115 ± 1281517,907  ops/s
Signatures.pow2               64  thrpt   15     62431,716 ±    1056,398  ops/s
for IntegerModuloP the result should not depend on base, and if it does, we should fix that.

Well, if you ever encounter the special cased "ONE" during ECDSA signature, you have a bigger problem than that the exponentiation is not exactly constant time. Also, if you can get close enough to the system doing the signing to be able to measure the time of the exponentiation precisely enough to differentiate one really occurring base from another -- you only have one chance to measure, so cannot average out noise -- than again, you probably have better methods to get to the key than trying to measure time.

djelinski

Verified the benchmark on x86; very nice 5% improvement. Tier1-3 also passed. LGTM!

XueleiFan · 2022-10-31T14:58:02Z

Reviewer approval is required. Anyone has cycle? Thanks!

openjdk · 2022-10-31T15:11:28Z

@XueleiFan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8294731: Improve multiplicative inverse for secp256r1 implementation

Reviewed-by: djelinski, jjiang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

wangweij · 2022-10-31T15:16:09Z

Hi @XueleiFan, can you wait for approval from @ferakocz? Thanks.

XueleiFan · 2022-10-31T17:19:21Z

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

XueleiFan · 2022-10-31T17:21:53Z

Hi @XueleiFan, can you wait for approval from @ferakocz? Thanks.

I will see if I can get it by the end of this Tuesday.

ferakocz · 2022-11-02T14:35:20Z

src/java.base/share/classes/sun/security/util/math/IntegerModuloP.java

+                // Note that the last 4 bits was handled in the for-lopp above
+                // as it hapeens to be 4. For bit set other than 4 bits, for
+                // example, 3 bits set (0x8), the value should be added back.
+                // d.setProduct(w[2]);


I think you can remove this comment, or at least fix your typos: "for-lopp" -> "for loop", "hapeens" -> "happens", "(0x8)" -> "(0x7)".
You can say something like.: ' "if(k != -1) d.setProduct(w[k]);" is not necessary here as k is -1 at the end of the loop for this exponent'

Thank you for the suggestion. I would like to remove this comment as it looks more clear to me.

ferakocz · 2022-11-02T14:44:30Z

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

Well, I doubt this would be one of those cases you have in mind...
Your method of computing the inverse looks good to me, but I still think that if we can achieve a better result with an existing general method then we should do that instead of writing special ones for every curve.
I think there is a risk in having more code, too.

XueleiFan · 2022-11-02T16:09:27Z

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

Well, I doubt this would be one of those cases you have in mind... Your method of computing the inverse looks good to me, but I still think that if we can achieve a better result with an existing general method then we should do that instead of writing special ones for every curve. I think there is a risk in having more code, too.

There was vulnerability reported to other TLS vendors for non-constant inversion computation and we was OK. The new implementation performance in this PR is pretty close to using BigIntegers. We might not want to take the risks by introducing the branchless implementation back.

I agree that more code means more risks. I hope the risk is under control and get examined. It also come with an advantage. With this update, if secp256r1 broken, secp521r1 may be still safe as they are using different code base.

Thank you for looking into the implementation.

XueleiFan · 2022-11-14T06:23:37Z

@ferakocz Did you have further comment? What do you think if we integrate the update?

ferakocz · 2022-11-15T13:30:46Z

It looks good to me.

XueleiFan · 2022-11-15T15:53:05Z

/integrate

openjdk · 2022-11-15T15:55:15Z

Going to push as commit c042b8e.
Since your change was applied there have been 30 commits pushed to the master branch:

d3051a7: 8296736: Some PKCS9Attribute can be created but cannot be encoded
decb1b7: 8286800: Assert in PhaseIdealLoop::dump_real_LCA is too strong
c49e484: 8294739: jdk/jshell/ToolShiftTabTest.java timed out
a45c9af: 8295814: jdk/jshell/CommandCompletionTest.java fails with "lists don't have the same size expected [2] but found [1]"
d0fae43: 8294947: Use 64bit atomics in patch_verified_entry on x86_64
6f467cd: 8295934: IGV: keep node selection when changing view or graph
9adb728: 8295070: Introduce more target combinations for compiler flags
8ab70d3: 8294775: Shenandoah: reduce contention on _threads_in_evac
5551cb6: 8293166: jdk/jfr/jvm/TestDumpOnCrash.java fails on Linux ppc64le and Linux aarch64
8a9eabb: 8296786: Limit VM modes for com/sun/jdi/JdbLastErrorTest.java
... and 20 more: https://git.openjdk.org/jdk/compare/657a0b2f1564e1754dbd64b776c53a52c480c901...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-11-15T15:55:45Z

@XueleiFan Pushed as commit c042b8e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8294731: Improve multiplicative inverse for EC implementation

c5a656d

openjdk bot added the security security-dev@openjdk.org label Oct 3, 2022

XueleiFan added 2 commits October 4, 2022 11:43

add field order impl

36e7cb5

replace tab with whitrespaces

1da02c5

XueleiFan changed the title ~~8294731: Improve multiplicative inverse for EC implementation~~ 8294731: Improve multiplicative inverse for secp256r1 implementation Oct 4, 2022

XueleiFan marked this pull request as ready for review October 4, 2022 21:02

openjdk bot added the rfr Pull request is ready for review label Oct 4, 2022

more performance improvement

b9bf026

djelinski reviewed Oct 6, 2022

View reviewed changes

djelinski mentioned this pull request Oct 7, 2022

8294997: Improve ECC math operations #10614

Closed

3 tasks

more improvement

ae1df94

djelinski approved these changes Oct 10, 2022

View reviewed changes

johnshajiang approved these changes Oct 31, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Oct 31, 2022

ferakocz reviewed Nov 2, 2022

View reviewed changes

XueleiFan added 3 commits November 2, 2022 09:14

remove unnecessary comment

79fe697

remove duplicated benchmark case

395d6e8

Merge

1e0149e

openjdk bot added the integrated Pull request has been integrated label Nov 15, 2022

openjdk bot closed this Nov 15, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 15, 2022

XueleiFan mentioned this pull request Nov 29, 2022

8294248: Use less limbs for P256 in EC implementation #10398

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8294731: Improve multiplicative inverse for secp256r1 implementation #10544

8294731: Improve multiplicative inverse for secp256r1 implementation #10544

XueleiFan commented Oct 3, 2022 •

edited by openjdk bot

Loading

bridgekeeper bot commented Oct 3, 2022

openjdk bot commented Oct 3, 2022

mlbridge bot commented Oct 4, 2022 •

edited

Loading

mcpowers commented Oct 6, 2022

XueleiFan commented Oct 6, 2022

djelinski Oct 6, 2022 •

edited

Loading

djelinski left a comment

ferakocz commented Oct 7, 2022

djelinski commented Oct 7, 2022

ferakocz commented Oct 7, 2022

djelinski commented Oct 7, 2022

XueleiFan commented Oct 7, 2022

XueleiFan commented Oct 8, 2022

ferakocz commented Oct 10, 2022

djelinski left a comment

XueleiFan commented Oct 31, 2022

openjdk bot commented Oct 31, 2022 •

edited

Loading

wangweij commented Oct 31, 2022 •

edited

Loading

XueleiFan commented Oct 31, 2022 •

edited

Loading

XueleiFan commented Oct 31, 2022

ferakocz Nov 2, 2022 •

edited

Loading

XueleiFan Nov 2, 2022

ferakocz commented Nov 2, 2022

XueleiFan commented Nov 2, 2022

XueleiFan commented Nov 14, 2022

ferakocz commented Nov 15, 2022

XueleiFan commented Nov 15, 2022

openjdk bot commented Nov 15, 2022

openjdk bot commented Nov 15, 2022

8294731: Improve multiplicative inverse for secp256r1 implementation #10544

8294731: Improve multiplicative inverse for secp256r1 implementation #10544

Conversation

XueleiFan commented Oct 3, 2022 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Oct 3, 2022

openjdk bot commented Oct 3, 2022

mlbridge bot commented Oct 4, 2022 • edited Loading

Webrevs

mcpowers commented Oct 6, 2022

XueleiFan commented Oct 6, 2022

djelinski Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

djelinski left a comment

Choose a reason for hiding this comment

ferakocz commented Oct 7, 2022

djelinski commented Oct 7, 2022

ferakocz commented Oct 7, 2022

djelinski commented Oct 7, 2022

XueleiFan commented Oct 7, 2022

XueleiFan commented Oct 8, 2022

ferakocz commented Oct 10, 2022

djelinski left a comment

Choose a reason for hiding this comment

XueleiFan commented Oct 31, 2022

openjdk bot commented Oct 31, 2022 • edited Loading

wangweij commented Oct 31, 2022 • edited Loading

XueleiFan commented Oct 31, 2022 • edited Loading

XueleiFan commented Oct 31, 2022

ferakocz Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

XueleiFan Nov 2, 2022

Choose a reason for hiding this comment

ferakocz commented Nov 2, 2022

XueleiFan commented Nov 2, 2022

XueleiFan commented Nov 14, 2022

ferakocz commented Nov 15, 2022

XueleiFan commented Nov 15, 2022

openjdk bot commented Nov 15, 2022

openjdk bot commented Nov 15, 2022

XueleiFan commented Oct 3, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented Oct 4, 2022 •

edited

Loading

djelinski Oct 6, 2022 •

edited

Loading

openjdk bot commented Oct 31, 2022 •

edited

Loading

wangweij commented Oct 31, 2022 •

edited

Loading

XueleiFan commented Oct 31, 2022 •

edited

Loading

ferakocz Nov 2, 2022 •

edited

Loading