Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8294731: Improve multiplicative inverse for secp256r1 implementation #10544

Closed
wants to merge 8 commits into from

Conversation

XueleiFan
Copy link
Member

@XueleiFan XueleiFan commented Oct 3, 2022

Hi,

May I have this patch reviewed?

This is one of a few steps to improve the EC performance. The multiplicative inverse implementation could be improved for better performance.

For secp256r1 prime p, the current multiplicative inverse impl needs 256 square and 128 multiplication. With the path, the operation needs 256 square and 13 multiplication.

For secp256r1 order n, the current multiplicative inverse impl needs 256 square and 169 multiplication. With the patch, the operation needs 256 square and 43 multiplication.

In EC operations, square operation is much faster than multiplication. Decreasing multiplication numbers could speed up the multiplicative inverse significantly.

The benchmark for ECDSA Signature is checked in order to see if the performance improvement is visible. Here are the benchmark numbers before the patch applied:

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1412.644 ±  5.529  ops/s
Signatures.sign              512  thrpt   15  1407.711 ± 14.118  ops/s
Signatures.sign             2048  thrpt   15  1415.674 ±  6.965  ops/s
Signatures.sign            16384  thrpt   15  1395.582 ± 12.689  ops/s

And the following are the benchmarking after the patch applied.

Signatures.sign               64  thrpt   15  1484.404 ± 10.705  ops/s
Signatures.sign              512  thrpt   15  1486.563 ±  7.514  ops/s
Signatures.sign             2048  thrpt   15  1479.866 ± 15.028  ops/s
Signatures.sign            16384  thrpt   15  1469.789 ±  3.844  ops/s

The performance improvement of the patch is about 5% for ECDSA signature. It looks like the improvement is no significant enough for now. But it may be 2+ times more in numbers when the scalar multiplication implementation is improved in a follow-up enhancement in another pull request.

For comparing, here is the benchmarking numbers by using BigInteger.modInverse();

Benchmark        (messageLength)   Mode  Cnt     Score     Error  Units
Signatures.sign               64  thrpt   15  1395.628 ± 180.649  ops/s
Signatures.sign              512  thrpt   15  1510.590 ±   9.826  ops/s
Signatures.sign             2048  thrpt   15  1514.282 ±   3.382  ops/s
Signatures.sign            16384  thrpt   15  1497.325 ±   6.854  ops/s

and numbers for using BigInteger.modPow():

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1486.764 ± 17.908  ops/s
Signatures.sign              512  thrpt   15  1494.801 ± 14.072  ops/s
Signatures.sign             2048  thrpt   15  1500.170 ±  6.998  ops/s
Signatures.sign            16384  thrpt   15  1434.192 ± 49.269  ops/s

Enhancement for other curves will be considered in separate pull requests.

Thanks,
Xuelei


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8294731: Improve multiplicative inverse for secp256r1 implementation

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10544/head:pull/10544
$ git checkout pull/10544

Update a local copy of the PR:
$ git checkout pull/10544
$ git pull https://git.openjdk.org/jdk pull/10544/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10544

View PR using the GUI difftool:
$ git pr show -t 10544

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10544.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 3, 2022

👋 Welcome back xuelei! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 3, 2022

@XueleiFan The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the security security-dev@openjdk.org label Oct 3, 2022
@XueleiFan XueleiFan changed the title 8294731: Improve multiplicative inverse for EC implementation 8294731: Improve multiplicative inverse for secp256r1 implementation Oct 4, 2022
@XueleiFan XueleiFan marked this pull request as ready for review October 4, 2022 21:02
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 4, 2022
@mlbridge
Copy link

mlbridge bot commented Oct 4, 2022

Webrevs

@mcpowers
Copy link
Contributor

mcpowers commented Oct 6, 2022

It seems to me the scalar multiplication enhancement should be done first, or maybe integrated with this fix.
Do you have a bug number for the scalar multiplication enhancement?

@XueleiFan
Copy link
Member Author

It seems to me the scalar multiplication enhancement should be done first, or maybe integrated with this fix. Do you have a bug number for the scalar multiplication enhancement?

I did not file the scalar multiplication enhancement in JBS yet. There are a few places that could be improved for the EC performance. However, the update is big if having them all in one PR. In order to simplify the code review and implementation, I would like to break it down into small enhancements. I filed an umbrella RFE for the performance improvement in EC. The goal to make the common EC crypto operations (key generation/exchange/signature) 3+ times faster, and make the TLS connections 20%+ faster .

I may have to wait for a few more weeks so that I can come up with the scalar multiplication pull request.


// calculate imp ^ (2^16 - 1)
MutableIntegerModuloP t = imp.mutable();
for (int i = 15; i != 0; i--) {
Copy link
Member

@djelinski djelinski Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun!

You can further reduce the number of multiplications:
t3 = t^2 * t
t15 = t3 ^ 4 * t3
t255 = t15^16 * t15
t65535 = t255^256 * t255
only four multiplications to get t^(2^16 - 1)

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also try using precomputed powers of t between 0-15? similar to what we do in ECOperations.multiply (see pointMultiples). This will also improve the number of multiplications.

@ferakocz
Copy link
Contributor

ferakocz commented Oct 7, 2022

According to our measurements, changing the body of IntegerModuloP.multiplicativeInverse() to

return getField().getElement(asBigInteger().modPow(modulus.subtract(BigInteger.TWO), modulus));

will result in a bit better performance improvement without the added complexity of new code for every curve.

@djelinski
Copy link
Member

@XueleiFan tests are failing after the last commit; see sun/security/ec/TestEC.java for example.

@ferakocz biginteger math is not constant-time, which is why it can't be used here.

@ferakocz
Copy link
Contributor

ferakocz commented Oct 7, 2022

@djelinski for this purpose, it doesn't matter if the exponentiation is not constant time, as its running time only depends on the value of the exponent, which is a known (public) value.

@djelinski
Copy link
Member

BigInteger exponentiation time also depends on also depends on the base; quick benchmark:
BigInteger.ONE.modPow(mod.subtract(BigInteger.TWO), mod) vs BigInteger.TWO.modPow(mod.subtract(BigInteger.TWO), mod):

Benchmark        (messageLength)   Mode  Cnt         Score         Error  Units
Signatures.pow1               64  thrpt   15  67352286,115 ± 1281517,907  ops/s
Signatures.pow2               64  thrpt   15     62431,716 ±    1056,398  ops/s

for IntegerModuloP the result should not depend on base, and if it does, we should fix that.

@XueleiFan
Copy link
Member Author

@XueleiFan tests are failing after the last commit; see sun/security/ec/TestEC.java for example.

@djelinski Thank you very much for help for the testing. The test passed in my testing, but I may made something wrong in the commit. Anyway, I'm working on further improvement, similar to your comments. I will make sure the test passed for the next commit.

@XueleiFan
Copy link
Member Author

could you also try using precomputed powers of t between 0-15? similar to what we do in ECOperations.multiply (see pointMultiples). This will also improve the number of multiplications.

0-15 may be too much for the P256 order field because of the bit sets in it. I tried 0-8 and 0-4. 0-4 has a little bit better benchmark numbers. The two is about the same for multiplication numbers, but 0-8 uses more memory. In the last commit, 0-4 is used for caching as it is more memory friendly.

@ferakocz
Copy link
Contributor

BigInteger exponentiation time also depends on also depends on the base; quick benchmark: BigInteger.ONE.modPow(mod.subtract(BigInteger.TWO), mod) vs BigInteger.TWO.modPow(mod.subtract(BigInteger.TWO), mod):

Benchmark        (messageLength)   Mode  Cnt         Score         Error  Units
Signatures.pow1               64  thrpt   15  67352286,115 ± 1281517,907  ops/s
Signatures.pow2               64  thrpt   15     62431,716 ±    1056,398  ops/s

for IntegerModuloP the result should not depend on base, and if it does, we should fix that.

Well, if you ever encounter the special cased "ONE" during ECDSA signature, you have a bigger problem than that the exponentiation is not exactly constant time. Also, if you can get close enough to the system doing the signing to be able to measure the time of the exponentiation precisely enough to differentiate one really occurring base from another -- you only have one chance to measure, so cannot average out noise -- than again, you probably have better methods to get to the key than trying to measure time.

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the benchmark on x86; very nice 5% improvement. Tier1-3 also passed. LGTM!

@XueleiFan
Copy link
Member Author

Reviewer approval is required. Anyone has cycle? Thanks!

@openjdk
Copy link

openjdk bot commented Oct 31, 2022

@XueleiFan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8294731: Improve multiplicative inverse for secp256r1 implementation

Reviewed-by: djelinski, jjiang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 31, 2022
@wangweij
Copy link
Contributor

wangweij commented Oct 31, 2022

Hi @XueleiFan, can you wait for approval from @ferakocz? Thanks.

@XueleiFan
Copy link
Member Author

XueleiFan commented Oct 31, 2022

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

@XueleiFan
Copy link
Member Author

Hi @XueleiFan, can you wait for approval from @ferakocz? Thanks.

I will see if I can get it by the end of this Tuesday.

// Note that the last 4 bits was handled in the for-lopp above
// as it hapeens to be 4. For bit set other than 4 bits, for
// example, 3 bits set (0x8), the value should be added back.
// d.setProduct(w[2]);
Copy link
Contributor

@ferakocz ferakocz Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this comment, or at least fix your typos: "for-lopp" -> "for loop", "hapeens" -> "happens", "(0x8)" -> "(0x7)".
You can say something like.: ' "if(k != -1) d.setProduct(w[k]);" is not necessary here as k is -1 at the end of the loop for this exponent'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I would like to remove this comment as it looks more clear to me.

@ferakocz
Copy link
Contributor

ferakocz commented Nov 2, 2022

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

Well, I doubt this would be one of those cases you have in mind...
Your method of computing the inverse looks good to me, but I still think that if we can achieve a better result with an existing general method then we should do that instead of writing special ones for every curve.
I think there is a risk in having more code, too.

@XueleiFan
Copy link
Member Author

... you only have one chance to measure, so cannot average out noise ...

There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.

... than again, you probably have better methods to get to the key than trying to measure time.

I may have to agree that better methods may exist. But better methods do not imply that we can let this method go.

Well, I doubt this would be one of those cases you have in mind... Your method of computing the inverse looks good to me, but I still think that if we can achieve a better result with an existing general method then we should do that instead of writing special ones for every curve. I think there is a risk in having more code, too.

There was vulnerability reported to other TLS vendors for non-constant inversion computation and we was OK. The new implementation performance in this PR is pretty close to using BigIntegers. We might not want to take the risks by introducing the branchless implementation back.

I agree that more code means more risks. I hope the risk is under control and get examined. It also come with an advantage. With this update, if secp256r1 broken, secp521r1 may be still safe as they are using different code base.

Thank you for looking into the implementation.

@XueleiFan
Copy link
Member Author

@ferakocz Did you have further comment? What do you think if we integrate the update?

@ferakocz
Copy link
Contributor

It looks good to me.

@XueleiFan
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 15, 2022

Going to push as commit c042b8e.
Since your change was applied there have been 30 commits pushed to the master branch:

  • d3051a7: 8296736: Some PKCS9Attribute can be created but cannot be encoded
  • decb1b7: 8286800: Assert in PhaseIdealLoop::dump_real_LCA is too strong
  • c49e484: 8294739: jdk/jshell/ToolShiftTabTest.java timed out
  • a45c9af: 8295814: jdk/jshell/CommandCompletionTest.java fails with "lists don't have the same size expected [2] but found [1]"
  • d0fae43: 8294947: Use 64bit atomics in patch_verified_entry on x86_64
  • 6f467cd: 8295934: IGV: keep node selection when changing view or graph
  • 9adb728: 8295070: Introduce more target combinations for compiler flags
  • 8ab70d3: 8294775: Shenandoah: reduce contention on _threads_in_evac
  • 5551cb6: 8293166: jdk/jfr/jvm/TestDumpOnCrash.java fails on Linux ppc64le and Linux aarch64
  • 8a9eabb: 8296786: Limit VM modes for com/sun/jdi/JdbLastErrorTest.java
  • ... and 20 more: https://git.openjdk.org/jdk/compare/657a0b2f1564e1754dbd64b776c53a52c480c901...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 15, 2022
@openjdk openjdk bot closed this Nov 15, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 15, 2022
@openjdk
Copy link

openjdk bot commented Nov 15, 2022

@XueleiFan Pushed as commit c042b8e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated security security-dev@openjdk.org
6 participants