-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8294731: Improve multiplicative inverse for secp256r1 implementation #10544
Conversation
👋 Welcome back xuelei! A progress list of the required criteria for merging this PR into |
@XueleiFan The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
It seems to me the scalar multiplication enhancement should be done first, or maybe integrated with this fix. |
I did not file the scalar multiplication enhancement in JBS yet. There are a few places that could be improved for the EC performance. However, the update is big if having them all in one PR. In order to simplify the code review and implementation, I would like to break it down into small enhancements. I filed an umbrella RFE for the performance improvement in EC. The goal to make the common EC crypto operations (key generation/exchange/signature) 3+ times faster, and make the TLS connections 20%+ faster . I may have to wait for a few more weeks so that I can come up with the scalar multiplication pull request. |
|
||
// calculate imp ^ (2^16 - 1) | ||
MutableIntegerModuloP t = imp.mutable(); | ||
for (int i = 15; i != 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun!
You can further reduce the number of multiplications:
t3 = t^2 * t
t15 = t3 ^ 4 * t3
t255 = t15^16 * t15
t65535 = t255^256 * t255
only four multiplications to get t^(2^16 - 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also try using precomputed powers of t between 0-15? similar to what we do in ECOperations.multiply (see pointMultiples
). This will also improve the number of multiplications.
According to our measurements, changing the body of IntegerModuloP.multiplicativeInverse() to return getField().getElement(asBigInteger().modPow(modulus.subtract(BigInteger.TWO), modulus)); will result in a bit better performance improvement without the added complexity of new code for every curve. |
@XueleiFan tests are failing after the last commit; see @ferakocz biginteger math is not constant-time, which is why it can't be used here. |
@djelinski for this purpose, it doesn't matter if the exponentiation is not constant time, as its running time only depends on the value of the exponent, which is a known (public) value. |
BigInteger exponentiation time also depends on also depends on the base; quick benchmark:
for IntegerModuloP the result should not depend on base, and if it does, we should fix that. |
@djelinski Thank you very much for help for the testing. The test passed in my testing, but I may made something wrong in the commit. Anyway, I'm working on further improvement, similar to your comments. I will make sure the test passed for the next commit. |
0-15 may be too much for the P256 order field because of the bit sets in it. I tried 0-8 and 0-4. 0-4 has a little bit better benchmark numbers. The two is about the same for multiplication numbers, but 0-8 uses more memory. In the last commit, 0-4 is used for caching as it is more memory friendly. |
Well, if you ever encounter the special cased "ONE" during ECDSA signature, you have a bigger problem than that the exponentiation is not exactly constant time. Also, if you can get close enough to the system doing the signing to be able to measure the time of the exponentiation precisely enough to differentiate one really occurring base from another -- you only have one chance to measure, so cannot average out noise -- than again, you probably have better methods to get to the key than trying to measure time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified the benchmark on x86; very nice 5% improvement. Tier1-3 also passed. LGTM!
Reviewer approval is required. Anyone has cycle? Thanks! |
@XueleiFan This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Hi @XueleiFan, can you wait for approval from @ferakocz? Thanks. |
There are cases that one chance is enough to place an attack. We normally don't discuss vulnerability details in public, please send me an email in private if more details is required.
I may have to agree that better methods may exist. But better methods do not imply that we can let this method go. |
I will see if I can get it by the end of this Tuesday. |
// Note that the last 4 bits was handled in the for-lopp above | ||
// as it hapeens to be 4. For bit set other than 4 bits, for | ||
// example, 3 bits set (0x8), the value should be added back. | ||
// d.setProduct(w[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this comment, or at least fix your typos: "for-lopp" -> "for loop", "hapeens" -> "happens", "(0x8)" -> "(0x7)".
You can say something like.: ' "if(k != -1) d.setProduct(w[k]);" is not necessary here as k is -1 at the end of the loop for this exponent'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I would like to remove this comment as it looks more clear to me.
Well, I doubt this would be one of those cases you have in mind... |
There was vulnerability reported to other TLS vendors for non-constant inversion computation and we was OK. The new implementation performance in this PR is pretty close to using BigIntegers. We might not want to take the risks by introducing the branchless implementation back. I agree that more code means more risks. I hope the risk is under control and get examined. It also come with an advantage. With this update, if secp256r1 broken, secp521r1 may be still safe as they are using different code base. Thank you for looking into the implementation. |
@ferakocz Did you have further comment? What do you think if we integrate the update? |
It looks good to me. |
/integrate |
Going to push as commit c042b8e.
Your commit was automatically rebased without conflicts. |
@XueleiFan Pushed as commit c042b8e. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi,
May I have this patch reviewed?
This is one of a few steps to improve the EC performance. The multiplicative inverse implementation could be improved for better performance.
For secp256r1 prime p, the current multiplicative inverse impl needs 256 square and 128 multiplication. With the path, the operation needs 256 square and 13 multiplication.
For secp256r1 order n, the current multiplicative inverse impl needs 256 square and 169 multiplication. With the patch, the operation needs 256 square and 43 multiplication.
In EC operations, square operation is much faster than multiplication. Decreasing multiplication numbers could speed up the multiplicative inverse significantly.
The benchmark for ECDSA Signature is checked in order to see if the performance improvement is visible. Here are the benchmark numbers before the patch applied:
And the following are the benchmarking after the patch applied.
The performance improvement of the patch is about 5% for ECDSA signature. It looks like the improvement is no significant enough for now. But it may be 2+ times more in numbers when the scalar multiplication implementation is improved in a follow-up enhancement in another pull request.
For comparing, here is the benchmarking numbers by using BigInteger.modInverse();
and numbers for using BigInteger.modPow():
Enhancement for other curves will be considered in separate pull requests.
Thanks,
Xuelei
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10544/head:pull/10544
$ git checkout pull/10544
Update a local copy of the PR:
$ git checkout pull/10544
$ git pull https://git.openjdk.org/jdk pull/10544/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 10544
View PR using the GUI difftool:
$ git pr show -t 10544
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10544.diff