-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8333583: Crypto-XDH.generateSecret regression after JDK-8329538 #19728
Conversation
👋 Welcome back vpaprotsk! A progress list of the required criteria for merging this PR into |
@vpaprotsk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 241 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@sviswa7, @vnkozlov, @ascarpino) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@vpaprotsk The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
@ascarpino Would you mind reviewing this again please? Mostly java you reviewed before. |
/label -build |
@magicus |
src/hotspot/share/opto/runtime.cpp
Outdated
@@ -1414,8 +1414,8 @@ const TypeFunc* OptoRuntime::intpoly_montgomeryMult_P256_Type() { | |||
|
|||
// result type needed | |||
fields = TypeTuple::fields(1); | |||
fields[TypeFunc::Parms + 0] = TypeInt::INT; // carry bits in output | |||
const TypeTuple* range = TypeTuple::make(TypeFunc::Parms+1, fields); | |||
fields[TypeFunc::Parms + 0] = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor nit: here NULL could be nullptr instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
What causes regression in P256 "(~-8-14%)"? |
Actually, the other way around; reducePositive is now an unconditionally executed for both pure java and the intrinsic paths. Perhaps that's what is misleading, it was only the mult() intrinsic that was taking advantage of this 'skip reduction' before. (pure java did not benefit from removing reduction, so I kept it. Now 'keeping it' for both paths) |
Hi @vpaprotsk, |
Looking on Also you did not change assembler for intrinsic but you changed corresponding Java code ( |
Yep, I split the original java mult() into multImpl() and reducePositive().
The intrinsic used to return 1 (i.e. numAdds = 1), which would let the next operation decide if it needed to do the reduction or skip it. Now reducePositive() reduction always happens after the intrinsic (when it could had been skipped before). |
Let me know that I got it right:
I like new implementation because intrinsic matches Java code. It would allow avoid confusion I had. The only question left: do we need to do something about Java code which checks return value? It is always 0 now. And I don't see you changed such checks. |
Thats it exactly. Except I would correct the last two words
I disliked this too. I originally removed the Java reduction too, but it hurt the non-intrinsic performance, so put it back in. (Before I got distracted with this bug, I was actually working on next ECC iteration, and was trying to fix this mismatch. But I also hadn't realized how much this optimization actually helped.) There is also a 'bigger' complaint.. this optimization tried to use virtual methods to specialize one particular curve. Fairly standard practice. And it brought the other 'unaffected' curve down. If I can't use virtual methods for further optimizations.. how am I supposed to optimize further? Hmm. Not the time to discuss an answer, this release is going out, not the time to get 'creative', but this will give me problems next time I try to add code here.
(Correction: no return, void). numAdds is now again pretty much a 'private' concept to the IntegerPolynomial class, so figure it was fine before, it should be fine now? |
I did not mean it for this changes but as general improvement of code in other RFE. But it is up to core libs group to decide. |
Talking about future improvements. Is it possible to optimize reduction code by converting it to intrinsic too? Or code generated by C2 is good enough? |
I had some experiments to try where I was using virtual methods to add optimizations, similar to the optimization here (i.e. the default method 'does nothing' and have just one override). Perhaps this issue could had been solved differently and there is something to do on the compiler side i.e. requires a specific order of optimizations.. specialize the IntegerPolynomial.setProduct() hot path for XDH field type, inline mult() from XDH field, realize that the return is always zero, which allows whatever optimizations that werent run for 4% performance. (I don't yet know enough about the C2 to be able to answer or 'fix' that) |
There are examples in C2 how to check method's class holder (intrinsic's predicate) before executing intrinsic code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for VM changes.
@TobiHartmann ran our testing and it passed. |
Thanks @vnkozlov @TobiHartmann ! |
@ferakocz just tagging you as reminder of (the many) items in your queue :) |
Sorry, I was out of office last week. I will take a deeper look at the changes tomorrow, but I have a question based on my first look at it: Do you attribute the performance loss of the XDH code path to the mult() function returning an int instead of being void? Do you think that this prevented some optimization in the hotspot compiler? |
@ferakocz, now I was out on long weekend...
That's exactly it. I 'proved experimentally' that that's the case. Though I haven't identified which exact sequence of optimizations is missing deterministically from compilation logs. That's beyond me yet. Identifying which optimization(s) is missing might be great for long term, but figured since we are closing down commits for this release, I should put something in soonest. This PR essentially 'reverts' the part of my ECC PR to original code. Which in turn should be easiest to review. |
Looks good to me. It would be good, though, to figure out what else could be done to regain the P256 performance with keeping the speed of this code path. |
@ascarpino please approve this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with review by @ferakocz
Thanks @ferakocz @ascarpino /integrate |
@vpaprotsk |
/sponsor |
Going to push as commit f101e15.
Your commit was automatically rebased without conflicts. |
@sviswa7 @vpaprotsk Pushed as commit f101e15. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
/backport :jdk23 |
@vpaprotsk To use the |
/backport jdk:jdk23 |
@TobiHartmann the backport was successfully created on the branch backport-TobiHartmann-f101e153-jdk23 in my personal fork of openjdk/jdk. To create a pull request with this backport targeting openjdk/jdk:jdk23, just click the following link: The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:
If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk:
|
This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much.
The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR.
XDH.generateSecret performance
before Montgomery PR:
after Montgomery PR:
with this PR:
P256 performance with/without mult intrinsic:
Performance before Montgomery PR:
Performance in master without mult() intrinsic
Performance in master with mult() intrinsic
THIS PR without mult intrinsic
THIS PR with mult intrinsic
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19728/head:pull/19728
$ git checkout pull/19728
Update a local copy of the PR:
$ git checkout pull/19728
$ git pull https://git.openjdk.org/jdk.git pull/19728/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19728
View PR using the GUI difftool:
$ git pr show -t 19728
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19728.diff
Webrev
Link to Webrev Comment