-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions #5390
JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions #5390
Conversation
👋 Welcome back aph! A progress list of the required criteria for merging this PR into |
@theRealAph The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
|
e5ea9b3
to
ac6b7f5
Compare
Mailing list message from Nick Gasson on hotspot-dev: On 07/09/21 22:36 pm, Andrew Haley wrote:
Can you include this explanation in the code somewhere? Perhaps as a -- |
Right you are: I'm forever asking committers to do just that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I tested on several different machines and got speed-ups between 5x and 17x (dataSize=16384).
@theRealAph This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 112 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. In particular, use of the kernel generator model to mange unrolling is something that should be used in all generated code that relies on unrolling. It is highly readable, which is rarely the case with hand-crafted code, because the generator methods clearly signal the structure of the interleaved code. It should also be far easier to update if the code ever needs revising. I suspect it would be hard to produce hand-crafted code that does significantly better when it comes to performance.
In case anyone is wondering why this one hasn't been committed yet. I could commit this now, and fix its time-to-safepoint later. Thoughts? |
I'd commit it now in order to get experience with it, and fix time-to-safepoint later. There's still plenty of time left in the Java 18 schedule for the latter. |
Not a review, but that's the best assembly code I think I've ever seen. Probably the only way to make it decisively better would be to code it in Java, using the Vector API on top of the (as yet uninvented) statically compiled but self-hosting System Java dialect. |
I'm going to frame that and put it on my wall. |
/integrate |
Going to push as commit 4f3b626.
Your commit was automatically rebased without conflicts. |
@theRealAph Pushed as commit 4f3b626. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
An interleaved version of AES/GCM.
Performance, now and then:
A note about the implementation for the reviewers:
Unrolled and hand-scheduled intrinsics are often written in a way that
I don't find satisfactory. Often they are a conglomeration of
copy-and-paste programming and C macros, which makes them hard to
understand and hard to maintain. I won't name any names, but there are
many examples to be found in free software across the Internet,
I spent a while thinking about a structured way to develop and
implement them, and I think I've got something better. The idea is
that you transform a pre-existing implementation into a generator for
the interleaved version. The transformation shouldn't be too hard to
do, but more importantly it should be possible for a reader to verify
that the interleaved and unrolled version performs the same function.
A generator takes the form of a subclass of
KernelGenerator
. Thecore idea is that the programmer defines the base case of the
intrinsic and a method to generate a clone of it, shifted to a
different set of registers.
KernelGenerator
will then generateseveral interleaved copies of the function, with each one using a
different set of registers.
The subclass must implement three methods:
length()
, which is thenumber of instruction bundles in the intrinsic,
generate(int n)
which emits the nth instruction bundle in the intrinsic, and
next()
which takes an instance of the generator and returns a version of it,
shifted to a new set of registers.
As an example, here's the inner loop of AES encryption:
(Some details elided for clarity.)
The generator for the unrolled version looks like:
The job of converting a single inline intrinsic is, as you can see,
not much more than adding a switch statement. Some instructions should
only be emitted once, rather than several times, such as the labels
and branches. (You can use a list of C++ lambdas rather than a switch
statement to do the same thing, very LISP, but that seems a bit of a
sledgehammer. YMMV.)
I believe that this approach will be more maintainable and easier to
understand than other approaches we've seen. Also, the number of
unrolls is just a number that can be tweaked as required.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390
$ git checkout pull/5390
Update a local copy of the PR:
$ git checkout pull/5390
$ git pull https://git.openjdk.java.net/jdk pull/5390/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5390
View PR using the GUI difftool:
$ git pr show -t 5390
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5390.diff