JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm #20235

suchismith1993 · 2024-07-18T14:31:57Z

Currently acceleration code for GHASH is missing for PPC64.

The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8216437: PPC64: Add intrinsic for GHASH algorithm (Enhancement - P4)

Reviewers

Martin Doerr (@TheRealMDoerr - Reviewer)
Amit Kumar (@offamitkumar - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235
$ git checkout pull/20235

Update a local copy of the PR:
$ git checkout pull/20235
$ git pull https://git.openjdk.org/jdk.git pull/20235/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20235

View PR using the GUI difftool:
$ git pr show -t 20235

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20235.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-07-18T14:33:03Z

👋 Welcome back sroy! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-07-18T14:34:26Z

@suchismith1993 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8216437: PPC64: Add intrinsic for GHASH algorithm

Reviewed-by: mdoerr, amitkumar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 128 new commits pushed to the master branch:

afb9134: 8355627: Don't use ThreadCritical for EventLog list
811f117: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX
d29700c: 8344706: Implement JEP 512: Compact Source Files and Instance Main Methods
... and 125 more: https://git.openjdk.org/jdk/compare/0537c6927d4f617624672cfae06928f9738175ca...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TheRealMDoerr, @offamitkumar) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2024-07-18T14:34:48Z

@suchismith1993 The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

bridgekeeper · 2024-09-12T19:19:28Z

@suchismith1993 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2024-11-07T19:33:33Z

@suchismith1993 This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

suchismith1993 · 2024-11-11T08:38:21Z

/open

openjdk · 2024-11-11T08:39:00Z

@suchismith1993 This pull request is now open

mlbridge · 2024-12-18T16:08:21Z

Webrevs

offamitkumar · 2024-12-18T16:23:49Z

src/hotspot/cpu/ppc/vm_version_ppc.cpp

-  if (UseGHASHIntrinsics) {
-    warning("GHASH intrinsics are not available on this CPU");
-    FLAG_SET_DEFAULT(UseGHASHIntrinsics, false);
+  if (FLAG_IS_DEFAULT(UseGHASHIntrinsics)) {


Just a passing comment: I guess there should be a check about whether underlying architecture supports vector instruction or not. If it does then only enable intrinsic.

TheRealMDoerr

Thanks for implementing it! Reviewing the algorithm will take more time. I already have some comments and suggestions.

TheRealMDoerr · 2024-12-20T17:03:26Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

    return start;
  }

+// Generate stub for ghash process  blocks.


There are multiple double-whitespaces in the new comments. Please clean them up!

TheRealMDoerr · 2024-12-20T17:08:02Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  VectorRegister vMask = VR24;
+  VectorRegister vS = VR25;
+  VectorSRegister vXS = VSR33;
+  Label L_end, L_aligned;


I suggest to declare VectorRegisters only and using ->to_vsr() below. This should improve readability.

Non-volatile VectorRegisters need to be preserved. See

jdk/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp

Line 3866 in bcb1bda

// Save non-volatile vector registers (frameless).

TheRealMDoerr · 2024-12-20T17:14:32Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  VectorSRegister vXS = VSR33;
+  Label L_end, L_aligned;
+
+  static const unsigned char perm_pattern[16] __attribute__((aligned(16))) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};


This pattern can be produced by lvsl. Loading it from memory is not needed.

Hi @TheRealMDoerr

I had tried something like
__ lvsl(loadOrder, 0);

This generated a pattern as below
{0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 0x9, 0x8, 0x7, 0x6, 0x5, 0x4, 0x3,
0x2, 0x1, 0x0}}
This causes the the data to be loaded into vector in wrong order.

The desired pattern is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

Since the data is stored in bytes and we don't have lxvb16x in power8, the pattern has to be enforced.

Is there a better way to do this ?

I know this code has been changed already, but I would like to point out that you should use alignas(alignment) for alignment purposes and not __attribute__((aligned(alignment))) like the HotSpot Style Guide recommends for any future changes

TheRealMDoerr · 2024-12-20T17:18:34Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+// Arguments for generated stub:
+//      state:  R3_ARG1
+//      subkeyH:    R4_ARG2
+//      data: R5_ARG3


Argument "blocks" missing.

TheRealMDoerr · 2024-12-20T17:19:03Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  __ vsldoi(vHigherH, vSwappedH, vZero, 8);     // H.H
+  __ vxor(vTmp1, vTmp1, vTmp1);
+  __ vxor(vZero, vZero, vZero);
+  __ mtctr(blocks);


Can blocks be 0?
blocks is an int. The higher half of the register may possibly contain garbage and should be cleared. (Can be combined with 0 check if needed.)

(

jdk/src/java.base/share/classes/com/sun/crypto/provider/GHASH.java

Line 281 in a641932

while (blocks > 0) {

) In the java code, the 0 check is handled.

The C2 compiler replaces this Java code by the intrinsic.

len is passed to the stub without null check:

jdk/src/hotspot/share/opto/library_call.cpp

Line 7599 in f9b1133

state_start, subkeyH_start, data_start, len);

But I can see null checks in GHASH.java for all callers of processBlocks. So, your assertion should be fine.

TheRealMDoerr · 2024-12-20T17:22:35Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+    __ addi(data, data, 16);
+    __ bdnz(loop);
+  __ stxvd2x(vZero->to_vsr(), state);
+  __ blr();                                     // Return from function


Some empty lines would improve readability.

offamitkumar · 2024-12-23T05:25:53Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  __ vxor(vConstC2, vConstC2, vConstC2);
+  __ mtvrd(vConstC2, temp1);


is that vxor instruction really necessary? mtvrd will do overwrite any way. So why do we want to be sure that there is 0 in vConstC2 ?

offamitkumar · 2024-12-23T05:33:27Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  __ vsldoi(vSwappedH, vTmp2, vTmp2, 8);        // swap Lower and Higher Halves of subkey H
+  __ vsldoi(vLowerH, vZero, vSwappedH, 8);      // H.L
+  __ vsldoi(vHigherH, vSwappedH, vZero, 8);     // H.H
+  __ vxor(vTmp1, vTmp1, vTmp1);


I see this vTmp1 is being used in the loop and vpmsumd (line 741) should overwrite whatever it is containing. So was this xor necessary ?

offamitkumar · 2024-12-23T05:37:16Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  __ vxor(vTmp1, vTmp1, vTmp1);
+  __ vxor(vZero, vZero, vZero);
+  __ mtctr(blocks);
+  __ li(temp1, 0);


I find this load redundant as well. We are loading 0 again at line 722 in the loop.

offamitkumar · 2024-12-23T05:38:33Z

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

+  __ vsldoi(vLowerH, vZero, vSwappedH, 8);      // H.L
+  __ vsldoi(vHigherH, vSwappedH, vZero, 8);     // H.H
+  __ vxor(vTmp1, vTmp1, vTmp1);
+  __ vxor(vZero, vZero, vZero);


we are doing same xor operation on vZero at 721 in the loop, before that It is not being used. So can we get rid of this xor-instruction as well ?

openjdk · 2025-01-09T08:21:42Z

@suchismith1993 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout ghash_processblocks
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

src/hotspot/cpu/ppc/vm_version_ppc.cpp

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp

theRealAph · 2025-01-10T13:07:53Z

The commenting here is poor.

GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?

Sure, I could figure it out by reading the code, but please say.

suchismith1993 · 2025-01-15T16:45:22Z

The commenting here is poor.

GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?

Sure, I could figure it out by reading the code, but please say.

Hi Andrew

I would like to understand if I have fully understood your comment.

Currently the load instruction takes care of the endianness ,for subkey and state. For loading the data, we enforce the endianness and reorder the bytes order using vec_perm.
vec_perm(vH, vHigh, vLow, loadOrder);

I am assuming the inputs for GHASH follows the endianness as per the algorithm, as you have mentioned. I have made sure they are in the appropriate intended representation for both LE and BE platforms(using vec_perm and appropriate load instructions)

In the algorithm that I have used , 0xC2 is the polynomial for reduction.

It is shifted by 56 bits to make It the most significant byte. I think this is little endian byte order ?
I just had to do the operations with the reduction polynomial to align it as per the algorithm.

I did not do any extra swapping for the subkey ,state vector and input.

Is this what you are looking for ?

theRealAph · 2025-01-15T18:28:55Z

The commenting here is poor.
GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?
Sure, I could figure it out by reading the code, but please say.

Hi Andrew

I would like to understand if I have fully understood your comment.

Currently the load instruction takes care of the endianness ,for subkey and state. For loading the data, we enforce the endianness and reorder the bytes order using vec_perm. vec_perm(vH, vHigh, vLow, loadOrder);

I am assuming the inputs for GHASH follows the endianness as per the algorithm, as you have mentioned. I have made sure they are in the appropriate intended representation for both LE and BE platforms(using vec_perm and appropriate load instructions)

In the algorithm that I have used , 0xC2 is the polynomial for reduction.

It is shifted by 56 bits to make It the most significant byte. I think this is little endian byte order ? I just had to do the operations with the reduction polynomial to align it as per the algorithm.

Right, so in this implementation the low-order bits of the field polynomial (i.e. p = z^7+z^2+z+1) are represented as 0xC2, or 11000010. But you will note that there is a bit missing here. the low-order bits of the field polynomial should have four bits set. And in GHASH.java in the JDK, 0xe100000000000000 is used, which is a bit more obvious.

I think you're using the trick described in Intel's Optimized Galois-Counter-Mode Implementation on Intel® Architecture Processors, which represents the polynomial in a shifted form as, in effect, 1:C200000000000000.
Unfortunately, the constant vConstC2 does not appear anywhere in this PR, so I had no way to know that. I guess that this PR does not even compile.

The main problem is, though, that there is little commentary in the code which explains how things are encoded. If you're using a bit-reversed and shifted representation of a polynomial, you have to say that. If youre using the algorithm described in the Intel paper, you have to say that too. Have pity on the reader.

theRealAph · 2025-04-24T14:24:17Z

Please run AESGCMByteBuffer.encrypt and provide some before and after figures.

suchismith1993 · 2025-04-24T14:27:10Z

/integrate

openjdk · 2025-04-24T14:28:04Z

@suchismith1993 This pull request has not yet been marked as ready for integration.

suchismith1993 · 2025-04-27T17:20:30Z

Please run AESGCMByteBuffer.encrypt and provide some before and after figures.

Hi @TheRealMDoerr Was this suite run from your end ? Was the TestAESMain that you had checked the run times on ?

@theRealAph From my end, we had improvement of around 3 times after running TestAESMain. Is that not valid test suite ?

If the improvement with this version is satisfactory , can we have this integrated and then pursue further improvements on it in separate PR ? Will open a JBS issue for the same

TheRealMDoerr · 2025-04-28T08:46:57Z

Hi @TheRealMDoerr Was this suite run from your end ? Was the TestAESMain that you had checked the run times on ?

I've run tier1-4 on linux ppc64le and AIX for stability testing and only used TestAESMain to check the performance. I agree with Andrew that some performance numbers should be published. I think you can also report the performance results of TestAESMain, here.

suchismith1993 · 2025-04-28T09:07:05Z

Runtime without my changes

~/jdkHead/jdk/build/linux-ppc64le-server-fastdebug/jdk/bin/java -Xbatch -DcheckOutput=true -Dmode=GCM -DencInputOffset=1 -DencOutputOffset=1 -XX:DisableIntrinsic=_ghash_processBlocks -XX:+UnlockDiagnosticVMOptions -Xbootclasspath/a:. -cp . compiler.codegen.aes.TestAESMain 100000 100000

The output is as belows

100000 iterations
For random generator using seed: 7133744594045351839
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=7133744594045351839" to command line.

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Encryption cipher provider: SunJCE version 24
Encryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting encryption warm-up
Finished encryption warm-up
TestAESEncode runtime was 1495.395126 ms

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Decryption cipher provider: SunJCE version 24
Decryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting decryption warm-up
Finished decryption warm-up
TestAESDecode runtime was 1378.752962 ms

suchismith1993 · 2025-04-28T09:07:41Z

Runtime with changes

~/jdkHead/jdk/build/linux-ppc64le-server-fastdebug/jdk/bin/java -Xbatch -DcheckOutput=true -Dmode=GCM -DencInputOffset=1 -DencOutputOffset=1 -XX:+UnlockDiagnosticVMOptions -Xbootclasspath/a:. -cp . compiler.codegen.aes.TestAESMain 100000 100000

The output is a below
100000 iterations
For random generator using seed: 1980542562394450893
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=1980542562394450893" to command line.

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Encryption cipher provider: SunJCE version 24
Encryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting encryption warm-up
Finished encryption warm-up
TestAESEncode runtime was 565.673321 ms

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Decryption cipher provider: SunJCE version 24
Decryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting decryption warm-up
Finished decryption warm-up
TestAESDecode runtime was 459.795885 ms

theRealAph · 2025-04-28T09:21:50Z

@theRealAph From my end, we had improvement of around 3 times after running TestAESMain. Is that not valid test suite ?

If the improvement with this version is satisfactory , can we have this integrated and then pursue further improvements on it in separate PR ? Will open a JBS issue for the same

You should run the JMH test, like so:

fedora:theRealAph-jdk $ CONF=release make -k LOG=info build-microbenchmark CONF_CHECK=auto
fedora:theRealAph-jdk $  ./build/linux-aarch64-server-release/jdk/bin/java -Djmh.ignoreLock=true  -jar ./build/linux-aarch64-server-release/images/test/micro/benchmarks.jar -f 1 AESGCMByteBuffer.encrypt\$

Benchmark                                 (dataMethod)  (dataSize)  (keyLength)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        1024          128              thrpt    8  2216504.066 ± 12527.173  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        1500          128              thrpt    8  1505300.797 ±  8675.648  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        4096          128              thrpt    8   813518.431 ±  7513.509  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct       16384          128              thrpt    8   233268.190 ±   975.616  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        1024          128              thrpt    8  2562063.056 ± 18200.538  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        1500          128              thrpt    8  1771049.922 ±  6444.924  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        4096          128              thrpt    8   934138.960 ±  5353.664  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap       16384          128              thrpt    8   257884.039 ±   149.974  ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt        direct        1024          128              thrpt    8  2214159.143 ± 16196.670  ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt          heap        1024          128              thrpt    8  2578675.681 ± 22067.812  ops/s

suchismith1993 · 2025-04-28T12:48:43Z

Without GHASH change

B

Benchmark	Data Method	Data Size	Key Length	Mode	Count	Score (ops/s)	Error (ops/s)	Units
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	direct	1024	128	thrpt	8	52020.466	± 756.766	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	direct	1500	128	thrpt	8	35524.179	± 587.709	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	direct	4096	128	thrpt	8	14065.545	± 94.679	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	direct	16384	128	thrpt	8	3494.208	± 36.804	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	heap	1024	128	thrpt	8	53579.051	± 521.148	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	heap	1500	128	thrpt	8	37105.385	± 755.540	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	heap	4096	128	thrpt	8	14122.494	± 78.641	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt	heap	16384	128	thrpt	8	3570.723	± 18.136	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	direct	1024	128	thrpt	8	50573.814	± 858.171	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	direct	1500	128	thrpt	8	35402.422	± 761.839	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	direct	4096	128	thrpt	8	13948.808	± 121.955	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	direct	16384	128	thrpt	8	3555.491	± 27.543	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	heap	1024	128	thrpt	8	52583.092	± 786.567	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	heap	1500	128	thrpt	8	36563.715	± 365.381	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	heap	4096	128	thrpt	8	13974.515	± 88.673	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart	heap	16384	128	thrpt	8	3552.996	± 25.234	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	direct	1024	128	thrpt	8	53387.361	± 690.909	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	direct	1500	128	thrpt	8	36970.383	± 495.504	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	direct	4096	128	thrpt	8	13919.025	± 88.704	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	direct	16384	128	thrpt	8	3582.015	± 12.920	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	heap	1024	128	thrpt	8	53631.653	± 449.160	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	heap	1500	128	thrpt	8	37890.654	± 291.797	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	heap	4096	128	thrpt	8	14324.705	± 33.475	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt	heap	16384	128	thrpt	8	3563.167	± 18.069	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	direct	1024	128	thrpt	8	52676.705	± 828.404	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	direct	1500	128	thrpt	8	36329.914	± 475.700	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	direct	4096	128	thrpt	8	14062.787	± 118.448	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	direct	16384	128	thrpt	8	3579.154	± 16.530	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	heap	1024	128	thrpt	8	53562.594	± 317.060	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	heap	1500	128	thrpt	8	36811.085	± 320.696	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	heap	4096	128	thrpt	8	14086.269	± 54.366	ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart	heap	16384	128	thrpt	8	3563.559	± 19.188	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decrypt	direct	1024	128	thrpt	8	52021.706	± 827.404	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decrypt	heap	1024	128	thrpt	8	53550.519	± 457.500	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decryptMultiPart	direct	1024	128	thrpt	8	50392.121	± 890.139	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decryptMultiPart	heap	1024	128	thrpt	8	52771.665	± 547.670	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt	direct	1024	128	thrpt	8	53258.597	± 758.263	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt	heap	1024	128	thrpt	8	54603.228	± 343.555	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encryptMultiPart	direct	1024	128	thrpt	8	52796.661	± 870.566	ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encryptMultiPart	heap	1024	128	thrpt	8	53488.007	± 441.574	ops/s

suchismith1993 · 2025-04-28T13:01:03Z

With my changes

Benchmark	Data Method	Data Size	Key Length	Mode	Cnt	Score	Error	Units
AESGCMByteBuffer.decrypt	direct	1024	128	thrpt	8	192164.655	±2499.922	ops/s
AESGCMByteBuffer.decrypt	direct	1500	128	thrpt	8	138590.675	±1718.893	ops/s
AESGCMByteBuffer.decrypt	direct	4096	128	thrpt	8	60015.129	±516.554	ops/s
AESGCMByteBuffer.decrypt	direct	16384	128	thrpt	8	15705.840	±101.889	ops/s
AESGCMByteBuffer.decrypt	heap	1024	128	thrpt	8	234618.808	±3508.043	ops/s
AESGCMByteBuffer.decrypt	heap	1500	128	thrpt	8	153490.970	±1991.507	ops/s
AESGCMByteBuffer.decrypt	heap	4096	128	thrpt	8	59706.883	±393.104	ops/s
AESGCMByteBuffer.decrypt	heap	16384	128	thrpt	8	15282.959	±35.228	ops/s
AESGCMByteBuffer.decryptMultiPart	direct	1024	128	thrpt	8	169563.728	±3262.014	ops/s
AESGCMByteBuffer.decryptMultiPart	direct	1500	128	thrpt	8	125917.360	±2171.133	ops/s
AESGCMByteBuffer.decryptMultiPart	direct	4096	128	thrpt	8	57233.798	±1219.124	ops/s
AESGCMByteBuffer.decryptMultiPart	direct	16384	128	thrpt	8	15314.450	±267.215	ops/s
AESGCMByteBuffer.decryptMultiPart	heap	1024	128	thrpt	8	199834.254	±2929.256	ops/s
AESGCMByteBuffer.decryptMultiPart	heap	1500	128	thrpt	8	143659.707	±2019.578	ops/s
AESGCMByteBuffer.decryptMultiPart	heap	4096	128	thrpt	8	57676.269	±760.886	ops/s
AESGCMByteBuffer.decryptMultiPart	heap	16384	128	thrpt	8	14899.282	±194.883	ops/s
AESGCMByteBuffer.encrypt	direct	1024	128	thrpt	8	217833.792	±2839.966	ops/s
AESGCMByteBuffer.encrypt	direct	1500	128	thrpt	8	152150.607	±2203.853	ops/s
AESGCMByteBuffer.encrypt	direct	4096	128	thrpt	8	60091.726	±812.084	ops/s
AESGCMByteBuffer.encrypt	direct	16384	128	thrpt	8	15720.273	±85.991	ops/s
AESGCMByteBuffer.encrypt	heap	1024	128	thrpt	8	218901.548	±2687.554	ops/s
AESGCMByteBuffer.encrypt	heap	1500	128	thrpt	8	153527.621	±1816.675	ops/s
AESGCMByteBuffer.encrypt	heap	4096	128	thrpt	8	58896.329	±1637.968	ops/s
AESGCMByteBuffer.encrypt	heap	16384	128	thrpt	8	15226.399	±17.957	ops/s
AESGCMByteBuffer.encryptMultiPart	direct	1024	128	thrpt	8	197339.940	±2428.986	ops/s
AESGCMByteBuffer.encryptMultiPart	direct	1500	128	thrpt	8	136931.341	±2782.111	ops/s
AESGCMByteBuffer.encryptMultiPart	direct	4096	128	thrpt	8	59652.962	±750.375	ops/s
AESGCMByteBuffer.encryptMultiPart	direct	16384	128	thrpt	8	15667.096	±58.490	ops/s
AESGCMByteBuffer.encryptMultiPart	heap	1024	128	thrpt	8	214639.739	±4077.556	ops/s
AESGCMByteBuffer.encryptMultiPart	heap	1500	128	thrpt	8	155557.214	±2422.094	ops/s
AESGCMByteBuffer.encryptMultiPart	heap	4096	128	thrpt	8	58895.472	±1538.650	ops/s
AESGCMByteBuffer.encryptMultiPart	heap	16384	128	thrpt	8	15038.955	±44.792	ops/s
AESGCMByteBuffer.small.decrypt	direct	1024	128	thrpt	8	192555.048	±3710.757	ops/s
AESGCMByteBuffer.small.decrypt	heap	1024	128	thrpt	8	235177.894	±4321.018	ops/s
AESGCMByteBuffer.small.decryptMultiPart	direct	1024	128	thrpt	8	167625.340	±2418.147	ops/s
AESGCMByteBuffer.small.decryptMultiPart	heap	1024	128	thrpt	8	200193.172	±3319.042	ops/s
AESGCMByteBuffer.small.encrypt	direct	1024	128	thrpt	8	216340.878	±4651.345	ops/s
AESGCMByteBuffer.small.encrypt	heap	1024	128	thrpt	8	231760.813	±4271.094	ops/s
AESGCMByteBuffer.small.encryptMultiPart	direct	1024	128	thrpt	8	195748.230	±5825.305	ops/s
AESGCMByteBuffer.small.encryptMultiPart	heap	1024	128	thrpt	8	215594.033	±4254.075	ops/s

suchismith1993 · 2025-05-02T11:56:06Z

Hi @theRealAph Can you help understand this result ? since op/s is increasing for ghash code , does it suggest a speedup ?

theRealAph · 2025-05-02T12:05:11Z

Hi @theRealAph Can you help understand this result ? since op/s is increasing for ghash code , does it suggest a speedup ?

Yes, an increase in ops/s is what we want.

suchismith1993 · 2025-05-02T12:26:35Z

Thank you everyone.

suchismith1993 · 2025-05-02T12:26:44Z

/integrate

openjdk · 2025-05-02T12:27:06Z

@suchismith1993
Your change (at version 423c868) is now ready to be sponsored by a Committer.

TheRealMDoerr · 2025-05-02T12:29:49Z

/sponsor

openjdk · 2025-05-02T12:30:35Z

Going to push as commit cdad6d7.
Since your change was applied there have been 128 commits pushed to the master branch:

afb9134: 8355627: Don't use ThreadCritical for EventLog list
811f117: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX
d29700c: 8344706: Implement JEP 512: Compact Source Files and Instance Main Methods
... and 125 more: https://git.openjdk.org/jdk/compare/0537c6927d4f617624672cfae06928f9738175ca...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-05-02T12:30:45Z

@TheRealMDoerr @suchismith1993 Pushed as commit cdad6d7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

suchismith1993 · 2025-05-15T07:46:45Z

/backport jdk21u-dev

suchismith1993 · 2025-05-15T07:46:58Z

/backport jdk24u-dev

openjdk · 2025-05-15T07:48:22Z

@suchismith1993 the backport was successfully created on the branch backport-suchismith1993-cdad6d78-master in my personal fork of openjdk/jdk21u-dev. To create a pull request with this backport targeting openjdk/jdk21u-dev:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit cdad6d78 from the openjdk/jdk repository.

The commit being backported was authored by Suchismith Roy on 2 May 2025 and was reviewed by Martin Doerr and Amit Kumar.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21u-dev:

$ git fetch https://github.com/openjdk-bots/jdk21u-dev.git backport-suchismith1993-cdad6d78-master:backport-suchismith1993-cdad6d78-master
$ git checkout backport-suchismith1993-cdad6d78-master
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21u-dev.git backport-suchismith1993-cdad6d78-master

openjdk · 2025-05-15T07:49:06Z

@suchismith1993 The target repository jdk24u-dev is not a valid target for backports.
List of valid target repositories: openjdk/jdk, openjdk/jdk11u, openjdk/jdk11u-dev, openjdk/jdk17u, openjdk/jdk17u-dev, openjdk/jdk21u, openjdk/jdk21u-dev, openjdk/jdk24u, openjdk/jdk7u, openjdk/jdk8u, openjdk/jdk8u-dev, openjdk/jfx, openjdk/jfx17u, openjdk/jfx21u, openjdk/jfx24u, openjdk/lilliput-jdk17u, openjdk/lilliput-jdk21u, openjdk/shenandoah-jdk21u, openjdk/shenandoah-jdk8u.
Supplying the organization/group prefix is optional.

openjdk bot added the hotspot hotspot-dev@openjdk.org label Jul 18, 2024

bridgekeeper bot closed this Nov 7, 2024

openjdk bot reopened this Nov 11, 2024

suchismith1993 force-pushed the ghash_processblocks branch from 59f5c2a to 913be49 Compare December 9, 2024 15:14

suchismith1993 changed the title ~~skeleton code~~ PPC64: Add intrinsic for GHASH algorithm Dec 9, 2024

suchismith1993 changed the title ~~PPC64: Add intrinsic for GHASH algorithm~~ JDK-8216437 : [PPC64] Add intrinsic for GHASH algorithm Dec 9, 2024

suchismith1993 changed the title ~~JDK-8216437 : [PPC64] Add intrinsic for GHASH algorithm~~ JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm Dec 9, 2024

suchismith1993 marked this pull request as ready for review December 18, 2024 16:03

openjdk bot added the rfr Pull request is ready for review label Dec 18, 2024

offamitkumar reviewed Dec 18, 2024

View reviewed changes

TheRealMDoerr reviewed Dec 20, 2024

View reviewed changes

offamitkumar reviewed Dec 23, 2024

View reviewed changes

openjdk bot added rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Jan 8, 2025

openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed merge-conflict Pull request has merge conflict with target branch labels Jan 9, 2025

TheRealMDoerr reviewed Jan 9, 2025

View reviewed changes

src/hotspot/cpu/ppc/vm_version_ppc.cpp Outdated Show resolved Hide resolved

offamitkumar reviewed Jan 9, 2025

View reviewed changes

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp Outdated Show resolved Hide resolved

offamitkumar approved these changes Apr 28, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 28, 2025

openjdk bot added the sponsor Pull request is ready to be sponsored label May 2, 2025

openjdk bot added the integrated Pull request has been integrated label May 2, 2025

openjdk bot closed this May 2, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 2, 2025

		__ vxor(vConstC2, vConstC2, vConstC2);
		__ mtvrd(vConstC2, temp1);

JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm #20235

JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm #20235

Uh oh!

Conversation

suchismith1993 commented Jul 18, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Jul 18, 2024

Uh oh!

openjdk bot commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Jul 18, 2024

Uh oh!

bridgekeeper bot commented Sep 12, 2024

Uh oh!

bridgekeeper bot commented Nov 7, 2024

Uh oh!

suchismith1993 commented Nov 11, 2024

Uh oh!

openjdk bot commented Nov 11, 2024

Uh oh!

mlbridge bot commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheRealMDoerr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suchismith1993 Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Jan 9, 2025

Uh oh!

Uh oh!

Uh oh!

theRealAph commented Jan 10, 2025

Uh oh!

suchismith1993 commented Jan 15, 2025

Uh oh!

theRealAph commented Jan 15, 2025

Uh oh!

theRealAph commented Apr 24, 2025

Uh oh!

suchismith1993 commented Apr 24, 2025

suchismith1993 commented Jul 18, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Jul 18, 2024 •

edited

Loading

mlbridge bot commented Dec 18, 2024 •

edited

Loading

suchismith1993 Jan 8, 2025 •

edited

Loading

suchismith1993 commented Apr 28, 2025 •

edited

Loading

suchismith1993 commented Apr 28, 2025 •

edited

Loading

theRealAph commented Apr 28, 2025 •

edited

Loading

suchismith1993 commented Apr 28, 2025 •

edited

Loading