Skip to content

Conversation

@suchismith1993
Copy link
Contributor

@suchismith1993 suchismith1993 commented Jul 18, 2024

JBS Issue : JDK-8216437

Currently acceleration code for GHASH is missing for PPC64.

The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8216437: PPC64: Add intrinsic for GHASH algorithm (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235
$ git checkout pull/20235

Update a local copy of the PR:
$ git checkout pull/20235
$ git pull https://git.openjdk.org/jdk.git pull/20235/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20235

View PR using the GUI difftool:
$ git pr show -t 20235

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20235.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 18, 2024

👋 Welcome back sroy! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 18, 2024

@suchismith1993 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8216437: PPC64: Add intrinsic for GHASH algorithm

Reviewed-by: mdoerr, amitkumar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 128 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TheRealMDoerr, @offamitkumar) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Jul 18, 2024

@suchismith1993 The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jul 18, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Sep 12, 2024

@suchismith1993 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 7, 2024

@suchismith1993 This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Nov 7, 2024
@suchismith1993
Copy link
Contributor Author

/open

@openjdk openjdk bot reopened this Nov 11, 2024
@openjdk
Copy link

openjdk bot commented Nov 11, 2024

@suchismith1993 This pull request is now open

@suchismith1993 suchismith1993 changed the title skeleton code PPC64: Add intrinsic for GHASH algorithm Dec 9, 2024
@suchismith1993 suchismith1993 changed the title PPC64: Add intrinsic for GHASH algorithm JDK-8216437 : [PPC64] Add intrinsic for GHASH algorithm Dec 9, 2024
@suchismith1993 suchismith1993 changed the title JDK-8216437 : [PPC64] Add intrinsic for GHASH algorithm JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm Dec 9, 2024
@suchismith1993 suchismith1993 marked this pull request as ready for review December 18, 2024 16:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 18, 2024
@mlbridge
Copy link

mlbridge bot commented Dec 18, 2024

if (UseGHASHIntrinsics) {
warning("GHASH intrinsics are not available on this CPU");
FLAG_SET_DEFAULT(UseGHASHIntrinsics, false);
if (FLAG_IS_DEFAULT(UseGHASHIntrinsics)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a passing comment: I guess there should be a check about whether underlying architecture supports vector instruction or not. If it does then only enable intrinsic.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing it! Reviewing the algorithm will take more time. I already have some comments and suggestions.

return start;
}

// Generate stub for ghash process blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple double-whitespaces in the new comments. Please clean them up!

VectorRegister vMask = VR24;
VectorRegister vS = VR25;
VectorSRegister vXS = VSR33;
Label L_end, L_aligned;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to declare VectorRegisters only and using ->to_vsr() below. This should improve readability.

Non-volatile VectorRegisters need to be preserved. See

// Save non-volatile vector registers (frameless).

VectorSRegister vXS = VSR33;
Label L_end, L_aligned;

static const unsigned char perm_pattern[16] __attribute__((aligned(16))) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern can be produced by lvsl. Loading it from memory is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TheRealMDoerr

I had tried something like
__ lvsl(loadOrder, 0);

This generated a pattern as below
{0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 0x9, 0x8, 0x7, 0x6, 0x5, 0x4, 0x3,
0x2, 0x1, 0x0}}
This causes the the data to be loaded into vector in wrong order.

The desired pattern is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

Since the data is stored in bytes and we don't have lxvb16x in power8, the pattern has to be enforced.

Is there a better way to do this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this code has been changed already, but I would like to point out that you should use alignas(alignment) for alignment purposes and not __attribute__((aligned(alignment))) like the HotSpot Style Guide recommends for any future changes

// Arguments for generated stub:
// state: R3_ARG1
// subkeyH: R4_ARG2
// data: R5_ARG3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument "blocks" missing.

__ vsldoi(vHigherH, vSwappedH, vZero, 8); // H.H
__ vxor(vTmp1, vTmp1, vTmp1);
__ vxor(vZero, vZero, vZero);
__ mtctr(blocks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can blocks be 0?
blocks is an int. The higher half of the register may possibly contain garbage and should be cleared. (Can be combined with 0 check if needed.)

Copy link
Contributor Author

@suchismith1993 suchismith1993 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(

) In the java code, the 0 check is handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C2 compiler replaces this Java code by the intrinsic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len is passed to the stub without null check:

state_start, subkeyH_start, data_start, len);

But I can see null checks in GHASH.java for all callers of processBlocks. So, your assertion should be fine.

__ addi(data, data, 16);
__ bdnz(loop);
__ stxvd2x(vZero->to_vsr(), state);
__ blr(); // Return from function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some empty lines would improve readability.

Comment on lines 691 to 692
__ vxor(vConstC2, vConstC2, vConstC2);
__ mtvrd(vConstC2, temp1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that vxor instruction really necessary? mtvrd will do overwrite any way. So why do we want to be sure that there is 0 in vConstC2 ?

__ vsldoi(vSwappedH, vTmp2, vTmp2, 8); // swap Lower and Higher Halves of subkey H
__ vsldoi(vLowerH, vZero, vSwappedH, 8); // H.L
__ vsldoi(vHigherH, vSwappedH, vZero, 8); // H.H
__ vxor(vTmp1, vTmp1, vTmp1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this vTmp1 is being used in the loop and vpmsumd (line 741) should overwrite whatever it is containing. So was this xor necessary ?

__ vxor(vTmp1, vTmp1, vTmp1);
__ vxor(vZero, vZero, vZero);
__ mtctr(blocks);
__ li(temp1, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this load redundant as well. We are loading 0 again at line 722 in the loop.

__ vsldoi(vLowerH, vZero, vSwappedH, 8); // H.L
__ vsldoi(vHigherH, vSwappedH, vZero, 8); // H.H
__ vxor(vTmp1, vTmp1, vTmp1);
__ vxor(vZero, vZero, vZero);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are doing same xor operation on vZero at 721 in the loop, before that It is not being used. So can we get rid of this xor-instruction as well ?

@openjdk openjdk bot added rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Jan 8, 2025
@openjdk
Copy link

openjdk bot commented Jan 9, 2025

@suchismith1993 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout ghash_processblocks
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed merge-conflict Pull request has merge conflict with target branch labels Jan 9, 2025
@theRealAph
Copy link
Contributor

The commenting here is poor.

GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?

Sure, I could figure it out by reading the code, but please say.

@suchismith1993
Copy link
Contributor Author

The commenting here is poor.

GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?

Sure, I could figure it out by reading the code, but please say.

Hi Andrew

I would like to understand if I have fully understood your comment.

Currently the load instruction takes care of the endianness ,for subkey and state. For loading the data, we enforce the endianness and reorder the bytes order using vec_perm.
vec_perm(vH, vHigh, vLow, loadOrder);

I am assuming the inputs for GHASH follows the endianness as per the algorithm, as you have mentioned. I have made sure they are in the appropriate intended representation for both LE and BE platforms(using vec_perm and appropriate load instructions)

In the algorithm that I have used , 0xC2 is the polynomial for reduction.

It is shifted by 56 bits to make It the most significant byte. I think this is little endian byte order ?
I just had to do the operations with the reduction polynomial to align it as per the algorithm.

I did not do any extra swapping for the subkey ,state vector and input.

Is this what you are looking for ?

@theRealAph
Copy link
Contributor

The commenting here is poor.
GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?
Sure, I could figure it out by reading the code, but please say.

Hi Andrew

I would like to understand if I have fully understood your comment.

Currently the load instruction takes care of the endianness ,for subkey and state. For loading the data, we enforce the endianness and reorder the bytes order using vec_perm. vec_perm(vH, vHigh, vLow, loadOrder);

I am assuming the inputs for GHASH follows the endianness as per the algorithm, as you have mentioned. I have made sure they are in the appropriate intended representation for both LE and BE platforms(using vec_perm and appropriate load instructions)

In the algorithm that I have used , 0xC2 is the polynomial for reduction.

It is shifted by 56 bits to make It the most significant byte. I think this is little endian byte order ? I just had to do the operations with the reduction polynomial to align it as per the algorithm.

Right, so in this implementation the low-order bits of the field polynomial (i.e. p = z^7+z^2+z+1) are represented as 0xC2, or 11000010. But you will note that there is a bit missing here. the low-order bits of the field polynomial should have four bits set. And in GHASH.java in the JDK, 0xe100000000000000 is used, which is a bit more obvious.

I think you're using the trick described in Intel's Optimized Galois-Counter-Mode Implementation on Intel® Architecture Processors, which represents the polynomial in a shifted form as, in effect, 1:C200000000000000.
Unfortunately, the constant vConstC2 does not appear anywhere in this PR, so I had no way to know that. I guess that this PR does not even compile.

The main problem is, though, that there is little commentary in the code which explains how things are encoded. If you're using a bit-reversed and shifted representation of a polynomial, you have to say that. If youre using the algorithm described in the Intel paper, you have to say that too. Have pity on the reader.

@theRealAph
Copy link
Contributor

Please run AESGCMByteBuffer.encrypt and provide some before and after figures.

@suchismith1993
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Apr 24, 2025

@suchismith1993 This pull request has not yet been marked as ready for integration.

@suchismith1993
Copy link
Contributor Author

Please run AESGCMByteBuffer.encrypt and provide some before and after figures.

Please run AESGCMByteBuffer.encrypt and provide some before and after figures.

Hi @TheRealMDoerr Was this suite run from your end ? Was the TestAESMain that you had checked the run times on ?

@theRealAph From my end, we had improvement of around 3 times after running TestAESMain. Is that not valid test suite ?

If the improvement with this version is satisfactory , can we have this integrated and then pursue further improvements on it in separate PR ? Will open a JBS issue for the same

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 28, 2025
@TheRealMDoerr
Copy link
Contributor

Hi @TheRealMDoerr Was this suite run from your end ? Was the TestAESMain that you had checked the run times on ?

I've run tier1-4 on linux ppc64le and AIX for stability testing and only used TestAESMain to check the performance. I agree with Andrew that some performance numbers should be published. I think you can also report the performance results of TestAESMain, here.

@suchismith1993
Copy link
Contributor Author

suchismith1993 commented Apr 28, 2025

Runtime without my changes

~/jdkHead/jdk/build/linux-ppc64le-server-fastdebug/jdk/bin/java -Xbatch -DcheckOutput=true -Dmode=GCM -DencInputOffset=1 -DencOutputOffset=1 -XX:DisableIntrinsic=_ghash_processBlocks -XX:+UnlockDiagnosticVMOptions -Xbootclasspath/a:. -cp . compiler.codegen.aes.TestAESMain 100000 100000

The output is as belows

100000 iterations
For random generator using seed: 7133744594045351839
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=7133744594045351839" to command line.

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Encryption cipher provider: SunJCE version 24
Encryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting encryption warm-up
Finished encryption warm-up
TestAESEncode runtime was 1495.395126 ms

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Decryption cipher provider: SunJCE version 24
Decryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting decryption warm-up
Finished decryption warm-up
TestAESDecode runtime was 1378.752962 ms

@suchismith1993
Copy link
Contributor Author

suchismith1993 commented Apr 28, 2025

Runtime with changes

~/jdkHead/jdk/build/linux-ppc64le-server-fastdebug/jdk/bin/java -Xbatch -DcheckOutput=true -Dmode=GCM -DencInputOffset=1 -DencOutputOffset=1 -XX:+UnlockDiagnosticVMOptions -Xbootclasspath/a:. -cp . compiler.codegen.aes.TestAESMain 100000 100000

The output is a below
100000 iterations
For random generator using seed: 1980542562394450893
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=1980542562394450893" to command line.

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Encryption cipher provider: SunJCE version 24
Encryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting encryption warm-up
Finished encryption warm-up
TestAESEncode runtime was 565.673321 ms

algorithm=AES, mode=GCM, paddingStr=NoPadding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=1, encOutputOffset=1, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Decryption cipher provider: SunJCE version 24
Decryption cipher algorithm: AES/GCM/NoPadding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [647]: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
encode: [671]: 00 7c 9d 30 e5 43 96 fd 53 28 c4 08 16 99 58 b2 60 a6 81 22 51 d9 fd f6 ab d7 4b f2 c9 1f d9 c6
decode: [647]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting decryption warm-up
Finished decryption warm-up
TestAESDecode runtime was 459.795885 ms

@theRealAph
Copy link
Contributor

theRealAph commented Apr 28, 2025

@theRealAph From my end, we had improvement of around 3 times after running TestAESMain. Is that not valid test suite ?

If the improvement with this version is satisfactory , can we have this integrated and then pursue further improvements on it in separate PR ? Will open a JBS issue for the same

You should run the JMH test, like so:

fedora:theRealAph-jdk $ CONF=release make -k LOG=info build-microbenchmark CONF_CHECK=auto
fedora:theRealAph-jdk $  ./build/linux-aarch64-server-release/jdk/bin/java -Djmh.ignoreLock=true  -jar ./build/linux-aarch64-server-release/images/test/micro/benchmarks.jar -f 1 AESGCMByteBuffer.encrypt\$

Benchmark                                 (dataMethod)  (dataSize)  (keyLength)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        1024          128              thrpt    8  2216504.066 ± 12527.173  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        1500          128              thrpt    8  1505300.797 ±  8675.648  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct        4096          128              thrpt    8   813518.431 ±  7513.509  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt         direct       16384          128              thrpt    8   233268.190 ±   975.616  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        1024          128              thrpt    8  2562063.056 ± 18200.538  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        1500          128              thrpt    8  1771049.922 ±  6444.924  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap        4096          128              thrpt    8   934138.960 ±  5353.664  ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt           heap       16384          128              thrpt    8   257884.039 ±   149.974  ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt        direct        1024          128              thrpt    8  2214159.143 ± 16196.670  ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt          heap        1024          128              thrpt    8  2578675.681 ± 22067.812  ops/s

@suchismith1993
Copy link
Contributor Author

suchismith1993 commented Apr 28, 2025

Without GHASH change

B

Benchmark Data Method Data Size Key Length Provider Mode Count Score (ops/s) Error (ops/s) Units
o.o.b.j.c.full.AESGCMByteBuffer.decrypt direct 1024 128   thrpt 8 52020.466 ± 756.766 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt direct 1500 128   thrpt 8 35524.179 ± 587.709 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt direct 4096 128   thrpt 8 14065.545 ± 94.679 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt direct 16384 128   thrpt 8 3494.208 ± 36.804 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt heap 1024 128   thrpt 8 53579.051 ± 521.148 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt heap 1500 128   thrpt 8 37105.385 ± 755.540 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt heap 4096 128   thrpt 8 14122.494 ± 78.641 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decrypt heap 16384 128   thrpt 8 3570.723 ± 18.136 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart direct 1024 128   thrpt 8 50573.814 ± 858.171 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart direct 1500 128   thrpt 8 35402.422 ± 761.839 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart direct 4096 128   thrpt 8 13948.808 ± 121.955 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart direct 16384 128   thrpt 8 3555.491 ± 27.543 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart heap 1024 128   thrpt 8 52583.092 ± 786.567 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart heap 1500 128   thrpt 8 36563.715 ± 365.381 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart heap 4096 128   thrpt 8 13974.515 ± 88.673 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.decryptMultiPart heap 16384 128   thrpt 8 3552.996 ± 25.234 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt direct 1024 128   thrpt 8 53387.361 ± 690.909 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt direct 1500 128   thrpt 8 36970.383 ± 495.504 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt direct 4096 128   thrpt 8 13919.025 ± 88.704 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt direct 16384 128   thrpt 8 3582.015 ± 12.920 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt heap 1024 128   thrpt 8 53631.653 ± 449.160 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt heap 1500 128   thrpt 8 37890.654 ± 291.797 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt heap 4096 128   thrpt 8 14324.705 ± 33.475 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encrypt heap 16384 128   thrpt 8 3563.167 ± 18.069 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart direct 1024 128   thrpt 8 52676.705 ± 828.404 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart direct 1500 128   thrpt 8 36329.914 ± 475.700 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart direct 4096 128   thrpt 8 14062.787 ± 118.448 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart direct 16384 128   thrpt 8 3579.154 ± 16.530 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart heap 1024 128   thrpt 8 53562.594 ± 317.060 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart heap 1500 128   thrpt 8 36811.085 ± 320.696 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart heap 4096 128   thrpt 8 14086.269 ± 54.366 ops/s
o.o.b.j.c.full.AESGCMByteBuffer.encryptMultiPart heap 16384 128   thrpt 8 3563.559 ± 19.188 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decrypt direct 1024 128   thrpt 8 52021.706 ± 827.404 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decrypt heap 1024 128   thrpt 8 53550.519 ± 457.500 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decryptMultiPart direct 1024 128   thrpt 8 50392.121 ± 890.139 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.decryptMultiPart heap 1024 128   thrpt 8 52771.665 ± 547.670 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt direct 1024 128   thrpt 8 53258.597 ± 758.263 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encrypt heap 1024 128   thrpt 8 54603.228 ± 343.555 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encryptMultiPart direct 1024 128   thrpt 8 52796.661 ± 870.566 ops/s
o.o.b.j.c.small.AESGCMByteBuffer.encryptMultiPart heap 1024 128   thrpt 8 53488.007 ± 441.574 ops/s

@suchismith1993
Copy link
Contributor Author

With my changes

Benchmark Data Method Data Size Key Length Provider Mode Cnt Score Error Units
AESGCMByteBuffer.decrypt direct 1024 128   thrpt 8 192164.655 ±2499.922 ops/s
AESGCMByteBuffer.decrypt direct 1500 128   thrpt 8 138590.675 ±1718.893 ops/s
AESGCMByteBuffer.decrypt direct 4096 128   thrpt 8 60015.129 ±516.554 ops/s
AESGCMByteBuffer.decrypt direct 16384 128   thrpt 8 15705.840 ±101.889 ops/s
AESGCMByteBuffer.decrypt heap 1024 128   thrpt 8 234618.808 ±3508.043 ops/s
AESGCMByteBuffer.decrypt heap 1500 128   thrpt 8 153490.970 ±1991.507 ops/s
AESGCMByteBuffer.decrypt heap 4096 128   thrpt 8 59706.883 ±393.104 ops/s
AESGCMByteBuffer.decrypt heap 16384 128   thrpt 8 15282.959 ±35.228 ops/s
AESGCMByteBuffer.decryptMultiPart direct 1024 128   thrpt 8 169563.728 ±3262.014 ops/s
AESGCMByteBuffer.decryptMultiPart direct 1500 128   thrpt 8 125917.360 ±2171.133 ops/s
AESGCMByteBuffer.decryptMultiPart direct 4096 128   thrpt 8 57233.798 ±1219.124 ops/s
AESGCMByteBuffer.decryptMultiPart direct 16384 128   thrpt 8 15314.450 ±267.215 ops/s
AESGCMByteBuffer.decryptMultiPart heap 1024 128   thrpt 8 199834.254 ±2929.256 ops/s
AESGCMByteBuffer.decryptMultiPart heap 1500 128   thrpt 8 143659.707 ±2019.578 ops/s
AESGCMByteBuffer.decryptMultiPart heap 4096 128   thrpt 8 57676.269 ±760.886 ops/s
AESGCMByteBuffer.decryptMultiPart heap 16384 128   thrpt 8 14899.282 ±194.883 ops/s
AESGCMByteBuffer.encrypt direct 1024 128   thrpt 8 217833.792 ±2839.966 ops/s
AESGCMByteBuffer.encrypt direct 1500 128   thrpt 8 152150.607 ±2203.853 ops/s
AESGCMByteBuffer.encrypt direct 4096 128   thrpt 8 60091.726 ±812.084 ops/s
AESGCMByteBuffer.encrypt direct 16384 128   thrpt 8 15720.273 ±85.991 ops/s
AESGCMByteBuffer.encrypt heap 1024 128   thrpt 8 218901.548 ±2687.554 ops/s
AESGCMByteBuffer.encrypt heap 1500 128   thrpt 8 153527.621 ±1816.675 ops/s
AESGCMByteBuffer.encrypt heap 4096 128   thrpt 8 58896.329 ±1637.968 ops/s
AESGCMByteBuffer.encrypt heap 16384 128   thrpt 8 15226.399 ±17.957 ops/s
AESGCMByteBuffer.encryptMultiPart direct 1024 128   thrpt 8 197339.940 ±2428.986 ops/s
AESGCMByteBuffer.encryptMultiPart direct 1500 128   thrpt 8 136931.341 ±2782.111 ops/s
AESGCMByteBuffer.encryptMultiPart direct 4096 128   thrpt 8 59652.962 ±750.375 ops/s
AESGCMByteBuffer.encryptMultiPart direct 16384 128   thrpt 8 15667.096 ±58.490 ops/s
AESGCMByteBuffer.encryptMultiPart heap 1024 128   thrpt 8 214639.739 ±4077.556 ops/s
AESGCMByteBuffer.encryptMultiPart heap 1500 128   thrpt 8 155557.214 ±2422.094 ops/s
AESGCMByteBuffer.encryptMultiPart heap 4096 128   thrpt 8 58895.472 ±1538.650 ops/s
AESGCMByteBuffer.encryptMultiPart heap 16384 128   thrpt 8 15038.955 ±44.792 ops/s
AESGCMByteBuffer.small.decrypt direct 1024 128   thrpt 8 192555.048 ±3710.757 ops/s
AESGCMByteBuffer.small.decrypt heap 1024 128   thrpt 8 235177.894 ±4321.018 ops/s
AESGCMByteBuffer.small.decryptMultiPart direct 1024 128   thrpt 8 167625.340 ±2418.147 ops/s
AESGCMByteBuffer.small.decryptMultiPart heap 1024 128   thrpt 8 200193.172 ±3319.042 ops/s
AESGCMByteBuffer.small.encrypt direct 1024 128   thrpt 8 216340.878 ±4651.345 ops/s
AESGCMByteBuffer.small.encrypt heap 1024 128   thrpt 8 231760.813 ±4271.094 ops/s
AESGCMByteBuffer.small.encryptMultiPart direct 1024 128   thrpt 8 195748.230 ±5825.305 ops/s
AESGCMByteBuffer.small.encryptMultiPart heap 1024 128   thrpt 8 215594.033 ±4254.075 ops/s

@suchismith1993
Copy link
Contributor Author

Hi @theRealAph Can you help understand this result ? since op/s is increasing for ghash code , does it suggest a speedup ?

@theRealAph
Copy link
Contributor

Hi @theRealAph Can you help understand this result ? since op/s is increasing for ghash code , does it suggest a speedup ?

Yes, an increase in ops/s is what we want.

@suchismith1993
Copy link
Contributor Author

Thank you everyone.

@suchismith1993
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 2, 2025
@openjdk
Copy link

openjdk bot commented May 2, 2025

@suchismith1993
Your change (at version 423c868) is now ready to be sponsored by a Committer.

@TheRealMDoerr
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented May 2, 2025

Going to push as commit cdad6d7.
Since your change was applied there have been 128 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 2, 2025
@openjdk openjdk bot closed this May 2, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 2, 2025
@openjdk
Copy link

openjdk bot commented May 2, 2025

@TheRealMDoerr @suchismith1993 Pushed as commit cdad6d7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@suchismith1993
Copy link
Contributor Author

/backport jdk21u-dev

@suchismith1993
Copy link
Contributor Author

/backport jdk24u-dev

@openjdk
Copy link

openjdk bot commented May 15, 2025

@suchismith1993 the backport was successfully created on the branch backport-suchismith1993-cdad6d78-master in my personal fork of openjdk/jdk21u-dev. To create a pull request with this backport targeting openjdk/jdk21u-dev:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit cdad6d78 from the openjdk/jdk repository.

The commit being backported was authored by Suchismith Roy on 2 May 2025 and was reviewed by Martin Doerr and Amit Kumar.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21u-dev:

$ git fetch https://github.com/openjdk-bots/jdk21u-dev.git backport-suchismith1993-cdad6d78-master:backport-suchismith1993-cdad6d78-master
$ git checkout backport-suchismith1993-cdad6d78-master
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21u-dev.git backport-suchismith1993-cdad6d78-master

@openjdk
Copy link

openjdk bot commented May 15, 2025

@suchismith1993 The target repository jdk24u-dev is not a valid target for backports.
List of valid target repositories: openjdk/jdk, openjdk/jdk11u, openjdk/jdk11u-dev, openjdk/jdk17u, openjdk/jdk17u-dev, openjdk/jdk21u, openjdk/jdk21u-dev, openjdk/jdk24u, openjdk/jdk7u, openjdk/jdk8u, openjdk/jdk8u-dev, openjdk/jfx, openjdk/jfx17u, openjdk/jfx21u, openjdk/jfx24u, openjdk/lilliput-jdk17u, openjdk/lilliput-jdk21u, openjdk/shenandoah-jdk21u, openjdk/shenandoah-jdk8u.
Supplying the organization/group prefix is optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants