-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8318756 Create better internal buffer for AEADs #16487
Conversation
👋 Welcome back ascarpino! A progress list of the required criteria for merging this PR into |
@ascarpino The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Noteworthy perf data (ops/sec)
Memory reduction (bytes / op)
|
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! A few comments in line.
/** | ||
* Create an instance with no buffer | ||
*/ | ||
public AEADBufferedStream() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never used. And once you remove it, buf
is never null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
||
public AEADBufferedStream(int len) { | ||
buf = new byte[len]; | ||
count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to zero-initialize, it's zero by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
* @return internal or new byte array of non-blocksize data. | ||
*/ | ||
@Override | ||
public synchronized byte[] toByteArray() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this synchronized
; the new write
methods are not synchronized, so this does not add value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I thought I was stuck with synchronized
from the parent
*/ | ||
@Override | ||
public synchronized byte[] toByteArray() { | ||
if (buf.length > count) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever happen? And if it can, would it be better to return the whole buffer and have the caller extract the necessary range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because it can reuse the allocated buffer during encryption scenarios when multi-part ops send non-block size data. For decryption, it should never happen.
count = buf.length; | ||
} else { | ||
if (buf.length < (count + len)) { | ||
buf = Arrays.copyOf(buf, count + len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will copy a lot if the user performs many small writes, for example when decrypting with CipherInputStream; see AESGCMCipherOutputStream
benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the ByteArrayOutputStream
code, the ArraysSupport.newLength()
will double the allocation each time. So if the buffer is 1k size and it wants to add one more byte, the method will allocate 2k.
I agree that in my change, if you send one byte at a time, it will be doing a lot of allocating. But I don't believe that is the general case. If you have examples of apps doing small allocations, let me know and I can come up with a different scheme. But at this point I thought it was a bitter risk more allocations in the less-likely situation, than unused allocation in what I see as a more common case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as stated above, any application using CipherInputStream will do O(N) reallocations here, bringing back JDK-8298249; you might want to check with the reporter if this actually affects any real applications.
For reference, the results with this patch:
Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESGCMCipherOutputStream.decrypt 16384 128 thrpt 40 30325.669 ? 1616.428 ops/s
AESGCMCipherOutputStream.decrypt 1048576 128 thrpt 40 7.034 ? 0.479 ops/s
Baseline:
Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESGCMCipherOutputStream.decrypt 16384 128 thrpt 40 52171.497 ? 6229.841 ops/s
AESGCMCipherOutputStream.decrypt 1048576 128 thrpt 40 470.844 ? 179.817 ops/s
i.e. with the patch, decryption is 40% slower for 16KB stream, 98% (or 50x) slower for 1MB stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.. I updated the code to handle it more like it did. With one update(), the memory usage is still how I intended. So keeping the CipherOutputStream case, where it's multiple update() calls, it's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my change, that perf test showed a 30% for 16k and 15% for 1M perf increase
iv_index = (iv_index + 1) % IV_MODULO; | ||
return r; | ||
return new GCMParameterSpec(96, iv, iv_index, 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use 128-bit tag here.
|
||
AlgorithmParameterSpec getNewSpec() { | ||
iv_index = (iv_index + 1) % IV_MODULO; | ||
return new GCMParameterSpec(96, iv, iv_index, 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use 128-bit tag here.
|
||
AlgorithmParameterSpec getNewSpec() { | ||
iv_index = (iv_index + 1) % IV_MODULO; | ||
return new GCMParameterSpec(96, iv, iv_index, 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use 128-bit tag here.
@Benchmark | ||
public void decrypt() throws Exception { | ||
decryptCipher.init(Cipher.DECRYPT_MODE, ks, | ||
encryptCipher.getParameters(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a different method to obtain parameter spec here; the flamegraph suggests that getParameters
is responsible for up to 20% of the time spent in these benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for point that out. I have decided to leave this as is. There are optimizations that can be made to getParameters
that I would like the benchmark to measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no optimization available for getParameters
so I saved the parameters from the encryption op and it greatly improved the test performance
*/ | ||
|
||
public AEADBufferedStream(int len) { | ||
buf = new byte[len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf = new byte[len]; | |
super(len); |
otherwise buf will be initialized twice, once here, once in the base class constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.. better memory performance too.
} | ||
|
||
/* | ||
* Optimized version of bufferCrypt from CipherSpi.java. Direct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I was under the impression that the bulk of the optimization comes from using modified engineGetOutputSize
that does not allocate when isUpdate== true and decrypting. That's not documented anywhere.
} else { | ||
try { | ||
outArray = engineDoFinal(inArray, inOfs, inLen); | ||
} catch (BadPaddingException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really be hiding the BadPaddingException
here and in other places in this method? AEADBadTagException
extends BadPaddingException
, and I'm pretty sure we don't want to hide it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CipherSpi:engineUpdate() only throws ShortBufferException. That forces many exceptions into a ProviderException with extends RuntimeException. The BadPaddingException doesn't matter because CC20 doesn't have padding. AEADTagException should be specified and thrown, not wrapped.
src.position(pos + len); | ||
return; | ||
} | ||
if (buf == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from previous implementation; buf is never null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I redid a lot of this code in this update
count = buf.length; | ||
} else { | ||
if (buf.length < (count + len)) { | ||
buf = Arrays.copyOf(buf, count + len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as stated above, any application using CipherInputStream will do O(N) reallocations here, bringing back JDK-8298249; you might want to check with the reporter if this actually affects any real applications.
For reference, the results with this patch:
Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESGCMCipherOutputStream.decrypt 16384 128 thrpt 40 30325.669 ? 1616.428 ops/s
AESGCMCipherOutputStream.decrypt 1048576 128 thrpt 40 7.034 ? 0.479 ops/s
Baseline:
Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESGCMCipherOutputStream.decrypt 16384 128 thrpt 40 52171.497 ? 6229.841 ops/s
AESGCMCipherOutputStream.decrypt 1048576 128 thrpt 40 470.844 ? 179.817 ops/s
i.e. with the patch, decryption is 40% slower for 16KB stream, 98% (or 50x) slower for 1MB stream.
return bufferCrypt(input, output, true); | ||
} catch (AEADBadTagException e) { | ||
// exception is never thrown by update ops | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 0; | |
throw new AssertionError(e); |
Changed aeadbuffer usage away from byte[].length to aeadbuffer.size()
int blen = buf.length; | ||
// Create a new larger buffer and append the new data | ||
if (blen < count + len) { | ||
buf = Arrays.copyOf(buf, ArraysSupport.newLength(blen, blen + len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf = Arrays.copyOf(buf, ArraysSupport.newLength(blen, blen + len, | |
buf = Arrays.copyOf(buf, ArraysSupport.newLength(blen, count + len - blen, |
the second parameter is minGrowth; count+len-blen
would be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it should be len
. That is the minGrowth as that is the size being added to the buffer, and Math.max(len, blen)
* method must use {@code size()} for the relevant data length as the | ||
* returning byte[] maybe larger. | ||
* | ||
* @return internal or new byte array of non-blocksize data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the doc - the method always returns the internal array.
I would create a new method to retrieve the internal buffer instead of overriding toByteArray
; it would be less surprising to the future code editors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
@ascarpino This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 772 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit dc9c77b.
Your commit was automatically rebased without conflicts. |
@ascarpino Pushed as commit dc9c77b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
@ascarpino I'm seeing GHA test failures on
[1] https://github.com/eirbjo/jdk/actions/runs/7145974602#user-content-com_sun_crypto_provider_cipher_aead_aeadbuffertest |
This change did not introduce the problem. The test just exposed the problem that has existed since https://bugs.openjdk.org/browse/JDK-8247645 was integrated in JDK 20. A bug was filed: https://bugs.openjdk.org/browse/JDK-8321542 |
Hi,
I need a review for a new internal buffer class called AEADBufferStream. AEADBufferStream extends ByteArrayOutputStream, but eliminates some data checking and copying that are not necessary for what GaloisCounterMode.java and ChaCha20Cipher.java need.
The changes greatest benefit is with decryption operations. ChaCha20-Poly1305 had larger performance gains by adopting similar techniques that AES/GCM already uses.
The new buffer shows up to 21% bytes/sec performance increase for decryption for ChaCha20-Poly1305 and 12% for AES/GCM. 16K data sizes saw a memory usage reduction of 46% with and 83% with ChaCha20-Poly1305. These results come from the JMH tests updated in this request and memory usage using the JMH gc profile gc.alloc.rate.norm entry
thanks
Tony
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16487/head:pull/16487
$ git checkout pull/16487
Update a local copy of the PR:
$ git checkout pull/16487
$ git pull https://git.openjdk.org/jdk.git pull/16487/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16487
View PR using the GUI difftool:
$ git pr show -t 16487
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16487.diff
Webrev
Link to Webrev Comment