-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8285398: Cache the results of constraint checks #8349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back djelinski! A progress list of the required criteria for merging this PR into |
|
@djelinski The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
before: after: After this patch the HandshakeContext initialization practically disappears from the CPU profile; it only takes ~5% in TLS1.2 resumption, and much less in the remaining scenarios. |
|
Mailing list message from Peter Firmstone on security-dev: Nice. On 22/04/2022 6:47 am, Daniel Jeli?ski wrote: -- Peter Firmstone |
| Boolean result = cache.get(algorithm); | ||
| if (result != null) { | ||
| return result; | ||
| } | ||
| result = checkAlgorithm(disabledAlgorithms, algorithm, decomposer); | ||
| cache.put(algorithm, result); | ||
| return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth using cache.computeIfAbsent or do you want to avoid lambda allocation overhead and potentially blocking concurrent handshakes on writer thread?
| Boolean result = cache.get(algorithm); | |
| if (result != null) { | |
| return result; | |
| } | |
| result = checkAlgorithm(disabledAlgorithms, algorithm, decomposer); | |
| cache.put(algorithm, result); | |
| return result; | |
| return cache.computeIfAbsent(algorithm, algo -> checkAlgorithm(disabledAlgorithms, algo, decomposer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer using get over computeIfAbsent when optimizing for performance. While computeIfAbsent is more concise, it's also much slower than get when the entry is already present.
|
|
||
| private final Set<String> disabledAlgorithms; | ||
| private final Constraints algorithmConstraints; | ||
| private volatile SoftReference<Map<String, Boolean>> cacheRef = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a one-clear-for-all mechanism. Once the clearing happens, the full map will be collected. For SoftReferences, it clears them fairly eagerly. Maybe, the performance could be further improved in practice by using soft reference for each map entry (See sun.security.util.Cache code for an example).
I will have another look next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, soft references are likely to be cleaned if they are not used in an entire GC cycle.
Using a soft reference for each map entry would not help here; note that all keys and all values in this map are GC roots (keys are enum names, values are TRUE and FALSE), so in practice they would never be collected.
Even if we made the entries collectable, it would not help with the TLS case; SSLEngine & SSLSocket only check the constraints once at the beginning of the handshake, so they would all expire at the same time anyway. What's even worse, we would then have to clear the expired entries from the map.
I also considered limiting the size of the map. That's a tricky subject; we would need to get the size limit exactly right. Given that the algorithms are always checked in the same order, if the cache were too small, we would get zero hits.
With all the above in mind I decided not to use sun.security.util.Cache here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW: I wouldn't expect SoftReference (as opposed to WeakReference) to be eagerly cleaned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SoftReferences are guaranteed to survive one GC after use; beyond that their lifespan is determined by SoftRefLRUPolicyMSPerMB and the amount of memory available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the above in mind I decided not to use
sun.security.util.Cachehere
I was not meant to use Cache and timeout for this update.
SoftReference and this patch should work good in larger memory boxes. Performance improvement sometimes is a trade-off game between memory and CPU. Did you have a chance to check the performance in the limited memory circumstance? like '-mx64M".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run a few tests:
-Xmx16m, with this patch, still better than the original version:
Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units
SSLHandshake.doHandshake true TLSv1.2 thrpt 5 4477.444 ± 375.975 ops/s
SSLHandshake.doHandshake true TLS thrpt 5 681.471 ± 72.531 ops/s
SSLHandshake.doHandshake false TLSv1.2 thrpt 5 335.366 ± 89.056 ops/s
SSLHandshake.doHandshake false TLS thrpt 5 308.711 ± 90.134 ops/s
-Xmx8m, before patch:
Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units
SSLHandshake.doHandshake true TLSv1.2 thrpt 5 153.760 ± 12.025 ops/s
SSLHandshake.doHandshake true TLS thrpt 5 70.700 ± 7.506 ops/s
SSLHandshake.doHandshake false TLSv1.2 thrpt 5 67.459 ± 4.325 ops/s
SSLHandshake.doHandshake false TLS thrpt 5 64.695 ± 1.647 ops/s
after:
Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units
SSLHandshake.doHandshake true TLSv1.2 thrpt 5 557.324 ± 50.269 ops/s
SSLHandshake.doHandshake true TLS thrpt 5 155.258 ± 13.866 ops/s
SSLHandshake.doHandshake false TLSv1.2 thrpt 5 181.755 ± 29.319 ops/s
SSLHandshake.doHandshake false TLS thrpt 5 120.627 ± 25.832 ops/s
Much slower, but still faster with the patch.
With -Xmx4m the test failed with OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional testing. The improvement is impressive.
ascarpino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this idea. It's not caching the whole list like I struggled with, but it's a good solution for the algorithm name based constraints. It also shows that more caching probably would help further.
I'll let Xuelei finish his review comments
coffeys
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nice performance boost in this area. Looks good to me.
|
@djelinski This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 47 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
+1. |
|
Thanks for reviewing! /integrate |
|
Going to push as commit 00e9c96.
Your commit was automatically rebased without conflicts. |
|
@djelinski Pushed as commit 00e9c96. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Profiling the TLS handshakes using SSLHandshake benchmark shows that a large portion of time is spent in HandshakeContext initialization, specifically in DisabledAlgorithmConstraints class.
There are only a few instances of that class, and they are immutable. Caching the results should be a low-risk operation.
The cache is implemented as a softly reachable ConcurrentHashMap; this way it can be removed from memory after a period of inactivity. Under normal circumstances the cache holds no more than 100 algorithms.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8349/head:pull/8349$ git checkout pull/8349Update a local copy of the PR:
$ git checkout pull/8349$ git pull https://git.openjdk.java.net/jdk pull/8349/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8349View PR using the GUI difftool:
$ git pr show -t 8349Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8349.diff