Skip to content

Conversation

@djelinski
Copy link
Member

@djelinski djelinski commented Apr 21, 2022

Profiling the TLS handshakes using SSLHandshake benchmark shows that a large portion of time is spent in HandshakeContext initialization, specifically in DisabledAlgorithmConstraints class.

There are only a few instances of that class, and they are immutable. Caching the results should be a low-risk operation.

The cache is implemented as a softly reachable ConcurrentHashMap; this way it can be removed from memory after a period of inactivity. Under normal circumstances the cache holds no more than 100 algorithms.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8349/head:pull/8349
$ git checkout pull/8349

Update a local copy of the PR:
$ git checkout pull/8349
$ git pull https://git.openjdk.java.net/jdk pull/8349/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8349

View PR using the GUI difftool:
$ git pr show -t 8349

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8349.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 21, 2022

👋 Welcome back djelinski! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 21, 2022

@djelinski The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the security security-dev@openjdk.org label Apr 21, 2022
@djelinski
Copy link
Member Author

before:

Benchmark                 (resume)  (tlsVersion)   Mode  Cnt     Score      Error  Units
SSLHandshake.doHandshake      true       TLSv1.2  thrpt    5  2165.081 ± 440.204  ops/s
SSLHandshake.doHandshake      true           TLS  thrpt    5   534.681 ± 146.931  ops/s
SSLHandshake.doHandshake     false       TLSv1.2  thrpt    5   369.104 ±  11.143  ops/s
SSLHandshake.doHandshake     false           TLS  thrpt    5   253.903 ±  58.056  ops/s

after:

Benchmark                 (resume)  (tlsVersion)   Mode  Cnt      Score     Error  Units
SSLHandshake.doHandshake      true       TLSv1.2  thrpt    5  10440.501 ± 478.177  ops/s
SSLHandshake.doHandshake      true           TLS  thrpt    5    762.995 ±  33.842  ops/s
SSLHandshake.doHandshake     false       TLSv1.2  thrpt    5    440.471 ±  52.867  ops/s
SSLHandshake.doHandshake     false           TLS  thrpt    5    305.928 ±  57.847  ops/s

After this patch the HandshakeContext initialization practically disappears from the CPU profile; it only takes ~5% in TLS1.2 resumption, and much less in the remaining scenarios.

@djelinski djelinski marked this pull request as ready for review April 21, 2022 20:42
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 21, 2022
@mlbridge
Copy link

mlbridge bot commented Apr 21, 2022

Webrevs

@mlbridge
Copy link

mlbridge bot commented Apr 21, 2022

Mailing list message from Peter Firmstone on security-dev:

Nice.

On 22/04/2022 6:47 am, Daniel Jeli?ski wrote:

--
Regards,

Peter Firmstone
0498 286 363
Zeus Project Services Pty Ltd.

Comment on lines +963 to +969
Boolean result = cache.get(algorithm);
if (result != null) {
return result;
}
result = checkAlgorithm(disabledAlgorithms, algorithm, decomposer);
cache.put(algorithm, result);
return result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth using cache.computeIfAbsent or do you want to avoid lambda allocation overhead and potentially blocking concurrent handshakes on writer thread?

Suggested change
Boolean result = cache.get(algorithm);
if (result != null) {
return result;
}
result = checkAlgorithm(disabledAlgorithms, algorithm, decomposer);
cache.put(algorithm, result);
return result;
return cache.computeIfAbsent(algorithm, algo -> checkAlgorithm(disabledAlgorithms, algo, decomposer));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer using get over computeIfAbsent when optimizing for performance. While computeIfAbsent is more concise, it's also much slower than get when the entry is already present.


private final Set<String> disabledAlgorithms;
private final Constraints algorithmConstraints;
private volatile SoftReference<Map<String, Boolean>> cacheRef =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a one-clear-for-all mechanism. Once the clearing happens, the full map will be collected. For SoftReferences, it clears them fairly eagerly. Maybe, the performance could be further improved in practice by using soft reference for each map entry (See sun.security.util.Cache code for an example).

I will have another look next week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, soft references are likely to be cleaned if they are not used in an entire GC cycle.
Using a soft reference for each map entry would not help here; note that all keys and all values in this map are GC roots (keys are enum names, values are TRUE and FALSE), so in practice they would never be collected.
Even if we made the entries collectable, it would not help with the TLS case; SSLEngine & SSLSocket only check the constraints once at the beginning of the handshake, so they would all expire at the same time anyway. What's even worse, we would then have to clear the expired entries from the map.

I also considered limiting the size of the map. That's a tricky subject; we would need to get the size limit exactly right. Given that the algorithms are always checked in the same order, if the cache were too small, we would get zero hits.
With all the above in mind I decided not to use sun.security.util.Cache here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: I wouldn't expect SoftReference (as opposed to WeakReference) to be eagerly cleaned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SoftReferences are guaranteed to survive one GC after use; beyond that their lifespan is determined by SoftRefLRUPolicyMSPerMB and the amount of memory available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all the above in mind I decided not to use sun.security.util.Cache here

I was not meant to use Cache and timeout for this update.

SoftReference and this patch should work good in larger memory boxes. Performance improvement sometimes is a trade-off game between memory and CPU. Did you have a chance to check the performance in the limited memory circumstance? like '-mx64M".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run a few tests:
-Xmx16m, with this patch, still better than the original version:

Benchmark                 (resume)  (tlsVersion)   Mode  Cnt     Score     Error  Units
SSLHandshake.doHandshake      true       TLSv1.2  thrpt    5  4477.444 ± 375.975  ops/s
SSLHandshake.doHandshake      true           TLS  thrpt    5   681.471 ±  72.531  ops/s
SSLHandshake.doHandshake     false       TLSv1.2  thrpt    5   335.366 ±  89.056  ops/s
SSLHandshake.doHandshake     false           TLS  thrpt    5   308.711 ±  90.134  ops/s

-Xmx8m, before patch:

Benchmark                 (resume)  (tlsVersion)   Mode  Cnt    Score    Error  Units
SSLHandshake.doHandshake      true       TLSv1.2  thrpt    5  153.760 ± 12.025  ops/s
SSLHandshake.doHandshake      true           TLS  thrpt    5   70.700 ±  7.506  ops/s
SSLHandshake.doHandshake     false       TLSv1.2  thrpt    5   67.459 ±  4.325  ops/s
SSLHandshake.doHandshake     false           TLS  thrpt    5   64.695 ±  1.647  ops/s

after:

Benchmark                 (resume)  (tlsVersion)   Mode  Cnt    Score    Error  Units
SSLHandshake.doHandshake      true       TLSv1.2  thrpt    5  557.324 ± 50.269  ops/s
SSLHandshake.doHandshake      true           TLS  thrpt    5  155.258 ± 13.866  ops/s
SSLHandshake.doHandshake     false       TLSv1.2  thrpt    5  181.755 ± 29.319  ops/s
SSLHandshake.doHandshake     false           TLS  thrpt    5  120.627 ± 25.832  ops/s

Much slower, but still faster with the patch.
With -Xmx4m the test failed with OOM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional testing. The improvement is impressive.

Copy link
Contributor

@ascarpino ascarpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this idea. It's not caching the whole list like I struggled with, but it's a good solution for the algorithm name based constraints. It also shows that more caching probably would help further.
I'll let Xuelei finish his review comments

Copy link
Contributor

@coffeys coffeys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice performance boost in this area. Looks good to me.

@openjdk
Copy link

openjdk bot commented Apr 25, 2022

@djelinski This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8285398: Cache the results of constraint checks

Reviewed-by: coffeys, xuelei

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 47 new commits pushed to the master branch:

  • 4bf2c18: 8285435: Show file and line in MacroAssembler::verify_oop for AArch64 and RISC-V platforms (Port from x86)
  • d435d69: 8285614: Fix typo in java.lang.Float
  • 3416bfa: 8283022: com/sun/crypto/provider/Cipher/AEAD/GCMBufferTest.java failing with -Xcomp after 8273297
  • 80a7f7b: 8267690: Revisit (Doc)Tree search implemented by throwing an exception
  • 9b82708: 8284319: Test runtime/cds/appcds/TestParallelGCWithCDS.java fails in repo-loom
  • fb60594: 8285477: Add a PRECISION public static field to j.l.Float and j.l.Double
  • 1e79ded: 8284889: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc timed out
  • 414918d: 8285389: EdDSA trimming zeros
  • 293bc5e: 8129778: Few awt test fail for Solaris 11 with RuntimeException
  • 36f2e52: 8225777: java/awt/Mixing/MixingOnDialog.java fails on Ubuntu
  • ... and 37 more: https://git.openjdk.java.net/jdk/compare/81a8e2f8b32ad27aed45c4f6966e8d9ecf8b0fc9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 25, 2022
@XueleiFan
Copy link
Member

It also shows that more caching probably would help further.

+1.

@djelinski
Copy link
Member Author

Thanks for reviewing!

/integrate

@openjdk
Copy link

openjdk bot commented Apr 26, 2022

Going to push as commit 00e9c96.
Since your change was applied there have been 47 commits pushed to the master branch:

  • 4bf2c18: 8285435: Show file and line in MacroAssembler::verify_oop for AArch64 and RISC-V platforms (Port from x86)
  • d435d69: 8285614: Fix typo in java.lang.Float
  • 3416bfa: 8283022: com/sun/crypto/provider/Cipher/AEAD/GCMBufferTest.java failing with -Xcomp after 8273297
  • 80a7f7b: 8267690: Revisit (Doc)Tree search implemented by throwing an exception
  • 9b82708: 8284319: Test runtime/cds/appcds/TestParallelGCWithCDS.java fails in repo-loom
  • fb60594: 8285477: Add a PRECISION public static field to j.l.Float and j.l.Double
  • 1e79ded: 8284889: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc timed out
  • 414918d: 8285389: EdDSA trimming zeros
  • 293bc5e: 8129778: Few awt test fail for Solaris 11 with RuntimeException
  • 36f2e52: 8225777: java/awt/Mixing/MixingOnDialog.java fails on Ubuntu
  • ... and 37 more: https://git.openjdk.java.net/jdk/compare/81a8e2f8b32ad27aed45c4f6966e8d9ecf8b0fc9...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 26, 2022
@openjdk openjdk bot closed this Apr 26, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 26, 2022
@openjdk
Copy link

openjdk bot commented Apr 26, 2022

@djelinski Pushed as commit 00e9c96.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated security security-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

6 participants