Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8298381: Improve handling of session tickets for multiple SSLContexts #11590

Closed
wants to merge 7 commits into from

Conversation

simonis
Copy link
Member

@simonis simonis commented Dec 8, 2022

Currently, TLS session tickets introduced by JDK-8211018 in JDK 13 (i.e. SessionTicketExtension$StatelessKey) are generated in the class SessionTicketExtension and they use a single, global key ID (currentKeyID) for all SSLContexts.

This is problematic if more than one SSLContext is used, because every context which requests a session ticket will increment the global id currentKeyID when it creates a ticket. This means that in turn all the other contexts won't be able to find a ticket under the new id in their SSLContextImpl and create a new one (again incrementing currentKeyID). In fact, every time a ticket is requested from a different context, this will transitively trigger a new ticket creation in all the other contexts. We've observed millions of session ticket accumulating for some workloads.

Another issue with the curent implementation is that cleanup is racy because the underlying data structure (i.e. keyHashMap in SSLContextImpl) as well as the cleanup code itself are not threadsafe.

I therefor propose to move currentKeyID into the SSLContextImpl to solve these issues.

The following test program (contributed by Steven Collison (https://raycoll.com/)) can be used to demonstrate the current behaviour. It outputs the number of StatelessKey instances at the end of the program. Opening 1000 connections with a single SSLContext results in a single StatelessKey instance being created:

$ java -XX:+UseSerialGC -Xmx16m -cp ~/Java/ SSLSocketServerMultipleSSLContext 9999 1 1000
605:             1             32  sun.security.ssl.SessionTicketExtension$StatelessKey (java.base@20-internal)

The same example with the 1000 connections being opened alternatively on thwo different contexts will instead create 1000 StatelessKey instances:

$ java -XX:+UseSerialGC -Xmx16m -cp ~/Java/ SSLSocketServerMultipleSSLContext 9999 2 1000
  11:          1000          32000  sun.security.ssl.SessionTicketExtension$StatelessKey (java.base@20-internal)

With my proposed patch, the numbers goes back to two instances again:

$ java -XX:+UseSerialGC -Xmx16m -cp ~/Java/ SSLSocketServerMultipleSSLContext 9999 2 1000
611:             2             64  sun.security.ssl.SessionTicketExtension$StatelessKey (java.base@20-internal)

I've attached the test program to the JBS issue. If you think it makes sense, I can probably convert it into a JTreg test.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8298381: Improve handling of session tickets for multiple SSLContexts

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11590/head:pull/11590
$ git checkout pull/11590

Update a local copy of the PR:
$ git checkout pull/11590
$ git pull https://git.openjdk.org/jdk pull/11590/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11590

View PR using the GUI difftool:
$ git pr show -t 11590

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11590.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 8, 2022

👋 Welcome back simonis! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 8, 2022
@openjdk
Copy link

openjdk bot commented Dec 8, 2022

@simonis The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the security security-dev@openjdk.org label Dec 8, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 8, 2022

Webrevs

@mrserb
Copy link
Member

mrserb commented Dec 8, 2022

I have asked some of the next questions already here. Would like to mention some of them;

  • The main question I have: is it safe to assume that the SSLContextImpl can be shared across the threads? If yes, it seems it lacks synchronization here and there. Like this patch added a synchronization around getNextKey/cleanup/destroy methods but it does not prevent usage of the key after creation and destroying it by different threads.
  • Is it safe to have duplicated currentKeyID per ssl context and use that during encryption/description as part of the "Additional Authentication Data"?

@simonis
Copy link
Member Author

simonis commented Dec 11, 2022

I have asked some of the next questions already here. Would like to mention some of them;

* The main question I have: is it safe to assume that the [SSLContextImpl](https://github.com/openjdk/jdk/pull/11590/files#diff-fc40f443cd9d2236d4279b83e6b6e662771d08faa571851f9de3de4925a2c36c) can be shared across the threads? 

I can't answer this for sure but at least I haven't found any documentation which would prevent multi-threaded usage. On the other hand, SSLContextImpl contains the following implementation note:

 * Implementation note: Instances of this class and the child classes are
 * immutable, except that the context initialization (SSLContext.init()) may
 * reset the key, trust managers and source of secure random.

So this comment will probably have to be removed once we move the session ticket key (i.e. currentKeyID) into this class. But the session key hash map has already lived in SSLContextImpl before, so the claim has already been violated.

*..If yes, it seems it lacks synchronization here and there. Like this patch added a synchronization around getNextKey/cleanup/destroy methods but it does not prevent usage of the key after creation and destroying it by different threads.

You're right, but that's actually an improvement compared to the initial implementation where cleanup/destroy wasn't synchronized at all :)

With regards to the missing synchronization of key usage and key destruction, I think this patch doesn't change the existing behavior because it wasn't synchronized before either. On the other hand, this patch fixes the implementation such that it only creates new tickets every hour (i.e. jdk.tls.server.statelessKeyTimeout) per context by default. This means that each keyHashMap should not contain more than 24 valid session keys by default (i.e. jdk.tls.server.sessionTicketTimeout/jdk.tls.server.statelessKeyTimeout) and cleanup should be called at most once per hour. Even if a session key which is in use will be destroyed, the worst case consequence would be that the verification of a corresponding session ticket could fail. But that is not critical because the SSL handshake would just fail and fall back to full session renegotiation in that case.

* Is it safe to have duplicated currentKeyID per ssl context and use that during encryption/description as part of the "Additional Authentication Data"?

With this change there will be no duplicate key IDs per context, only different contexts might eventually use the same session key (i.e. currentKeyID). But that should still be quite rare, because they both start from a different random number. And even if it happens, I don't see a problem with it. According to RFC 5077, the key ID is an opaque data structure which is transmitted in plain text. It is only used by the server to identify the correct private key for decrypting the clients session ticket. It's up to the application to make sure that clients with session tickets will connect to the correct context which is capable of decrypting the ticket. Notice, that this was already the case in the original implementation where different contexts had different key IDs but still every context could only decrypt session tickets with keys it had issued before. I think "JDK-8250965: Distributed TLS sessions implementation" is an attempt to solve this problem more generally.

@XueleiFan
Copy link
Member

The same example with the 1000 connections being opened alternatively on two different contexts will instead create 1000 StatelessKey instances:

That's obviously not the expected behaviors. It is a good catch for the static currentKeyID issue.

What do you think to move SSLContextImpl.keyHashMap into SSLSessionContextImpl? I would like to have SSLContextImpl focusing on configuration.

@mrserb
Copy link
Member

mrserb commented Dec 11, 2022

You're right, but that's actually an improvement compared to the initial implementation where cleanup/destroy wasn't synchronized at all :)
With regards to the missing synchronization of key usage and key destruction, I think this patch doesn't change the existing behavior because it wasn't synchronized before either.

I think behavior is changed, since the synchronization problem was hidden by generation of many keys. And if we start to use one key by many threads, we will need to carefully sync it, but if we just add synchronization per ssl contex we will make encode/decode methods single threaded per ssl context, which is unfortunate.

@simonis
Copy link
Member Author

simonis commented Dec 12, 2022

The same example with the 1000 connections being opened alternatively on two different contexts will instead create 1000 StatelessKey instances:

That's obviously not the expected behaviors. It is a good catch for the static currentKeyID issue.

What do you think to move SSLContextImpl.keyHashMap into SSLSessionContextImpl? I would like to have SSLContextImpl focusing on configuration.

Do you propose to only move SSLContextImpl.keyHashMap, or both SSLContextImpl.keyHashMap and SSLContextImpl.currentKeyID or SSLContextImpl.keyHashMap, SSLContextImpl.currentKeyID and all the corresponding accessor methods (i.e. addSessionKey(), cleanupSessionKeys(), getKey() and getID()) into SSLSessionContextImpl?

Also, I assume you'd probably like to keep it in SSLContextImpl.serverCache rather then SSLContextImpl.clinetCache, right?

@simonis
Copy link
Member Author

simonis commented Dec 12, 2022

You're right, but that's actually an improvement compared to the initial implementation where cleanup/destroy wasn't synchronized at all :)
With regards to the missing synchronization of key usage and key destruction, I think this patch doesn't change the existing behavior because it wasn't synchronized before either.

I think behavior is changed, since the synchronization problem was hidden by generation of many keys. And if we start to use one key by many threads, we will need to carefully sync it, but if we just add synchronization per ssl contex we will make encode/decode methods single threaded per ssl context, which is unfortunate.

Not sure what you mean? Do you refer to the SessionTicketSpecs encrypt()/decrypt() methods?What do you mean by "we make them single threaded per ssl context"?

encrypt() will call getCurrentKey() which isn't synchronized. Only once an hour or so, when the current key has expired, getCurrentKey() will call getNextKey() which is synchronized on the ssl context. decrypt() only calls getKey() which is never synchronized. I can't see a problem here.

@mrserb
Copy link
Member

mrserb commented Dec 12, 2022

encrypt() will call getCurrentKey() which isn't synchronized.

If we think that all that try/catch blocks in the encode/cleanup will save us, then why we added the sync block around cleanup. But If we try synchronize work with keys, and added the new block around cleanup then why the usage of the keys(who call getCurrentKey) are not synchronized by anything, am I not sure that it is safe to use the key after/during destruction, probably some write/read locks will help.

@simonis
Copy link
Member Author

simonis commented Dec 20, 2022

Hi @XueleiFan,

I've moved keyHashMap and currentKeyID into SSLSessionContextImpl as you've requested and also did some refactoring and simplification (i.e. moved most of the implementation from SessionTicketExtension to SSLSessionContextImpl).

Can you please have a look at the new version?

Thanks,
Volker

@simonis
Copy link
Member Author

simonis commented Dec 22, 2022

Hi @XueleiFan, @ascarpino,

Thank you for your reviews and comments. I think I've addressed all of your suggestions in the latest version of the PR.

There's just one more thing I had to change. Now that I've moved all the logic to SSLSessionContextImpl I can't get a reference to SSLContextImpls secure random instance any more when creating a new StatelessKey. For simplicity I've therefor decided to use the KeyGenerator.init() method which doesn't require an extra secure random instance.

If you think that it is important that the KeyGenerator uses the same secure random instance like the corresponding SSLContextImpl, I can easily change SSLSessionContextImpl::getKey() to take a SecureRandom argument and pass SSLContextImpls secure random instance down to SSLSessionContextImpl::getKey() (and up to StatelessKey::<init>) when calling getKey() from KeyState::getCurrentKey().

What do you think?

@ascarpino
Copy link
Contributor

For simplicity I've therefor decided to use the KeyGenerator.init() method which
doesn't require an extra secure random instance.

I understand the logic, but some users like to use a particular provider, keygen should use the random specified in the HandshakeContext.

@ascarpino
Copy link
Contributor

@simonis, I don't think @XueleiFan and I are going to finish reviews by Friday. As Oracle has next week off, I suggest you update the copyrights to 2023 and we can integrate next year one all the comments have been resolved.

…mprove initialization in SSLSessionContextImpl
@simonis simonis closed this Dec 23, 2022
@simonis simonis reopened this Dec 23, 2022
@simonis
Copy link
Member Author

simonis commented Dec 23, 2022

@simonis, I don't think @XueleiFan and I are going to finish reviews by Friday. As Oracle has next week off, I suggest you update the copyrights to 2023 and we can integrate next year one all the comments have been resolved.

From my understanding there hasn't been much left to resolve except using HandshakeContexts secure random for the key generation in StatelessKey and some minor initialization improvements in SSLSessionContextImpl which I've just pushed both.

So if you'll find some time to approve the PR this year it'll be great. Otherwise, it will be easy to update the copyright years if that will be last issue :)

Thanks for your support and have a nice and peaceful end of year,
Volker

Copy link
Member

@XueleiFan XueleiFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@openjdk
Copy link

openjdk bot commented Jan 2, 2023

@simonis This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8298381: Improve handling of session tickets for multiple SSLContexts

Reviewed-by: xuelei, ascarpino, serb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 362 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 2, 2023
@simonis
Copy link
Member Author

simonis commented Jan 3, 2023

Looks good to me. Thanks!

Thanks @XueleiFan!

I've updated the copyright year to 2023 and will wait one or two more days just in case @ascarpino wants to take one more look as well.

@jnimeh
Copy link
Member

jnimeh commented Jan 3, 2023

Hi @simonis, I am sorry for chiming in so late on this issue. I do think it might be worthwhile to make your proof-of-concept code into a jtreg test as you mentioned in your summary. I think it really comes down to how feasible the conversion would be. It's always better to have an automated test if we can, but it depends on if jtreg code can get access to reliable information about the session tickets via the session cache and know that things are behaving as intended.

@simonis
Copy link
Member Author

simonis commented Jan 9, 2023

Hi @simonis, I am sorry for chiming in so late on this issue. I do think it might be worthwhile to make your proof-of-concept code into a jtreg test as you mentioned in your summary. I think it really comes down to how feasible the conversion would be. It's always better to have an automated test if we can, but it depends on if jtreg code can get access to reliable information about the session tickets via the session cache and know that things are behaving as intended.

I don't think the current reproducer can easily be converted into a cheap and stable jterg test. The current version is quite heavyweight because it calls jcmd GC.class_histogram and not really stable because it depends on the number of live StatelessKey objects in the heap. So for now I'd prefer to finally fix this issue even without an attached automatic test.

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/reviewer credit ascarpino

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/reviewer credit mrserb

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis
Reviewer ascarpino successfully credited.

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/reviewer credit schlosna

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis Could not parse mrserb as a valid reviewer.
Syntax: /reviewer (credit|remove) [@user | openjdk-user]+. For example:

  • /reviewer credit @openjdk-bot
  • /reviewer credit duke
  • /reviewer credit @user1 @user2

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis Could not parse schlosna as a valid reviewer.
Syntax: /reviewer (credit|remove) [@user | openjdk-user]+. For example:

  • /reviewer credit @openjdk-bot
  • /reviewer credit duke
  • /reviewer credit @user1 @user2

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/reviewer credit @mrserb @schlosna

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis Could not parse @schlosna as a valid reviewer.
Syntax: /reviewer (credit|remove) [@user | openjdk-user]+. For example:

  • /reviewer credit @openjdk-bot
  • /reviewer credit duke
  • /reviewer credit @user1 @user2

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/reviewer credit @mrserb

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis
Reviewer serb successfully credited.

@simonis
Copy link
Member Author

simonis commented Jan 10, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jan 10, 2023

Going to push as commit debe587.
Since your change was applied there have been 362 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 10, 2023
@openjdk openjdk bot closed this Jan 10, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 10, 2023
@openjdk
Copy link

openjdk bot commented Jan 10, 2023

@simonis Pushed as commit debe587.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated security security-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants