Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client-cert is absent from resumed sessions #1992

Closed
wtarreau opened this issue Mar 4, 2022 · 11 comments · Fixed by #2103
Closed

client-cert is absent from resumed sessions #1992

wtarreau opened this issue Mar 4, 2022 · 11 comments · Fixed by #2103

Comments

@wtarreau
Copy link

wtarreau commented Mar 4, 2022

Hi,

given that I had the question today relayed from a customer by our support team, I had a second look at the draft to see what was planned for resumed connections, and I'm seeing this:

... and proves possession of the corresponding private key to a server when
    negotiating a TLS connection or the resumption of such a connection ...

The problem is, on resumed connections, the client does not present the certificate again, it will use a ticket or a session ID to match a cached entry or any other mechanism I'm not familiar with (please bear with me, I'm not SSL-fluent). In our case when this happens, the related fields are empty (since the info is missing from the TLS stack), and we encourage customers to log something else as a complement to match new connections against a previously verified session (e.g. session ID).

I think it's important to explicitly state that such a certificate info is not necessarily available in such a case, and that either the proxy+application have to find other ways to persist the information between connections, or that the cases where the certificate is matched need to be narrowed down to the strict minimum. For example, I remember a decade ago when working for a bank that the security team absolutely wanted to distinguish really authenticated connections and resumed ones and that in particular they refused to consider as 2FA something that resulted from a resumed connection because they wouldn't be able to prove that the authentication happened at this precise instant if they had to defend this in court.

After the different versions of TLS evolved, I'm not sure whether there is a stable session identifier that's available across multiple versions, but I do think that if we document a common method to pass this client-cert info to the server, we also need to provide "something" for all the resumed connections that do not convey that info anymore, otherwise users might fall back to poor solutions due to the difficulty of understanding all the details of the various versions (and it's very likely that I've been one of those giving such wrong advices, not knowing how to do better). It's particularly important in load-balanced environments because you cannot count on the front LB to keep a copy of the client-cert, as the resumed connection might very well end up on another node, and you cannot count on the server either since different requests will often reach different servers. So the only thing that remains is that "thing" that the client uses to be recognized on resumption (sorry for the imprecise language, it seems for me that up to TLS1.2 it was a ticket but I'm not sure for 1.3).

@martinthomson
Copy link
Contributor

Hi Willy,

This is likely a limitation of your TLS implementation. A perfectly reasonable limitation, but not something that is consistent across all implementations.

What happens with resumption is that any state from the initial connection can be carried across over into a resumed session. Obviously, things that happened on the first connection after a ticket was issued can't be assumed to be available, but generally anything that happened in the original handshake could be used.

What the specification assumes is that the details of the handshake are remembered and can be used again. There's a bunch of other stuff like which TLS version and whatnot, some of which the specification requires endpoints to remember (see for example this paragraph, which insists that the KDF remain the same). When it comes to authentication state, however, TLS is not very consistent and so implementations are not.

Some implementations (like NSS and Boring SSL) remember the client certificate. The way they do this is they put the whole certificate chain in the session ticket. That makes the tickets pretty large, which can badly affect the resumption handshake: whatever you save on sending certificates is eaten up with a huge ticket in the first message. Worse, this first message is bad DoS mojo as you have to accept a whole lot of state before you are really sure that the client is OK (there are ways around this, but they are all pretty bad). Others might save the certificates server side, but that adds a whole different type of fragility. Others just "forget" that the client offered a certificate. That last one is probably what you are seeing here.

This is a whole new sort of problem to solve. What you might need in this case is a way to recover an old certificate from a log based on some identifier. The problem here is that the identifier isn't stable either. TLS doesn't have a single consistent value you can use across all versions and all types of resumption.

@wtarreau
Copy link
Author

wtarreau commented Mar 8, 2022

Hi Martin,

thanks for the detailed explanation. The only viable way to convery the cert info is via the ticket anyway, because several consecutive connections may reach different LBs and there's no way to be faster than light to synchronize them on any external data. What you mention about BoringSSL is interesting given that it derives from OpenSSL. I'll suggest my coworkers to have a look there, in case we'd find that the behavior is configurable.

Anyway, my point remains about the importance of mentioning this in the spec (not the details). Just putting a big warning about the fact that the presence of certificate information in resumed sessions is implementation-dependent, because that's an easy trap everyone falls into, and that's too bad when it serves to design an architecture and is discovered after deployment.

@davidben
Copy link
Contributor

davidben commented Mar 8, 2022

(BoringSSL's behavior was one of the things we changed from OpenSSL's. OpenSSL only retains the leaf certificate across resumption. We found that having different information available across resumptions led to bugs.)

That, alas, does lead to a huge ticket as @martinthomson mentioned. There's a third solution which avoids this and I think demonstrates an issue with this draft's kind of TLS/HTTP split. (We haven't implemented this in BoringSSL, as client certs aren't a huge priority for us, but I think it's clearly the right design.)

At the end of the day, your server presumably uses client certificates as the proof for some sort of application-specific identity. That identity may be an email address, a username, or something else entirely. You can think of the VerifyCertificate process not as returning yes/no, but as returning the verified identity or an error.

If you can capture your application-specific notion of identity, that's all you need to retain after the handshake and in the ticket. It's also all you need to pass down to the application. This will be much more compact than the input certificate chain. But it's incompatible with this Client-Cert and Client-Cert-Chain headers, which use a much larger intermediate representation. Of course, per the "don't make resumption behave differently" rule, any application which does this should make the full chain equally unavailable on initial and resumption handshakes.

(As an analogy, when clients verify server certificates, the identity is the SAN list, perhaps with some metadata like expiry. We don't really need the original certificates. Though browsers tend to provide these silly "view certificate" buttons, so we've got a bit of a hard time there.)

Back to the original topic, I think we need to treat resumption behavior as a prerequisite to deploying a system around Client-Cert or Client-Cert-Chain. If you retain the EE certificate across resumption, you can use Client-Cert and enable resumption. If the full chain is only available on initial handshakes, you must either disable resumption or not implement Client-Cert-Chain. Trying to do something clever across TLS and HTTP session state will not work. I've seen far too many cases of bad server deployments breaking flakily because the system made assumptions about the relationship between HTTP requests, connections, and TLS sessions.

@bc-pi
Copy link
Contributor

bc-pi commented Mar 8, 2022

I did have a moment of hesitation when writing "... or the resumption of such a connection ..." as I didn't know the details but figured there might be implementation-dependent limitations or variations around what/how data is retained across resumption. I don't think this draft can do anything about it. Other than a mention? What else can be said? Please propose some text, if you can.

@bc-pi
Copy link
Contributor

bc-pi commented Mar 8, 2022

Is there a reasonable way to say that, in order to use Client-Cert / Client-Cert-Chain, the TLS implementation needs to retain the client cert info across resumption or not offer resumption to clients that established mutually authenticated connections?

@wtarreau
Copy link
Author

wtarreau commented Mar 8, 2022

For me it's not possible to "maintain" it as it's not necessarily the same physical machine. And one essential property of tickets is that they're usable across a fleet of reverse-proxies who share the same keys. I tend to think that it's not this draft's business to try to solve the TLS-level problems nor architectural shortcomings, however the draft needs to warn unsuspecting adopters about them.

Something around this maybe ?

"The ability to extract a client certificate and/or an issuer chain from a resumed TLS connection is entirely specific to the implementation and sometimes also to the TLS version in use. Implementers are strongly encouraged to verify if their implementation matches their expectations or to make sure the application has some way of retaining such information once already learned. One possibility to work around implementation limits is to completely disable resumption when client-cert is needed".

@b---c
Copy link
Contributor

b---c commented Mar 8, 2022

Thanks @wtarreau, I can work with that or similar. Probably/maybe in the Deployment Considerations somewhere like a new subsection. I'd love for @martinthomson to take a look and tell us why it's wrong though and how it could be better.

@martinthomson
Copy link
Contributor

I might add David's suggestion to recommend against providing a value that won't be consistent after resumption, so:

Some TLS implementations do not retain client certificate information when resuming. Providing inconsistent values of Client-Cert and Client-Cert-Chain when resuming might lead to errors, so implementations that are unable to provide these values SHOULD either disable resumption for connections with client certificates or omit a Client-Cert or Client-Cert-Chain field if it might not be available after resuming.

@wtarreau
Copy link
Author

wtarreau commented Mar 9, 2022

I guess I'm fine with Martin's proposal.

@bc-pi
Copy link
Contributor

bc-pi commented Mar 10, 2022

I'll borrow from both suggestions to piece something together.

@enygren
Copy link
Contributor

enygren commented Dec 6, 2022

Opened #2345 since it would be good to also have discussion in Security Considerations. There are some very sharp edges here (beyond the ones mentioned already) due to variations in how implementations do resumption and client certs in combination in ways that could be surprising to people who are not experts in TLS implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants