Broken accessors for fields moved from SSL_SESSION to SSL #10280

kaduk · 2019-10-28T14:14:48Z

Background

There's been a fair bit of (relatively) recent activity in moving various
data from the SSL_SESSION or CERT object to the SSL object, largely to
facilitate thread-safety with respect to SSL_SESSION reuse, but also for
interactions with the order in which various processing/callbacks happen.
For example:

#9157 moved 'shared_sigalgs' from
CERT to SSL to fix SSL_check_chain() in the cert_cb.

#9162 moved 'peer_ecpointformats',
'peer_ciphers', and 'peer_supportedgroups' from SSL_SESSION to SSL for
thread-safety purposes.

#8383 moved 'tick_identity' from
SSL_SESSION to SSL for thread-safety purposes.

Attempt to generalize the situation

The immediate trigger here is that #9162 resulted in us being unable to log
what group was used for key exchange on a resumption handshake, since
SSL_get_shared_curve(s, 0) no longer has a list of groups in the session to
access, but I'd like to have a more general discussion about which types of
information belongs where, and how that interacts with the accessors that we
provide to applications. (This is complicated by some of the accessors
being under-documented as to whether they only work during specific
callbacks/the handshake vs. only on the initial connection vs. also on
resumed connections.) There is perhaps some prior art in boringssl, which
(as I understand it) goes even further to categorize information as (1)
needed only during the handshake, (2) stored per-connection (SSL), or (3)
kept past resumption (SSL_SESSION). (It also has a clear separation between
config and connection state, which seems wise.)

Ignoring the distinction between (1) and (2) for now (as I don't think we've
successfully enforced such a distinction for our accessor APIs in the past),
I think it feels fairly natural to have our processing flow use the SSL
structure for holding "the raw bits the client sent" as we are unpacking the
ClientHello from wire format. The question then would become what and when
to copy from there to the SSL_SESSION, and how that interacts with
accessors. In an abstract sense, it doesn't really seem like "the whole
list of things the peer offered" is a property of the session, but "the one
thing that we picked/negotiated" is, so that we'd store (e.g.) the sigalg we
used, or the group used for key exchange, but not the whole list from the
peer. Even if there was no key exchange for the current connection (as is
the case for TLS 1.2 resumption), it may still be useful for auditing
purposes to know what group was used to initialize the key hierarchy, so
that (for example) the use of weak groups/keys therefrom can be tracked
across all affected transactions.

Assuming that there is acceptance on the above formulation for when/what to
put in the SSL_SESSION, with our current policy on immutable sessions (per
#10210), we would only be able to make such copies into the session on the
initial handshake, so as to keep the SSL_SESSION immutable on resumption.
But, this makes TLS 1.3 resumptions problematic, since the resumption PSK is
bound only to the HKDF hash, and the cipher can change, a new key exchange
is done (potentially in a new group), etc. There's a conflict between the
stance of "we store in the session what was actually negotiated" and
immutability of sessions. One way out would be to assert that things like
negotiated-group and cipher are only properties of the session for TLS 1.2
and prior, and per-connection properties in TLS 1.3, and make the accessors
apply that condition as well; another would be to relax the immutability of
sessions and take the stance that resumed TLS 1.3 connections generate new
(abstract and SSL_SESSION) sessions. I find the latter more appealing in
that it preserves the (1)/(2)/(3) division from above, but am not so tied to
it so as to oppose other approaches.

Accessors

Now, let's consider how the already-existing accessor functions have thrown
wrenches into the nice pretty abstract thoughts from above :)
I will not bring an exhaustive analysis in here, but spot-check a couple
examples.

shared groups

I'll start with SSL_get_shared_curve(), the one that started me down this
track. Since renamed to SSL_get_shared_group(), it's documented as:

SSL_get_shared_group() returns shared group n for a server-side
SSL ssl. If n is -1 then the total number of shared groups is
returned, which may be zero. Other than for diagnostic purposes,
most applications will only be interested in the first shared group
so n is normally set to zero. If the value n is out of range,
NID_undef is returned.

The implementation is:

    case SSL_CTRL_GET_SHARED_GROUP:
        {
            uint16_t id = tls1_shared_group(s, larg);
 
            if (larg != -1)
                return tls1_group_id2nid(id);
            return id;
        }

tls1_shared_group() has to go through the effort of fetching both local and
peer supported groups (honoring SSL_OP_CIPHER_SERVER_PREFERENCE for which
order to use) and computing the intersection. tls1_get_peer_groups() is
what actually goes to reach into s->ext.peer_supportedgroups, which is
not populated for TLS 1.2 resumptions:

int tls_parse_ctos_supported_groups(SSL *s, PACKET *pkt, unsigned int context,
                                    X509 *x, size_t chainidx)
    [...]

    if (!s->hit || SSL_IS_TLS13(s)) {
        OPENSSL_free(s->ext.peer_supportedgroups);
        s->ext.peer_supportedgroups = NULL;
        s->ext.peer_supportedgroups_len = 0;
        if (!tls1_save_u16(&supported_groups_list,
                           &s->ext.peer_supportedgroups,
                           &s->ext.peer_supportedgroups_len)) {
            SSLfatal(s, SSL_AD_INTERNAL_ERROR,
                     SSL_F_TLS_PARSE_CTOS_SUPPORTED_GROUPS,
                     ERR_R_INTERNAL_ERROR);
            return 0;
        }
    }

The documentation seems to just say "a server-side SSL", which would include
resumption flows, making the current behavior not match the documentation.
(It also would seem to include pre-handshake SSLs and during the
client_hello_cb, but we'll ignore that...). On the one hand, it does say
that basically only 0 is useful as the second argument, but if we were to
make a compat shim so that SSL_get_shared_group(s, 0) succeeds and returns a
cached group from the session even on resumptions, then we'd still have the
documented behavior of SSL_get_shared_group(s, -1) to cope with, and our
choices are: lie and return 1 (bad) or return what the actual number would
have been but not support retrieving all of them (also bad) or to actually
cache the entire list from the client in the session (philosophically bad
but maybe tolerable engineering).
As an attempt out of the mess, I'd propose to just update the recently added
SSL_get_negotiated_group() API to grab that group from the session even on
resumption flows (as opposed to the current implementation, that only works
for full handshakes). Then we'd retroactively define SSL_get_shared_group()
to only work on full handshakes.

shared sigalgs

For another example, let's look at SSL_get_shared_sigalgs(); it seems to be
largely similar to SSL_get_shared_curve(), offering an interface that grabs
a specific index in the shared list. It has multiple possible outputs for
NID vs. raw protocol constant, and has to cope with both old-style (separate
signature and hash) vs. TLS 1.3-style (combined SignatureScheme), so there's
more output parameters:

SSL_get_shared_sigalgs() returns information about the shared signature
algorithms supported by peer s. The parameter idx indicates the index
of the shared signature algorithm to return starting from zero. The signature
algorithm NID is written to *psign, the hash NID to *phash and the
sign and hash NID to *psignhash. The raw signature and hash values
are written to *rsig and *rhash.

The sigalgs are only saved for initial handshakes:

int tls_parse_ctos_sig_algs(SSL *s, PACKET *pkt, unsigned int context, X509 *x,
                            size_t chainidx)
{
    PACKET supported_sig_algs;

    if (!PACKET_as_length_prefixed_2(pkt, &supported_sig_algs)
            || PACKET_remaining(&supported_sig_algs) == 0) {
        SSLfatal(s, SSL_AD_DECODE_ERROR,
                 SSL_F_TLS_PARSE_CTOS_SIG_ALGS, SSL_R_BAD_EXTENSION);
        return 0;
    }

    if (!s->hit && !tls1_save_sigalgs(s, &supported_sig_algs, 0)) {
        SSLfatal(s, SSL_AD_DECODE_ERROR,
                 SSL_F_TLS_PARSE_CTOS_SIG_ALGS, SSL_R_BAD_EXTENSION);
        return 0;
    }

[side note: tls1_save_sigalgs() has a probably-a-bug in that it still errors out
early if s->cert == NULL, which should have been removed in #9157 ]

The peer's sigalgs are saved into the SSL (well, the s3, but they're the
same now), and the shared sigalgs are set by
tls_early_post_process_client_hello()->tls1_set_server_sigalgs()->tls1_process_sigalgs()->tls1_set_shared_sigalgs(),
and only for non-resumptions (the check is in
tls_early_post_process_client_hello()).

So resumed handshakes are not saving the peer's sigalgs or attempting to
compute the shared sigalgs, and thus SSL_get_shared_sigalgs() is doomed to
failure for such connections.

There doesn't seem to be a whole lot that's fundamentally different from
SSL_get_shared_group(), so I won't repeat the discussion.

Path forward

I'd like to get a sense from other people about whether the (1)+(2)/(3)
categorization of information makes sense; whether we feel a need to restore
the ability of accessors to work on resumed connections; and (if so) whether
we should do so by differentiating accessor implementations between TLS 1.2
and 1.3, by giving TLS 1.3 a new session per connection, or some other
means.

The text was updated successfully, but these errors were encountered:

mattcaswell · 2019-10-29T14:50:04Z

IMO the whole purpose of an SSL_SESSION is to store data required for a resumption. If it isn't necessary for a resumption then we shouldn't put it there. Anything we have in there carries with it an overhead, because SSL_SESSIONs are serialisable. So the more "stuff" means bigger ticket sizes, larger session caches etc.

The accessors issue seems to be one of under-documentation. We have not clearly defined under what circumstances fields are available across resumptions - some data is available "by accident" and some is not. Applications then get written based on assumptions of what is available which may not match what was intended.

I do think the (1)+(2)/(3) categorisation is useful - only stuff in (3) belongs in the SSL_SESSION.

FYI I have a half done PR somewhere which tries to make things a little cleaner for SSL_SESSION. I would like to introduce a new "immutable" flag. As soon as an SSL_SESSION enters a state where it might be shared with other threads the "immutable" flag gets set. All "setters" check the immutable flag and fail if they are called and it is set. At least that's the idea...still a work in progress.

davidben · 2019-10-29T16:45:10Z

There is perhaps some prior art in boringssl, which (as I understand it) goes even further to categorize information as (1) needed only during the handshake, (2) stored per-connection (SSL), or (3) kept past resumption (SSL_SESSION). (It also has a clear separation between config and connection state, which seems wise.)

For background, our separation between (1) and (2) is so we release some per-connection memory after the handshake completes. Note, however, that this is sometimes a breaking change when that state has accessors. For instance, SSL_get_client_CA_list in BoringSSL is only defined to work during a client certificate selection callback (or when the handshake is paused during one of those callbacks). This is a breaking change, but one we find worthwhile to trim memory.

The config and connection separation was originally to simplify SSL_clear, which sadly folks use. The separation means we can just drop all the connection state structs and recreate them, rather than manually clear everything. We've also since moved the per-connection handshake-only config into a struct that, if the caller disavows SSL_clear, renegotiation, and some getters, we will also shed after the handshake completes.

One way out would be to assert that things like negotiated-group and cipher are only properties of the session for TLS 1.2 and prior, and per-connection properties in TLS 1.3, and make the accessors apply that condition as well; another would be to relax the immutability of sessions and take the stance that resumed TLS 1.3 connections generate new (abstract and SSL_SESSION) sessions. I find the latter more appealing in that it preserves the (1)/(2)/(3) division from above, but am not so tied to it so as to oppose other approaches.

The cipher suite needs to be saved in TLS 1.3 sessions for 0-RTT. The negotiated group doesn't strictly need to be saved, but it's a plausible strategy for predicting key shares and reducing HRRs. (We don't do this but it's an option if it ever becomes necessary.)

Our strategy is to keep SSL_SESSIONs immutable but, in TLS 1.3, resumption clones the SSL_SESSION rather than keeping it around like TLS 1.2. This is also useful given ticket renewal is much more common in TLS 1.3. (So I think that's the latter option though I'm not sure if I'm reading it right.)

kaduk · 2019-10-29T17:02:44Z

@davidben your description of what you "think [is] the latter option" matches up with what I had in mind for it, so thanks for clarifying.

kaduk · 2019-11-04T19:14:42Z

@mattcaswell do you have any public WIP for what you were thinking about for SSL_SESSION immutability (or is it just finding all the places where we are behaving in a thread-unsafe manner, a la #9176)?

I would like to start writing some code in this space, but am still not entirely sure that we have agreement on the target end-state.

I think that there is some fundamental conflict in the design space that stems from TLS 1.3 changing some things (e.g., supported/negotiated groups) from being per-session to being per-connection. So we seem to be presented with a choice between having separate behavior for where we store/access that information depending on the TLS version, or producing a new session structure for each TLS 1.3 connection. In my mind, having to apply the TLS version conditional in every accessor is risk-prone, which favors making a new session structure for TLS 1.3 connections

I guess I can start trying to normalize the (1)+(2)/(3) split in terms of structure layout, but we may need to take what falls into (3) on a case-by-case basis.

beldmit · 2019-11-05T17:35:56Z

BTW, my colleagues have found the other information related to lack of information in the SSL_SESSION object.

In TLS 1.3, in the case of post-handshake auth in the resumption mode, when the server requests a certificate from the client, the client responds with the empty list of certificates.

It is caused by the client's lack of signature schemas in common. The function tls_parse_ctos_sig_algs does not parse the extensions signature_algorithms and signature_algorithms_cert being called on resumption.

I have a more detailed description of the bug and proposed fixes and I think it worth moving the problem to a separate issue.

beldmit · 2019-11-06T16:47:23Z

Created a separate issue #10370

kaduk added the issue: bug report The issue was opened to report a bug label Oct 28, 2019

mattcaswell added triaged: bug The issue/pr is/fixes a bug and removed issue: bug report The issue was opened to report a bug labels Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken accessors for fields moved from SSL_SESSION to SSL #10280

Broken accessors for fields moved from SSL_SESSION to SSL #10280

kaduk commented Oct 28, 2019

mattcaswell commented Oct 29, 2019

davidben commented Oct 29, 2019

kaduk commented Oct 29, 2019

kaduk commented Nov 4, 2019

beldmit commented Nov 5, 2019 •

edited

beldmit commented Nov 6, 2019

Broken accessors for fields moved from SSL_SESSION to SSL #10280

Broken accessors for fields moved from SSL_SESSION to SSL #10280

Comments

kaduk commented Oct 28, 2019

Background

Attempt to generalize the situation

Accessors

shared groups

shared sigalgs

Path forward

mattcaswell commented Oct 29, 2019

davidben commented Oct 29, 2019

kaduk commented Oct 29, 2019

kaduk commented Nov 4, 2019

beldmit commented Nov 5, 2019 • edited

beldmit commented Nov 6, 2019

beldmit commented Nov 5, 2019 •

edited