Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ticket callbacks in TLSv1.3 #6198

Closed
wants to merge 6 commits into from

Conversation

mattcaswell
Copy link
Member

The return value from the ticket_key callback was not properly handled in
TLSv1.3, so that a ticket was always renewed even if the callback
requested that it should not be.

Also the ticket decrypt callback was not being called at all in TLSv1.3.

As a result of fixing the above issues I encountered an unrelated bug that was causing a test failure. When a server is writing (using SSL_write_early_data()) to an unauthenticated client, the buffering bio is still in place for writing - so we need to ensure a BIO_flush() occurs. There's a commit in this PR to do that.

This documents the behaviour that a ticket will always be issued following a TLSv1.3 resumption. This is different to TLSv1.2 which defaults to not issuing a new ticket on resumption. The behaviour in both protocols can be changed by using the callbacks.

Finally I added a test for the various callbacks.

Checklist
  • documentation is added or updated
  • tests are added or updated

@mattcaswell mattcaswell added branch: master Merge to master branch branch: 1.1.1 Merge to OpenSSL_1_1_1-stable branch labels May 8, 2018
@mattcaswell mattcaswell added this to the 1.1.1 milestone May 8, 2018
@mattcaswell
Copy link
Member Author

Ping @vdukhovni. This doesn't do everything that we talked about (more PRs to come). But its a start.

ssl/t1_lib.c Outdated
default:
return SSL_TICKET_FATAL_ERR_OTHER;
}

return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is never hit, now; as all cases in the switch() return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I suggest refactor the default case out and put it after the switch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*/
st->hand_state = TLS_ST_OK;
} else {
st->hand_state = TLS_ST_SW_SESSION_TICKET;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's any better to just always do st->hand_state = TLS_ST_SW_SESSION_TICKET;, and override it to TLS_ST_OK only in the else if (s->hit && !s->ext.ticket_expected) case. I find it odd to set the same var=value twice within an if/else chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Done.

* 2: either s->tls_session_secret_cb was set, or a ticket was offered but
* couldn't be decrypted because of a non-fatal error.
* 3: a ticket was successfully decrypted and *ret was set.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because of SSL_TICKET_RETURN defines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The meanings behind the different return values are in the header where they are defined. It is unnecessary to repeat them here.

@vdukhovni
Copy link

I'm not sure how the first commit relates to the rest. Is it a separate bugfix?

ssl/t1_lib.c Outdated
goto err;
}
if (mlen == 0)
goto end;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here ret is still SSL_TICKET_FATAL_ERR_OTHER, is that the right return value (maybe it is)? Perhaps be explicit anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I did that.

ssl/t1_lib.c Outdated
}
eticklen -= mlen;
/* Check HMAC of encrypted ticket */
if (HMAC_Update(hctx, etick, eticklen) <= 0
|| HMAC_Final(hctx, tick_hmac, NULL) <= 0) {
goto err;
goto end;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same here, implicit initial return value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssl/t1_lib.c Outdated
/*
* If set, the decrypt_ticket_cb() is always called regardless of the
* return value determined above. The callback is responsible for checking
* |ret| before it performs any action

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation looks rather misleading with respect to the above statement:

       dec_cb is the application defined callback invoked after session ticket
       decryption has been attempted and any session ticket application data
       is available.  The application can call
       SSL_SESSION_get_ticket_appdata() at this time to retrieve the
       application data. 

Sure sounds like the ticket is guaranteed decrypted when the callback is made. Indeed why make the callback if it is not? The only hint of more interesting possibilities is much further down under RETURN VALUES:

       The dec_cb callback must return one of the following SSL_TICKET_RETURN
       values. Under normal circumstances the retv value is returned
       unmodified, but the callback can change the behavior of the post-ticket
       decryption code by returning something different. The dec_cb callback
       must check the retv value before performing any action.

The last sentence does not quite convey that the ticket might not be available...

Perhaps I'm misreading the docs, but I'm confused after the first pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback can be called even in the event of a failure to decrypt. On looking at the code though it's a bit strange. There are some situations where the callback is not called at all (e.g. if the client sends an empty TLSv1.2 ticket extension, or if there is no TLSv1.2 ticket extension) and other cases where it is called following an internal error (e.g. malloc failure in libssl), and again other times when it is called following a failed decrypt (e.g. because the server was restarted and we have a different ticket key).

Since these callbacks are new for 1.1.1 we have the opportunity to clean this up. Perhaps @tmshort can comment on what the original intention for this was?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intent was to know if/when a ticket was received, and what happened after attempted decryption. If no ticket extension is received then it should not be called (tickets not used in this case). If an empty ticket extension is received, then it should be called with the appropriate retv value. The documentation should be cleared up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the motivation for this design choice? It rather invites applications that register for this callback to do apparently bad things, like modify error conditions into non-error conditions, or attempt to use the session pointer when no decryption took place and it is uninitialized, ... Is there a good use-case for making the callback on failure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, applications (I'm looking from CDNs) will want to know if a bad ticket has been received; they may want to abort the connection (rather than create a new session) and/or may want to not send a new ticket. CDNs also report on these conditions to track potential stale clients and/or DDOS. I can see that if the SSL_TICKET_FATAL_ERR_OTHER/MALLOC condition is sent to the function that it shouldn't be changed; and there may be one or two other conditions as well that should not occur. These can be checked after the callback to ensure compliance (or an assert of some type).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and foremost, the documentation needs to clearly spell out the contract between OpenSSL and the callback author. When will the callback be called (or not called), what parameters it will see under various conditions, and what return values it may use to signal some desired outcome.

How, for example, might an application indicate that no new tickets should be issued if decryption fails? Normally, that would just result in a full handshake with new tickets issued. I don't, for example, see any return codes that signal suppression of ticket issuance while continuing the handshake. And SSL_TICKET_ERR_OTHER is rather a mixture of symptoms, I don't see it as a good condition for applications to raise.

The lack of complete documentation indicates to me a good likelihood of an incomplete design. What should we do here? Since these callbacks are new in 1.1.1, perhaps they should be removed and wait for the next release and a more complete design?

The commit that created the callback just said

commit df0fed9aab239e2e9a269d06637a6442051dee3b
Author: Todd Short <tshort@akamai.com>
Date:   Wed Mar 15 13:25:55 2017 -0400

    Session Ticket app data
    
    Adds application data into the encrypted session ticket
    
    Reviewed-by: Paul Dale <paul.dale@oracle.com>
    Reviewed-by: Matt Caswell <matt@openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/3802)

Neither this comment, nor #3802 suggests a broader agenda of giving applications control of the ticket behaviour, all that's promised and reviewed is access to "appdata", for which the callback should only happen on success.

So my impression is that the non-success cases are underspecified, and perhaps not ready for prime-time???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the result of parsing the application data, the application may choose to use or discard the ticket; that was part of the intent. If that wasn't clear, I apologize. I'm more than willing to create a PR that updates the documentation and sanitizes the input/output of the callback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something needs to be done, since the current contract is rather under-specified. For example under "RETURN VALUES" only the input semantics of the decrypt callback retv are specified. There is no documentation of what output values may be specified which states are valid input states from which to return a given output... There's no warning about when the ss parameter might be invalid and SSL_SESSION_get0_ticket_appdata() should not be called.

If the callback is to stay it needs complete documentation, from which we can assess whether the implementation matches the design contract. If that's not possible to complete before the release, I'd like to see the feature delayed to a later release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest the callback should not be called in the event of any of the "fatal" return codes occurring. It can be called in any other event but the callback should not attempt to turn an error return code into a success. It seems to me to be within the scope of this PR to do that, so I'll give that a go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of FATAL errors, I generally agree (not strong enough thoughts to disagree). Once @mattcaswell updates the code, I will update the documentation to reflect reality.


gen_tick_called = dec_tick_called = 0;

clntsess = SSL_get1_session(clientssl);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that with TLS 1.3 the server's ticket message has been received by this point? I'm not sure how far beyond TLS finished create_ssl_objects proceeds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. create_ssl_connection has an explicit client side read at the end of it to make sure we've processed any NewSessionTicket that the server might have sent, so we can rely on the session being available at this point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort Once @mattcaswell updates the code, I will update the documentation to reflect reality.

I think that's somewhat backwards, please update the document to reflect the intended design, then we can assess the code to see whether it matches the documentation!

As much as possible write documentation first, then the code, then update the documentation with any lessons learned while writing the code that led to design revisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As such, that makes this PR dependent on that one; it will take me a day or two as $DAYJOB takes priority.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the documentation in this PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort I'm concerned that this may be harder to understand by users, than simply passing back the original return value (which implies making the decision). But, I suspect a "hint" would be better, so that no protocol specifications are broken.

If the callback should always just return its input value, then it should just be void. So, in fact, giving it the freedom to return any of the same range of values Is the most complex design. Indeed it even makes possible for the appearance of successful decryption when none took place, possibly causing problems later.

There are only three possible outcomes, abort, continue without new ticket, continue with new ticket. The callback's return value should not mirror the decryption status, it can't retroactively make something decrypt successfully that did not.

Please document clearly the actions that the callback can take, and then we can base the implementation on the documented available actions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not confuse the return value of the callback with the return value of its caller. The caller will process the callback's return value and choose its own return value appropriately. The callback should not know the return-value contract of its caller, and will not be a tall-call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the callback should always just return its input value, then it should just be void.

That is absolutely not the case, but can be used by a callback that just wants to get the application data. The intent was to manipulate ticket use based on the application data (and other parameters).

Regardless, it sounds as though this is going to change.

One option that is useful, but not yet mentioned in this discussion, is to IGNORE the ticket (i.e. do a full handshake); I don't see that's possible with the trichotomy option (the original code would have used SSL_TICKET_NONE as a return).

Thus, I see at least 5 possible results from the callback:

  1. Abort the handshake (-1)
  2. Ignore the ticket, do not issue a new ticket (0)
  3. Ignore the ticket, issue a new ticket (1)
  4. Use the ticket, do not issue a new ticket (2)
  5. Use the ticket, issue a new ticket (3)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do need all 5 outcomes they should still use a separate set of values from the input states. We must not coerce an exact correspondence between the callback's return values and the return-value contract of its caller. Note also that "use the ticket" makes no sense when there is no ticket to use, so the caller would have treat 4/5 as identical to 2/3 when no ticket was obtained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm AOK with that. And yes, if there was no ticket, then "Use the ticket" makes no sense; and should be ignored.

@vdukhovni
Copy link

Overall, this appears to work, Postfix no longer sees unsolicited tickets being issued.

@mattcaswell
Copy link
Member Author

I'm not sure how the first commit relates to the rest. Is it a separate bugfix?

I should probably have explicitly called out in my intro that this PR does introduce a change in behaviour for TLSv1.3 PSKs. Previously, following a successful attempt to connect with a PSK, we still sent a new session ticket back. This seems strange to me for a PSK, so the behaviour is now that we never send session tickets back if the original connection was via PSK.

This change in behaviour caused a test to fail. On investigation it turns out that it was due to an unrelated latent bug that only got triggered due to the behaviour change. The first commit fixes that issue. So, although not directly related to this PR, it is needed because without it the tests would fail for this PR.

@mattcaswell
Copy link
Member Author

I have pushed a new commit to deal with the travis failure. This was a mem leak which, again, was caused by the PSK change in behaviour I described above. This time the problem was a latent bug in sslapitest itself (again not directly related to this PR, but without it the tests will fail so it is needed here).

I also pushed a fixup commit to address the feedback comments so far. All comments have been dealt with except for @vdukhovni's one about the documentation. Awaiting input from @tmshort.

@mattcaswell
Copy link
Member Author

New commit pushed that reworks the decrypt ticket callback:

  • Don't call the decrypt ticket callback if we've already encountered a fatal error.
  • Do call it if we have an empty ticket present.
  • Change the return code so that it can only return 3 values to avoid problems where the callback sends back invalid or non-sensical return codes.
  • Documentation updated accordingly

@mattcaswell
Copy link
Member Author

Ok, here's another attempt with the 5 return values. Fixup commit pushed. One of the commit messages from an earlier commit will need some tweaking - but I'll rebase this PR and fix that when we seem to be getting some agreement.


=item SSL_TICKET_RETURN_ERROR

An error occurred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more clear, perhaps even renamed to SSL_TICKET_RETURN_ABORT. The idea is that the callback is choosing to abort the handshake, whether due an error in the callback or policy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it as suggested. Also added a few more words to describe its intended use and also a caution about the multi-tickets thing in TLSv1.3


An error occurred.

=item SSL_TICKET_RETURN_NO_TICKET

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to SSL_TICKET_RETURN_IGNORE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Do not use a ticket (if one was available). Do not send a renewed ticket to the
client.

=item SSL_TICKET_RETURN_NO_TICKET_RENEW

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to SSL_TICKET_RETURN_IGNORE_RENEW

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

should return this value if B<status> is B<SSL_TICKET_EMPTY> or
B<SSL_TICKET_NO_DECRYPT>.

=item SSL_TICKET_RETURN_TICKET

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to SSL_TICKET_RETURN_USE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

If the callback does not wish to change the default ticket behaviour then it
should return this value if B<status> is B<SSL_TICKET_SUCCESS>.

=item SSL_TICKET_RETURN_TICKET_RENEW

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to SSL_TICKET_RETURN_USE_RENEW

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

strlen(appdata));
}

static int dec_tick_cb(SSL *s, SSL_SESSION *ss, const unsigned char *keyname,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest changes, the return value of dec_tick_cb is once again SSL_TICKET_RETURN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

decryption. The B<keyname> and B<keyname_len> identify the key used to decrypt the
session ticket. The B<dec_cb> callback is defined as type
decryption has been attempted and any session ticket application data is
available (if any). If ticket decryption was successful then the B<ss> argument

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (if any) looks redundant given the and any on the previous line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks from the code like in TLS 1.3 multiple PSK identities might be sent, even multiple tickets, and we keep decrypting until one succeeds. What does that mean for the contract with the callbacks? Applications should not be too eager to force an abort if they don't like a (valid) ticket, because there may be another ticket? Is forcing an abort a good idea? They can always hang up after the handshake completes, and at that time they'll have seen all the tickets... This is tricky I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the redundant "if any". Also added a caution in the SSL_TICKET_RETURN_ABORT section about multi-tickets.

@vdukhovni
Copy link

My review is complete. Once the final comments are addressed, I'll be ready to approve. If clients can present multiple tickets at the same time in TLS 1.3, then the docs need to warn the decrypt callback authors about that... Just because one ticket is not suitable, does not mean they'll all be bad. But if multiple tickets are not allowed, the issue is moot, but then the code allows multiple tickets to be sent it seems...

@vdukhovni
Copy link

Reading the TLS 1.3 spec, it sure looks like multiple tickets from the client are OK, however odd that may or may not be. Not sure whether we need to worry about clients doing that in practice. That is, will clients ever do it for a good reason? If not, then the decrypt callback need not worry about aborting before it has seen all the tickets.

When a server call SSL_write_early_data() to write to an unauthenticated
client the buffering BIO is still in place, so we should ensure we flush
the write.
The return value from the ticket_key callback was not properly handled in
TLSv1.3, so that a ticket was *always* renewed even if the callback
requested that it should not be.

Also the ticket decrypt callback was not being called at all in TLSv1.3.
&& (ret == SSL_TICKET_EMPTY
|| ret == SSL_TICKET_NO_DECRYPT
|| ret == SSL_TICKET_SUCCESS
|| ret == SSL_TICKET_SUCCESS_RENEW)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I note elsewhere, SSL_TICKET_NONE causes the decrypt ticket callback to not be called. The comment above implies that only fatal errors prevent invocation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSL_TICKET_NONE is not a possible state in this context, so the documentation and code are fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I don't see a path. Just feels like something is missing... but it's fine.

@vdukhovni
Copy link

Question. I notice that with TLS 1.3 we don't really decide whether to use a PSK (including resumption) until we've checked the "binder". This currently happens after the decrypt callback. If the binder is invalid, we ignore it, and do a full handshake, but the decrypt callback may have pulled in data from the discarded ticket!

Should the decrypt callback happen after the binder check? Is the intent to filter acceptable tickets or to restore some application context from the ticket? If the latter, the restored context may not be correct. Not sure whether the subsequent full handshake fixes that or not...

@tmshort
Copy link
Contributor

tmshort commented May 10, 2018

Interesting. The application may use the ticket app-data to choose whether or not to discard the session, or potentially use [the session]. The dec_cb API clearly (I hope) indicates that the SSL_SESSION provided has not yet been attached to the SSL structure in question.

We could potentially document this behavior, as the intent of getting the ticket app-data was to assist in making a choice about using (or not) the incoming ticket/session. It was not necessarily meant to be saved off (although a "bad" application might do this, the data is always available via API from the SSL_SESSION).

So, the application can always, after the SSL_accept() completes, look at the "official" ticket app-data from the actual session.

@vdukhovni
Copy link

Should we close out this PR and deal with the semantics of the appdata separately, or take it across the line here? I am concerned that with TLS 1.3 the callback perhaps happens too early. It could probably be delayed until the binder is checked? This would also mean that indeed we pick a single ticket for the callback to look at simplifying the model. But on the other hand, the NO_DECRYPT et. al. cases go away. Frankly that too might be a win. How important is it for the callback to see the failure cases, and why these particular failure cases and not some other ways in which the handshake might be anomalous.

Risking Matt's enduring patience to keep bending this PR to the ever-evolving requirements, I ask you to reconsider the design. I think that none of the failure cases should be exposed to the callback. Just give a valid ticket that will otherwise be used, and let it override use and/or re-issue.

@richsalz
Copy link
Contributor

Seeing failures can be important, to be able to report to DoS-mitigating systems, for example.

@tmshort
Copy link
Contributor

tmshort commented May 10, 2018

The failure cases are useful for reporting (e.g. CDN). The current design allows the application to guide the choice of ticket use and ticket validity (especially if there are multiple tickets), which was a big part of the reasoning behind this. So, moving the dec_cb to later very much changes the intended use.

@mattcaswell
Copy link
Member Author

I think it is not a problem to call the callback before the binder is checked. The binder check should always pass. If it doesn't then this is a protocol error and the handshake will fail.

@vdukhovni
Copy link

Right good point, it seemed like we might fall back to a full handshake, but, since per the TLS 1.3 spec, we always abort, that removes the issue.

@vdukhovni
Copy link

So my final observation is that we pass all PSKs to both the external PSK callback and to the session ticket callback, and see who bites. We can't presently easily tell them apart. Perhaps we should (in TLS 1.3 and later) prepend a "magic" prefix to resumption PSKs that will distinguish them from external PSKs, we can add it when vending the (therefore slightly longer) session ticket, and look for it when processing TLS PSKs, allowing us to pass only the right type of PSK to each callback.

How about: "\0STRPSK\0". The two NULLs make for less likely external PSK identity names, and "STRPSK" would short for "session ticket resumption PSK". This is a 64-bit value and can be tested efficiently. (We can bikeshed this prefix if desired. :-) Any comments on using a prefix to tag resumption PSKs?

@richsalz
Copy link
Contributor

I'd like to know what other stacks do and why or why not they distinguish.

@vdukhovni
Copy link

vdukhovni commented May 10, 2018

@richsalz I'd like to know what other stacks do and why or why not they distinguish.

I already asked on the TLS WG list. So far ekr has said that NSS does not support external PSKs, which makes his life easy. What should have been the case is that the extension carried a small int to distinguish different PSK types, with (0) for resumption and (1) for external. But he did not need that, and nobody else spoke up, so we need to make our own ad-hoc heuristics... :-(

@vdukhovni
Copy link

It would be good to hear from more than just ekr, not sure whether anyone else will answer though...

@vdukhovni
Copy link

vdukhovni commented May 10, 2018

Sounds like we're almost ready to close out this PR. I'll do a final read-through, and expect to signoff later today. If we do decide to prefix resumption PSKs for TLS 1.3 that can probably be done separately. So far only ekr is answering, with no actionable information. NSS: no external PSKs, Mint: no resumption. :-(

Matt any thoughts on this last topic?

* If s->tls_session_secret_cb is set and we're not doing TLSv1.3 then we are
* expecting a pre-shared key ciphersuite, in which case we have no use for
* session tickets and one will never be decrypted, nor will
* s->ext.ticket_expected be set to 1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this true? Why can't we have a session_secret_cb and still be willing to do non-PSK handshakes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session_secret_cb is a bit odd. It is only there to support EAP-FAST. This has its own special implementation of pre-shared keys not based on the standard mechanism. If we're in an EAP-FAST setting then it only makes sense to do the EAP-FAST variant of TLS which implies using the EAP-FAST pre shared keys. It also implies special semantics for session tickets. Those tickets are not provisioned via the NewSessionTicket mechanism - the provisioning is done out-of-band. Incoming tickets, since they were not generated by OpenSSL in the first place, are not understood by us either. Therefore we do not decrypt them, nor do we set ticket_expected to 1 (which would have the impact of causing a NewSessionTicket to be sent).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about to sign-off on the PR, but I'd like to ask you to add a bit of commentary about this at these places, and perhaps above the definition of SSL_set_session_secret_cb(). Perhaps separately some day document this interface, unless it is intended to be only for users "in the know"...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask you to add a bit of commentary about this at these places, and perhaps above the definition of SSL_set_session_secret_cb().

Will do - but in a separate PR.

* abbreviated handshake based on external mechanism to
* calculate the master secret later.
*/
ret = SSL_TICKET_NO_DECRYPT;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the above question, why does SSL_set_session_secret_cb() (sadly undocumented) preclude the use of resumption of non-PSK sessions (assuming of course non-PSK sessions are allowed to coexist with that callback set).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because EAP-FAST implies special semantics for session tickets. These tickets were not generated by OpenSSL in the first place, so we cannot decrypt them.

@vdukhovni
Copy link

I thought I'd be done, but I'm confused about the semantics of session_secret_cb. I hope that won't be too big a rathole (or at all).

@vdukhovni
Copy link

Thanks everyone. On to other things...

@mattcaswell
Copy link
Member Author

Phew! Pushed! Thanks!

levitte pushed a commit that referenced this pull request May 11, 2018
When a server call SSL_write_early_data() to write to an unauthenticated
client the buffering BIO is still in place, so we should ensure we flush
the write.

Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
levitte pushed a commit that referenced this pull request May 11, 2018
The return value from the ticket_key callback was not properly handled in
TLSv1.3, so that a ticket was *always* renewed even if the callback
requested that it should not be.

Also the ticket decrypt callback was not being called at all in TLSv1.3.

Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
levitte pushed a commit that referenced this pull request May 11, 2018
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
levitte pushed a commit that referenced this pull request May 11, 2018
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
levitte pushed a commit that referenced this pull request May 11, 2018
The recent change in behaviour where you do not get a NewSessionTicket
message sent if you established the connection using a PSK caused a mem
leak to be triggered in sslapitest. It was actually a latent bug and we
were just lucky we never hit it before. The problem is due to complexity
with the way PSK sessions were set up in the early_data tests. PSK session
reference counting was handled differently to normal session reference
counting. This meant there were lots of special cases in the code where
we don't free a session if it is a PSK. It makes things easier if we just
handle PSK reference counts in the same way as other session reference
counts, and then we can remove all of the special case code.

Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
levitte pushed a commit that referenced this pull request May 11, 2018
Don't call the decrypt ticket callback if we've already encountered a
fatal error. Do call it if we have an empty ticket present.

Change the return code to have 5 distinct returns codes and separate it
from the input status value.

Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from #6198)
@kaduk
Copy link
Contributor

kaduk commented May 14, 2018

I think it is not a problem to call the callback before the binder is checked. The binder check should always pass. If it doesn't then this is a protocol error and the handshake will fail.

I think I misread this the first time, so just to double-check: we implement the suggested behavior from the spec, that we decide which PSK we will attempt to use, then attempt to validate the binder. If the binder validation fails, we abort the handshake, so it is "safe" to grab appdata before validating the binder because the session object will be destroyed if the validation fails.

N.B. that scare-quotes are needed around "safe", because the appdata callback may do other things in the application's control flow that have externally visible actions! So I think I would still prefer to have the appdata callback run only after binder validation.

@mattcaswell
Copy link
Member Author

I think I misread this the first time, so just to double-check: we implement the suggested behavior from the spec, that we decide which PSK we will attempt to use, then attempt to validate the binder. If the binder validation fails, we abort the handshake, so it is "safe" to grab appdata before validating the binder because the session object will be destroyed if the validation fails.

Correct.

N.B. that scare-quotes are needed around "safe", because the appdata callback may do other things in the application's control flow that have externally visible actions! So I think I would still prefer to have the appdata callback run only after binder validation.

This is a bit of chicken-and-egg situation. The callback forms part of the decision making process about which ticket we accept. Therefore it could be called for each ticket presented to us until we find an acceptable one. As you point out the spec's suggested behaviour is to select the ticket first and only validate the binder for the one you selected. But if we want to validate the binder before we call the callback then that necessarily implies going against the suggested behaviour.

In reality of course we have many callbacks that can be invoked during the processing of a handshake, any of which might have externally visible side-effects, but before we know that the connection is successful. Perhaps there should be some generic guidance somewhere for callback writers that points this issue out.

@tmshort
Copy link
Contributor

tmshort commented May 14, 2018

To limit callback side-effects, the SSL_SESSION and SSL arguments can be made const, but that would limit their usefulness. Invoking the callback after the binder limits it's usefulness in assisting selection of the session. I agree with @mattcaswell, there are a number of places where callbacks can cause side-effects, they all can't be eliminated because someone writes a bug in their callback.

@vdukhovni
Copy link

Yes, at most a warning about side-effects that persist beyond the present SSL session. At ticket decrypt time we haven't even seen the finished message, and can't rule out MiTM, etc. so there's nothing here that's specific to this particular extension, and such global side-effects seem unlikely...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch branch: 1.1.1 Merge to OpenSSL_1_1_1-stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants