Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFAICT openssl's strategy for handling TLS 1.3 session tickets makes it impossible to reliably implement communication patterns where the server never sends application-level data #7948

Open
njsmith opened this issue Dec 23, 2018 · 13 comments

Comments

@njsmith
Copy link

@njsmith njsmith commented Dec 23, 2018

I maintain a Python networking library called Trio, and I've been struggling to get it working with Openssl v1.1.1/TLS 1.3. We use openssl with memory BIOs and have an extensive test suite that passes with earlier openssl versions, but hits a number of problems after upgrading to v1.1.1. The main issue seems to be the session tickets that openssl sends after the handshake in server mode, and how they affect connections where the client never calls SSL_read.

Due diligence: I found these previous issues/PRs that are all about the exact same issue that I'm facing, but having read them carefully I still can't figure out how to make this work: #6342, #6904, #6944. Also, for reference, this is the Trio bug: python-trio/trio#819

TCP (as I understand it)

Let's ignore TLS for the moment and just talk about TCP. Suppose we have a client that connects, sends some data, and then disconnects, without ever calling recv. In pseudo-code:

# Socket client
sock = connect(...)
while ...:
    sock.send(...)
sock.close()

# Socket server (safe)
sock = accept()
while ...:
    sock.recv(...)
    if eof:
        sock.close()
        break

This isn't a terribly common pattern, but it's perfectly legal and reliable. Call this the safe pattern.

But, TCP has a gotcha: suppose we have the server send a bit of data, and change nothing else. In particular the client still never calls recv:

# Socket server (unsafe)
sock = accept()
sock.send(<one byte of data>)   # <--- This is the only line that's different
while ...:
    sock.recv(...)
    if eof:
        sock.close()
        break

Now, this is a little funny looking, because the server is sending some data that the client will never read. But, whatever, that shouldn't affect anything, right? This shouldn't affect the data being sent from the client→server, right?

Well, that would be logical, but it's wrong! In the unsafe pattern, an arbitrary amount of the data sent by the client can be lost, even though the client code didn't change at all.

This happens because of arcane details of how TCP works: in the safe pattern, when the client calls close, the client's kernel sends a FIN packet, and the server's kernel queues that up behind all the other data in the server's receive buffer, and everything proceeds in an orderly fashion. But in the unsafe pattern, the client has incoming data. And if there's incoming data before or after a close, then RFC 1122 says:

If such a host issues a CLOSE call while received data is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULD send a RST to show that data was lost.

So in this case, the client's kernel might send a RST, instead of or in addition to the FIN. And then when the server's kernel sees an RST packet, it discards all buffered data. So, if there's any data that the client sent that's still sitting in the server kernel buffers, it disappears forever.

References:

What does this have to do with TLS?

Generally speaking, it should be possible to take any application that uses raw TCP, and switch it to use TLS-over-TCP instead, right? (With the notable exception of half-closed connections, but never mind.) So let's port our client/server to use TLS:

# TLS-over-TCP client
tlssock = connect(...)
tlssock.do_handshake()
while ...:
    tlssock.send(...)
tlssock.send_close_notify()  # Often skipped in practice, but let's be standards-compliant
tlssock.close_tcp()

# TLS-over-TCP server ("safe")
tlssock = accept()
tlssock.do_handshake()
while ...:
    tlssock.recv(...)
    if eof:
        tlssock.send_close_notify()
        tlssock.close_tcp()
        break

Now here's the issue: with TLS 1.2 and earlier, if we follow the "safe pattern" at the application layer, like this, then openssl will ultimately translate that into the "safe pattern" at the TCP layer. But with TLS 1.3, openssl's habit of sending session tickets after the handshake means that this exact same code now produces the unsafe pattern at the TCP layer. The server→client session tickets could cause the client→server application data to be lost.

#6944 changes how openssl reacts to getting notified of a client close while sending session tickets, so that the server can keep calling recv. But that doesn't help if the kernel has already discarded the buffer that recv is trying to read out of. The problem here is all inside the TCP stacks; there's nothing openssl can do about it, except avoid sending the session tickets in the first place.

There's also a secondary problem, but it's more theoretical: if the server→client buffer is small enough, then this code could deadlock at the handshake – the server's call to SSL_do_handshake won't return until the client calls SSL_read, but the client is calling SSL_send, which will eventually block until the server calls SSL_read, but the server can't because it's waiting for the client to call SSL_read... I'm not sure if there are any realistic cases where people use buffers that are small enough to trigger this though. (We have some torture tests with small buffers to flush out problems like this, which of course did catch it.)

What to do?

Of course we can disable session tickets, but this is difficult in our case because Python's openssl bindings don't expose SSL_set_num_tickets and SSL_CTX_set_session_ticket_cb. Also, as a generic networking library, we wouldn't want to disable session tickets in general. But, we also really don't want to have to explain to our users that enabling TLS is just a matter of switching from a SocketStream to a SSLStream, the APIs are identical except that if you happen to know that your client might not ever read data, then on the server side you have to call this special API before the handshake. That's a super leaky abstraction.

We could do full bidirectional-shutdown, as suggested here. This seems problematic, though – the RFCs explicitly say that bidi shutdown is never required, bidi shutdown has always been useless before, and I've never heard of any existing software that does it. Trio doesn't have any way to force its peers to use it. If OpenSSL is going to say that all TLS 1.3 implementations that want to interoperate with OpenSSL have to switch to bidi shutdown, then that seems like a huge ask. Also, isn't TLS 1.3 supposed to reduce the number of round-trips we make?

The best idea I have is:

  • don't sent tickets automatically after the TLS 1.3 handshake
  • automatically send tickets the first time the server calls SSL_write
  • also provide an explicit SSL_write_tickets function to send tickets immediately
@njsmith njsmith changed the title AFAICT openssl's strategy for handling TLS 1.3 session tickets make it impossible to reliably implement communication patterns where the server never sends application-level data AFAICT openssl's strategy for handling TLS 1.3 session tickets makes it impossible to reliably implement communication patterns where the server never sends application-level data Dec 23, 2018
@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 23, 2018

Due diligence: I found these previous issues/PRs that are all about the exact same issue that I'm facing, but having read them carefully I still can't figure out how to make this work: #6342, #6904, #6944. Also, for reference, this is the Trio bug: python-trio/trio#819

Have you read the shutdown documentation after #7188 has been merged?

Did you read the wiki about TLS 1.3, in particular the section about session?
https://wiki.openssl.org/index.php/TLS1.3#Sessions

Generally speaking, it should be possible to take any application that uses raw TCP, and switch it to use TLS-over-TCP instead, right?

Yes. But generally speaking, any encapsulation of a protocol by an other protocol causes problems. Generally, it's hard to abstract things away. And you point it out yourself above that there is an unsafe pattern in TCP.

So let's port our client/server to use TLS:

# TLS-over-TCP client
tlssock = connect(...)
tlssock.do_handshake()
while ...:
    tlssock.send(...)
tlssock.send_close_notify()  # Often skipped in practice, but let's be standards-compliant
tlssock.close_tcp()

You really can't skip sending the close notify. If you skip that you're vulnerable to a truncation attack. The other side should react to that with a protocol error.

The standards compliant case would be to receive the close notify before closing the tcp socket. The other side could send back a close notify alert, closing the connection without waiting for it can cause your unsafe TCP pattern. So this really is the unsafe TLS example.

That you need to close the TLS layer before you close the TCP layer is one of the many ways in which TLS leaks things to the application.

There's also a secondary problem, but it's more theoretical: if the server→client buffer is small enough, then this code could deadlock at the handshake – the server's call to SSL_do_handshake won't return until the client calls SSL_read, but the client is calling SSL_send, which will eventually block until the server calls SSL_read, but the server can't because it's waiting for the client to call SSL_read... I'm not sure if there are any realistic cases where people use buffers that are small enough to trigger this though. (We have some torture tests with small buffers to flush out problems like this, which of course did catch it.)

I'm sure that you can cause any TCP connection to deadlock.

We could do full bidirectional-shutdown, as suggested here. This seems problematic, though – the RFCs explicitly say that bidi shutdown is never required, bidi shutdown has always been useless before, and I've never heard of any existing software that does it. Trio doesn't have any way to force its peers to use it. If OpenSSL is going to say that all TLS 1.3 implementations that want to interoperate with OpenSSL have to switch to bidi shutdown, then that seems like a huge ask.

The only case where the bidirectional shutdown is required is when you want to resume the session. I currently don't see when only a one way shutdown shouldn't work, as long as you disable tickets.

Any TLS 1.3 implementation that wants to support resumption really is going to have the same problem. And you really want to support resumption.

So for clients that only send data and never receive any, TLS 1.3 really forces you to either disable resumption or do a bidirectional shutdown.

Also, isn't TLS 1.3 supposed to reduce the number of round-trips we make?

Only at the start of the connection, to get the data faster.

The best idea I have is:

* don't sent tickets automatically after the TLS 1.3 handshake
* automatically send tickets the first time the server calls `SSL_write`

You really also want a client that only writes to support resumption. In that case if SSL_write() is never called, you can never resume the session.

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Dec 24, 2018

Have you read the shutdown documentation after #7188 has been merged?
Did you read the wiki about TLS 1.3, in particular the section about session?
https://wiki.openssl.org/index.php/TLS1.3#Sessions

Yes :-)

generally speaking, any encapsulation of a protocol by an other protocol causes problems. Generally, it's hard to abstract things away.

It's true that this kind of encapsulation is difficult, but previous versions of openssl managed it, and current openssl manages it except when using TLS 1.3 with session tickets.

And you point it out yourself above that there is an unsafe pattern in TCP.

For sure. But it used to be that if you used the "safe pattern" at the application level, that would map onto the "safe pattern" at the TCP level, and vice-versa – that's a non-leaky abstraction. The regression is that now if you use the safe pattern at the application level, openssl will silently convert that into the unsafe pattern at the TCP level. That's a leaky abstraction.

You really can't skip sending the close notify. If you skip that you're vulnerable to a truncation attack. The other side should react to that with a protocol error.

Yeah, Trio supports two modes: by default it does unidirectional close_notify and expects unidirectional close_notify. Or, if you set https_compatible=True, then it neither sends nor expects close_notify, since this is required for interoperability with most real HTTPS servers. This is mostly orthogonal to the current issue though; the session tickets break both modes in the same way.

The standards compliant case would be to receive the close notify before closing the tcp socket. The other side could send back a close notify alert, closing the connection without waiting for it can cause your unsafe TCP pattern. So this really is the unsafe TLS example.

Actually, no! It's not at all obvious, but AFAICT it really is true that safe patterns used to map to safe patterns and vice-versa, even if you have a mix of peers using unidirectional and bidi shutdown.

In your example, say that our program implements bidi shutdown, and we're talking to a peer that does unidirectional shutdown. It sends us a close_notify, and we send one back. You're right, our close_notify may provoke them to send a RST... but the fact that we've already received their close_notify means that we've already emptied our receive buffer, so their RST can't cause any harm. Disaster is narrowly averted.

Another tricky case is when our peer performs a unidirectional shutdown while we're in the middle of sending data. In this case, the data we're sending might provoke them to send a RST back, and that might cause their close_notify to be lost, and now we don't know where the end of their data is. But, this still doesn't break the abstraction, because in this case plain TCP suffers from exactly the same problem: if they close their socket when we're in the middle of sending data, then their FIN may be lost, and we don't know where the end of their data is. So the semantics are annoying, but they're predictable and consistent regardless of whether you're using TLS.

That you need to close the TLS layer before you close the TCP layer is one of the many ways in which TLS leaks things to the application.

Not sure what you mean. Empirically, this is something that we abstracted away and it used to work fine. Trio's SSLStream and SocketStream classes both have close methods with the same semantics as far as the user's concerned; one of them sends close_notify (at least in standards compliant mode) and the other doesn't, but they don't have to care about that.

I'm sure that you can cause any TCP connection to deadlock.

??? "Writing reliable network protocols is impossible, so there's no point in trying" is not the attitude I was hoping to hear from an openssl dev :-(.

Empirically, we have plenty of protocol implementations that survive our deadlock torture test just fine, including previous versions of openssl.

And actually, now that I think about it, this deadlock thing is actually more of a practical problem than I realized, because it means that with openssl 1.1.1 we can no longer use the deadlock torture test to test protocols that run on top of TLS :-(

The only case where the bidirectional shutdown is required is when you want to resume the session. I currently don't see when only a one way shutdown shouldn't work, as long as you disable tickets.

Right, but here I'm talking about the case where you don't disable tickets. If tickets aren't disabled, then @mattcaswell pointed out in #6904 that the example client/server can be made to work again by turning on bidi shutdown on the client, so it's a possible workaround.

Any TLS 1.3 implementation that wants to support resumption really is going to have the same problem. And you really want to support resumption.

So for clients that only send data and never receive any, TLS 1.3 really forces you to either disable resumption or do a bidirectional shutdown.

My understanding is that bidi shutdown on the client is sufficient to prevent data loss in cases like my example client/server, but it isn't sufficient to support resumption, because openssl doesn't process session tickets that it receives after sending a close_notify.

In any case, sure, it would be nice if our example client/server could support resumption. It's too bad that with TLS 1.3, they can't, without further changes. But that doesn't mean it's OK to stop transmitting their data reliably! Our client/server here are only relying on the traditional guarantees made by TCP and all previous versions of openssl.

You really also want a client that only writes to support resumption. In that case if SSL_write() is never called, you can never resume the session.

In your quote you dropped my third bullet point, which partially addresses this?

Besides... current openssl has exactly the same flaw in practice: if you have a protocol where the server never calls SSL_write, then presumably the client never calls SSL_read, and so it doesn't matter whether the server sends the tickets during the handshake, the client's not going to read them anyway.

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 24, 2018

And you point it out yourself above that there is an unsafe pattern in TCP.

For sure. But it used to be that if you used the "safe pattern" at the application level, that would map onto the "safe pattern" at the TCP level, and vice-versa – that's a non-leaky abstraction. The regression is that now if you use the safe pattern at the application level, openssl will silently convert that into the unsafe pattern at the TCP level. That's a leaky abstraction.

The safe pattern for both TCP and TLS is to make sure that both ends agree all data has been sent before closing the connection. TLS has a mechanism for that, and it's to send the close notify in both directions.

You really can't skip sending the close notify. If you skip that you're vulnerable to a truncation attack. The other side should react to that with a protocol error.

Yeah, Trio supports two modes: by default it does unidirectional close_notify and expects unidirectional close_notify. Or, if you set https_compatible=True, then it neither sends nor expects close_notify, since this is required for interoperability with most real HTTPS servers. This is mostly orthogonal to the current issue though; the session tickets break both modes in the same way.

I don't know enough about http(s), but is it always clear from the protocol when it's finished sending the data? Is a truncation attack possible?

In https tests I've done I've seen any combination of close notify that's possible, including no and bidirectional.

The standards compliant case would be to receive the close notify before closing the tcp socket. The other side could send back a close notify alert, closing the connection without waiting for it can cause your unsafe TCP pattern. So this really is the unsafe TLS example.

Actually, no! It's not at all obvious, but AFAICT it really is true that safe patterns used to map to safe patterns and vice-versa, even if you have a mix of peers using unidirectional and bidi shutdown.

In your example, say that our program implements bidi shutdown, and we're talking to a peer that does unidirectional shutdown. It sends us a close_notify, and we send one back. You're right, our close_notify may provoke them to send a RST... but the fact that we've already received their close_notify means that we've already emptied our receive buffer, so their RST can't cause any harm. Disaster is narrowly averted.

TLS 1.3 makes it very explicit that sending a close notify only closes the write direction. The other peer is still allowed to send application data back. If it still wants to send application data, and not yet the close notify, you do have a problem. In TCP you can also do a shutdown(SHUT_WR), the connection isn't fully closed.

OpenSSL has supported that for a very long time, even when the older TLS standards didn't say that was supported.

I'm sure that you can cause any TCP connection to deadlock.

??? "Writing reliable network protocols is impossible, so there's no point in trying" is not the attitude I was hoping to hear from an openssl dev :-(.

That's not at all what I'm saying. I'm saying that if both sides of a connection are doing the wrong thing, it's very easy to cause a deadlock. If you think there is a bug in OpenSSL that makes it deadlock while talking to itself, file a bug. Example code would be nice.

My understanding is that bidi shutdown on the client is sufficient to prevent data loss in cases like my example client/server, but it isn't sufficient to support resumption, because openssl doesn't process session tickets that it receives after sending a close_notify.

We've fixed processing session tickets after sending close notify in #7114.

You really also want a client that only writes to support resumption. In that case if SSL_write() is never called, you can never resume the session.

In your quote you dropped my third bullet point, which partially addresses this?

Oh, you want a server that never calls SSL_write() to call some other function instead if it wants to be able to resume? That suddenly changes servers not to support resumption any more, and if the server changes to do it, the client still has the same problem that it needs to read the session.

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Dec 27, 2018

TLS 1.3 makes it very explicit that sending a close notify only closes the write direction

Oh wow, I'd missed this, and it's super exciting, thank you! That fixes the one place where it used to be impossible to abstract away the difference between TLS and other byte-stream transports. That's really my main concern here... I want to be able to write protocol code in a generic way, that works the same on TCP, TLS, or whatever other transport makes sense. The openssl 1.1.1 session ticket handling breaks that guarantee.

The safe pattern for both TCP and TLS is to make sure that both ends agree all data has been sent before closing the connection. TLS has a mechanism for that, and it's to send the close notify in both directions.

In the past it's always been safe to take any program that worked correctly using plain TCP, and switch it to use TLS with unidirectional close_notify. I guess you can argue that unidirectional close_notify was always "wrong" in some vague existential sense, but it was explicitly allowed by the RFCs and it worked in practice, which seems more important to me.

That's not at all what I'm saying. I'm saying that if both sides of a connection are doing the wrong thing, it's very easy to cause a deadlock. If you think there is a bug in OpenSSL that makes it deadlock while talking to itself, file a bug. Example code would be nice.

This is the bug you are asking me to file :-). My test suite literally started deadlocking when I upgraded to 1.1.1. It's a test where the client does SSL_do_handshakeSSL_write, while the server does SSL_do_handshakeSSL_read, and the transport layer between them has minimal buffering. This is the same communication pattern used by e.g. HTTP/1.1, and it worked fine with previous versions of openssl.

The deadlock happens because the server's SSL_do_handshake doesn't return until it finishes writing out the session tickets, but by the time the server starts trying to write the tickets, the client's SSL_do_handshake has already completed and the client has switched to SSL_write. Neither write can complete because neither side is reading → deadlock.

I could make a standalone reproducer (in Python, say), if that would be helpful, but the problem is very straightforward. As long as the server's SSL_do_handshake sends session tickets but the client's SSL_do_handshake doesn't read them, you have a deadlock hazard here.

We've fixed processing session tickets after sending close notify in #7114.

Oh, excellent, thanks for the update.

Oh, you want a server that never calls SSL_write() to call some other function instead if it wants to be able to resume? That suddenly changes servers not to support resumption any more, and if the server changes to do it, the client still has the same problem that it needs to read the session.

I think this is confusing and we need to break it down by cases :-)

Case 1: client/server where the server sends application data to the client:

  • On TLS 1.2 and earlier: works correctly and supports resumption
  • On TLS 1.3 with openssl's current session ticket heuristic: works correctly and supports resumption
  • On TLS 1.3 with the alternative session ticket heuristic I suggested: works correctly and supports resumption

Case 2: client/server where the server never sends application data to client, with non-bidirectional close:

  • On TLS 1.2 and earlier: works correctly and supports resumption
  • On TLS 1.3 with openssl's current session ticket heuristic: can lose data and doesn't support resumption. Can be fixed by modifying the client to use bidi shutdown.
  • On TLS 1.3 with the alternative session ticket heuristic I suggested: works correctly, but doesn't support resumption. Can be fixed by modifying the client to use bidi shutdown + the server to call SSL_write_tickets.

Case 3: client/server where the server never sends application data to the client, with bidi close:

  • On TLS 1.2 and earlier: works correctly and supports resumption
  • On TLS 1.3 with openssl's current session ticket heuristic: works correctly and supports resumption
  • On TLS 1.3 with the alternative session ticket heuristic I suggested: works correctly, but doesn't support resumption. Can be fixed by modifying the server to call SSL_write_tickets.

So both strategies have some downsides compared to the old TLS 1.2 way of doing things, but make slightly different trade-offs about who suffers if they don't update their programs, and what kind of suffering they experience. It seems like openssl is saying that it's OK if some "case 2" users lose data, because in exchange, some "case 3" users will get slightly lower latency.

My impression is that for existing apps, "case 1" is far more common than "case 2", and "case 2" is far more common than "case 3".

I don't think it's a good trade-off to sacrifice correctness in the relatively common case, in order to speed up the relatively uncommon case.

To be clear though, I'm not like, wedded to that particular proposal or anything. It's just one idea, and I totally get that these are tricky issues and TLS 1.3's new session ticket design forces implementors to make awkward trade-offs. But I think for something as fundamental and widely-used as openssl, it's worth getting the edge cases right.

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 27, 2018

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Dec 27, 2018

Is this the scenario you're worried about?

  1. Client sends all the data
  2. Client sends close_notify
  3. Client starts reading from socket waiting for close_notify in response
  4. Server receives the client's close_notify
  5. Server closes the socket, without sending its own close_notify

The server here has violated the spec – you aren't supposed to just close the socket like that. So the client TLS library should report an unclean shutdown. But... beyond that, I don't think anything changes? The server did receive all the data, and the client will safely read any session tickets before it sees a clean TCP-level shutdown and reports an unclean TLS-level shutdown.

I might not be understanding what you're worried about.

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 27, 2018

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Dec 28, 2018

Where I feel like I'm missing something is, I don't understand how this scenario relates to session tickets :-). It's generally true that peers shouldn't violate the spec and doing so has some consequences, but the consequences don't seem to be any different here than they would be in any other case?

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 28, 2018

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Dec 29, 2018

Note that you safe example of TLS with TLS 1.2 also produces the unsafe TCP pattern, the server will try to send back a close notify.

I think I already explained why this isn't true up above? I'll try explaining again in the hopes that I was just unclear, but if there's a deeper disagreement lmk.

The "unsafe TCP pattern" requires a very specific combination of things all happen together:

  1. Peer A sends data,
  2. ...and then closes the connection.
  3. Peer B sends data,
  4. ...and hasn't yet read all of the data that Peer A sent

With TLS 1.2, I think you're talking about the case where the client (peer A) sends a unidirectional close_notify and then closes its socket, and then the server (peer B) does what the spec says it should do: after it receives the client's close_notify, it sends back its own close_notify. This means that conditions 1, 2, and 3 are all satisfied... but condition 4 is not. If the server has received the client's close_notify, then by definition the server has already read all the client's data, and can't possibly lose any of it, no matter whether it sends its own close_notify back or not.

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 29, 2018

@kroeckx

This comment has been minimized.

Copy link
Member

@kroeckx kroeckx commented Dec 29, 2018

@njsmith

This comment has been minimized.

Copy link
Author

@njsmith njsmith commented Jan 3, 2019

But at the same time, the TCP connection is not closed properly, because the server will attempt to write a close notify back, and there will be some error. I think this is an unsafe pattern that just happens to work.

Yeah, I guess it's again a matter of definitions... I don't know if anyone intentionally designed this behavior, or if it's an accidental outcome of several different features in TCP and its common implementations. But either way, the behavior has been standard and documented for decades, and common protocols like websockets are designed around it.

Since there might be applications that do this, something needs to change. I'm currently not sure if that something is openssl, or the client application. And I currently don't know of any real affected client other than some test suite.

Yeah, I don't know of any real affected applications either, and I totally sympathize with your reluctance to change things based on a weird corner case like this. The reason I think this is important, though, is the way it breaks abstractions.

I'm not writing an application; I'm writing a generic networking library. I guess most applications that use TLS don't use the openssl C API directly, but go through some layer like Trio (my library), or the node.js tls module, or similar. With TLS 1.2, it was certainly difficult for networking library authors to understand all the details of the openssl API and to abstract away the differences between TCP vs. TLS-over-TCP, but it was possible, and that was a cost paid once by the networking libraries, instead of over and over by every applications developer.

Like you, I have no idea whether any of my users are writing applications that would be broken by the new session ticket behavior, so I have to assume the worst. If I can't abstract away the differences, then I have to take on the burden of figuring out exactly what openssl does and doesn't guarantee, how that translates into the higher-level API that my library provides, and then teach all my users about these rare corner cases, just case they happen to ever write an app that would run into them.

There's been a ton of work in recent years to make TLS more universal and accessible. Right now in Trio it's extremely simple... you write trio.open_tcp_stream(host, port) to make a regular TCP connection, and trio.open_ssl_over_tcp_stream(host, port) to make a TLS-protected TCP connection; they act identically after that. If you want to run TLS over an arbitrary transport, you just write new_stream = trio.ssl.SSLStream(transport_stream, ssl_context); exotic features like TLS over TLS are trivial. When people implement new protocols, it's very common to have bugs that sneak through testing because they never happen on loopback, and only rarely on real networks; in Trio you can flip a switch to test your protocol against simulated nasty network conditions to shake out these kinds of bugs.

But this all relies on being able to abstract away the difference between TCP and TLS. If that goes away, then it doesn't really matter that it's only in an obscure edge condition; we still risk losing all these features.

njsmith added a commit to njsmith/trio that referenced this issue Jan 7, 2019
It's currently a mess because of how openssl v1.1.1 handles session
tickets. There isn't consensus yet about whether this is an openssl
bug or what:

python-trio#819
openssl/openssl#7948
openssl/openssl#7967
njsmith added a commit to njsmith/trio that referenced this issue Jan 7, 2019
It's currently a mess because of how openssl v1.1.1 handles session
tickets. There isn't consensus yet about whether this is an openssl
bug or what:

python-trio#819
openssl/openssl#7948
openssl/openssl#7967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.