-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closing a TLS socket's input stream unexpectedly breaks its output stream as well #906
Comments
This is not a bug, it is a design decision. Shutdown/close of streams issue a "proper" tls shutdown, which effectively closes both lines of communication. Specifically https://datatracker.ietf.org/doc/html/rfc2246#section-7.2.1 - one may not actually close only half of the comm line for a tls socket, other than if you are the initiator of full comm close - and even then you should really wait for the ack. |
This is a bug in Seastar - either in Seastar's TLS or HTTP, but it's not "not a bug": The Seastar HTTP server code, as it is currently written, has one fiber that reads requests and builds a list of replies to send, and a second fiber which actually sends these replies. The read fiber closes what it thinks is its read-only input stream when it doesn't want to read any more - but at that point, the reply fiber might have not have sent the last response. The reply fiber still holds on to what it thinks is its write-only output stream, and successfully (it thinks) writes to it. You can say that the HTTP server can be rewritten not to do this - we can move the close() of both input and output streams till after both fibers are done. But I think we shouldn't have to do this - I claim that the TLS API is broken and should be fixed instead: TLS gives you separate input and output streams, and the Seastar API dictates that you must If you do this change I'm proposing (TLS should close the session only when both input and output streams are |
While I agree with you mostly, the code is like it is because otherwise certain parts of a certain database breaks/broke when using transparent tls, because it did not do proper close of both streams (even while having them), resulting in TLS warnings due to improperly terminated connections all the time. |
I think we should do this change - I think it's better than fixing httpd to work around this problem and also documenting this problem for other users to know. In any case we need to do one of these changes to Seastar. One thing you can do is to add a log message whenever a TLS session is destroyed without being closed first, so code that has this problem (i.e., closes one of the streams but forgets to close the other) becomes aware that it needs to fix this problem. |
Turns out that this bug was already discovered more than four years ago, in #232. It was fixed for normal TCP sockets, but never for TLS sockets. |
Note to self: If we don't fix this issue quickly (like #232 has been gathering dust for four years), we should quickly:
|
@nyh so what's the verdict? https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/835/testReport/alternator_tests/AlternatorTest/test_tls_connection/ keeps failing... |
@bhalevy I'm not familiar with this dtest, but I don't see how can be related to this issue, which is caused by code (e.g., Seastar's HTTP server's error handling) which writes a reply in parallel to closing the connection. Unless the probing done by this test gets a 404 (does it?) I don't think this is the case. Do you have Scylla logs for this failing test that may shed additional light? |
@nyh the node log is https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/835/artifact/logs-all.release.2/1623060985816_alternator_tests.AlternatorTest.test_tls_connection/node1.log The test merely probes the connection using the https url:
|
I opened https://github.com/scylladb/scylla-dtest/issues/2170 for this dtest failure. |
This issue was first exposed in the context of Scylla's scylladb/scylladb#8691 which uses Seastar's HTTP server with SSL, but appears to be a bug in our TLS layer, not limited only to that scenario.
The underlying bug appears to be that closing a TLS connection's input stream, immediately closes its output stream as well, which shouldn't be the case: It's fine to continue writing to a socket output stream after closing its input stream, and Seastar's HTTP server implementation assumes this to be allowed:
In some cases, the Seastar's HTTP server decides not allow reusing this connection, and close it. In this case,
httpd.cc
's functionconnection::read()
stops reading more requests, and ends with:return _read_buf.close();
However, at that point, the HTTP server still has not written the response to the socket. The HTTP server has a separate fiber sending the reply, and it turns out that if it does this after the
_read_buf.close()
happened`, the output will not be sent. Although the handler already formatted a reply, the client will never receive it and it will appear to the client like an empty (and seemingly valid) reply.Note that this bug does not exist with ordinary TCP sockets - the HTTP server can close the read stream and still write to the write stream - that last write only gets lost when a TLS socket is used.
To demonstrate that this is a race where output to the TLS socket after
_read_buf.close()
gets lost, I added a short delay before calling_read_buf.close()
inhttpd.cc
, and indeed, the reply no longer gets lost. The following two versions work (the first probably works by chance, cause it depends on which orders continuations get run):CC @elcallio @amnonh
The text was updated successfully, but these errors were encountered: