Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing a TLS socket's input stream unexpectedly breaks its output stream as well #906

Open
nyh opened this issue May 23, 2021 · 10 comments

Comments

@nyh
Copy link
Contributor

nyh commented May 23, 2021

This issue was first exposed in the context of Scylla's scylladb/scylladb#8691 which uses Seastar's HTTP server with SSL, but appears to be a bug in our TLS layer, not limited only to that scenario.

The underlying bug appears to be that closing a TLS connection's input stream, immediately closes its output stream as well, which shouldn't be the case: It's fine to continue writing to a socket output stream after closing its input stream, and Seastar's HTTP server implementation assumes this to be allowed:

In some cases, the Seastar's HTTP server decides not allow reusing this connection, and close it. In this case, httpd.cc's function connection::read() stops reading more requests, and ends with:

return _read_buf.close();

However, at that point, the HTTP server still has not written the response to the socket. The HTTP server has a separate fiber sending the reply, and it turns out that if it does this after the _read_buf.close() happened`, the output will not be sent. Although the handler already formatted a reply, the client will never receive it and it will appear to the client like an empty (and seemingly valid) reply.

Note that this bug does not exist with ordinary TCP sockets - the HTTP server can close the read stream and still write to the write stream - that last write only gets lost when a TLS socket is used.

To demonstrate that this is a race where output to the TLS socket after _read_buf.close() gets lost, I added a short delay before calling _read_buf.close() in httpd.cc, and indeed, the reply no longer gets lost. The following two versions work (the first probably works by chance, cause it depends on which orders continuations get run):

return skip_entire_stream(_read_buf).then([this] {
    return _read_buf.close();
});
return sleep(1s).then([this] {
    return _read_buf.close();
});

CC @elcallio @amnonh

@elcallio
Copy link
Contributor

This is not a bug, it is a design decision. Shutdown/close of streams issue a "proper" tls shutdown, which effectively closes both lines of communication. Specifically https://datatracker.ietf.org/doc/html/rfc2246#section-7.2.1 - one may not actually close only half of the comm line for a tls socket, other than if you are the initiator of full comm close - and even then you should really wait for the ack.

@nyh
Copy link
Contributor Author

nyh commented May 24, 2021

This is a bug in Seastar - either in Seastar's TLS or HTTP, but it's not "not a bug":

The Seastar HTTP server code, as it is currently written, has one fiber that reads requests and builds a list of replies to send, and a second fiber which actually sends these replies. The read fiber closes what it thinks is its read-only input stream when it doesn't want to read any more - but at that point, the reply fiber might have not have sent the last response. The reply fiber still holds on to what it thinks is its write-only output stream, and successfully (it thinks) writes to it.

You can say that the HTTP server can be rewritten not to do this - we can move the close() of both input and output streams till after both fibers are done. But I think we shouldn't have to do this - I claim that the TLS API is broken and should be fixed instead: TLS gives you separate input and output streams, and the Seastar API dictates that you must close() each of those before you stop using them - but those "stop using them" don't necessarily happen at the same time. I'm not saying there is actually a way to close half a TLS connection - rather I'm saying that the TLS code should wait until both input and output streams were closed before closing the session. Think of this as the user asking to close the streams (a requirement of Seastar) - not a request to "close half a connection". We shouldn't abort the session when the first of the two streams is closed.

If you do this change I'm proposing (TLS should close the session only when both input and output streams are close()ed) another benefit is "least surprise". The HTTP server was written this way - sending a reply after closing the input stream - because this works well in TCP so nobody had to think on when to close the input stream. So making it work also with TLS simply reduces one more chance for a developer to be surprise when some server code they are writing suddenly doesn't work with TLS.

@elcallio
Copy link
Contributor

While I agree with you mostly, the code is like it is because otherwise certain parts of a certain database breaks/broke when using transparent tls, because it did not do proper close of both streams (even while having them), resulting in TLS warnings due to improperly terminated connections all the time.
I can make the change, but it might have breaking impact on usages.

@nyh
Copy link
Contributor Author

nyh commented May 24, 2021

While I agree with you mostly, the code is like it is because otherwise certain parts of a certain database breaks/broke when using transparent tls, because it did not do proper close of both streams (even while having them), resulting in TLS warnings due to improperly terminated connections all the time.
I can make the change, but it might have breaking impact on usages.

I think we should do this change - I think it's better than fixing httpd to work around this problem and also documenting this problem for other users to know. In any case we need to do one of these changes to Seastar.

One thing you can do is to add a log message whenever a TLS session is destroyed without being closed first, so code that has this problem (i.e., closes one of the streams but forgets to close the other) becomes aware that it needs to fix this problem.

@nyh
Copy link
Contributor Author

nyh commented May 24, 2021

Turns out that this bug was already discovered more than four years ago, in #232. It was fixed for normal TCP sockets, but never for TLS sockets.

@nyh
Copy link
Contributor Author

nyh commented May 24, 2021

Note to self: If we don't fix this issue quickly (like #232 has been gathering dust for four years), we should quickly:

  1. Document this difference in a comment somewhere (I'm not even sure where).
  2. Fix the Seastar HTTP server not to rely on whether or not this issue is fixed. In other words, the HTTP server should close() both input and output stream together, in the same place, when it knows both aren't needed any more.

@bhalevy
Copy link
Member

bhalevy commented Jun 8, 2021

@nyh
Copy link
Contributor Author

nyh commented Jun 8, 2021

@bhalevy I'm not familiar with this dtest, but I don't see how can be related to this issue, which is caused by code (e.g., Seastar's HTTP server's error handling) which writes a reply in parallel to closing the connection. Unless the probing done by this test gets a 404 (does it?) I don't think this is the case. Do you have Scylla logs for this failing test that may shed additional light?

@bhalevy
Copy link
Member

bhalevy commented Jun 8, 2021

@nyh the node log is https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/835/artifact/logs-all.release.2/1623060985816_alternator_tests.AlternatorTest.test_tls_connection/node1.log

The test merely probes the connection using the https url:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/835/testReport/alternator_tests/AlternatorTest/test_tls_connection/

Traceback (most recent call last):
  File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.7/unittest/case.py", line 645, in run
    testMethod()
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/alternator_tests.py", line 757, in test_tls_connection
    self.prepare_dynamodb_cluster(num_of_nodes=3, is_encrypted=True)
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/alternator_utils.py", line 257, in prepare_dynamodb_cluster
    self._add_api_for_node(node=node, timeout=timeout)
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/alternator_utils.py", line 212, in _add_api_for_node
    self.wait_for_alternator(node=node, timeout=timeout)
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/alternator_utils.py", line 207, in wait_for_alternator
    raise TimeoutError(f"Timed out probing {node_urls} after {timeout} seconds")
TimeoutError: Timed out probing {'node1': 'https://127.0.21.1:8043/'} after 300 seconds

@nyh
Copy link
Contributor Author

nyh commented Jun 8, 2021

I opened https://github.com/scylladb/scylla-dtest/issues/2170 for this dtest failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants