When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

nyh · 2022-12-01T16:20:32Z

Seastar's HTTP sever has a content_length_limit feature. If enabled, and the client starts to send a request with an over-the-limit Content-Length, Seastar httpd immediately sends a "payload too large" response and closes the connection. The client might still be sending its very large request at that point, but when we close its connection after sending the response, the client is expected to read our response and handle it properly - not just report a write failure.

Our implementation almost works as expected, but not always. The problem is the "reset problem" explained in https://tools.ietf.org/html/rfc7230#section-6.6:

If a server performs an immediate close of a TCP connection, there is
a significant risk that the client will not be able to read the last
HTTP response. If the server receives additional data from the
client on a fully closed connection, such as another request that was
sent by the client before receiving the server's response, the
server's TCP stack will send a reset packet to the client;
unfortunately, the reset packet might erase the client's
unacknowledged input buffers before they can be read and interpreted
by the client's HTTP parser.

To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only
the write side of the read/write connection. The server then
continues to read from the connection until it receives a
corresponding close by the client, or until the server is reasonably
certain that its own TCP stack has received the client's
acknowledgement of the packet(s) containing the server's last
response. Finally, the server fully closes the connection.

ScyllaDB has a test which uses Seastar's content_length_limit feature and expects the HTTP client to read an orderly error response. As shown in scylladb/scylladb#12143, in some rare cases instead of getting the error response, the client gets an RST exactly as explained above, and cannot read the error response and instead gets an exception:

E           urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

To fix this, Seastar needs to do what is explained above - after sending the payload-too-large reponse, it shouldn't immediately close the connnection, but instead half-close it (the equivalent of shutdown(SHUT_WR)), wait until all written bytes have been acknowledged by the client's TCP stack (the equivalent of the SIOCOUTQ ioctl, but how do we wait for this event?). Only then, we can fully close the connection.

The funtion we need to fix is generate_error_reply_and_close(). The Seastar http server implementation uses it in the content_length_limit case, and also in other cases.

The text was updated successfully, but these errors were encountered:

avikivity · 2022-12-01T17:05:55Z

Why not continue reading (and drop the data) until the sender stops transmitting?

In a recent commit 757d2a4, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now *usually* passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes scylladb#12143 Refs scylladb#12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com>

In a recent commit 757d2a4, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now *usually* passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened #12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes #12143 Refs #12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12169

nyh · 2022-12-04T10:12:35Z

Why not continue reading (and drop the data) until the sender stops transmitting?

That's a good question, which we discussed several times already. This is definitely a possible option and there's nothing "wrong" with it. But I believe it is not the best solution. Imagine that an application wants to limit uploads (PUT, or POST) to 1MB, and a user comes and tries to upload a 1GB file to this application. With your proposal, the user may sit happily for an hour while the 1GB is being uploaded - only to discover at the very end that Seastar was simply ignoring everything that the user sent, and sends an error at the very end. Instead, the behavior we have right now is that the user's upload will be interrupted right at the beginning (with Content-Length) or after uploading 1MB (if the sender uses Chunked-Encoding with small chunks). I think it's better. Reading the entire 1GB is not only wasteful, I think it is also bad user experience.

In a recent commit 757d2a4, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now *usually* passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes scylladb#12143 Refs scylladb#12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb#12169

nyh added bug http labels Dec 1, 2022

nyh mentioned this issue Dec 1, 2022

Alternator's response to oversized request may not reach the client because of an RST scylladb/scylladb#12166

Open

nyh mentioned this issue Dec 1, 2022

alternator-test: xfail a flaky test exposing a known bug scylladb/scylladb#12169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

nyh commented Dec 1, 2022

avikivity commented Dec 1, 2022

nyh commented Dec 4, 2022 •

edited

When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

Comments

nyh commented Dec 1, 2022

avikivity commented Dec 1, 2022

nyh commented Dec 4, 2022 • edited

nyh commented Dec 4, 2022 •

edited