Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When HTTP server sends an error reply, don't close the connection before client TCP stack acks the response #1325

Open
nyh opened this issue Dec 1, 2022 · 2 comments

Comments

@nyh
Copy link
Contributor

nyh commented Dec 1, 2022

Seastar's HTTP sever has a content_length_limit feature. If enabled, and the client starts to send a request with an over-the-limit Content-Length, Seastar httpd immediately sends a "payload too large" response and closes the connection. The client might still be sending its very large request at that point, but when we close its connection after sending the response, the client is expected to read our response and handle it properly - not just report a write failure.

Our implementation almost works as expected, but not always. The problem is the "reset problem" explained in https://tools.ietf.org/html/rfc7230#section-6.6:

If a server performs an immediate close of a TCP connection, there is
a significant risk that the client will not be able to read the last
HTTP response. If the server receives additional data from the
client on a fully closed connection, such as another request that was
sent by the client before receiving the server's response, the
server's TCP stack will send a reset packet to the client;
unfortunately, the reset packet might erase the client's
unacknowledged input buffers before they can be read and interpreted
by the client's HTTP parser.

To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only
the write side of the read/write connection. The server then
continues to read from the connection until it receives a
corresponding close by the client, or until the server is reasonably
certain that its own TCP stack has received the client's
acknowledgement of the packet(s) containing the server's last
response. Finally, the server fully closes the connection.

ScyllaDB has a test which uses Seastar's content_length_limit feature and expects the HTTP client to read an orderly error response. As shown in scylladb/scylladb#12143, in some rare cases instead of getting the error response, the client gets an RST exactly as explained above, and cannot read the error response and instead gets an exception:

E           urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

To fix this, Seastar needs to do what is explained above - after sending the payload-too-large reponse, it shouldn't immediately close the connnection, but instead half-close it (the equivalent of shutdown(SHUT_WR)), wait until all written bytes have been acknowledged by the client's TCP stack (the equivalent of the SIOCOUTQ ioctl, but how do we wait for this event?). Only then, we can fully close the connection.

The funtion we need to fix is generate_error_reply_and_close(). The Seastar http server implementation uses it in the content_length_limit case, and also in other cases.

@avikivity
Copy link
Member

Why not continue reading (and drop the data) until the sender stops transmitting?

nyh added a commit to nyh/scylla that referenced this issue Dec 1, 2022
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.

Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.

Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.

Fixes scylladb#12143
Refs scylladb#12166
Refs scylladb/seastar#1325

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
avikivity pushed a commit to scylladb/scylladb that referenced this issue Dec 1, 2022
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.

Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened #12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.

Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.

Fixes #12143
Refs #12166
Refs scylladb/seastar#1325

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12169
@nyh
Copy link
Contributor Author

nyh commented Dec 4, 2022

Why not continue reading (and drop the data) until the sender stops transmitting?

That's a good question, which we discussed several times already. This is definitely a possible option and there's nothing "wrong" with it. But I believe it is not the best solution. Imagine that an application wants to limit uploads (PUT, or POST) to 1MB, and a user comes and tries to upload a 1GB file to this application. With your proposal, the user may sit happily for an hour while the 1GB is being uploaded - only to discover at the very end that Seastar was simply ignoring everything that the user sent, and sends an error at the very end. Instead, the behavior we have right now is that the user's upload will be interrupted right at the beginning (with Content-Length) or after uploading 1MB (if the sender uses Chunked-Encoding with small chunks). I think it's better. Reading the entire 1GB is not only wasteful, I think it is also bad user experience.

MeepoYang pushed a commit to storage-scsg/scylladb that referenced this issue Apr 28, 2024
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.

Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.

Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.

Fixes scylladb#12143
Refs scylladb#12166
Refs scylladb/seastar#1325

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb#12169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants