Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete response when using HTTP2+CONNECT tunneling #28329

Open
zhenqxuMSFT opened this issue Jul 11, 2023 · 5 comments
Open

Incomplete response when using HTTP2+CONNECT tunneling #28329

zhenqxuMSFT opened this issue Jul 11, 2023 · 5 comments
Labels
bug no stalebot Disables stalebot from closing an issue question Questions that are neither investigations, bugs, nor enhancements

Comments

@zhenqxuMSFT
Copy link

Description:
envoy version: 1.25.6

We are using HTTP2+CONNECT to tunnel TCP connections over an HTTP2 connection. The setup looks like:

client --HTTP--> envoy1 --TCP over HTTP2 tunnel---> envoy2 --HTTP--> server

client makes HTTP 1.1 requests.
envoy1 runs a tcp_proxy filter and multiplex the tcp connection on an HTTP2 stream.
envoy2 runs 2 HCMs. The first HCM demultiplexes the HTTP2 stream and feeds to an internal listener with a second HCM. The second HCM routes the original HTTP request to upstream server.
The server serves large binary content using chunk encoding.

When client makes a request without "Connection: close" header, everything works fine. However, if it makes a request with "Connection: close" header, sometimes the response is incomplete - the tail of the response is not received.
With curl an error message shows: transfer closed with outstanding read data remaining

I did some investigation here. There are 5 connections involved:
C1: stands for client -> envoy1
C2: stands for envoy1 -> envoy2 external listener C2.log
C3: stands for envoy2 first HCM -> upstream (internal listener) C3.log
C4: stands for downstream (envoy2 first HCM) -> internal listener C4.log
C5: stands for envoy2 second HCM -> server C5.log
C3 and C4 are physically same connection but logically 2 connections from envoy2 perspective

From logs, I can tell that C5 received whole response from server and C4 wrote whole response. Also, C3 got the whole response. The problem is though C2 successfully encoded the whole response but C2 failed to write all the data before resetting the stream.
I also did a packet capture at envoy1 side and it shows envoy2 reset the stream thus following DATA frames are not being sent.
image

key logs from C2:
[2023-07-11 12:33:45.331][17][trace][http] [source/common/http/conn_manager_impl.cc:1606] [C2][S4803202076355283748] encoding data via codec (size=16384 end_stream=false)
[2023-07-11 12:33:45.332][17][debug][router] [source/common/router/upstream_request.cc:500] [C2][S4803202076355283748] resetting pool request
[2023-07-11 12:33:45.332][17][trace][http] [source/common/http/conn_manager_impl.cc:1606] [C2][S4803202076355283748] encoding data via codec (size=8597 end_stream=true)
[2023-07-11 12:33:45.332][17][debug][http] [source/common/http/conn_manager_impl.cc:212] [C2][S4803202076355283748] doEndStream() resetting stream
[2023-07-11 12:33:45.332][17][debug][http] [source/common/http/conn_manager_impl.cc:1649] [C2][S4803202076355283748] stream reset
[2023-07-11 12:33:45.332][17][trace][http2] [source/common/http/http2/codec_impl.cc:809] [C2] deferred reset stream
[2023-07-11 12:33:45.332][17][trace][http2] [source/common/http/http2/codec_impl.cc:1316] [C2] about to send frame type=3, flags=0
[2023-07-11 12:33:45.332][17][trace][http2] [source/common/http/http2/codec_impl.cc:1338] [C2] send data: bytes=13
[2023-07-11 12:33:45.332][17][trace][connection] [source/common/network/connection_impl.cc:483] [C2] writing 13 bytes, end_stream false
[2023-07-11 12:33:45.332][17][trace][http2] [source/common/http/http2/codec_impl.cc:1218] [C2] sent frame type=3, stream_id=1, length=4
[2023-07-11 12:33:45.332][17][debug][http2] [source/common/http/http2/codec_impl.cc:1247] [C2] sent reset code=0
[2023-07-11 12:33:45.332][17][debug][http2] [source/common/http/http2/codec_impl.cc:1363] [C2] stream 1 closed: 0

Though log says 'deferred reset stream' but actually there's no delay, it sends the RST immediately, causing nghttp2 to drop pending DATA frames.

Some questions:

  1. I see here suggested using nghttp2_on_frame_send_callback to ensure DATA being sent before RST. But it seems envoy is not using it. Is there any special reason here?
    case NGHTTP2_DATA: {
    // This should be the case since we're sending these frames. It's possible
    // that codec fuzzers would incorrectly send frames for non-existent streams
    // which is why this is not an assert.
    if (stream != nullptr) {
    const bool end_stream_sent = flags & NGHTTP2_FLAG_END_STREAM;
    stream->local_end_stream_sent_ = end_stream_sent;
    if (end_stream_sent) {
    stream->onEndStreamEncoded();
    }
    }
    break;
    }
  2. I do not quite understand the logic of deferred reset stream. It seems to me that if a stream is set to be deferred reset, any send operation on any stream (even not same one) can cause the stream to be reset? Because sendPendingFrames() always iterates over all the deferred reset streams and resets them. Does the 'defer' take effect in fact?
  3. (Correct me if this does not make sense) Today when the upstream closes the connection, the stream is reset because stream.filter_manager_.remoteDecodeComplete() is false. It makes sense for normal HTTP2 requests to reset the stream when request is not completed. However,for CONNECT, it's always false unless client ends the tunnel. Also, from CONNECT request perspective, the request is already completed so should we handle it in the same way as normal HTTP2 request completed?
@zhenqxuMSFT zhenqxuMSFT added bug triage Issue requires triage labels Jul 11, 2023
@alyssawilk
Copy link
Contributor

my guess is it's related to this
https://github.com/envoyproxy/envoy/blob/main/source/common/http/conn_manager_impl.cc#L258
because that's the main code difference between the connection close case and not. That said I have no idea how the internal listener handles fast connection close cc @kyessenov for how that might cause problems.

@alyssawilk alyssawilk added question Questions that are neither investigations, bugs, nor enhancements and removed triage Issue requires triage labels Jul 12, 2023
@zhenqxuMSFT
Copy link
Author

zhenqxuMSFT commented Jul 12, 2023

@alyssawilk It's not related to the internal listener. I also tested it without internal listener, the issue still persists. In that test the envoy2 setup looks like:
--HTTP2--> HCM --HTTP1.1--> (normal listener) --> HCM ------->

Also, https://github.com/envoyproxy/envoy/blob/main/source/common/http/conn_manager_impl.cc#L258 happens after the RST and the first HCM's saw_connection_close_ should be false (because there's no 'Connection: close' header from the first HCM's perspective). So it may not be the cause. The reset happens here:

stream.response_encoder_->getStream().resetStream(StreamResetReason::LocalReset);

The difference between connection close and not is whether C4 is closed after write flush complete. If C4 is closed then the stream in the first HCM is reset due to the HTTP2+CONNECT case. The second HCM is behaving properly in the whole request flow.

@zhenqxuMSFT
Copy link
Author

@alyssawilk @kyessenov comments are appreciated.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 16, 2023
@ahmelsayed
Copy link

@alyssawilk @kyessenov would appreciate commenting on this or resetting the stale bot.

@kyessenov kyessenov added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug no stalebot Disables stalebot from closing an issue question Questions that are neither investigations, bugs, nor enhancements
Projects
None yet
Development

No branches or pull requests

4 participants