reverse_tunnels: intercept RPING on upstream reverse connections#43666
reverse_tunnels: intercept RPING on upstream reverse connections#43666agrawroh merged 1 commit intoenvoyproxy:mainfrom
Conversation
Signed-off-by: aakugan <aakashganapathy2@gmail.com>
e5dd925 to
f248e1e
Compare
|
@aakugan can you detail a little more on the steps leading to this issue? Is this a race between upstream receiving a RPONG in plaintext from downstream, after having sent a data request on a HTTP/2 stream? |
|
Yes @basundhara-c that is correct under heavy load / latency we observed that the connection may get upgraded ie PRI... was sent and the codec is waiting for the settings frame before the downstream RPONG is received which is then directly passed to the client_codec_ causing this issue. |
|
/assign @botengyao |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the RPING handling logic into a new RpingInterceptor class, centralizing the logic and fixing an issue where RPING messages could interfere with HTTP/2 connection upgrades on upstream reverse connections. However, the RpingInterceptor::read() implementation introduces critical flaws that lead to protocol desynchronization and data corruption. Specifically, it leaks partial internal protocol bytes to the caller and returns incorrect byte counts after draining messages, which can cause connection failures and potentially be exploited to bypass protocol-level security checks. Additionally, a pre-existing correctness issue regarding partial RPING message handling has been carried over, which could lead to further protocol errors in downstream filters.
| if (buffer.length() == 0) { | ||
| return Api::IoCallUint64Result{expected, Api::IoError::none()}; | ||
| } | ||
|
|
||
| // RPING followed by application data. Disable echo and return the remaining data. | ||
| ENVOY_LOG(trace, | ||
| "RpingInterceptor: received application data after RPING, " | ||
| "disabling RPING echo for FD: {}", | ||
| fd_); | ||
| ping_echo_active_ = false; | ||
| // The adjusted return value is the number of bytes excluding the drained RPING. It should be | ||
| // transparent to upper layers that the RPING was processed. | ||
| const uint64_t adjusted = | ||
| (result.return_value_ >= expected) ? (result.return_value_ - expected) : 0; | ||
| return Api::IoCallUint64Result{adjusted, Api::IoError::none()}; | ||
| } | ||
|
|
||
| // If partial data could be the start of RPING (only when fewer than expected bytes). | ||
| if (len < expected) { | ||
| const absl::string_view rping_prefix = | ||
| ReverseConnectionUtility::PING_MESSAGE.substr(0, static_cast<size_t>(len)); | ||
| if (peek_sv == rping_prefix) { | ||
| ENVOY_LOG(trace, | ||
| "RpingInterceptor: partial RPING received ({} bytes), waiting " | ||
| "for more.", | ||
| len); | ||
| return result; // Wait for more data. |
There was a problem hiding this comment.
The RpingInterceptor::read() implementation contains critical flaws that lead to protocol desynchronization and data corruption.
Specifically, at line 58, when a partial RPING is read, the interceptor incorrectly returns the result to the caller. This exposes internal protocol bytes ("RPI...") to upper-layer protocols (like HTTP/2 or TLS), causing protocol errors and connection termination. The interceptor should completely hide RPING messages, including partial ones, by returning an EAGAIN error to signal to the caller that it should not process the buffer and should wait for more data. The partial data will remain in the buffer for the next read operation to complete the message.
Additionally, there are issues with incorrect return values on drain (lines 32-34) where the function incorrectly informs the caller that data was added when the buffer is empty, and data desynchronization on adjusted return values (lines 44-46) where the adjusted return value calculation does not account for bytes read in previous partial reads. These issues collectively cause the Envoy network stack to lose synchronization with the data stream, leading to connection failures and potential exploitation of protocol-level vulnerabilities.
Applying the fix for partial RPING handling will require updating tests that check for partial RPING handling, such as ChoppedRpingCompletesAndDrainsInSingleBuffer in rping_interceptor_test.cc and ReadPartialDataAndStateTransitions in downstream_reverse_connection_io_handle_test.cc.
| if (buffer.length() == 0) { | |
| return Api::IoCallUint64Result{expected, Api::IoError::none()}; | |
| } | |
| // RPING followed by application data. Disable echo and return the remaining data. | |
| ENVOY_LOG(trace, | |
| "RpingInterceptor: received application data after RPING, " | |
| "disabling RPING echo for FD: {}", | |
| fd_); | |
| ping_echo_active_ = false; | |
| // The adjusted return value is the number of bytes excluding the drained RPING. It should be | |
| // transparent to upper layers that the RPING was processed. | |
| const uint64_t adjusted = | |
| (result.return_value_ >= expected) ? (result.return_value_ - expected) : 0; | |
| return Api::IoCallUint64Result{adjusted, Api::IoError::none()}; | |
| } | |
| // If partial data could be the start of RPING (only when fewer than expected bytes). | |
| if (len < expected) { | |
| const absl::string_view rping_prefix = | |
| ReverseConnectionUtility::PING_MESSAGE.substr(0, static_cast<size_t>(len)); | |
| if (peek_sv == rping_prefix) { | |
| ENVOY_LOG(trace, | |
| "RpingInterceptor: partial RPING received ({} bytes), waiting " | |
| "for more.", | |
| len); | |
| return result; // Wait for more data. | |
| // Return EAGAIN to prevent the caller from processing the partial RPING data. | |
| // The data remains in the buffer and will be processed with more data on the next read. | |
| return Api::IoCallUint64Result(0, IoSocketError::create(SOCKET_ERROR_AGAIN)); |
There was a problem hiding this comment.
RPING is only 5 bytes and is usually sent in a single TCP segment, so partial reads are unlikely in practice. Returning EAGAIN would need e2e testing and possibly upstream Envoy changes, so I’d prefer to handle that in a follow-up enhancement and don’t think we should block this PR on it.
agrawroh
left a comment
There was a problem hiding this comment.
@basundhara-c We should do a follow-up to remove the partial case if we think it can't happen in practice and see if we could return EAGAIN as suggested.
Commit Message
Intercept RPING on upstream reverse connections
Additional Description
If a RPING is sent and then the connection is upgraded to http2 via the conn pool the downstream reverse connection io handle's rping response causes a 502 protocol error.
More specifically: Remote peer returned unexpected data while we expected SETTINGS frame
Additional logs:
The Alert was a custom log line added by me for easier debugging.
Its better for both the downstream and upstream rc io handles to share a common read implementation.
Testing
Unit tests.
Additionally, manually validated the example in the docs which also works after the change.