Skip to content

tcp proxy: add new option to let tcp proxy check the drain close status#44567

Merged
wbpcode merged 10 commits into
envoyproxy:mainfrom
wbpcode:dev-fix-tcp-proxy
May 8, 2026
Merged

tcp proxy: add new option to let tcp proxy check the drain close status#44567
wbpcode merged 10 commits into
envoyproxy:mainfrom
wbpcode:dev-fix-tcp-proxy

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented Apr 22, 2026

Commit Message: tcp proxy: add new option to let tcp proxy check the drain close status
Additional Description:

To close #44419.

There are some other solutions in my mind, like to register a callback for every connection to the drain close manager. But it will introduce huge complexity to ensure all these callbacks are called correctly in correct threads and have impact to our core code tree.

So, I finally select the simplest one. This is clean, safe, easy to review/maintain, and kept similar logic with HCM.

Risk Level: low. touch core code but guarded by new proto API.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #44567 was opened by wbpcode.

see: more, trace.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Apr 22, 2026

/retest

* All tcp proxy stats. @see stats_macros.h
*/
#define ALL_TCP_PROXY_STATS(COUNTER, GAUGE) \
COUNTER(downstream_cx_drain_close) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this counter described in a doc?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's me take a check.

Comment thread api/envoy/extensions/filters/network/tcp_proxy/v3/tcp_proxy.proto Outdated
wbpcode added 4 commits April 29, 2026 02:53
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Comment thread source/common/tcp_proxy/tcp_proxy.cc
Copy link
Copy Markdown
Member

@agrawroh agrawroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, LGTM otherwise

agrawroh
agrawroh previously approved these changes May 2, 2026
@agrawroh
Copy link
Copy Markdown
Member

agrawroh commented May 2, 2026

/retest

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented May 4, 2026

I guess the CI failure have no business with this PR...

@wbpcode wbpcode requested a review from Copilot May 4, 2026 03:36
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented May 4, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check_drain_close configuration option to the TCP proxy filter. When enabled, the filter checks for a drain signal after each read or write operation and closes the downstream connection using FlushWrite if a drain is requested. The changes include updates to the API, documentation, implementation in tcp_proxy.cc, and comprehensive unit tests. A review comment suggests that the drain check in onData should be extended to cover active proxying paths where an upstream connection exists, ensuring consistent behavior.

Comment thread source/common/tcp_proxy/tcp_proxy.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in TCP proxy feature to proactively honor drain-close decisions during active TCP data flow, reducing the likelihood of reconnect storms at the end of listener drain.

Changes:

  • Add check_drain_close API field and wire it into TcpProxy::Filter to close downstream connections with FlushWrite when drain close is requested.
  • Add a new stat (downstream_cx_drain_close) and a new local close reason (tcp_proxy_drain_close) for observability.
  • Add/extend unit tests and update TCP proxy stats documentation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/mocks/server/factory_context.h Extend mock factory context with MockListenerInfo to support listener direction-dependent behavior.
test/mocks/server/factory_context.cc Provide default listenerInfo()/direction() behavior for tests.
test/extensions/filters/network/tcp_proxy/config_test.cc Add config parsing test for the new check_drain_close field.
test/common/tcp_proxy/tcp_proxy_test.cc Add unit coverage for drain-close behavior on downstream read/upstream write and for inbound-only scope selection.
source/common/tcp_proxy/tcp_proxy.h Add new stat and config accessors for drain-close decision/scope and the feature flag.
source/common/tcp_proxy/tcp_proxy.cc Implement drain-close check and downstream closure after read/write handling when enabled.
envoy/stream_info/stream_info.h Add TcpProxyDrainClose local close reason string.
docs/root/configuration/listeners/network_filters/tcp_proxy_filter.rst Document the new downstream drain-close counter.
changelogs/current.yaml Add release note entry describing the new TCP proxy drain-close check feature.
api/envoy/extensions/filters/network/tcp_proxy/v3/tcp_proxy.proto Add check_drain_close field and update next-free-field annotation.

Comment thread changelogs/current.yaml
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
// the downstream connection is closed with ``FlushWrite``.
//
// This is disabled by default for backward compatibility.
bool check_drain_close = 24;
Copy link
Copy Markdown
Contributor

@kyessenov kyessenov May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: does HTTP manager do something similar? Does it poll the state of connections or does it register a callback with the drain manager? I feel like the polling method adds non-determinism (e.g. active connections are closed but idle connection remain stale), which makes it hard to use.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HCM will poll the state of drain manager at the end of every single request. This also one reason why I choose the polling implementation.

I feel like the polling method adds non-determinism (e.g. active connections are closed but idle connection remain stale), which makes it hard to use.

Yeah. This is known problem/shortage. But callbacks solution will brings huge complexity to core code (callback lifetime management, thread safe, graceful draining and so on). I guess an appropriate idle timeout setting combine the polling should be good enough for most scenarios?

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented May 5, 2026

/retest

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@repokitteh-read-only repokitteh-read-only Bot removed the api label May 5, 2026
agrawroh
agrawroh previously approved these changes May 6, 2026
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall; just a few nits

/wait

Comment thread docs/root/configuration/listeners/network_filters/tcp_proxy_filter.rst Outdated
// after each read or write. When drain close is requested for the listener's traffic direction,
// the downstream connection is closed with ``FlushWrite``.
//
// This is disabled by default for backward compatibility.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enable by default and leave a runtime setting to temporarily change the default. It seems like much better behavior to have this enabled.

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if there is a case that the users want to keep the existing TCP connections as long time as possible by setting a long drain duration. Different with HTTP, we have no way to close the connection gracefully.
So, I slightly inclined to keep an option to control it. But I can change the bool to wrapped value so it's would be easier to change the default value in the future. WDYT?

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you strongly prefer to use runtime guard and enable it by default, I am also fine to that. Feel free to let me know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly, it was just a thought. I'm ok leaving it as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used a message here, you could have put a timer-based check as well in the future. I don't think we see TCP keep-alives in Envoy, unlike HTTP, so a connection can be stuck without drain or data without a timer.

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented May 8, 2026

This API is reviewed before. And latest one only change it to wrapped message.

@wbpcode wbpcode merged commit 4a9081f into envoyproxy:main May 8, 2026
29 of 30 checks passed
@wbpcode wbpcode deleted the dev-fix-tcp-proxy branch May 8, 2026 08:44
jiaweiw02 pushed a commit to jiaweiw02/envoy that referenced this pull request May 15, 2026
…us (envoyproxy#44567)

Commit Message: tcp proxy: add new option to let tcp proxy check the
drain close status
Additional Description:


To close envoyproxy#44419. 

There are some other solutions in my mind, like to register a callback
for every connection to the drain close manager. But it will introduce
huge complexity to ensure all these callbacks are called correctly in
correct threads and have impact to our core code tree.

So, I finally select the simplest one. This is clean, safe, easy to
review/maintain, and kept similar logic with HCM.

Risk Level: low. touch core code but guarded by new proto API.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

---------

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: Jiawei Wu <wujiawei@google.com>
alrzazz pushed a commit to alrzazz/envoy that referenced this pull request May 20, 2026
…us (envoyproxy#44567)

Commit Message: tcp proxy: add new option to let tcp proxy check the
drain close status
Additional Description:

To close envoyproxy#44419.

There are some other solutions in my mind, like to register a callback
for every connection to the drain close manager. But it will introduce
huge complexity to ensure all these callbacks are called correctly in
correct threads and have impact to our core code tree.

So, I finally select the simplest one. This is clean, safe, easy to
review/maintain, and kept similar logic with HCM.

Risk Level: low. touch core code but guarded by new proto API.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

---------

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: Alireza <alrzazz98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gradually drain for raw TCP connections to avoid reconnect storm

7 participants