listener_manager: implement filter-chain drain-close callback#44932
listener_manager: implement filter-chain drain-close callback#44932jronak wants to merge 1 commit into
Conversation
|
See #44567 for TCP connection draining. NOTE, for now we prefer to let the connection to check the drain flag actively (same with HTTP's design) rather than to the callbacks because:
|
|
/wait |
Wire addOnDrainCloseCb through listener and per-filter-chain factory contexts so drain-close signaling is consistent for server shutdown, LDS listener removal, and in-place filter chain updates. Per-listener drain managers are children of the server drain manager; listener factory context forwards registration to the listener drain manager; each per-filter-chain context keeps local callbacks and registers with the parent drain decision once, with startDraining() for chains removed in-place. Add unit tests for cascade, idempotency, reuse vs removal, and late registration. Follow-up: use this hook in tcp_proxy to drain live connections with jitter over the drain window in a separate change. Test: bazel test //test/common/listener_manager:filter_chain_manager_impl_test Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
258c6c2 to
1cff095
Compare
Ah nice #44567, I have something similar in our internal fork, sharing a couple of things that didn't work great with the flag-check approach in case it's worth keeping the callback:
I'm thinking with the callback manager, callback is pushed into connections impacted by drain with it's delay value and tcp-proxy can setup the connection closure after the delay. Also sorry, idk why I thought CallbackManager was thread aware. Updated the PR to manage callbacks to be thread aware, lmk if it still makes sense to proceed. |
|
@wbpcode waiting for your response. |
I guess the graceful draining should be helpful for this cases because every connections will get different drain flag based on the a time based percentage?
I think this is common problem for inactive connections. My previous initial plan is to design a new timer to check flag periodically if necessary, which could keep the drain flag as the unique way to know the drain status. But I am open to this.
A drain manager should be better if we can have a great design to avoid the impact to core code. Let me take a full passthrough to the Envoy's current drain manager design to see the performance and code impact. And sorry for my delay! 🙇 Slack me directly if I missed your message again! |
Commit Message: listener_manager: implement filter-chain drain-close callback
Additional Description: Today, Envoy network filters cannot reliably react to drain events.
addOnDrainCloseCbon the network filter factory context is a no-op, so filters miss server shutdown, listener removal via LDS, and in-place filter chain replacement. This change implementsaddOnDrainCloseCballowing the network filters to setup/register drain callbacks where the callbacks to the filters are invoked with a random drain time to spread the cleanups.As a follow-up, we will use this hook in
tcp_proxyto drain live TCP connections gradually over the configured drain window instead of letting them sit until forced shutdown.Risk Level: Low
Testing: Integration and Unit
Docs Changes: Updated
Release Notes: NA
Platform Specific Features: NA