-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
design proposalNeeds design doc/proposal before implementationNeeds design doc/proposal before implementationstalestalebot believes this issue/PR has not been touched recentlystalebot believes this issue/PR has not been touched recently
Description
Listener mode was working well in the old world:
- force close connection at certain time
- Use a number of bind_to_port = false listeners and expect very few listeners are updated.
However, I am seeing the tendency that
- Deprecate bind_to_port and migrate the small listeners into a listener with huge filter chain collection. We lose the ability to update a small listener. Instead the singleton listener is updated and drained entirely.
- In istio, a new service could be up and down and lead to listener updated. As per 1), all the envoy in the cluster could be experiencing listener update and all the connections are drained.
- For non-http connection there might be no way to close gently. E.g. a grpc streaming channel, or a mysql long running transaction.
- A vicious or a buggy xds implementation(e.g. undeterministic hashed listener config) could update a listener frequently and create endless draining listeners. The draining listener corpse will stay in the heap until drain timeout.
- Various legacy services treat connection as pet.
I am proposing changing the listener drain model:
- Allow unlimited drain timeout as long as connection is alive. Something like connections ref-count the listener.
- Instead of maintaining a drain time window, early announce the drain complete as long as no connection left (minor change)
- Audit the drainage, if the draining listener/connection is overloaded, force close connection and remove listener.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
design proposalNeeds design doc/proposal before implementationNeeds design doc/proposal before implementationstalestalebot believes this issue/PR has not been touched recentlystalebot believes this issue/PR has not been touched recently