Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should IPVS graceful termination ignore "inactive" connections? #81308

Open
andrewsykim opened this issue Aug 12, 2019 · 7 comments

Comments

@andrewsykim
Copy link
Member

commented Aug 12, 2019

What would you like to be added:
Currently the IPVS proxier considers both active and inactive connections when deciding if an IPVS real server should be deleted or not. For many TCP based applications, this results in the real server not being deleted until the graceful termination timeout is met (15m). I think it's worth re-evaluating if IPVS graceful termination should just ignore inactive connections when deciding if it should remove a real serve.r

Why is this needed:
Improves the graceful termination logic in the IPVS proxier by removing stale real servers faster.

@andrewsykim

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

/sig network
/assign @andrewsykim @m1093782566 @lbernail

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Inactive Connections are connections that are not Established any longer so they map to TCP connections that are closing (a FIN has been seen). By default they time out after 2 minutes. It's the tcpfin timeout parameter:

Timeout (tcp tcpfin udp): 900 120 300

This timer maps to the CLOSE_WAIT / TIME_WAIT TCP connection status so I think we should keep this behavior.

However there is a pathological situation where a node crashes and the real-servers it hosts never get the chance to send a FIN. In that case remote clients will keep retrying to send packets until TCP stops retrying. In this case, the connection remains in the Active state (no FIN/RST packets have been seen) and the connection will only be deleted after tcp_retries2 retransmissions, which usually means around 15 minutes (see: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)

For this situation, I think decreasing tcp_retries2 is an interesting option.

I also think we should decrease the default IPVS timeout values (especially for UDP) and/or make it configurable

@andrewsykim

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

Ah, this makes more sense, was originally confused by the rsGracefulDeletePeriod constant (constant is not used, deleted it here #81312).

For this situation, I think decreasing tcp_retries2 is an interesting option.

^ going to do some research on this option as well and report back here.

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

There is some background here: #77903
As well as here: #71358

@athenabot

This comment has been minimized.

Copy link

commented Aug 15, 2019

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@andrewsykim

This comment has been minimized.

Copy link
Member Author

commented Aug 15, 2019

/remove-triage unresolved

@juliantaylor

This comment has been minimized.

Copy link

commented Aug 22, 2019

This may be important as the current kube-proxy ipvs settings cause issues with deployments under high load from single clients, see #81775
Thought that issue causes active connections to continue existing on dead backends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.