Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proxier/ipvs] Disable graceful termination for UDP traffic #77802

Merged
merged 1 commit into from May 29, 2019

Conversation

@lbernail
Copy link
Contributor

commented May 13, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:
Disable graceful termination for non-TCP flows. The rational behind this is that with a high number of UDP connections (DNS or syslog for instance), stale UDP connections (to terminated/terminating backends) will be reused and fail (and if the load is high enough never expire).

This patch disables graceful termination for UDP. It is not perfect, but I believe it is much better. What happens is the following:

  • UDP target pod becomes NotReady
  • IP is removed from RealServers immediately
  • Because of expire_nodest_conn any new packet reusing the port will generate a write error and the connection will be removed from the ipvs connection table (if the application retries it will be routed to an active pod)

Which issue(s) this PR fixes:
Fixes #76664

Special notes for your reviewer:
I think in addition to that we should set much more aggressive UDP timeouts (5s? instead of the default: 300s) and make it configurable with a flag for specific use-case (I haven't seen a use case for long lived udp "connections" with very low amount of traffic).

I'm not a big fan of strings.ToLower(string(v1.ProtocolTCP)). Any thoughts on this?

Does this PR introduce a user-facing change?:

IPVS: Disable graceful termination for UDP traffic to solve issues with high number of UDP connections (DNS / syslog in particular)

/assign @m1093782566
/sig network
/area IPVS

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

Hi @lbernail. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lbernail

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

/ok-to-test

@m1093782566

This comment has been minimized.

Copy link
Member

commented May 15, 2019

LGTM overall.

cc @andrewsykim for review.

// Delete TCP RS with no connections
// For other protocols, don't perform graceful termination
// (existing connections will be deleted because sysctlExpireNoDestConn=1)
if rsToDelete.VirtualServer.Protocol == strings.ToLower(string(v1.ProtocolTCP)) && rs.ActiveConn+rs.InactiveConn != 0 {

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 15, 2019

Member

re: strings.ToLower(string(v1.ProtocolTCP)), v1.ProtocolTCP is mainly a type used by the Service API and is only applicable when comparing values in that resource. For comparing protocols from VirtualServer, we can probably use a separate constant for this.

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 15, 2019

Member

I personally think just rsToDelete.VirtualServer.Protocol == "tcp" is also fine, having a constant may not be necessary either :)

This comment has been minimized.

Copy link
@lbernail

lbernail May 15, 2019

Author Contributor

I completely agree
I just changed it to "tcp"

// For TCP, InactiveConn are connections not in ESTABLISHED state
if rs.ActiveConn+rs.InactiveConn != 0 {
// Delete TCP RS with no connections
// For other protocols, don't perform graceful termination

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 15, 2019

Member

Does this also apply for SCTP?

This comment has been minimized.

Copy link
@lbernail

lbernail May 15, 2019

Author Contributor

I'm not familiar with SCTP so I chose to enable graceful termination for TCP only

I had a quick look at the protocol and it seems to include a handshake and a shutdown which should make connection tracking similar to TCP (so when a backend stops, the equivalent of a FIN is sent and the connection should be removed from the IPVS conntrack).

If you have a simple test scenario in mind I can look into it (the change is very simple we can simply do rsToDelete.VirtualServer.Protocol != "udp" instead)

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 15, 2019

Member

I'm not super familiar with it either, was more asking in case you knew :P. I think handling this on case-by-case basis makes sense. We can revisit if it's a problem for SCTP. Going to give another pass at this PR some time tomorrow to make sure we're not missing any edge cases :)

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 18, 2019

Member

I put more thought into this, and I think to be safe, we should only follow this scenarios for protocols we know for sure run into issue (UDP). So rsToDelete.VirtualServer.Protocol != "udp" like you mentioned. Otherwise this could have unintended outcomes for sctp that we aren't aware of.

This comment has been minimized.

Copy link
@andrewsykim

andrewsykim May 18, 2019

Member

i.e. don't change the behavior for sctp unless we have a good reason.

This comment has been minimized.

Copy link
@lbernail

lbernail May 27, 2019

Author Contributor

Let's do that. Not sure how to test it but I'm pretty sure SCTP will have real connection tracking

@bowei

This comment has been minimized.

Copy link
Member

commented May 18, 2019

Is there a way to add a test?

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 19, 2019

@bowei added some initial unit tests for graceful termination here #78088 to keep the scope of this PR smaller :)

@jsravn

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

Are you sure about expire_nodest_conn? I thought because we set conn_reuse_mode=0, that flag has no effect - it is effectively 0.

@lbernail

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2019

Are you sure about expire_nodest_conn? I thought because we set conn_reuse_mode=0, that flag has no effect - it is effectively 0.

Yes, I thought that too, but it turns out the doc is either wrong or incomplete. I tested it and it works

@lbernail

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

/test pull-kubernetes-kubemark-e2e-gce-big

@m1093782566

This comment has been minimized.

Copy link
Member

commented May 28, 2019

/lgtm

Please squash the commits @lbernail

@k8s-ci-robot k8s-ci-robot added the lgtm label May 28, 2019

@lbernail lbernail force-pushed the DataDog:lbernail/no-graceful-udp branch from 4ed9a69 to 9ff0685 May 28, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 28, 2019

@lbernail

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Please squash the commits @lbernail

Sure, just did it

@m1093782566

This comment has been minimized.

Copy link
Member

commented May 28, 2019

/lgtm

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label May 28, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernail, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 944a7e2 into kubernetes:master May 29, 2019

20 checks passed

cla/linuxfoundation lbernail authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

k8s-ci-robot added a commit that referenced this pull request May 31, 2019

Merge pull request #78569 from DataDog/automated-cherry-pick-of-#7780…
…2-upstream-release-1.12

Automated cherry pick of #77802 upstream release 1.12

k8s-ci-robot added a commit that referenced this pull request Jun 1, 2019

Merge pull request #78567 from DataDog/automated-cherry-pick-of-#7780…
…2-upstream-release-1.14

Automated cherry pick of #77802 upstream release 1.14

k8s-ci-robot added a commit that referenced this pull request Jun 4, 2019

Merge pull request #78568 from DataDog/automated-cherry-pick-of-#7780…
…2-upstream-release-1.13

Automated cherry pick of #77802 upstream release 1.13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.