New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable graceful termination for UDP flows when using kube-proxy in IPVS mode #71515

Merged
merged 4 commits into from Nov 30, 2018

Conversation

@lbernail
Copy link
Contributor

lbernail commented Nov 28, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
The current graceful termination logic is to delete a RS if the number of active connections is 0. This makes sense for TCP but for UDP the number of active connections is always 0. This is an issue for DNS queries because the RS will be deleted but the IPVS connection will remain until it expires (5mn by default) and if there are a lot of DNS queries, the port will be reused and queries blackholed.

Of course for this to work properly the service needs to continue to serve queries until the connections expire (this works fine with the lameduck option of coredns).

Which issue(s) this PR fixes *
Fixes #71514

Special notes for your reviewer:
In addition to this fix, the PR also includes a small fix in the delete function which was never removing entries from the graceTerminateRSList and a small improvement in logging to use the full RS name in logs to identify the service associated with it (helpful for DNS because when a pod is deleted both the TCP and UDP endpoints are removed). I can of course create separate PRs if needed

Does this PR introduce a user-facing change?:

UDP connections now support graceful termination in IPVS mode

/assign @m1093782566

lbernail added some commits Nov 28, 2018

[kube-proxy/ipvs] Display RS complete name in logs
Help distinguish UDP and TCP RS (useful for DNS which uses both)
[kube-proxy/ipvs] Handle UDP graceful termination
The current logic is to delete a RS if the number of active connections
is 0. This makes sense for TCP but for UDP the number of active
connections is always 0. This is an issue for DNS queries because the RS
will be deleted but the IPVS connection will remain until it expires
(5mn by default) and if there are a lot of DNS queries, the port will be
reused and queries blackholed. Of course for this to work properly the
service needs to continue to serve queries until the connections expire
(this works fine with the lameduck option of coredns).
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 28, 2018

Hi @lbernail. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/network and removed needs-sig labels Nov 28, 2018

@k8s-ci-robot k8s-ci-robot requested review from Lion-Wei and thockin Nov 28, 2018

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

/ok-to-test

@@ -75,10 +75,10 @@ func (q *graceTerminateRSList) remove(rs *listItem) bool {

uniqueRS := rs.String()
if _, ok := q.list[uniqueRS]; ok {
return false
delete(q.list, uniqueRS)

This comment has been minimized.

@m1093782566

m1093782566 Nov 29, 2018

Member

@Lion-Wei Can you check this piece of code?

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

LGTM

/milestone v1.13

/approve

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Nov 29, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 29, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernail, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

/retest

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

Adding

/priority critical-urgent

as there are some related outstanding issues.

/lgtm

@Lion-Wei given the priority, I am going to merge this PR. Please feel free to put your comments here.

@Lion-Wei

This comment has been minimized.

Copy link
Contributor

Lion-Wei commented Nov 29, 2018

@lbernail @m1093782566 I'm so sorry for this mistake, thanks for caching this.
All looks good to me.

@lbernail

This comment has been minimized.

Copy link
Contributor

lbernail commented Nov 29, 2018

I don't think we have tests for graceful termination yet. Happy to help work on them

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

We can limit this PR to UDP if you prefer and discuss TCP later?

Sounds like a good plan. We need to fix the UDP graceful termination first as there are some outstanding issues.

I don't think we have tests for graceful termination yet. Happy to help work on them

Great!

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

/hold cancel

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

@lbernail gofmt please.

/lgtm

Will LGTM again when you push the update to fix go fmt.

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 29, 2018

@lbernail lbernail force-pushed the DataDog:lbernail/udp-graceful branch from 5024875 to b11233a Nov 29, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label Nov 29, 2018

@lbernail

This comment has been minimized.

Copy link
Contributor

lbernail commented Nov 29, 2018

go fmt fixed
sorry about that

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 29, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 29, 2018

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Nov 29, 2018

/hold

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Nov 29, 2018

@lbernail @m1093782566 was this PR meant for 1.13? we no longer accept merges into 1.13 branch and only extremely critical urgent fixes can to be CP'ed if needed.

@nikopen

This comment has been minimized.

Copy link
Member

nikopen commented Nov 29, 2018

@lbernail @m1093782566 this fix looks rather important, but we need to assess the criticality of this fix and how stable it is to merge for 1.13.0 (release is on Monday). Let us know about the details:

  • Is this a new bug on 1.13-alpha or is it longstanding / should be backported to previous releases?
  • How well has it been tested / what are the possibilities of release-blocking CI tests failing?
  • Can it be merged for 1.13.1?

I think this can go in, and be reverted if it introduces any CI problems due to lack of time.

@m1093782566

This comment has been minimized.

Copy link
Member

m1093782566 commented Nov 30, 2018

I think it can be merged in 1.13.1.

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Nov 30, 2018

@tpepper and @aleksandra-malinowska to consider this as a possible 1.13.1 patch release candidate.

/milestone clear

/hold cancel

@k8s-ci-robot k8s-ci-robot removed this from the v1.13 milestone Nov 30, 2018

@k8s-ci-robot k8s-ci-robot merged commit 25c9ac6 into kubernetes:master Nov 30, 2018

18 checks passed

cla/linuxfoundation lbernail authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details

k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018

Merge pull request #71847 from DataDog/automated-cherry-pick-of-#7151…
…5-upstream-release-1.12

Automated cherry pick of #71515 upstream release 1.12

k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018

Merge pull request #71881 from DataDog/automated-cherry-pick-of-#7151…
…5-upstream-release-1.13

Automated cherry pick of #71515 upstream release 1.13

k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018

Merge pull request #71848 from DataDog/automated-cherry-pick-of-#7151…
…5-upstream-release-1.11

Automated cherry pick of #71515 upstream release 1.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment