Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable graceful termination for UDP flows when using kube-proxy in IPVS mode #71515

Merged
merged 4 commits into from
Nov 30, 2018

Conversation

lbernail
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
The current graceful termination logic is to delete a RS if the number of active connections is 0. This makes sense for TCP but for UDP the number of active connections is always 0. This is an issue for DNS queries because the RS will be deleted but the IPVS connection will remain until it expires (5mn by default) and if there are a lot of DNS queries, the port will be reused and queries blackholed.

Of course for this to work properly the service needs to continue to serve queries until the connections expire (this works fine with the lameduck option of coredns).

Which issue(s) this PR fixes *
Fixes #71514

Special notes for your reviewer:
In addition to this fix, the PR also includes a small fix in the delete function which was never removing entries from the graceTerminateRSList and a small improvement in logging to use the full RS name in logs to identify the service associated with it (helpful for DNS because when a pod is deleted both the TCP and UDP endpoints are removed). I can of course create separate PRs if needed

Does this PR introduce a user-facing change?:

UDP connections now support graceful termination in IPVS mode

/assign @m1093782566

Help distinguish UDP and TCP RS (useful for DNS which uses both)
The current logic is to delete a RS if the number of active connections
is 0. This makes sense for TCP but for UDP the number of active
connections is always 0. This is an issue for DNS queries because the RS
will be deleted but the IPVS connection will remain until it expires
(5mn by default) and if there are a lot of DNS queries, the port will be
reused and queries blackholed. Of course for this to work properly the
service needs to continue to serve queries until the connections expire
(this works fine with the lameduck option of coredns).
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 28, 2018
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 28, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @lbernail. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 28, 2018
@m1093782566
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 29, 2018
@@ -75,10 +75,10 @@ func (q *graceTerminateRSList) remove(rs *listItem) bool {

uniqueRS := rs.String()
if _, ok := q.list[uniqueRS]; ok {
return false
delete(q.list, uniqueRS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lion-Wei Can you check this piece of code?

@m1093782566
Copy link
Contributor

LGTM

/milestone v1.13

/approve

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Nov 29, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernail, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2018
@m1093782566
Copy link
Contributor

/retest

@m1093782566
Copy link
Contributor

m1093782566 commented Nov 29, 2018

Adding

/priority critical-urgent

as there are some related outstanding issues.

/lgtm

@Lion-Wei given the priority, I am going to merge this PR. Please feel free to put your comments here.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 29, 2018
@Lion-Wei
Copy link

@lbernail @m1093782566 I'm so sorry for this mistake, thanks for caching this.
All looks good to me.

@lbernail
Copy link
Contributor Author

I don't think we have tests for graceful termination yet. Happy to help work on them

@m1093782566
Copy link
Contributor

We can limit this PR to UDP if you prefer and discuss TCP later?

Sounds like a good plan. We need to fix the UDP graceful termination first as there are some outstanding issues.

I don't think we have tests for graceful termination yet. Happy to help work on them

Great!

@m1093782566
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2018
@m1093782566
Copy link
Contributor

m1093782566 commented Nov 29, 2018

@lbernail gofmt please.

/lgtm

Will LGTM again when you push the update to fix go fmt.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2018
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2018
@lbernail
Copy link
Contributor Author

go fmt fixed
sorry about that

@m1093782566
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2018
@AishSundar
Copy link
Contributor

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2018
@AishSundar
Copy link
Contributor

@lbernail @m1093782566 was this PR meant for 1.13? we no longer accept merges into 1.13 branch and only extremely critical urgent fixes can to be CP'ed if needed.

@nikopen
Copy link
Contributor

nikopen commented Nov 29, 2018

@lbernail @m1093782566 this fix looks rather important, but we need to assess the criticality of this fix and how stable it is to merge for 1.13.0 (release is on Monday). Let us know about the details:

  • Is this a new bug on 1.13-alpha or is it longstanding / should be backported to previous releases?
  • How well has it been tested / what are the possibilities of release-blocking CI tests failing?
  • Can it be merged for 1.13.1?

I think this can go in, and be reverted if it introduces any CI problems due to lack of time.

@m1093782566
Copy link
Contributor

I think it can be merged in 1.13.1.

@AishSundar
Copy link
Contributor

@tpepper and @aleksandra-malinowska to consider this as a possible 1.13.1 patch release candidate.

/milestone clear

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2018
@k8s-ci-robot k8s-ci-robot removed this from the v1.13 milestone Nov 30, 2018
@k8s-ci-robot k8s-ci-robot merged commit 25c9ac6 into kubernetes:master Nov 30, 2018
k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018
…5-upstream-release-1.12

Automated cherry pick of #71515 upstream release 1.12
k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018
…5-upstream-release-1.13

Automated cherry pick of #71515 upstream release 1.13
k8s-ci-robot added a commit that referenced this pull request Dec 10, 2018
…5-upstream-release-1.11

Automated cherry pick of #71515 upstream release 1.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In IPVS mode, kube-proxy doesn't handle graceful termination of UDP flows
6 participants