New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable graceful termination for UDP flows when using kube-proxy in IPVS mode #71515
Enable graceful termination for UDP flows when using kube-proxy in IPVS mode #71515
Conversation
Help distinguish UDP and TCP RS (useful for DNS which uses both)
The current logic is to delete a RS if the number of active connections is 0. This makes sense for TCP but for UDP the number of active connections is always 0. This is an issue for DNS queries because the RS will be deleted but the IPVS connection will remain until it expires (5mn by default) and if there are a lot of DNS queries, the port will be reused and queries blackholed. Of course for this to work properly the service needs to continue to serve queries until the connections expire (this works fine with the lameduck option of coredns).
Hi @lbernail. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -75,10 +75,10 @@ func (q *graceTerminateRSList) remove(rs *listItem) bool { | |||
|
|||
uniqueRS := rs.String() | |||
if _, ok := q.list[uniqueRS]; ok { | |||
return false | |||
delete(q.list, uniqueRS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lion-Wei Can you check this piece of code?
LGTM /milestone v1.13 /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lbernail, m1093782566 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Adding /priority critical-urgent as there are some related outstanding issues. /lgtm @Lion-Wei given the priority, I am going to merge this PR. Please feel free to put your comments here. |
@lbernail @m1093782566 I'm so sorry for this mistake, thanks for caching this. |
I don't think we have tests for graceful termination yet. Happy to help work on them |
Sounds like a good plan. We need to fix the UDP graceful termination first as there are some outstanding issues.
Great! |
/hold cancel |
@lbernail gofmt please. /lgtm Will LGTM again when you push the update to fix go fmt. |
5024875
to
b11233a
Compare
go fmt fixed |
/lgtm |
/hold |
@lbernail @m1093782566 was this PR meant for 1.13? we no longer accept merges into 1.13 branch and only extremely critical urgent fixes can to be CP'ed if needed. |
@lbernail @m1093782566 this fix looks rather important, but we need to assess the criticality of this fix and how stable it is to merge for 1.13.0 (release is on Monday). Let us know about the details:
I think this can go in, and be reverted if it introduces any CI problems due to lack of time. |
I think it can be merged in 1.13.1. |
@tpepper and @aleksandra-malinowska to consider this as a possible 1.13.1 patch release candidate. /milestone clear /hold cancel |
…5-upstream-release-1.12 Automated cherry pick of #71515 upstream release 1.12
…5-upstream-release-1.13 Automated cherry pick of #71515 upstream release 1.13
…5-upstream-release-1.11 Automated cherry pick of #71515 upstream release 1.11
What type of PR is this?
/kind bug
What this PR does / why we need it:
The current graceful termination logic is to delete a RS if the number of active connections is 0. This makes sense for TCP but for UDP the number of active connections is always 0. This is an issue for DNS queries because the RS will be deleted but the IPVS connection will remain until it expires (5mn by default) and if there are a lot of DNS queries, the port will be reused and queries blackholed.
Of course for this to work properly the service needs to continue to serve queries until the connections expire (this works fine with the lameduck option of coredns).
Which issue(s) this PR fixes *
Fixes #71514
Special notes for your reviewer:
In addition to this fix, the PR also includes a small fix in the
delete
function which was never removing entries from thegraceTerminateRSList
and a small improvement in logging to use the full RS name in logs to identify the service associated with it (helpful for DNS because when a pod is deleted both the TCP and UDP endpoints are removed). I can of course create separate PRs if neededDoes this PR introduce a user-facing change?:
/assign @m1093782566