-
Notifications
You must be signed in to change notification settings - Fork 40.4k
ipvs support graceful termination #66012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipvs support graceful termination #66012
Conversation
/cc @jsravn @jhorwit2 @rramkumar1 |
@Lion-Wei: GitHub didn't allow me to request PR reviews from the following users: jsravn. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout? This should be available via the netfilter interface as active connections. When it reaches 0, it should be safe to remove the real server. |
Previously discussed at #64947 (comment). |
pkg/proxy/ipvs/proxier.go
Outdated
continue | ||
} | ||
portNum, err := strconv.Atoi(port) | ||
glog.V(5).Infof("new ep %q is in graceful delete list", uniqueRS) | ||
err := proxier.gracefuldeleteManager.DeleteRsImmediately(uniqueRS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this drop existing connections if an endpoint is flapping? For example, if a pod is under load and goes NotReady temporarily (indicating it's rejecting new connections, but still processing existing ones). I don't think we want to drop connections in this case. It should just remove it from the queue rather than deleting the real server first. And update the weight to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thanks. Update to weight=1 would be better.
But is there another situation, that both service and pod have been deleted, and user created another service is pod, match the original vs/rs in the delete list.
Just little worry about this case, seems like doesn't matter.
// Delete old endpoints | ||
for _, ep := range curEndpoints.Difference(newEndpoints).UnsortedList() { | ||
// if curEndpoint is in gracefulDelete, skip | ||
uniqueRS := vs.String() + "/" + ep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this identifier should either be created in graceful_delete.go (pass in vs and rs instead of the string), or defined higher up so it's not duplicated with line 1503.
It would be ideal if there is a way to detect if the connection is active. |
@jsravn @m1093782566 I think check active connections can be better. But I was worry about the reliable issue. Is no active connections enough to prove that rs can be safely removed? And is other connect status matters, e.g. FIN_WAIT/TIME_WAIT... And we need a new IPVS interface in pkg/util/ipvs to get realserver active connection count. |
@Lion-Wei For TCP connections, active count is the correct thing to use. If it is in FIN_WAIT/TIME_WAIT then it's been closed by one side so I think it's fine to close the connection at this point (after which, for FIN_WAIT at least, the one side will get a reset if they try to send packets). UDP is trickier. I think UDP only increments "InActCount", I would need to test this to be sure. So for UDP you would have to wait for InActCount to drop to 0. You can see here how ipvsadm retrieves those values, I guess it might require modifying the ipvs util package if it doesn't have it yet: https://git.kernel.org/pub/scm/utils/kernel/ipvsadm/ipvsadm.git/tree/libipvs/libipvs.c#n893. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments.
One observation: I see that the iptables implementation (#60074) takes in a command line arg which dictates the termination delay. How come we are not doing that here?
Also, that implementation spawns a goroutine for each endpoint, whereas you have the priority queue implementation. What was the reasoning for the queue?
pkg/proxy/ipvs/graceful_delete.go
Outdated
@@ -0,0 +1,250 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like a more appropriate name for the file would be graceful_termination.go. Any objections to that?
pkg/proxy/ipvs/graceful_delete.go
Outdated
} | ||
|
||
// GracefulDeleteRSQueueQueue is a priority heap where the lowest ProcessAt is at the front of the queue | ||
type GracefulDeleteRSQueueQueue []*GracefulDeleteRS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this gracefulTerminationQueue and make it package private? The queue is an implementation detail of the manager.
pkg/proxy/ipvs/graceful_delete.go
Outdated
|
||
// GracefulDeleteRS stores real server information and the process time. | ||
// If nothing special happened, real server will be delete after process time. | ||
type GracefulDeleteRS struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this queueItem or something like that? GracefulDeleteRS is a really confusing name .
pkg/proxy/ipvs/graceful_delete.go
Outdated
} | ||
|
||
// GracefulDeleteRSQueueQueue is a priority heap where the lowest ProcessAt is at the front of the queue | ||
type GracefulDeleteRSQueueQueue []*GracefulDeleteRS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: why not make the internal type of the heap a map with the key being the queue item and the value being the index? That way you can use the native properties of the map to preserve uniqueness and you should still be able to implement the sort and heap interfaces. Then, you won't need the UniqueQueue .Maybe it does not make sense but just wanted to throw that idea out there.
pkg/proxy/ipvs/graceful_delete.go
Outdated
return nil | ||
} | ||
|
||
func (m *GracefulDeleteManager) TryDeleteRs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this package private
pkg/proxy/ipvs/graceful_delete.go
Outdated
} | ||
} | ||
|
||
func (m *GracefulDeleteManager) RSInGracefulDelete(uniqueRS string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this PendingGracefulTermination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function to check whether rs is in graceful termination list. I think InTerminationList
would be better.
pkg/proxy/ipvs/graceful_delete.go
Outdated
return exist | ||
} | ||
|
||
func (m *GracefulDeleteManager) GracefulDeleteRS(vs *utilipvs.VirtualServer, rs *utilipvs.RealServer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this GracefullyTerminate?
@jsravn , I think use active connections to determine whether rs can be delete is a terrific idea, But I think this work gonna take some time, cause the vendor repo And I think with a graceful time is another solution, thought not the best. Maybe we can let this in to solve the problem first, and after @jhorwit2 @m1093782566 @rramkumar1 Thoughts? |
@rramkumar1 Recently I was busy in other work, sorry for the delay response. Also, I'm little hesitate about the struct, if we use a fifo queue, then we don't have to traversing all "grace terminating endpoint" to decide which should be delete, in large cluster we might have a lot of rs in "grace termination". |
@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier). |
We should be a bit careful about creating new API/flags, especially the new |
Quick update, turnout
|
Is anything left here, except for a rebase so the conflict is resolved? |
@akumria @jsravn @lbernail @m1093782566 , Connection based graceful termination is finished, please take another look. |
@Lion-Wei : this looks good. I think we'll test it as soon as it makes it to an alpha release |
@lbernail I think this should be cherry-pick to 1.11, that's necessary. |
/approve |
Thanks, @m1093782566 . |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Lion-Wei, m1093782566 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-integration |
Will this make it into a 1.12.x release? Along with #69267 |
@zachaller Yeah, I think so. |
What this PR does / why we need it:
Add a timed queue to handle ipvs graceful delete.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #57841
Special notes for your reviewer:
Release note: