Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipvs support graceful termination #66012

Merged
merged 2 commits into from Sep 28, 2018
Merged

ipvs support graceful termination #66012

merged 2 commits into from Sep 28, 2018

Conversation

Lion-Wei
Copy link

@Lion-Wei Lion-Wei commented Jul 10, 2018

What this PR does / why we need it:
Add a timed queue to handle ipvs graceful delete.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #57841

Special notes for your reviewer:

Release note:

IPVS proxier mode now support connection based graceful termination.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 10, 2018
@Lion-Wei
Copy link
Author

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

@k8s-ci-robot
Copy link
Contributor

@Lion-Wei: GitHub didn't allow me to request PR reviews from the following users: jsravn.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Lion-Wei Lion-Wei changed the title ipvs support graceful termination [wip]ipvs support graceful termination Jul 10, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2018
@jsravn
Copy link
Contributor

jsravn commented Jul 10, 2018

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout? This should be available via the netfilter interface as active connections. When it reaches 0, it should be safe to remove the real server.

@jsravn
Copy link
Contributor

jsravn commented Jul 10, 2018

Previously discussed at #64947 (comment).

continue
}
portNum, err := strconv.Atoi(port)
glog.V(5).Infof("new ep %q is in graceful delete list", uniqueRS)
err := proxier.gracefuldeleteManager.DeleteRsImmediately(uniqueRS)
Copy link
Contributor

@jsravn jsravn Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this drop existing connections if an endpoint is flapping? For example, if a pod is under load and goes NotReady temporarily (indicating it's rejecting new connections, but still processing existing ones). I don't think we want to drop connections in this case. It should just remove it from the queue rather than deleting the real server first. And update the weight to 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thanks. Update to weight=1 would be better.
But is there another situation, that both service and pod have been deleted, and user created another service is pod, match the original vs/rs in the delete list.
Just little worry about this case, seems like doesn't matter.

// Delete old endpoints
for _, ep := range curEndpoints.Difference(newEndpoints).UnsortedList() {
// if curEndpoint is in gracefulDelete, skip
uniqueRS := vs.String() + "/" + ep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this identifier should either be created in graceful_delete.go (pass in vs and rs instead of the string), or defined higher up so it's not duplicated with line 1503.

@m1093782566
Copy link
Contributor

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout?

It would be ideal if there is a way to detect if the connection is active.

@Lion-Wei
Copy link
Author

@jsravn @m1093782566 I think check active connections can be better. But I was worry about the reliable issue. Is no active connections enough to prove that rs can be safely removed? And is other connect status matters, e.g. FIN_WAIT/TIME_WAIT...

And we need a new IPVS interface in pkg/util/ipvs to get realserver active connection count.

@jsravn
Copy link
Contributor

jsravn commented Jul 10, 2018

@Lion-Wei For TCP connections, active count is the correct thing to use. If it is in FIN_WAIT/TIME_WAIT then it's been closed by one side so I think it's fine to close the connection at this point (after which, for FIN_WAIT at least, the one side will get a reset if they try to send packets).

UDP is trickier. I think UDP only increments "InActCount", I would need to test this to be sure. So for UDP you would have to wait for InActCount to drop to 0.

You can see here how ipvsadm retrieves those values, I guess it might require modifying the ipvs util package if it doesn't have it yet: https://git.kernel.org/pub/scm/utils/kernel/ipvsadm/ipvsadm.git/tree/libipvs/libipvs.c#n893.

Copy link
Contributor

@rramkumar1 rramkumar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments.

One observation: I see that the iptables implementation (#60074) takes in a command line arg which dictates the termination delay. How come we are not doing that here?

Also, that implementation spawns a goroutine for each endpoint, whereas you have the priority queue implementation. What was the reasoning for the queue?

@@ -0,0 +1,250 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a more appropriate name for the file would be graceful_termination.go. Any objections to that?

}

// GracefulDeleteRSQueueQueue is a priority heap where the lowest ProcessAt is at the front of the queue
type GracefulDeleteRSQueueQueue []*GracefulDeleteRS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this gracefulTerminationQueue and make it package private? The queue is an implementation detail of the manager.


// GracefulDeleteRS stores real server information and the process time.
// If nothing special happened, real server will be delete after process time.
type GracefulDeleteRS struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this queueItem or something like that? GracefulDeleteRS is a really confusing name .

}

// GracefulDeleteRSQueueQueue is a priority heap where the lowest ProcessAt is at the front of the queue
type GracefulDeleteRSQueueQueue []*GracefulDeleteRS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: why not make the internal type of the heap a map with the key being the queue item and the value being the index? That way you can use the native properties of the map to preserve uniqueness and you should still be able to implement the sort and heap interfaces. Then, you won't need the UniqueQueue .Maybe it does not make sense but just wanted to throw that idea out there.

return nil
}

func (m *GracefulDeleteManager) TryDeleteRs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this package private

}
}

func (m *GracefulDeleteManager) RSInGracefulDelete(uniqueRS string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this PendingGracefulTermination?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function to check whether rs is in graceful termination list. I think InTerminationList would be better.

return exist
}

func (m *GracefulDeleteManager) GracefulDeleteRS(vs *utilipvs.VirtualServer, rs *utilipvs.RealServer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this GracefullyTerminate?

@Lion-Wei
Copy link
Author

@jsravn , I think use active connections to determine whether rs can be delete is a terrific idea, But I think this work gonna take some time, cause the vendor repo docker/libnetwork maybe need talk about it, and discuss whether this is necessary.
Anyway, I raised an issue in docker/libnetwork. moby/libnetwork#2237

And I think with a graceful time is another solution, thought not the best. Maybe we can let this in to solve the problem first, and after docker/libnetwork side work finished, then use the better way.

@jhorwit2 @m1093782566 @rramkumar1 Thoughts?

@Lion-Wei
Copy link
Author

@rramkumar1 Recently I was busy in other work, sorry for the delay response.
Firstly, I don't think we need a parameter for termination delay time, cause users should not worry about it.
And I don't think each rs have an goroutine is a good idea, cause that could potentially cause a lot of goroutines in a large cluster with a lot of pod churn.

Also, I'm little hesitate about the struct, if we use a fifo queue, then we don't have to traversing all "grace terminating endpoint" to decide which should be delete, in large cluster we might have a lot of rs in "grace termination".

@jsravn
Copy link
Contributor

jsravn commented Jul 19, 2018

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

@m1093782566
Copy link
Contributor

m1093782566 commented Jul 20, 2018

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

We should be a bit careful about creating new API/flags, especially the new timeout flag would be deprecated very soon when/if we add connection detection support. I am not sure how urgent it is but configure timeout and detect connections they are different behaviors, we need to know the effects to users.

@Lion-Wei Lion-Wei changed the title [wip]ipvs support graceful termination ipvs support graceful termination Jul 20, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 20, 2018
@Lion-Wei
Copy link
Author

Quick update, turnout docker/libnetwork have the same problem, and they are using the same way to solve it.

Once a service is disabled we are down weighting it to 0 so that new connection are not going to be LB on that, and we are removing the ipvs rule once the container exit. The time between the weight to 0 and the removal of the ipvs rule is managed by the graceful shutdown period.
moby/libnetwork#2237 (comment)

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Aug 25, 2018
@akumria
Copy link

akumria commented Aug 30, 2018

Is anything left here, except for a rebase so the conflict is resolved?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2018
@Lion-Wei
Copy link
Author

Lion-Wei commented Sep 5, 2018

@akumria @jsravn @lbernail @m1093782566 , Connection based graceful termination is finished, please take another look.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 5, 2018
@lbernail
Copy link
Contributor

lbernail commented Sep 6, 2018

@Lion-Wei : this looks good. I think we'll test it as soon as it makes it to an alpha release
Do you think this can make it to the 1.11 branch?

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 16, 2018
@Lion-Wei
Copy link
Author

@lbernail I think this should be cherry-pick to 1.11, that's necessary.
But first we need to get this merged. : (

@m1093782566
Copy link
Contributor

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 27, 2018
@Lion-Wei
Copy link
Author

Thanks, @m1093782566 .
@jsravn @lbernail @rramkumar1 Can anyone help me add a lgtm, then we can get this merged. 😃

@m1093782566
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lion-Wei, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Lion-Wei
Copy link
Author

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 8ea6b2c into kubernetes:master Sep 28, 2018
@zachaller
Copy link
Contributor

zachaller commented Oct 8, 2018

Will this make it into a 1.12.x release? Along with #69267

@Lion-Wei
Copy link
Author

Lion-Wei commented Oct 8, 2018

@zachaller Yeah, I think so.

k8s-ci-robot added a commit that referenced this pull request Oct 24, 2018
#66012-#69268-upstream-release-1.12

Automated cherry pick of #68874: update vendor #66012: ipvs support graceful termination #69268: fix bugs introduced by ipvs graceful termination
k8s-ci-robot added a commit that referenced this pull request Nov 8, 2018
#66012-#69268-upstream-release-1.11

Automated cherry pick of #68874: update vendor #66012: ipvs support graceful termination #69268: fix bugs introduced by ipvs graceful termination
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ipvs proxier doesn't respect graceful termination
9 participants