New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipvs support graceful termination #66012

Merged
merged 2 commits into from Sep 28, 2018

Conversation

@Lion-Wei
Contributor

Lion-Wei commented Jul 10, 2018

What this PR does / why we need it:
Add a timed queue to handle ipvs graceful delete.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #57841

Special notes for your reviewer:

Release note:

IPVS proxier mode now support connection based graceful termination.
@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Jul 10, 2018

Contributor

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

Contributor

Lion-Wei commented Jul 10, 2018

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Jul 10, 2018

Contributor

@Lion-Wei: GitHub didn't allow me to request PR reviews from the following users: jsravn.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor

k8s-ci-robot commented Jul 10, 2018

@Lion-Wei: GitHub didn't allow me to request PR reviews from the following users: jsravn.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jsravn @jhorwit2 @rramkumar1
/assign @m1093782566
Please take a look, thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from rramkumar1 Jul 10, 2018

@Lion-Wei Lion-Wei changed the title from ipvs support graceful termination to [wip]ipvs support graceful termination Jul 10, 2018

@jsravn

This comment has been minimized.

Show comment
Hide comment
@jsravn

jsravn Jul 10, 2018

Contributor

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout? This should be available via the netfilter interface as active connections. When it reaches 0, it should be safe to remove the real server.

Contributor

jsravn commented Jul 10, 2018

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout? This should be available via the netfilter interface as active connections. When it reaches 0, it should be safe to remove the real server.

@jsravn

This comment has been minimized.

Show comment
Hide comment
@jsravn

jsravn Jul 10, 2018

Contributor

Previously discussed at #64947 (comment).

Contributor

jsravn commented Jul 10, 2018

Previously discussed at #64947 (comment).

Show outdated Hide outdated pkg/proxy/ipvs/proxier.go Outdated
// Delete old endpoints
for _, ep := range curEndpoints.Difference(newEndpoints).UnsortedList() {
// if curEndpoint is in gracefulDelete, skip
uniqueRS := vs.String() + "/" + ep

This comment has been minimized.

@jsravn

jsravn Jul 10, 2018

Contributor

I think this identifier should either be created in graceful_delete.go (pass in vs and rs instead of the string), or defined higher up so it's not duplicated with line 1503.

@jsravn

jsravn Jul 10, 2018

Contributor

I think this identifier should either be created in graceful_delete.go (pass in vs and rs instead of the string), or defined higher up so it's not duplicated with line 1503.

@m1093782566

This comment has been minimized.

Show comment
Hide comment
@m1093782566

m1093782566 Jul 10, 2018

Member

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout?

It would be ideal if there is a way to detect if the connection is active.

Member

m1093782566 commented Jul 10, 2018

@Lion-Wei Wouldn't it be better to check for active connections, rather than using an arbitrary timeout?

It would be ideal if there is a way to detect if the connection is active.

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Jul 10, 2018

Contributor

@jsravn @m1093782566 I think check active connections can be better. But I was worry about the reliable issue. Is no active connections enough to prove that rs can be safely removed? And is other connect status matters, e.g. FIN_WAIT/TIME_WAIT...

And we need a new IPVS interface in pkg/util/ipvs to get realserver active connection count.

Contributor

Lion-Wei commented Jul 10, 2018

@jsravn @m1093782566 I think check active connections can be better. But I was worry about the reliable issue. Is no active connections enough to prove that rs can be safely removed? And is other connect status matters, e.g. FIN_WAIT/TIME_WAIT...

And we need a new IPVS interface in pkg/util/ipvs to get realserver active connection count.

@jsravn

This comment has been minimized.

Show comment
Hide comment
@jsravn

jsravn Jul 10, 2018

Contributor

@Lion-Wei For TCP connections, active count is the correct thing to use. If it is in FIN_WAIT/TIME_WAIT then it's been closed by one side so I think it's fine to close the connection at this point (after which, for FIN_WAIT at least, the one side will get a reset if they try to send packets).

UDP is trickier. I think UDP only increments "InActCount", I would need to test this to be sure. So for UDP you would have to wait for InActCount to drop to 0.

You can see here how ipvsadm retrieves those values, I guess it might require modifying the ipvs util package if it doesn't have it yet: https://git.kernel.org/pub/scm/utils/kernel/ipvsadm/ipvsadm.git/tree/libipvs/libipvs.c#n893.

Contributor

jsravn commented Jul 10, 2018

@Lion-Wei For TCP connections, active count is the correct thing to use. If it is in FIN_WAIT/TIME_WAIT then it's been closed by one side so I think it's fine to close the connection at this point (after which, for FIN_WAIT at least, the one side will get a reset if they try to send packets).

UDP is trickier. I think UDP only increments "InActCount", I would need to test this to be sure. So for UDP you would have to wait for InActCount to drop to 0.

You can see here how ipvsadm retrieves those values, I guess it might require modifying the ipvs util package if it doesn't have it yet: https://git.kernel.org/pub/scm/utils/kernel/ipvsadm/ipvsadm.git/tree/libipvs/libipvs.c#n893.

@rramkumar1

First round of comments.

One observation: I see that the iptables implementation (#60074) takes in a command line arg which dictates the termination delay. How come we are not doing that here?

Also, that implementation spawns a goroutine for each endpoint, whereas you have the priority queue implementation. What was the reasoning for the queue?

Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
Show outdated Hide outdated pkg/proxy/ipvs/graceful_delete.go Outdated
@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Jul 19, 2018

Contributor

@jsravn , I think use active connections to determine whether rs can be delete is a terrific idea, But I think this work gonna take some time, cause the vendor repo docker/libnetwork maybe need talk about it, and discuss whether this is necessary.
Anyway, I raised an issue in docker/libnetwork. docker/libnetwork#2237

And I think with a graceful time is another solution, thought not the best. Maybe we can let this in to solve the problem first, and after docker/libnetwork side work finished, then use the better way.

@jhorwit2 @m1093782566 @rramkumar1 Thoughts?

Contributor

Lion-Wei commented Jul 19, 2018

@jsravn , I think use active connections to determine whether rs can be delete is a terrific idea, But I think this work gonna take some time, cause the vendor repo docker/libnetwork maybe need talk about it, and discuss whether this is necessary.
Anyway, I raised an issue in docker/libnetwork. docker/libnetwork#2237

And I think with a graceful time is another solution, thought not the best. Maybe we can let this in to solve the problem first, and after docker/libnetwork side work finished, then use the better way.

@jhorwit2 @m1093782566 @rramkumar1 Thoughts?

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Jul 19, 2018

Contributor

@rramkumar1 Recently I was busy in other work, sorry for the delay response.
Firstly, I don't think we need a parameter for termination delay time, cause users should not worry about it.
And I don't think each rs have an goroutine is a good idea, cause that could potentially cause a lot of goroutines in a large cluster with a lot of pod churn.

Also, I'm little hesitate about the struct, if we use a fifo queue, then we don't have to traversing all "grace terminating endpoint" to decide which should be delete, in large cluster we might have a lot of rs in "grace termination".

Contributor

Lion-Wei commented Jul 19, 2018

@rramkumar1 Recently I was busy in other work, sorry for the delay response.
Firstly, I don't think we need a parameter for termination delay time, cause users should not worry about it.
And I don't think each rs have an goroutine is a good idea, cause that could potentially cause a lot of goroutines in a large cluster with a lot of pod churn.

Also, I'm little hesitate about the struct, if we use a fifo queue, then we don't have to traversing all "grace terminating endpoint" to decide which should be delete, in large cluster we might have a lot of rs in "grace termination".

@jsravn

This comment has been minimized.

Show comment
Hide comment
@jsravn

jsravn Jul 19, 2018

Contributor

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

Contributor

jsravn commented Jul 19, 2018

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

@m1093782566

This comment has been minimized.

Show comment
Hide comment
@m1093782566

m1093782566 Jul 20, 2018

Member

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

We should be a bit careful about creating new API/flags, especially the new timeout flag would be deprecated very soon when/if we add connection detection support. I am not sure how urgent it is but configure timeout and detect connections they are different behaviors, we need to know the effects to users.

Member

m1093782566 commented Jul 20, 2018

@Lion-Wei I think that's okay. I agree it should be a configurable timeout, some people might want to disable it or change the timeout (thinking of the bug that led to UDP connections being aggressively flushed in the iptables proxier).

We should be a bit careful about creating new API/flags, especially the new timeout flag would be deprecated very soon when/if we add connection detection support. I am not sure how urgent it is but configure timeout and detect connections they are different behaviors, we need to know the effects to users.

@Lion-Wei Lion-Wei changed the title from [wip]ipvs support graceful termination to ipvs support graceful termination Jul 20, 2018

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Jul 23, 2018

Contributor

Quick update, turnout docker/libnetwork have the same problem, and they are using the same way to solve it.

Once a service is disabled we are down weighting it to 0 so that new connection are not going to be LB on that, and we are removing the ipvs rule once the container exit. The time between the weight to 0 and the removal of the ipvs rule is managed by the graceful shutdown period.
docker/libnetwork#2237 (comment)

Contributor

Lion-Wei commented Jul 23, 2018

Quick update, turnout docker/libnetwork have the same problem, and they are using the same way to solve it.

Once a service is disabled we are down weighting it to 0 so that new connection are not going to be LB on that, and we are removing the ipvs rule once the container exit. The time between the weight to 0 and the removal of the ipvs rule is managed by the graceful shutdown period.
docker/libnetwork#2237 (comment)

@akumria

This comment has been minimized.

Show comment
Hide comment
@akumria

akumria Aug 30, 2018

Is anything left here, except for a rebase so the conflict is resolved?

akumria commented Aug 30, 2018

Is anything left here, except for a rebase so the conflict is resolved?

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 4, 2018

Contributor

@akumria That depends, if we need this feature urgently, then I say we merge this pr for temporary solution, and send another pr for connection based termination.

Contributor

Lion-Wei commented Sep 4, 2018

@akumria That depends, if we need this feature urgently, then I say we merge this pr for temporary solution, and send another pr for connection based termination.

@m1093782566

This comment has been minimized.

Show comment
Hide comment
@m1093782566

m1093782566 Sep 4, 2018

Member

@Lion-Wei

Thanks you for all the great efforts you have made - including docker/libnetwork contribution.

I believe connection based termination for IPVS can make the implementation simpler and easily review. What do you think about it?

Member

m1093782566 commented Sep 4, 2018

@Lion-Wei

Thanks you for all the great efforts you have made - including docker/libnetwork contribution.

I believe connection based termination for IPVS can make the implementation simpler and easily review. What do you think about it?

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 4, 2018

Contributor

@m1093782566 Definitely make sense, either way, we'll got there finally.

Contributor

Lion-Wei commented Sep 4, 2018

@m1093782566 Definitely make sense, either way, we'll got there finally.

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 5, 2018

Contributor

/test pull-kubernetes-e2e-gce

Contributor

Lion-Wei commented Sep 5, 2018

/test pull-kubernetes-e2e-gce

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 5, 2018

Contributor

/assign @thockin
Please take a look when you got a chance, thanks.

Contributor

Lion-Wei commented Sep 5, 2018

/assign @thockin
Please take a look when you got a chance, thanks.

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 5, 2018

Contributor

@akumria @jsravn @lbernail @m1093782566 , Connection based graceful termination is finished, please take another look.

Contributor

Lion-Wei commented Sep 5, 2018

@akumria @jsravn @lbernail @m1093782566 , Connection based graceful termination is finished, please take another look.

@lbernail

This comment has been minimized.

Show comment
Hide comment
@lbernail

lbernail Sep 6, 2018

Contributor

@Lion-Wei : this looks good. I think we'll test it as soon as it makes it to an alpha release
Do you think this can make it to the 1.11 branch?

Contributor

lbernail commented Sep 6, 2018

@Lion-Wei : this looks good. I think we'll test it as soon as it makes it to an alpha release
Do you think this can make it to the 1.11 branch?

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 20, 2018

Contributor

@lbernail I think this should be cherry-pick to 1.11, that's necessary.
But first we need to get this merged. : (

Contributor

Lion-Wei commented Sep 20, 2018

@lbernail I think this should be cherry-pick to 1.11, that's necessary.
But first we need to get this merged. : (

@m1093782566

This comment has been minimized.

Show comment
Hide comment
@m1093782566

m1093782566 Sep 27, 2018

Member

/approve

Member

m1093782566 commented Sep 27, 2018

/approve

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 28, 2018

Contributor

Thanks, @m1093782566 .
@jsravn @lbernail @rramkumar1 Can anyone help me add a lgtm, then we can get this merged. 😃

Contributor

Lion-Wei commented Sep 28, 2018

Thanks, @m1093782566 .
@jsravn @lbernail @rramkumar1 Can anyone help me add a lgtm, then we can get this merged. 😃

@m1093782566

This comment has been minimized.

Show comment
Hide comment
@m1093782566

m1093782566 Sep 28, 2018

Member

/lgtm

Member

m1093782566 commented Sep 28, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 28, 2018

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Sep 28, 2018

Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lion-Wei, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Contributor

k8s-ci-robot commented Sep 28, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lion-Wei, m1093782566

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Sep 28, 2018

Contributor

/test pull-kubernetes-integration

Contributor

Lion-Wei commented Sep 28, 2018

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 8ea6b2c into kubernetes:master Sep 28, 2018

18 checks passed

cla/linuxfoundation Lion-Wei authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@zachaller

This comment has been minimized.

Show comment
Hide comment
@zachaller

zachaller Oct 8, 2018

Contributor

Will this make it into a 1.12.x release? Along with #69267

Contributor

zachaller commented Oct 8, 2018

Will this make it into a 1.12.x release? Along with #69267

@Lion-Wei

This comment has been minimized.

Show comment
Hide comment
@Lion-Wei

Lion-Wei Oct 8, 2018

Contributor

@zachaller Yeah, I think so.

Contributor

Lion-Wei commented Oct 8, 2018

@zachaller Yeah, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment