Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ipvs_svc deletion #80942

Merged
merged 1 commit into from Aug 13, 2019

Conversation

@gongguan
Copy link
Contributor

commented Aug 3, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
In ipvs mode, when service deleted but rs(pod) exist, related ipvs rule and address bound in ipvs0 will not be deleted.
I think it's unreasonable that kubernetes-service deleted but the service ip is available(ipvs rule not deleted). When you create same service again, new virtual service(totally two ipvs virtual service) will proxy to the same rs, too. And the deleted service ip will exist on both ipvs rule and ipvs0 device.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fix a bug in the IPVS proxier where virtual servers are not cleaned up even though the corresponding Service object was deleted.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2019

Hi @gongguan. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug and removed needs-kind labels Aug 3, 2019

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

/sig network
/area ipvs

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

another question:
graceful termination can be deleted too?
As service deleted, can ipvs virtual server be deleted directly, then unbind svc addr in dev ipvs0

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

there is a logical error in current code:
graceful termination will never executed

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 4, 2019

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

/ok-to-test

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

/priority important-soon

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

Can you update the release notes please

Done

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 10, 2019

/approve

Leaving final lgtm to @lbernail. Thanks @gongguan!

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, gongguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

This seems completely reasonable but I think it will introduce a behavior that is different from iptables. In iptables, removing a service will delete the iptables rules but I don't think it will purge the conntrack which will allow existing connections to terminate gracefully. I'm not sure what the expected behavior is.

I also expected a service deletion to trigger a deletion of the matching endpoint resource which should trigger graceful termination. I'm going to investigate why this is not happening. In any case the current behavior is definitely a bug.

Sorry for the delay I haven't had much time lately

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

OK I confirm that service deletion doesn't trigger a deletion of the real servers which I assumed would have been triggered here: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L768-L770

I'm going to debug this with a custom build

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Also, it would be great to add a test case for this so we never have a regression (this is a pretty bad bug...)

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

I also expected a service deletion to trigger a deletion of the matching endpoint resource which should trigger graceful termination. I

It does, but for TCP it takes 15m for those endpoints to be gracefully terminated and so the virtual server doesn't delete for 15m as well. So I think either we delete the virtual server right away (this PR) or we decrease the graceful termination timeout to a lower value.

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

@andrewsykim in the test I did, gracefultermination was not triggered at all (and weight remains at 1) so I suspect something is wrong in the handler

Anyway, if we agree we can remove services without gracefultermination this is not important

The PR looks good. @gongguan can you add a test for this specific scenario? (by fixing the test in TestCleanLegacyService)

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

@andrewsykim OK I think I know what happens: syncProxyRules is called for service deletion and endpoint deletion, but the reconcile loop only considers active services: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L902

So the only way to trigger gracefulTermination for real servers on service deletion would be in cleanLegacyService

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

Thanks for clarifying @lbernail!

Anyway, if we agree we can remove services without gracefultermination this is not important

IMO if a Service is deleted there's no reason to try graceful termination. I will ask some folks in SIG Network though to confirm if this shouldn't be the case.

The PR looks good. @gongguan can you add a test for this specific scenario?

Added a test case for this here #81309, PTAL :)

/lgtm

Thanks @gongguan!

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 12, 2019

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Thanks for adding the test!
We'll need to backport this to all supported branches

@lbernail

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

I just remember the context behind the graceful termination behavior: #71894
TL;DR we did not have expirenodestconn before which means when we removed a VS (and therefore RS) on service deletion, established connections were not terminated and their traffic was being blackholed.

Because we now set expirenodestconn, established connections will be terminated as soon as a packet is sent

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

/retest

@gongguan

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

Added a test case for this here #81309, PTAL :)

Thanks for @andrewsykim 's test. By the way, what else need to be done for me?

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

Not much from here, PR should be merged by the bot shortly :). Thanks @gongguan!

@k8s-ci-robot k8s-ci-robot merged commit 12a085f into kubernetes:master Aug 13, 2019

23 checks passed

cla/linuxfoundation gongguan authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 13, 2019

k8s-ci-robot added a commit that referenced this pull request Aug 20, 2019
Merge pull request #81483 from andrewsykim/automated-cherry-pick-of-#…
…80942-origin-release-1.15

Automated cherry pick of #80942: Fix a bug in the IPVS proxier where virtual servers are not
k8s-ci-robot added a commit that referenced this pull request Aug 20, 2019
Merge pull request #81482 from andrewsykim/automated-cherry-pick-of-#…
…80942-origin-release-1.14

Automated cherry pick of #80942: Fix a bug in the IPVS proxier where virtual servers are not
k8s-ci-robot added a commit that referenced this pull request Aug 20, 2019
Merge pull request #81481 from andrewsykim/automated-cherry-pick-of-#…
…80942-origin-release-1.13

Automated cherry pick of #80942: Fix a bug in the IPVS proxier where virtual servers are not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.