Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Service Load Balancer finalizer support #78262

Merged
merged 4 commits into from Jun 1, 2019

Conversation

@MrHohn
Copy link
Member

commented May 23, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
KEP link: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20190423-service-lb-finalizer.md

This PR adds finalizer protection for service LoadBalancers. It defines an alpha feature gate for the finalizer addition part while the removal part is always on.

Note that is is largely based off of #54569 and #65912. While one significant difference is that I haven't removed the cached service layer given the complexity and risks of hosting both pre-finalizer and with-finalizer control flow logic. Keeping the cache layer makes this easier.

End-to-end tests is being added in #78410.

Which issue(s) this PR fixes:

Fixes #53451. Also ref kubernetes/enhancements#980 and kubernetes/cloud-provider#16.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Finalizer Protection for Service LoadBalancers is now added as Alpha (disabled by default). This feature ensures the Service resource is not fully deleted until the correlating load balancer resources are deleted.
@MrHohn

This comment has been minimized.

Copy link
Member Author

commented May 23, 2019

/test all

@MrHohn MrHohn force-pushed the MrHohn:svc-finalizer-cleanup2 branch 5 times, most recently from b1deee2 to 8cc9c3b May 24, 2019

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels May 24, 2019

@MrHohn MrHohn force-pushed the MrHohn:svc-finalizer-cleanup2 branch from 8cc9c3b to c195103 May 24, 2019

@MrHohn

This comment has been minimized.

Copy link
Member Author

commented May 24, 2019

So I'm having trouble with unit testing:

--- FAIL: TestSyncLoadBalancerIfNeeded (0.06s)
    --- FAIL: TestSyncLoadBalancerIfNeeded/service_with_finalizer_that_no_longer_wants_LB (0.00s)
        service_controller_test.go:319: Service hasFinalizer=true, want false
    --- FAIL: TestSyncLoadBalancerIfNeeded/service_that_needs_cleanup (0.00s)
        service_controller_test.go:319: Service hasFinalizer=true, want false
--- FAIL: TestPatchFinalizer (0.00s)
    --- FAIL: TestPatchFinalizer/remove_finalizer (0.00s)
        service_controller_test.go:1136: Service hasFinalizer = true, want false
--- FAIL: TestPatchStatus (0.00s)
    --- FAIL: TestPatchStatus/clear_status (0.00s)
        service_controller_test.go:1214: Got status {[{8.8.8.8 }]}, want &LoadBalancerStatus{Ingress:[],}
FAIL

It seems like the fake kube client doesn't properly support directive delete for patch operation, hence all the test cases that require removing fields failed. As I can confirm the generated patch bytes indeed contain directive:

{"metadata":{"$deleteFromPrimitiveList/finalizers":["service.kubernetes.io/load-balancer-cleanup"],"$setElementOrder/finalizers":["bar"],"finalizers":["bar"]}}

My current plan is fixing that fake kube client.

@MrHohn MrHohn force-pushed the MrHohn:svc-finalizer-cleanup2 branch from c195103 to 63f0876 May 24, 2019

@thockin
Copy link
Member

left a comment

Gate is OK

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, MrHohn, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// syncLoadBalancerIfNeeded ensures that service's status is synced up with loadbalancer
// i.e. creates loadbalancer for service if requested and deletes loadbalancer if the service
// doesn't want a loadbalancer no more. Returns whatever error occurred.
func (s *ServiceController) syncLoadBalancerIfNeeded(key string, service *v1.Service) error {
func (s *ServiceController) syncLoadBalancerIfNeeded(service *v1.Service, key string) (loadBalancerOperation, error) {
// Note: It is safe to just call EnsureLoadBalancer. But, on some clouds that requires a delete & create,

This comment has been minimized.

Copy link
@tedyu

tedyu May 30, 2019

Contributor

It seems EnsureLoadBalancer should be written as ensureLoadBalancer

This comment has been minimized.

Copy link
@MrHohn

MrHohn May 30, 2019

Author Member

Thanks for the input. So this is a pretty old comment and I think it was referring to s.balancer.EnsureLoadBalancer(). But yeah I think we should later do a cleanup on the comment that no longer stands/fits.

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 31, 2019

/priority important-soon

MrHohn and others added some commits May 23, 2019

Add Load Balancer finalizer support
- Always try to remove finalizer upon load balancer cleanup
- Add finalizer prior to load balancer creation (feature gated)
- Cache logic fix-ups
- Event type/message fix-ups
- Use runtime.HandleError() on eaten errors

Co-authored-by: Josh Horwitz <horwitzja@gmail.com>

@MrHohn MrHohn force-pushed the MrHohn:svc-finalizer-cleanup2 branch from 0bcc94a to 64198a4 May 31, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 31, 2019

@MrHohn

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

@andrewsykim Sorry, I rebased to fix the merge conflict in kube_features.go. Would be great to have a re-lgtm, thanks!

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 31, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 31, 2019

@MrHohn

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit bc32307 into kubernetes:master Jun 1, 2019

21 checks passed

cla/linuxfoundation MrHohn authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@timoreimann

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

@MrHohn apologies for commenting on this merged PR with a late thought:

Is there a need to consider the edge case where the type is changed from LoadBalancer, the LB resource is about to be cleaned up, and while that process is still ongoing (possibly delayed/retried due to API connectivity issues or other events), the type is being switched back to LoadBalancer again? Would we then cancel the deletion efforts (or delete and recreate in sequence)?

It didn't become immediately clear to me by looking at the change, but my familiarity with the code is limited.

@bowei

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

While the object is in a finalization state, I would imagine changing the fields in the object should not have any effect. In other words, the user would have to wait for the finalizers to be done processing and the object has been fully removed before recreating the object.

@MrHohn

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2019

@timoreimann No worries at all and thanks for giving thoughts on this.

For the case you mentioned (type changed from LoadBalancer then back), there will be two outcomes depends on the timeline.

The first outcome is "Delete and Recreate". It happens if service controller starts processing the type:LoadBalancer->other update before it received the type:other->LoadBalancer update.

The second outcome is "No-op or Reconciling". It happens if service controller hasn't got to process the type:LoadBalancer->other update before the type:other->LoadBalancer update comes. So service controller processes both two updates together. In this case service controller simply calls EnsureLoadBalancer() again. It can either be a no-op if nothing changed, or a reconciliation if there is other valid update.

@MrHohn

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2019

While the object is in a finalization state, I would imagine changing the fields in the object should not have any effect.

Note that in this case, the behavior in fact stays the same as before this finalizer support. As the LB deletion action is triggered by a type change update, instead of an object deletion with finalizer.

But yes, for the cases where the object is in a finalization state, update won't have any effect.

@timoreimann

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

@MrHohn @bowei thanks for explaining -- happy to hear we're able to cope with the described scenario just fine. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.