Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Potential Race Condition in EndpointSlice Controller. #85703

Merged

Conversation

@robscott
Copy link
Member

robscott commented Nov 27, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
This adds a new EndpointSlice tracker to keep track of the expected resource versions of EndpointSlices associated with each Service managed by the EndpointSlice controller. This should prevent a potential race where a syncService call could happen with an incomplete view of EndpointSlices if additions or deletions hadn't fully propagated to the cache yet. Additionally, this ensures that external changes to EndpointSlices will be handled by the EndpointSlice controller.

Which issue(s) this PR fixes:
Fixes #85353

Does this PR introduce a user-facing change?:

Fix EndpointSlice controller race condition and ensure that it handles external changes to EndpointSlices.

/sig network
/priority important-soon
/cc @liggitt @bowei

@k8s-ci-robot k8s-ci-robot requested review from bowei and liggitt Nov 27, 2019
@robscott robscott force-pushed the robscott:endpointslice-controller-race-fix branch 2 times, most recently from 725d634 to c57dc16 Nov 27, 2019
prevEndpointSlice := obj.(*discovery.EndpointSlice)
endpointSlice := obj.(*discovery.EndpointSlice)
if managedByChanged(prevEndpointSlice, endpointSlice) || (managedByController(endpointSlice) && !c.endpointSliceTracker.UpToDate(endpointSlice)) {
c.queueServiceForEndpointSlice(endpointSlice)

This comment has been minimized.

Copy link
@liggitt

liggitt Dec 16, 2019

Member

if either the old or new endpoint slices were managed by us, and the service it references changed, don't we need to queue one (or maybe both) of the referenced services? maybe the UpToDate check covers this case... it's a little dense to think through

This comment has been minimized.

Copy link
@robscott

robscott Dec 16, 2019

Author Member

That's not a use case I'd planned for. The controller itself will never change the Service an EndpointSlice belongs to. Of course this could happen if someone were intentionally messing with EndpointSlices, but they could also break other things as well. With that said, the UpToDate check would fail if the EndpointSlice service had been changed since it would not expect that EndpointSlice name or resource version for the given Service.

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Dec 18, 2019

For better visibility:
/assign @bowei @liggitt


// queueServiceForEndpointSlice attempts to queue the corresponding Service for
// the provided EndpointSlice.
func (c *Controller) queueServiceForEndpointSlice(endpointSlice *discovery.EndpointSlice) {

This comment has been minimized.

Copy link
@bowei

bowei Dec 24, 2019

Member

...ForEndpointSlice seems redundant?

This comment has been minimized.

Copy link
@robscott

robscott Dec 24, 2019

Author Member

Yeah, I was trying to differentiate between what we're already doing for Pods where all related Services get queued up when a Pod changes. Service is potentially redundant here as the only queue referenced in this controller is the Service queue. I just tend to favor verbose function names so there's no question what is happening.

This comment has been minimized.

Copy link
@robscott

robscott Dec 26, 2019

Author Member

I haven't really thought of a better name here. Open to alternatives. I think the ForEndpointSlice provides useful context/differentiation here, but I can still remove that if you'd prefer.

pkg/controller/endpointslice/endpointslice_tracker.go Outdated Show resolved Hide resolved
@robscott robscott force-pushed the robscott:endpointslice-controller-race-fix branch 2 times, most recently from 3e7615e to 82c0611 Dec 24, 2019
This adds a new EndpointSlice tracker to keep track of the expected resource versions of EndpointSlices associated with each Service managed by the EndpointSlice controller. This should prevent a potential race where a syncService call could happen with an incomplete view of EndpointSlices if additions or deletions hadn't fully propagated to the cache yet. Additionally, this ensures that external changes to EndpointSlices will be handled by the EndpointSlice controller.
@robscott robscott force-pushed the robscott:endpointslice-controller-race-fix branch from 82c0611 to c75787b Dec 26, 2019
@bowei

This comment has been minimized.

Copy link
Member

bowei commented Dec 26, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 26, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 26, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Dec 26, 2019

/retest

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Dec 26, 2019

/retest

1 similar comment
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Dec 26, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit d4ba10e into kubernetes:master Dec 27, 2019
14 of 15 checks passed
14 of 15 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation robscott authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.