Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Establishing Controller to avoid race between Established condition and CRs actually served #63068

Merged
merged 1 commit into from May 29, 2018

Conversation

@xmudrii
Copy link
Member

commented Apr 24, 2018

What this PR does / why we need it: If you create CR shorty after CRD, it can happen that it returns error that CRD doesn't exists, even if it exists and is Established. This implements the Establishing Controller, is used to Establish CRD once we're sure it's ready and CRs are served. For more details, check issue #62725.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #62725

Add Establishing Controller on CRDs to avoid race between Established condition and CRs actually served. In HA setups, the Established condition is delayed by 5 seconds.

/sig api-machinery
/area custom-resources
/cc sttts deads2k

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

@xmudrii: Adding do-not-merge/release-note-label-needed because the release note process has not been followed.

One of the following labels is required "release-note", "release-note-action-required", or "release-note-none".
Please see: https://git.k8s.io/community/contributors/devel/pull-requests.md#write-release-notes-if-needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

// So, this is called when CRD is updated. That includes creation, but termination as well.
// Probably some logic like in the naming controller, but only to fire state change
// if CRD is creating for the first time.
if !apiextensions.IsCRDConditionTrue(newCRD, apiextensions.Established) {

This comment has been minimized.

Copy link
@sttts

sttts Apr 24, 2018

Contributor

&& IsCRDConditationTrue(newCRD, Installing)

apiextensions.SetCRDCondition(newCRD, establishedCondition)

// This is used to updated the server. The above `SetCRDCondition` only sets in storage.
//crdClient, err := internalclientset.NewForConfig(s.GenericAPIServer.LoopbackClientConfig)

This comment has been minimized.

Copy link
@sttts

sttts Apr 24, 2018

Contributor

this client should be created once outside of the crd handler and then passed to NewCustomResourceDefinitionHandler.

//}
}
// Update server with new CRD condition.
_, err := r.client.Apiextensions().CustomResourceDefinitions().UpdateStatus(newCRD)

This comment has been minimized.

Copy link
@sttts

sttts Apr 25, 2018

Contributor

this needs a retry loop for conflict errors (if there are multiple servers changing the CRD, these errors are normal), actually around the whole if clause.


// Is something like this required? As we have already updated the local copy
// I think nothing is needed at this point.
//newObj = crd

This comment has been minimized.

Copy link
@sttts

sttts Apr 25, 2018

Contributor

I don't think it matters much. If there is a change actually done, we will get another update (i.e. call here).

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Apr 25, 2018

@@ -119,6 +122,7 @@ func NewCustomResourceDefinitionHandler(
delegate: delegate,
restOptionsGetter: restOptionsGetter,
admission: admission,
ec: establishingController,

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

gofmt

@@ -290,6 +295,8 @@ func (r *crdHandler) updateCustomResourceDefinition(oldObj, newObj interface{})
r.customStorageLock.Lock()
defer r.customStorageLock.Unlock()

r.ec.queue.AddAfter(newCRD.Name, 1 * time.Millisecond)

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

is there a difference to a zero delay?

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

and here we would need the case distinction: if apiserver-count > 1, then use a higher number. And we need a good comment why this is necessary (about the race).

This comment has been minimized.

Copy link
@xmudrii

xmudrii Apr 27, 2018

Author Member

This is still not addressed (apiserver-count part), but I've updated to use zero delay. No difference at all, it works and is more intuitive.

glog.Infof("Starting NamingConditionController")
defer glog.Infof("Shutting down NamingConditionController")

// only start one worker thread since its a slow moving API and the naming conflict resolution bits aren't thread-safe

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

comment does not match

defer utilruntime.HandleCrash()
defer ec.queue.ShutDown()

glog.Infof("Starting NamingConditionController")

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

update log string

}

// EstablishCRD turns CRD into established and returns
// is it successful or error

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

grammar

defer ec.queue.Done(key)
crd, _ := ec.client.Apiextensions().CustomResourceDefinitions().Get(key.(string), v1.GetOptions{})

// Poll because if multiple servers try to change CRD (e.g. in HA) this can fail.

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

fail with conflict error


// Poll because if multiple servers try to change CRD (e.g. in HA) this can fail.
// That failure is normal, and so, we're trying to set it to Established again.
err := wait.Poll(500 * time.Millisecond, 3 * time.Second, func() (bool, error) {

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

what do other controllers do in the error case? re-queue the key?

This comment has been minimized.

Copy link
@xmudrii

xmudrii Apr 27, 2018

Author Member

In case of naming controller, yes.

var err error
_, err = ec.client.Apiextensions().CustomResourceDefinitions().UpdateStatus(crd)
if err != nil {
return false, err

This comment has been minimized.

Copy link
@sttts

sttts Apr 26, 2018

Contributor

this is not enough: imagine another server updates the crd meanwhile. Then the CRD's meta.ResourceVersion of your instance here will have an old value. The apiserver will reply with a conflict error for that that reason (failed optimistic concurrency). So you have to get the latest version of the CRD and try again with your update.

In other words: the current code (the wait.Poll) will always use the old crd instance. If the update fails with a conflict, the current code will fail again and again (until 3 seconds are up).

If you just stop the whole updating attemp instead, the update of the CRD will arrive at the informer eventually and your controller will get another chance to update. So I guess we don't even have to requeue. All is automatic. Compare the naming controller if that uses this logic as well.

This comment has been minimized.

Copy link
@xmudrii

xmudrii Apr 27, 2018

Author Member

That totally makes sense. Thank for pointing this out.

The requeue is not needed at this point, but I've added it later. I've updated this code to get new CRD in every Poll's iteration/try. That should be better I think.

The questionable part is: if establishCRD() fails, should I requeue the key and after how long? In this review of EstablishingController it requeues it after the failure in the proccessNextWorkItem function.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Apr 26, 2018

/assign @yliaog

@xmudrii xmudrii force-pushed the xmudrii:fix-62725 branch 2 times, most recently from 5b4f7d5 to f48e679 Apr 27, 2018

@xmudrii xmudrii changed the title [WIP] Implement Installing status for CRD to fix #62725 Implement Installing status for CRD to fix #62725 Apr 27, 2018

"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/util/workqueue"

This comment has been minimized.

Copy link
@sttts

sttts May 29, 2018

Contributor

nit: move "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions" also into this block

This comment has been minimized.

Copy link
@xmudrii

xmudrii May 29, 2018

Author Member

done

@@ -191,7 +192,7 @@ func (c *NamingConditionController) calculateNamesAndConditions(in *apiextension
namesAcceptedCondition.Message = "no conflicts found"
}

// set EstablishedCondition to true if all names are accepted. Never set it back to false.
// set EstablishedCondition initially to false, then set it to true in establishing controller.

This comment has been minimized.

Copy link
@sttts

sttts May 29, 2018

Contributor

maybe add the why: The establishing controller will see the NamesAccepted condition when it arrives through the shared informer. At that time the API endpoint handler will serve the endpoint, avoiding a race which we had if we set Established to true here.

This comment has been minimized.

Copy link
@xmudrii

xmudrii May 29, 2018

Author Member

done

@xmudrii xmudrii force-pushed the xmudrii:fix-62725 branch from 6c65b2f to e5b415d May 29, 2018

@xmudrii

This comment has been minimized.

Copy link
Member Author

commented May 29, 2018

Addressed the latest code review

@sttts

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

/lgtm
/approve

apiextensions-apiserver: add establishing controller to avoid race be…
…tween established and CRs actually served

@xmudrii xmudrii force-pushed the xmudrii:fix-62725 branch from e5b415d to 2bf66c3 May 29, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 29, 2018

@xmudrii

This comment has been minimized.

Copy link
Member Author

commented May 29, 2018

Updated go-fmt. Need relabeling again

@sttts

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 29, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@xmudrii

This comment has been minimized.

Copy link
Member Author

commented May 29, 2018

/test pull-kubernetes-e2e-kops-aws

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

Automatic merge from submit-queue (batch tested with PRs 64258, 63068). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 8ad67d3 into kubernetes:master May 29, 2018

15 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-kops-aws
Details
pull-kubernetes-e2e-kops-aws Job triggered.
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation xmudrii authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@xmudrii xmudrii deleted the xmudrii:fix-62725 branch May 29, 2018

crdHandler := NewCustomResourceDefinitionHandler(
versionDiscoveryHandler,
groupDiscoveryHandler,
s.Informers.Apiextensions().InternalVersion().CustomResourceDefinitions(),
delegateHandler,
c.ExtraConfig.CRDRESTOptionsGetter,
c.GenericConfig.AdmissionControl,
establishingController,
c.ExtraConfig.MasterCount,

This comment has been minimized.

Copy link
@liggitt

liggitt Jun 11, 2018

Member

this only applies when the master-count endpoint reconciler is used. that reconciler is no longer the default and is deprecated in 1.11 due to its inability to remove dead apiservers (#63383)

This comment has been minimized.

Copy link
@sttts

sttts Jun 11, 2018

Contributor

In that case in HA we are not much better than before, but certainly not worse. We can switch to a default timeout by default. Longterm we need some checkpointing mechanism to really solve this.

With the new lease type, is there a similarly simple way to detect HA?

This comment has been minimized.

Copy link
@liggitt

liggitt Jun 11, 2018

Member

With the new lease type, is there a similarly simple way to detect HA?

@rphillips ^

@xmudrii

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2018

@lavalamp

This comment has been minimized.

Copy link
Member

commented Jun 11, 2018

We need to solve this problem for a number of items, I am not sure I wanted a specific solution for just this problem, especially not if it uses the master count mechanism. I wasn't able to read this and it looks like it merged before @yliaog had a chance to review, also. @sttts for this kind of change I'd like to have either deads2k or myself sign off--adding coordination mechanisms between apiservers is kind of a big deal.

@xmudrii

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2018

@lavalamp I would love to help with this if I can. I'm absolutely aware this is not going to be an easy task, and that there's a lot of work behind this, but I would definitely love to help around.

Also, if I can hotfix this for 1.11, I'm willing to do that as soon as possible. What comes to my mind, as @sttts mentioned above, having some predefined timeout could work. The only negative downside here I see is that non-HA users can be annoyed, but there should be no other downsides.

Let's say start with 2-3 seconds at max, or even 5, and let's see how it is going to work. Not a big delay, will not cause any downsides, beside maybe annoyance to some users. But could be good as a temporary hotfix.

What I see as a possible reason for focusing on CRDs is that mostly CRDs are affected by this race. I did some research, and maybe you can find details in #57042 helpful.

I had reproduced some of the stuff mentioned in that issue, such as <invalid> age for pods as well, but we never managed to connect this race with the <invalid> age at all, so I would leave the age <invalid> problem a side for now.

Issue #63656 can be relevant to this as well.

This definitely needs a better fix, but we hoped this is going to help at least CRDs, as we think they're mostly affected by this.

If I can help with anything, let me know. Maybe it would be best to discuss this on the SIG call, but it's next week and I'm not sure is it going to be late, if there something to be fixed before 1.11 release.

@sttts

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2018

@lavalamp the actual change here was not the coordination mechanism, but a race that even showed up in single master setup. The idea with the seconds was actually proposed by @deads2k. I agree that this deserves a larger discussion, and duct-tape is not the right solution.

I would like to see this coordination topic somewhere in our roadmap, or even block further dynamic configuration mechanism until it is solved (audit is just around the corner; componentconfigs become more and more a thing). /cc @kubernetes/sig-architecture-misc-use-only-as-a-last-resort

@lavalamp

This comment has been minimized.

Copy link
Member

commented Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.