New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiextensions apiserver: update storage version for custom resources #96403
Conversation
f77616e
to
b9a60cd
Compare
return nil | ||
} | ||
} | ||
sv.OwnerReferences = append(sv.OwnerReferences, ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Append an owner reference to delete stale StorageVersions. This does not guarantee StorageVersions will be updated when the CRD gets updated.
// This means stale records can still exist in the cluster if an apiserver | ||
// hasn't got any request. All apiservers have to at least serve one | ||
// request (can be a GET) for a CR's storage version record to be | ||
// up-to-date, and to trigger storage migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to the owner-reference approach is to let crdHandler
garbage collect storageversion whenever we tear down old storage. That approach is more responsive, but it's more expensive (apiextensions-apiserver replicas conflict with each other in HA cluster) and only helps with some rare corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caesarxuchao Specifically, in an HA cluster, after a CRD gets encoding version update:
- if all three server have served requests with new storage, storage migration will be triggered for both approaches.
- if only one or two of the three servers have served requests with new storage, storage migration won't happen for the owner-reference approach, but will for the alternative
- if none of the three servers has served requests with new storage, storage migration won't happen for the owner-reference approach; it will for the alternative by a chance:
a. if the storage migrator observed the old StorageVersion object getting shrunk to one/two server records, it will trigger the migration
b. if the watch events get compressed (e.g. informer reconnection) and the storage migrator only saw a DELETE event, it won't trigger the migration
3 (updating the schema without even read after the update) should be extremely rare. I don't expect 2 to happen assuming enough number (>3) of requests have made to the servers and the load balancer works reasonably.
That said, we can let the storage migrator send read requests periodically to eliminate 3) and increase the chance of 1). Also the alternative can be a future improvement if one region being starved is a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this logic in the handler chain feels misplaced. Why is that? Just register in the event handlers for CRDs. I.e. put it into pkg/controllers/storageversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also above we have event handlers: https://github.com/kubernetes/kubernetes/pull/96403/files#diff-01e7a2059fe62e95ec32e2589c01b7218100de327647c641c334d38f2c5aa06eL206.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what is pkg/controllers/storageversion
.
let me think about using the event handlers to create/update storage versions. I don't want to use the event handlers to garbage collect storage versions because of the reason I mentioned in: https://github.com/kubernetes/kubernetes/pull/96403/files#r520386107.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree using the event handlers to create/update storage versions works better, and solves https://github.com/kubernetes/kubernetes/pull/96403/files#r520403155. Will update the PR
b9a60cd
to
e10ee04
Compare
utilfeature.DefaultFeatureGate.Enabled(genericfeatures.APIServerIdentity) { | ||
kubeclientset, err := kubernetes.NewForConfig(s.GenericAPIServer.LoopbackClientConfig) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to create kubernetes clientset: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the bots picked me from /test
should Kubernetes be uppercase here, or can it be dropped similar to line 173:
return nil, fmt.Errorf("failed to create clientset: %v", err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! kubernetes
is the package name: k8s.io/client-go/kubernetes.NewForConfig
. I have it to distinguish from L173, which is the apiextensions clientset: k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset.NewForConfig
. Changed the error to failed to create clientset for storage versions
to make it less confusing.
@@ -0,0 +1,2 @@ | |||
approvers: | |||
- roycaihw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially the SIG label should be here:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/apimachinery/OWNERS#L25-L26
not sure if there should be more approvers added here too. ideally, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the label and @caesarxuchao in approvers. Sadly I couldn't find a sig-api-machinery-approvers alias
// Send a request to make the server create handler and update storage version | ||
_, err = dynamicClient.Resource(gvr).Namespace("default").List(context.TODO(), metav1.ListOptions{}) | ||
if err != nil { | ||
t.Fatalf("unexpected error when listing foos: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace "foos" with gvr.Resource or just "resources"?
IIRC, the error here should already include what failed to be listed.
} | ||
|
||
var storageVersion apiserverinternalv1alpha1.ServerStorageVersion | ||
if err := wait.PollImmediate(100*time.Millisecond, 10*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see a case where 10 seconds might not be enough - i.e. to not introduce a flake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Actually we don't need the wait. I was doing something else and forgot about it. Removed
t.Fatalf("failed to get storage version for custom resources: %v", err) | ||
} | ||
if !strings.HasPrefix(storageVersion.APIServerID, "kube-apiserver-") { | ||
t.Fatalf("apiserver ID doesn't contain kube-apiserver- prefix, has: %v", apiserverID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think apiserverID is "" at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's populated by the apiserver-identity feature:
id = "kube-apiserver-" + uuid.New().String() |
e10ee04
to
00b04cc
Compare
/triage accepted |
00b04cc
to
820227e
Compare
@@ -508,6 +520,9 @@ func (r *crdHandler) updateCustomResourceDefinition(oldObj, newObj interface{}) | |||
|
|||
klog.V(4).Infof("Updating customresourcedefinition %s", newCRD.Name) | |||
r.removeStorage_locked(newCRD.UID) | |||
if err := r.updateStorageVersionFor(newCRD); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am a little worried to have this inside of the lock as it calls out via a client (and potentially blocks). Same above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline. Decoupled the update logic and the lock
I was hoping for a change more like this, where we intercept every write operation: diff --git a/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
index b317e30ebd2..263b4524154 100644
--- a/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
+++ b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
@@ -402,16 +402,31 @@ func (r *crdHandler) serveResource(w http.ResponseWriter, req *http.Request, req
responsewriters.ErrorNegotiated(err, Codecs, schema.GroupVersion{Group: requestInfo.APIGroup, Version: requestInfo.APIVersion}, w, req)
return nil
}
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.CreateResource(storage, requestScope, r.admission)
case "update":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.UpdateResource(storage, requestScope, r.admission)
case "patch":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
case "delete":
allowsOptions := true
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.DeleteResource(storage, allowsOptions, requestScope, r.admission)
case "deletecollection":
checkBody := true
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.DeleteCollection(storage, checkBody, requestScope, r.admission)
default:
responsewriters.ErrorNegotiated(
@@ -430,8 +445,14 @@ func (r *crdHandler) serveStatus(w http.ResponseWriter, req *http.Request, reque
case "get":
return handlers.GetResource(storage, requestScope)
case "update":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.UpdateResource(storage, requestScope, r.admission)
case "patch":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
default:
responsewriters.ErrorNegotiated(
@@ -450,8 +471,14 @@ func (r *crdHandler) serveScale(w http.ResponseWriter, req *http.Request, reques
case "get":
return handlers.GetResource(storage, requestScope)
case "update":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.UpdateResource(storage, requestScope, r.admission)
case "patch":
+ if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+ return err
+ }
return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
default:
responsewriters.ErrorNegotiated( that contains the intercepting logic to a single method and leaves the rest of the custom resource handling logic alone in that single method, we can:
|
As I recall the KEP, our goal for CRs is that we don't ever storage a CR with an encoding version that doesn't match what is stored in the API. This is slightly more complicated than the built-in resources since the encoding version can change without restarting. However, I think we could simplify slightly with a flow like
I don't know exactly where the code change would be offhand, but I don't know that we need asynchronous writing in this case. The first few mutating requests after an encoding version change would have higher than normal latency, but there are not that many of them. As I recall, there's a controller-y aspect to writing for the built-in resources. I think a similar thing could be built using a cache that checks each request to see if it matches and does the synchronous write then. |
@deads2k There could be some race conditions. Imagine:
The asynchronous channel makes sure the updates happen in the right order, and latter updates begin only after the former ones' graceful teardown finishes. |
@liggitt Thanks. That's doable. I will update the PR |
When a storage version update is aborted because the CRD was deleted, we don't need to reject in-flight CR requests. We can safely allow these requests to proceed, because the CRD finalizer guarantees no write requests can succeed after a CRD is deleted-- CREATE requests are 405 NOT ALLOWED, other requests will get 404 NOT FOUND.
In theory, "CRD deletion and CRD finalizer tearing-down CRs" could happen in between 1. crdHandler saw CRD terminating=false and 2. crdHandler serves CR create request. This commit makes crdHandler read the latest CRD from the cache (shared with CRD finalizer) right before serving the create request, to make sure the gap is as narrow as possible, and the chance for the race is still low.
@roycaihw: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
are you sure? updates have to be allowed in order to remove finalizers on CR instances blocking cleanup, right? by "after a CRD is deleted", do you mean "actually deleted" or "deletionTimestamp persisted on the CRD"? |
The cleanup goes directly to the storage, so it won't be blocked by the http handler.
I mean actually deleted. This is when crdHandler gets a DELETE watch event, starts tearing down the storage. The CRD finalizer controller did its job already, so we can unblock write requests and know they won't succeed. |
I may be wrong. Let me double check. But even if CR cleanup needs to go through the http handler, we won't deadlock. The order would be:
|
@roycaihw: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implements the section "CRDs" in KEP.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig api-machinery
/assign @sttts @caesarxuchao