apiextensions apiserver: update storage version for custom resources #96403

roycaihw · 2020-11-10T07:56:11Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
Implements the section "CRDs" in KEP.

Does this PR introduce a user-facing change?:

Add a feature in the API server to update storage version for custom resources. Enabling internal.apiserver.k8s.io/v1alpha1 API, StorageVersionAPI feature gate and APIServerIdentity feature gate are required to use this feature.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190802-dynamic-coordinated-storage-version.md

/sig api-machinery
/assign @sttts @caesarxuchao

roycaihw · 2020-11-10T08:33:01Z

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go

+					return nil
+				}
+			}
+			sv.OwnerReferences = append(sv.OwnerReferences, ref)


Append an owner reference to delete stale StorageVersions. This does not guarantee StorageVersions will be updated when the CRD gets updated.

roycaihw · 2020-11-10T08:46:52Z

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go

+		// This means stale records can still exist in the cluster if an apiserver
+		// hasn't got any request. All apiservers have to at least serve one
+		// request (can be a GET) for a CR's storage version record to be
+		// up-to-date, and to trigger storage migration.


An alternative to the owner-reference approach is to let crdHandler garbage collect storageversion whenever we tear down old storage. That approach is more responsive, but it's more expensive (apiextensions-apiserver replicas conflict with each other in HA cluster) and only helps with some rare corner cases.

@caesarxuchao Specifically, in an HA cluster, after a CRD gets encoding version update:

if all three server have served requests with new storage, storage migration will be triggered for both approaches.

if only one or two of the three servers have served requests with new storage, storage migration won't happen for the owner-reference approach, but will for the alternative

if none of the three servers has served requests with new storage, storage migration won't happen for the owner-reference approach; it will for the alternative by a chance:
a. if the storage migrator observed the old StorageVersion object getting shrunk to one/two server records, it will trigger the migration
b. if the watch events get compressed (e.g. informer reconnection) and the storage migrator only saw a DELETE event, it won't trigger the migration

3 (updating the schema without even read after the update) should be extremely rare. I don't expect 2 to happen assuming enough number (>3) of requests have made to the servers and the load balancer works reasonably.

That said, we can let the storage migrator send read requests periodically to eliminate 3) and increase the chance of 1). Also the alternative can be a future improvement if one region being starved is a concern.

Having this logic in the handler chain feels misplaced. Why is that? Just register in the event handlers for CRDs. I.e. put it into pkg/controllers/storageversion.

Also above we have event handlers: https://github.com/kubernetes/kubernetes/pull/96403/files#diff-01e7a2059fe62e95ec32e2589c01b7218100de327647c641c334d38f2c5aa06eL206.

not sure what is pkg/controllers/storageversion.

let me think about using the event handlers to create/update storage versions. I don't want to use the event handlers to garbage collect storage versions because of the reason I mentioned in: https://github.com/kubernetes/kubernetes/pull/96403/files#r520386107.

I agree using the event handlers to create/update storage versions works better, and solves https://github.com/kubernetes/kubernetes/pull/96403/files#r520403155. Will update the PR

neolit123 · 2020-11-10T13:22:32Z

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go

+		utilfeature.DefaultFeatureGate.Enabled(genericfeatures.APIServerIdentity) {
+		kubeclientset, err := kubernetes.NewForConfig(s.GenericAPIServer.LoopbackClientConfig)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create kubernetes clientset: %v", err)


looks like the bots picked me from /test

should Kubernetes be uppercase here, or can it be dropped similar to line 173:

return nil, fmt.Errorf("failed to create clientset: %v", err)

Thanks for reviewing! kubernetes is the package name: k8s.io/client-go/kubernetes.NewForConfig. I have it to distinguish from L173, which is the apiextensions clientset: k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset.NewForConfig. Changed the error to failed to create clientset for storage versions to make it less confusing.

neolit123 · 2020-11-10T13:26:21Z

test/integration/storageversion/OWNERS

@@ -0,0 +1,2 @@
+approvers:
+- roycaihw


potentially the SIG label should be here:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/apimachinery/OWNERS#L25-L26

not sure if there should be more approvers added here too. ideally, yes.

added the label and @caesarxuchao in approvers. Sadly I couldn't find a sig-api-machinery-approvers alias

neolit123 · 2020-11-10T13:33:34Z

test/integration/storageversion/storage_version_test.go

+	// Send a request to make the server create handler and update storage version
+	_, err = dynamicClient.Resource(gvr).Namespace("default").List(context.TODO(), metav1.ListOptions{})
+	if err != nil {
+		t.Fatalf("unexpected error when listing foos: %v", err)


replace "foos" with gvr.Resource or just "resources"?
IIRC, the error here should already include what failed to be listed.

neolit123 · 2020-11-10T13:34:29Z

test/integration/storageversion/storage_version_test.go

+	}
+
+	var storageVersion apiserverinternalv1alpha1.ServerStorageVersion
+	if err := wait.PollImmediate(100*time.Millisecond, 10*time.Second, func() (bool, error) {


do you see a case where 10 seconds might not be enough - i.e. to not introduce a flake?

good point. Actually we don't need the wait. I was doing something else and forgot about it. Removed

neolit123 · 2020-11-10T13:39:27Z

test/integration/storageversion/storage_version_test.go

+		t.Fatalf("failed to get storage version for custom resources: %v", err)
+	}
+	if !strings.HasPrefix(storageVersion.APIServerID, "kube-apiserver-") {
+		t.Fatalf("apiserver ID doesn't contain kube-apiserver- prefix, has: %v", apiserverID)


i think apiserverID is "" at this point.

It's populated by the apiserver-identity feature:

kubernetes/staging/src/k8s.io/apiserver/pkg/server/config.go

Line 299 in 3146daf

id = "kube-apiserver-" + uuid.New().String()

fedebongio · 2020-11-10T21:22:18Z

/triage accepted

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go

sttts · 2020-11-11T12:44:16Z

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go

@@ -508,6 +520,9 @@ func (r *crdHandler) updateCustomResourceDefinition(oldObj, newObj interface{})

 	klog.V(4).Infof("Updating customresourcedefinition %s", newCRD.Name)
 	r.removeStorage_locked(newCRD.UID)
+	if err := r.updateStorageVersionFor(newCRD); err != nil {


Am a little worried to have this inside of the lock as it calls out via a client (and potentially blocks). Same above.

discussed offline. Decoupled the update logic and the lock

liggitt · 2021-02-03T15:27:31Z

I was hoping for a change more like this, where we intercept every write operation:

diff --git a/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
index b317e30ebd2..263b4524154 100644
--- a/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
+++ b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go
@@ -402,16 +402,31 @@ func (r *crdHandler) serveResource(w http.ResponseWriter, req *http.Request, req
 			responsewriters.ErrorNegotiated(err, Codecs, schema.GroupVersion{Group: requestInfo.APIGroup, Version: requestInfo.APIVersion}, w, req)
 			return nil
 		}
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.CreateResource(storage, requestScope, r.admission)
 	case "update":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.UpdateResource(storage, requestScope, r.admission)
 	case "patch":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
 	case "delete":
 		allowsOptions := true
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.DeleteResource(storage, allowsOptions, requestScope, r.admission)
 	case "deletecollection":
 		checkBody := true
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.DeleteCollection(storage, checkBody, requestScope, r.admission)
 	default:
 		responsewriters.ErrorNegotiated(
@@ -430,8 +445,14 @@ func (r *crdHandler) serveStatus(w http.ResponseWriter, req *http.Request, reque
 	case "get":
 		return handlers.GetResource(storage, requestScope)
 	case "update":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.UpdateResource(storage, requestScope, r.admission)
 	case "patch":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
 	default:
 		responsewriters.ErrorNegotiated(
@@ -450,8 +471,14 @@ func (r *crdHandler) serveScale(w http.ResponseWriter, req *http.Request, reques
 	case "get":
 		return handlers.GetResource(storage, requestScope)
 	case "update":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.UpdateResource(storage, requestScope, r.admission)
 	case "patch":
+		if err := crdInfo.WaitForConsistentStorageVersion(req.Context()); err != nil {
+			return err
+		}
 		return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
 	default:
 		responsewriters.ErrorNegotiated(

that contains the intercepting logic to a single method and leaves the rest of the custom resource handling logic alone

in that single method, we can:

no-op if the feature is disabled or if there's only a single storageVersion (e.g. nothing to converge on)
return fast once servers converge
wait, honoring request timeout (or optionally a failsafe)

deads2k · 2021-02-03T15:28:06Z

As I recall the KEP, our goal for CRs is that we don't ever storage a CR with an encoding version that doesn't match what is stored in the API. This is slightly more complicated than the built-in resources since the encoding version can change without restarting. However, I think we could simplify slightly with a flow like

create storage for CR as normal
read-only requests are allowed immediately
first mutating request for a CR arrives
create or update the storage version. If this fails, fail the request
accept mutating requests

I don't know exactly where the code change would be offhand, but I don't know that we need asynchronous writing in this case. The first few mutating requests after an encoding version change would have higher than normal latency, but there are not that many of them.

As I recall, there's a controller-y aspect to writing for the built-in resources. I think a similar thing could be built using a cache that checks each request to see if it matches and does the synchronous write then.

roycaihw · 2021-02-11T08:00:27Z

@deads2k There could be some race conditions. Imagine:

...
updating the storage version to v1.
CRD got updated to v2
apiserver saw the watch event, removed the v1 storage from the storage map, and put it to graceful teardown
(the v1 storage could still update the storage version to v1, and finish serving the in-flight mutating requests)
new mutating request came in
apiserver created v2 storage, updating the storage version to v2

The v2 update might happen before the v1 update. In that case the storage version ends up being v1.
Or assuming the storage version updates happened in the right order (first v1, then v2), but some v1 in-flight mutating requests got served after the v2 storage version update. Furthermore if a storage migration happened after the v2 update and before the v1 in-flight requests, v1 data would get persisted in etcd.

The asynchronous channel makes sure the updates happen in the right order, and latter updates begin only after the former ones' graceful teardown finishes.

roycaihw · 2021-02-11T08:05:32Z

I was hoping for a change more like this, where we intercept every write operation:

@liggitt Thanks. That's doable. I will update the PR

When a storage version update is aborted because the CRD was deleted, we don't need to reject in-flight CR requests. We can safely allow these requests to proceed, because the CRD finalizer guarantees no write requests can succeed after a CRD is deleted-- CREATE requests are 405 NOT ALLOWED, other requests will get 404 NOT FOUND.

In theory, "CRD deletion and CRD finalizer tearing-down CRs" could happen in between 1. crdHandler saw CRD terminating=false and 2. crdHandler serves CR create request. This commit makes crdHandler read the latest CRD from the cache (shared with CRD finalizer) right before serving the create request, to make sure the gap is as narrow as possible, and the chance for the race is still low.

roycaihw · 2021-02-18T07:17:42Z

I checked CRD deletion workflow-- we don't need to reject CR writes if a CRD is deleted, see commit message 8661f73 for more detail. One concern raised was #99181. I had a mitigation in 6ddc1eb.

k8s-ci-robot · 2021-02-18T07:33:39Z

@roycaihw: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-bazel-test	`100aa3b`	link	`/test pull-kubernetes-bazel-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

liggitt · 2021-02-18T21:10:34Z

the CRD finalizer guarantees no write requests can succeed after a CRD is deleted

are you sure? updates have to be allowed in order to remove finalizers on CR instances blocking cleanup, right?

by "after a CRD is deleted", do you mean "actually deleted" or "deletionTimestamp persisted on the CRD"?

roycaihw · 2021-02-18T22:23:43Z

updates have to be allowed in order to remove finalizers on CR instances blocking cleanup, right?

The cleanup goes directly to the storage, so it won't be blocked by the http handler.

by "after a CRD is deleted"

I mean actually deleted. This is when crdHandler gets a DELETE watch event, starts tearing down the storage. The CRD finalizer controller did its job already, so we can unblock write requests and know they won't succeed.

roycaihw · 2021-02-19T05:12:13Z

updates have to be allowed in order to remove finalizers on CR instances blocking cleanup, right?

The cleanup goes directly to the storage, so it won't be blocked by the http handler.

I may be wrong. Let me double check.

But even if CR cleanup needs to go through the http handler, we won't deadlock. The order would be:

storage version update succeeds
CR cleanup finishes
CRD is actually deleted
crdHandler sees the DELETE event, aborts storage version update (no-op since the update already succeeded), unblocks write requests (no-op)

k8s-ci-robot · 2021-04-06T03:19:49Z

@roycaihw: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2021-07-05T04:59:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-04T05:50:16Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

k8s-triage-robot · 2021-09-03T05:57:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-09-03T05:57:26Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot assigned caesarxuchao Nov 10, 2020

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 10, 2020

k8s-ci-robot assigned sttts Nov 10, 2020

k8s-ci-robot requested review from lavalamp and neolit123 November 10, 2020 07:56

k8s-ci-robot added area/apiserver area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 10, 2020

roycaihw force-pushed the storage-version/cr branch from f77616e to b9a60cd Compare November 10, 2020 08:10

roycaihw commented Nov 10, 2020

View reviewed changes

roycaihw force-pushed the storage-version/cr branch from b9a60cd to e10ee04 Compare November 10, 2020 09:15

neolit123 reviewed Nov 10, 2020

View reviewed changes

roycaihw force-pushed the storage-version/cr branch from e10ee04 to 00b04cc Compare November 10, 2020 18:47

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 10, 2020

sttts reviewed Nov 11, 2020

View reviewed changes

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go Show resolved Hide resolved

roycaihw force-pushed the storage-version/cr branch from 00b04cc to 820227e Compare November 11, 2020 12:13

sttts reviewed Nov 11, 2020

View reviewed changes

roycaihw added 2 commits February 1, 2021 00:40

replace double feature gate check with non-nil interface check

5faab59

generated

35d9ffc

roycaihw added 3 commits February 16, 2021 22:11

move wait logic to serveResource, intercept every write operation

3cd443e

roycaihw mentioned this pull request Feb 18, 2021

Custom resource create might succeed after CRD deletion, leaving data in etcd #99181

Open

make CRD storage version update always retry

100aa3b

roycaihw mentioned this pull request Mar 9, 2021

provide directly decodable versions for storageversion API #99951

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 4, 2021

k8s-ci-robot closed this Sep 3, 2021

furkatgofurov7 mentioned this pull request Jun 21, 2022

Handle status.storedVersions migration during upgrades kubernetes-sigs/cluster-api#6691

Closed

andrewsykim mentioned this pull request Nov 1, 2022

[WIP] add initial support for StorageVersion API for CRDs #113498

Closed

richabanker mentioned this pull request Mar 7, 2024

Add initial support for StorageVersion API for CRDs #120582

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apiextensions apiserver: update storage version for custom resources #96403

apiextensions apiserver: update storage version for custom resources #96403

roycaihw commented Nov 10, 2020 •

edited

roycaihw Nov 10, 2020

roycaihw Nov 10, 2020

roycaihw Nov 10, 2020 •

edited

sttts Nov 11, 2020 •

edited

sttts Nov 11, 2020

roycaihw Nov 11, 2020 •

edited

roycaihw Nov 11, 2020

neolit123 Nov 10, 2020

roycaihw Nov 10, 2020

neolit123 Nov 10, 2020

roycaihw Nov 10, 2020

neolit123 Nov 10, 2020

neolit123 Nov 10, 2020

roycaihw Nov 10, 2020

neolit123 Nov 10, 2020

roycaihw Nov 10, 2020

fedebongio commented Nov 10, 2020

sttts Nov 11, 2020

roycaihw Nov 11, 2020

liggitt commented Feb 3, 2021

deads2k commented Feb 3, 2021

roycaihw commented Feb 11, 2021 •

edited

roycaihw commented Feb 11, 2021 •

edited

roycaihw commented Feb 18, 2021

k8s-ci-robot commented Feb 18, 2021 •

edited

liggitt commented Feb 18, 2021 •

edited

roycaihw commented Feb 18, 2021

roycaihw commented Feb 19, 2021

k8s-ci-robot commented Apr 6, 2021

fejta-bot commented Jul 5, 2021

k8s-triage-robot commented Aug 4, 2021

k8s-triage-robot commented Sep 3, 2021

k8s-ci-robot commented Sep 3, 2021

apiextensions apiserver: update storage version for custom resources #96403

apiextensions apiserver: update storage version for custom resources #96403

Conversation

roycaihw commented Nov 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roycaihw Nov 10, 2020 • edited

Choose a reason for hiding this comment

sttts Nov 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roycaihw Nov 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebongio commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Feb 3, 2021

deads2k commented Feb 3, 2021

roycaihw commented Feb 11, 2021 • edited

roycaihw commented Feb 11, 2021 • edited

roycaihw commented Feb 18, 2021

k8s-ci-robot commented Feb 18, 2021 • edited

liggitt commented Feb 18, 2021 • edited

roycaihw commented Feb 18, 2021

roycaihw commented Feb 19, 2021

k8s-ci-robot commented Apr 6, 2021

fejta-bot commented Jul 5, 2021

k8s-triage-robot commented Aug 4, 2021

k8s-triage-robot commented Sep 3, 2021

k8s-ci-robot commented Sep 3, 2021

roycaihw commented Nov 10, 2020 •

edited

roycaihw Nov 10, 2020 •

edited

sttts Nov 11, 2020 •

edited

roycaihw Nov 11, 2020 •

edited

roycaihw commented Feb 11, 2021 •

edited

roycaihw commented Feb 11, 2021 •

edited

k8s-ci-robot commented Feb 18, 2021 •

edited

liggitt commented Feb 18, 2021 •

edited