Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle status.storedVersions migration during upgrades #6691

Closed
furkatgofurov7 opened this issue Jun 21, 2022 · 3 comments · Fixed by #6749
Closed

Handle status.storedVersions migration during upgrades #6691

furkatgofurov7 opened this issue Jun 21, 2022 · 3 comments · Fixed by #6749
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@furkatgofurov7
Copy link
Member

User Story

As a developer/user/operator, I would like to do an upgrade from old API versions to newer ones in sequence (v1a3 => v1a4 => v1beta1 ) without manually removing stored versions and applying them in the middle of the upgrade process (v1a4 => v1beta1 since v1a4 would have v1a3 storedVersion in the status).

Detailed Description

we have a scenario where we do an upgrade from CAPM3 v1a4 => v1a5 => v1beta1 (CAPI v1a3 => v1a4 => v1beta1). The problem is, clusterctl fails (see logs below) to upgrade from v1a5 => v1b1 because of the status.storedVersions in the CRD which includes v1a4 API version in the status.

Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"
Retrying with backoff Cause="failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"metal3clusters.infrastructure.cluster.x-k8s.io\" is invalid: status.storedVersions[0]: Invalid value: \"v1alpha4\": must appear in spec.versions"
Patching CustomResourceDefinition="metal3clusters.infrastructure.cluster.x-k8s.io"

The request would be, should the clusterctl take care of removing of old version from the status.storedVersions or how is it handled in general?

Anything else you would like to add:

There has been a long discussion in the slack thread, xref: https://kubernetes.slack.com/archives/C8TSNPY4T/p1655805341632159

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 21, 2022
@furkatgofurov7
Copy link
Member Author

Based on the slack conversation we had, this is a small summary of all points we had:

Seems, the way we could tackle this as of now is:

  1. Run the storage Version migrator

  2. Remove the old version from the CustomResourceDefinition status.storedVersions field.

based on the https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#previous-storage-versions

/cc @sbueringer @fabriziopandini @lentzi90 @chrischdi

@schrej
Copy link
Member

schrej commented Jun 22, 2022

but everything will be migrated over time

This is not necessarily true. The apiserver only upgrades the storage version on write. So if an object never gets written, it won't get upgraded. So ideally we would automatically write the object during reconcile if the storage version is outdated, even if nothing changed. If I'm not mistaken, controllers will reconcile all of their objects after start, so that would automatically convert everything.

I think we discussed this at KubeCon during the CAPI meeting. One way to get around this would be to have some annotation/label on objects whose storage version is outdated. The idea was to talk to sig-apimachinery and suggest adding such an annotation. But I think we don't even need their support: conversion happens through conversion webhooks, which are implemented by operators. It should be possible to add an annotation during migration, when the target version is not the storage version. This could potentially be added on controller-runtime level, solving this problem for lots of operators at once.

Operators probably always reconcile the latest apiversion of an object, right? So if the controller gets an object, no conversion should occur. If conversion webhooks add a label during conversion, the controller would notice that the object was piped through conversion, and therefore isn't using the latest apiversion for storage.

@sbueringer
Copy link
Member

sbueringer commented Jun 27, 2022

Hey folks,
we've implemented a CRD migration in clusterctl upgrade in #6749 as we have to do it for cert-manager anyway.

I think it's a clean and straightforward solution for our issue and won't require a multi-month/year effort to get something implemented upstream in Kubernetes.

I looked over the KEPs and it seems like this will take a while: (and then is probably not available in old versions of Kubernetes)

I would highly prefer a straightforward solution in clusterctl over modifying webhook and controllers in all providers to write all CRs on upgrades (this would also not work for cert-manager).

If there will be an upstream solution in either controller-runtime or Kubernetes over time which works with all controller-runtime/Kubernetes version that we need it for we can of course deprecate our mechanism as it's then not necessary anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants