Storage upgrade mechanism #52185

lavalamp · 2017-09-08T17:29:22Z

Before we remove an API object version, we need to be certain that all stored objects have been upgraded to a version that will be readable in the future. The old plan for that was to run the cluster/upgrade-storage-objects.sh script after each upgrade. Unfortunately, there are a number of problems with this:

The script is old and incomplete, as api authors haven't been consistently adding their API objects to it (no doubt due to a lack of documentation in the right places).
Instructions to run the script in the release notes have been spotty.
We believe not all distributions have been running the script after upgrades.

Therefore, we need to design a robust solution for this problem, which works in the face of HA installations, user-provided apiservers, rollbacks, patch version releases, etc. Probably this won't look much like a script that you run after an upgrade, instead it may look more like a system where apiservers come to consensus on the desired storage version, plus a controller that does the job that script was supposed to do after a consensus change.

In the mean time, it is not safe to remove API object versions, so we are instating a moratorium on API object version removal until this system is in place. (Deprecations are still fine.)

thockin · 2017-09-08T18:00:35Z

Thanks, Daniel. Good summary. To echo:

it is not safe to remove API object versions, so we are instating a moratorium on API object version removal

Emphasis on removal. It sucks to carry old APIs forward, but this is a liability I think we can't ignore. We've gotten lucky this far.

@kubernetes/sig-network-api-reviews @cmluciano @dcbw @danwinship

@kubernetes/api-approvers @kubernetes/api-reviewers

smarterclayton · 2017-09-08T18:05:47Z

For context the openshift approach is we force a migration before every upgrade starts, and the migration has to complete. Migration is a kubectl cmd that reads all objects and writes a no-op change (which forces storage turn over). We use this for protobuf migration, field defaulting, new versions, storage versions conversion, self link fixup, etc.

It's basically a production version of upgrade-storage-objects.sh - depending on how complicated we want to get, i'd say it's the minimum viable path, while the cluster consensus is probably fine but more complicated.

https://github.com/openshift/origin/blob/master/pkg/oc/admin/migrate/storage/storage.go

thockin · 2017-09-08T18:09:49Z

Is this blocking the removal of Alpha APIs, too?

lavalamp · 2017-09-08T18:32:00Z

@smarterclayton Is that an upstream command?

lavalamp · 2017-09-08T18:33:18Z

@thockin I think it depends on how the alpha API is storing its objects.

smarterclayton · 2017-09-08T19:27:34Z

Downstream, it is a bunch of utility code for a migration framework (we have other migrators for things like RBAC, alpha annotations to beta fields, image references) and then a set of simple commands. I don't know that I think of this as kubectl as much as kubeadm or a separate command for admins (in openshift, oc adm is all admin focused commands and oc is all end user focused, but we don't quite have the equiv in kube)

lavalamp · 2017-09-08T20:28:30Z

@smarterclayton Gotcha. I will take a look at it but that still doesn't sound like an ideal way to solve the problem. Like, it's good if you're going to run one or two of these things with super-well-trained humans doing the upgrade, but Kubernetes isn't installed like that :)

smarterclayton · 2017-09-09T01:18:24Z

It's just part of general cluster upgrade operations. Someone has to roll the masters. Someone has to lay down new config. I agree there can be magic involved. But magic may have higher cost in some scenarios. On Sep 8, 2017, at 4:28 PM, Daniel Smith <notifications@github.com> wrote: @smarterclayton <https://github.com/smarterclayton> Gotcha. I will take a look at it but that still doesn't sound like an ideal way to solve the problem. Like, it's good if you're going to run one or two of these things with super-well-trained humans doing the upgrade, but Kubernetes isn't installed like that :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#52185 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p9i4xB-URdFNh97YstqxPPuQg8_cks5sgaN8gaJpZM4PReU1> .

enj · 2017-09-13T21:50:27Z

I have worked closely with OpenShift's installer/upgrade team on their use of the oc adm migrate storage command. I will say running it automatically pre and post upgrade is the trivial part. The command is surprisingly good at finding random edge cases. It could be turned into a controller of sort, though I am not sure what state the controller would try to drive the system from/to.

timothysc · 2017-09-14T14:19:56Z

Per conversation on sig-api-machinery, there was general consensus that it makes more sense to have this as a controllers responsibility with some set of strategies.

bgrant0607 · 2017-09-14T17:57:46Z

We need to figure out downgrades, also, which are similar. Right now, a downgrade will strand storage of any new APIs that are created.

liggitt · 2017-09-14T18:04:25Z

We need to figure out downgrades, also, which are similar. Right now, a downgrade will strand storage of any new APIs that are created.

Brand new resources will get orphaned on downgrade, but should be inert.

New versions of existing resources shouldn't start persisting with the new version until n+1 versions from when they are released (or rolling HA API upgrades won't work). If we only support single version downgrade, there shouldn't be any issues.

bgrant0607 · 2017-09-19T02:31:13Z

Related: #4855, #3562, #46073, #23233

bgrant0607 · 2017-09-19T02:45:12Z

@liggitt Inert resources have some of the timebomb characteristics of annotations discussed in #30819, upon re-upgrade.

fejta-bot · 2018-01-06T02:23:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

enj · 2018-01-09T02:07:43Z

/lifecycle frozen

thockin · 2018-01-17T00:13:02Z

Any status update here? We'd like to remove the deprecated v1beta1 NetworkPolicy.

liggitt · 2018-01-17T00:18:19Z

I haven't seen a better suggestion than a no-op read/write cycle via the API on the affected resources.

Enforcing that is done seems like the responsibility of the deployment mechanism, so it can be staged after all apiservers in a HA deployment are upgraded.

liggitt · 2018-01-17T00:31:47Z

In the meantime, you can start disabling serving the deprecated object by default. The types still remain, so they can be read from storage, and we still have the in tree debt, but it pushes people to use the new types as new clusters are deployed.

bgrant0607 · 2018-01-22T19:07:47Z

Speaking of HA, today each apiserver has its own independent default storage versions, which are imposed immediately upon apiserver upgrade. That doesn't seem desirable.

lavalamp · 2018-02-07T22:14:31Z

https://docs.google.com/document/d/1eoS1K40HLMl4zUyw5pnC05dEF3mzFLp5TPEEt4PFvsM/edit#heading=h.xgjl2srtytjt

caesarxuchao · 2018-09-11T21:45:39Z

KEP: https://github.com/kubernetes/community/pull/2524/files.

athenabot · 2019-05-06T17:58:50Z

@lavalamp @caesarxuchao
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

caesarxuchao · 2019-05-06T18:12:32Z

/remove-triage unresolved

dims · 2022-01-24T16:53:28Z

long-term-issue (note to self)

andrewsykim · 2022-10-13T17:56:58Z

/assign

shaneutt · 2024-03-22T08:57:41Z

This is marked as priority/important-soon but given the last updates, that wouldn't seem to be the case. Can someone catch me up on what the status is here? Is this simply stale since we don't have anyone to move it forward?

enj · 2024-04-05T09:56:52Z

/lifecycle frozen

The foundational pieces are being worked on here:

kubernetes/enhancements#2339
kubernetes/enhancements#4192

enj · 2024-04-05T10:00:02Z

The #sig-api-machinery-storageversion-dev slack channel is available as well.

lavalamp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Sep 8, 2017

lavalamp mentioned this issue Sep 8, 2017

Update deprecation doc with default versions kubernetes/website#5338

Merged

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Sep 8, 2017

thockin mentioned this issue Sep 18, 2017

Need downgrade tests for 1.8 release #52593

Closed

enj mentioned this issue Sep 23, 2017

Change RBAC storage version to v1 for 1.9 #52950

Merged

ncdc mentioned this issue Sep 25, 2017

Limit 52-character cronjob name validation to create #52967

Merged

deads2k mentioned this issue Nov 9, 2017

Remove unused pkg/apis/policy/v1alpha1 #55392

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 9, 2018

lavalamp self-assigned this Jan 9, 2018

bgrant0607 mentioned this issue Jan 22, 2018

Cluster versioning #4855

Closed

4 tasks

bgrant0607 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2018

enisoc mentioned this issue Jan 25, 2018

stop serving extensions/v1beta1 and networking.k8s.io/v1beta1 in 1.22 #43214

Closed

41 tasks

lavalamp mentioned this issue Mar 23, 2018

Feature Request: Allow openapi references in CRD validation schema #54579

Closed

MaciekPytel mentioned this issue Apr 17, 2018

Use local provisioner version that uses beta API #62740

Merged

asymmetric mentioned this issue Apr 25, 2018

Upgrade to 1.10 habitat-sh/habitat-operator#219

Closed

7 tasks

liggitt assigned caesarxuchao Jan 14, 2019

liggitt added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 14, 2019

caesarxuchao mentioned this issue Mar 6, 2019

Get Jobs command fails with an error #48411

Closed

thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019

k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label May 6, 2019

TBBle mentioned this issue Nov 30, 2020

Adopt resources into release with correct instance and managed-by labels helm/helm#7649

Merged

k8s-ci-robot assigned andrewsykim Oct 13, 2022

liggitt mentioned this issue Apr 5, 2023

remove volumeattachment v1alpha1 type from the API #117100

Closed

andrewsykim removed their assignment May 22, 2023

k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 4, 2024

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage upgrade mechanism #52185

Storage upgrade mechanism #52185

lavalamp commented Sep 8, 2017

thockin commented Sep 8, 2017

smarterclayton commented Sep 8, 2017 •

edited

Loading

thockin commented Sep 8, 2017

lavalamp commented Sep 8, 2017

lavalamp commented Sep 8, 2017

smarterclayton commented Sep 8, 2017

lavalamp commented Sep 8, 2017

smarterclayton commented Sep 9, 2017 via email

enj commented Sep 13, 2017

timothysc commented Sep 14, 2017

bgrant0607 commented Sep 14, 2017

liggitt commented Sep 14, 2017

bgrant0607 commented Sep 19, 2017

bgrant0607 commented Sep 19, 2017

fejta-bot commented Jan 6, 2018

enj commented Jan 9, 2018

thockin commented Jan 17, 2018

liggitt commented Jan 17, 2018

liggitt commented Jan 17, 2018

bgrant0607 commented Jan 22, 2018

lavalamp commented Feb 7, 2018

caesarxuchao commented Sep 11, 2018

athenabot commented May 6, 2019

caesarxuchao commented May 6, 2019

dims commented Jan 24, 2022

andrewsykim commented Oct 13, 2022

shaneutt commented Mar 22, 2024

enj commented Apr 5, 2024

enj commented Apr 5, 2024

Storage upgrade mechanism #52185

Storage upgrade mechanism #52185

Comments

lavalamp commented Sep 8, 2017

thockin commented Sep 8, 2017

smarterclayton commented Sep 8, 2017 • edited Loading

thockin commented Sep 8, 2017

lavalamp commented Sep 8, 2017

lavalamp commented Sep 8, 2017

smarterclayton commented Sep 8, 2017

lavalamp commented Sep 8, 2017

smarterclayton commented Sep 9, 2017 via email

enj commented Sep 13, 2017

timothysc commented Sep 14, 2017

bgrant0607 commented Sep 14, 2017

liggitt commented Sep 14, 2017

bgrant0607 commented Sep 19, 2017

bgrant0607 commented Sep 19, 2017

fejta-bot commented Jan 6, 2018

enj commented Jan 9, 2018

thockin commented Jan 17, 2018

liggitt commented Jan 17, 2018

liggitt commented Jan 17, 2018

bgrant0607 commented Jan 22, 2018

lavalamp commented Feb 7, 2018

caesarxuchao commented Sep 11, 2018

athenabot commented May 6, 2019

caesarxuchao commented May 6, 2019

dims commented Jan 24, 2022

andrewsykim commented Oct 13, 2022

shaneutt commented Mar 22, 2024

enj commented Apr 5, 2024

enj commented Apr 5, 2024

smarterclayton commented Sep 8, 2017 •

edited

Loading