Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster versioning #4855

Closed
zmerlynn opened this issue Feb 26, 2015 · 30 comments
Closed

Cluster versioning #4855

zmerlynn opened this issue Feb 26, 2015 · 30 comments

Comments

@zmerlynn
Copy link
Member

@zmerlynn zmerlynn commented Feb 26, 2015

This issue is a proposal and collection of work-items for the cluster versioning mechanics for 1.0. It is meant to contain concepts from #2524 and decisions from the 2015/02 Kubernetes Meet-up as to what to encapsulate/cut for 1.0:

Versioning requirements for kubelet:

  • kubelet version tuple must be reported for the cross-product of, at least, (kubelet, docker, kernel).
    apiserver. Since, as we'll see in the Upgrade section, these versions are bundled, the tuple itself may be encodable linearly (i.e. "kubelet image 97" for a given cloud provider / node image.) (#5948)
  • It should be easy to query the current versions of all kubelets from the API.
  • apiserver must speak only the capabilities of the least capable kubelet software version (the first part of the tuple)
  • apiserver of version n.x must be accessible to a kubelet of version n.y if x>=y. When in doubt, the versioning policy trumps this issue as to what API versions are required to interoperate, except that for 1.0 master components are always allowed to assume they're ahead of kubelet.
@rsokolowski
Copy link
Contributor

@rsokolowski rsokolowski commented Mar 3, 2015

Hi @zmerlynn , do you need any help with this issue ? If yes, which work items could I work on ? Thanks !

@roberthbailey
Copy link
Member

@roberthbailey roberthbailey commented Mar 14, 2015

This is a P1 and needs an owner. @zmerlynn if you are planning to drive it, please assign to yourself, otherwise it sounds like @rsokolowski is interested in starting on it.

@zmerlynn
Copy link
Member Author

@zmerlynn zmerlynn commented Mar 14, 2015

I think a chunk of this work is actually on Dawn, but let me figure out how to piece it out.

@zmerlynn
Copy link
Member Author

@zmerlynn zmerlynn commented Mar 16, 2015

cc @dchen1107

I think the first bullet, reporting the actual version somewhere, is probably somewhere on the node team. It seems like it could hook in with the status update mechanism for #4562 so that we're just doing one status update, but we don't necessarily need to update that often.

The second bullet (keeping track of all the versions) is ostensibly the NodeController.

The third bullet, having apiserver only speak the minimum, I'm not sure where it fits: ideally in a rolling upgrade scenario, you'd have some interconnection from NodeController to the apiserver that specifies the minimum, and when the last node reports in, an event fires back to apiserver. I'm not sure what follows this pattern right now.

The last bullet is waiting on the merge of #4833, and is otherwise, I think a testing and policy bullet. Which is complicated in its own right, but doesn't need a lot of upfront design.

@zmerlynn zmerlynn mentioned this issue Mar 18, 2015
0 of 4 tasks complete
@alex-mohr alex-mohr changed the title Cluster Versioning and Upgrade in V1.0 upgrade: Cluster Versioning and Upgrade in V1.0 Mar 19, 2015
This was referenced Mar 27, 2015
@mbforbes
Copy link
Contributor

@mbforbes mbforbes commented Mar 28, 2015

This is awesome, @zmerlynn—thanks for nailing down the version requirements and writing out the 1.0 upgrade plan in understandable English.

I'm going to be removing the upgrade parts of this to shove them in the other rollups (master #6075 and node #6079). That way, this can be specifically the versioning rollup and we can split the work nicely into three separate issues. I think I've covered everything upgrade-related in those, but I really liked the clear description of upgrades for 1.0 you wrote in this, so I'm going to transfer that to one or both of them.

Let me know if I missed something in this transition, and sorry if I confused anyone's issue linking by doing this!

@mbforbes mbforbes changed the title upgrade: Cluster Versioning and Upgrade in V1.0 Cluster versioning for V1.0 Mar 28, 2015
@mbforbes
Copy link
Contributor

@mbforbes mbforbes commented Mar 28, 2015

  • assigning @zmerlynn as the overall owner to drive, even if individual work items are handled across teams
@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Apr 7, 2017

Field gate proposal: https://docs.google.com/document/d/1wuoSqHkeT51mQQ7dIFhUKrdi3-1wbKrNWeIL4cKb9zU/edit#

Note also that the concept of "profiles" is being discussed, for multiple purposes.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Jan 2, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@roberthbailey
Copy link
Member

@roberthbailey roberthbailey commented Jan 22, 2018

@zmerlynn - can you triage this issue?

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Jan 22, 2018

/remove-lifecycle stale
/lifecycle frozen

I'd like to keep this open. We have a number of challenges around cluster lifecycle that I'd like to consider holistically:

  • Field-level versioning: #34508
  • HA masters mangle data: #46073
  • Storage upgrade: #52185
  • Teardown: #4630
  • Run levels: #54522
  • I don't remember which issue covers resource stranding upon rollback.
  • Add-on management: #23233
@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Jan 22, 2018

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Jan 26, 2018

Operations:

  • Turn up
  • Upgrade
  • Downgrade
  • Teardown

Strawman upgrade sequence:

  1. Upgrade masters (apiserver, controller manager, scheduler)
  2. Enable new APIs
  3. Upgrade addons (caveats: don't remove old DaemonSets, maybe support API version discovery)
  4. Disable old APIs
  5. Upgrade nodes

Default storage versions can't be updated until after step 1.

@krmayankk
Copy link
Contributor

@krmayankk krmayankk commented Aug 3, 2020

@bgrant0607 why is this frozen ? it seems it was originally created for 1.0 in 2015, but has lot of relevant details that could become a documentation link or KEP for how clusters should be upgraded , but not sure i have seen any of those documentation.

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 3, 2020

@krmayankk It's fine to close this issue, though AFAIK, neither the upgrade/downgrade sequencing, nor lifecycle stages, nor teardown operations, nor the other proposals mentioned above (#4855 (comment)) have been implemented. More specific issues, such as #54522 have been filed.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Aug 3, 2020

@bgrant0607: Closing this issue.

In response to this:

@krmayankk It's fine to close this issue, though AFAIK, neither the upgrade/downgrade sequencing, nor lifecycle stages, nor teardown operations, nor the other proposals mentioned above (#4855 (comment)) have been implemented. More specific issues, such as #54522 have been filed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Technical Debt Research
Enterprise Readiness
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.