Cluster Controller #7459

lavalamp · 2015-04-28T21:05:14Z

This is my post 1.0 vision for how cluster upgrades happen. Writing it up since it's been mentioned a few times.

cluster-controller

A new cluster component.
Runs in a pod, with admin privileges.
Takes over from the bootstrapping process when a new cluster is started.
Keeps a desired cluster state object, stored in the system with other api objects. This includes:
- The desired node configuration.
- The desired master component versions & configurations.
Runs a sync loop, observing cluster state, and changing it to match desired state.
- Important: rate limited!
Provisioning a new node and/or reinstalling an existing node is pushed into the cloudprovider interface.
- We add a declarative node configuration so we can say "make node X look like {config}" or "make a new node like {config}". We could start out with this being the salt script we currently use.
Cluster upgrades and rollbacks are done by pushing a new desired state.

kelseyhightower · 2015-04-28T22:59:46Z

@lavalamp Thanks for posting this. We are doing a lot of this work in Tectonic today and from this list I see plenty of areas where we can collaborate around managing the cluster lifecycle.

alex-mohr · 2015-04-28T23:39:50Z

@lavalamp Thanks -- and this does mirror some of what GKE has started to build as well, but we'd rather have more of that functionality living within the cluster rather than out-of-cluster functionality.

FYI @roberthbailey @zmerlynn

mbforbes · 2015-04-28T23:42:50Z

/sub

davidopp · 2015-04-30T00:28:02Z

/subscribe

alex-mohr · 2015-04-30T00:52:44Z

Flagging this as v1.0-post because I don't think we block v1 on it, but also upping priority to p1 because I want us to drive this soon after v1 ships.

That said, @kelseyhightower if you have cycles in the short term and/or want to upstream what you've found works in Tectonic, happy to review.

davidopp · 2015-04-30T04:59:16Z

Just to make sure I understand, when you say "provisioning a new node and/or reinstalling an existing node is pushed into the cloudprovider interface" you mean the cluster-controller, which is running the sync loop, will call into the cloudprovider interface to providion/reinstall?

alex-mohr · 2015-04-30T17:10:05Z

@davidopp I read @lavalamp's proposal as an implementation proposal to meet a requirement of roughly "in order to support k8s.apiserver.resizeClusterToHaveNodes(N), we need to make some 'master' component have sufficient knowledge and capability to dynamically create new nodes according to a template."

Still unclear to me whether that's exactly a ClusterController that does it all or the necessary functionality gets federated out to a ProvisionsAndDeletesNodesController, a ManagesExistingNodesController, a CloudProviderReconcilerController, etc.

Longer term, we'll want to support more than one node type, so my vague mental model looks a lot like making a "NodeReplicationController" a first-class entity, allowing multiples of them, and having an optional overall CapacityManagerController that can turn the various NodeReplicationController knobs as needed to meet SLOs.

And of course, if someone wants to manually manage their node database or use their own custom automation, I assume we have some form of opt outs for all of it, except for the basic RegisterNodeWithKubernetes plumbing.

lavalamp · 2015-05-01T00:09:53Z

@davidopp Yes.

@alex-mohr Yeah that's one possible factoring of this, although we may want to defer some of that complexity until after we have the basic path working.

bgrant0607 · 2015-09-10T21:04:16Z

Let's not build something that's unnecessarily restricted to just cluster configuration.

Related: component registration #13216

First thing we should do is move addons to Deployment, when that is ready. Then, once we have kubectl apply (#1702), updates of addons will be fully declarative.

If we want to effectively run kubectl apply as a continuous service, then we need something like Deployment Manager #3685.

Also, component configuration should be stored by the system #1627. If we need a way to orchestrate a rolling update of node configuration, we should think about how to do that in a way that would be useful in other scenarios, as well.

cc @jackgr @karlkfi @nikhiljindal @derekwaynecarr @fgrzadkowski

roberthbailey · 2017-04-21T22:38:00Z

Quoting from Kubernetes Architectural Roadmap (was Core/Layers Working Doc):

The community has developed numerous tools, such as minikube, kubeadm, bootkube, kube-aws, kops, kargo, kubernetes-anywhere, and so on. As can be seen from the diversity of tools, there is no one-size-fits-all solution for cluster deployment and management (e.g., upgrades).

I'm going to close this generic issue and let each provisioning solution solve this as they see fit.

lavalamp added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Apr 28, 2015

mbforbes added the area/upgrade label Apr 28, 2015

alex-mohr added this to the v1.0-post milestone Apr 30, 2015

alex-mohr added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Apr 30, 2015

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

roberthbailey added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. team/control-plane labels Aug 27, 2015

davidopp mentioned this issue Aug 27, 2015

Proposal for Deferred Creation of Addon Objects #3579

Closed

bgrant0607 removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 10, 2015

bgrant0607 mentioned this issue Sep 10, 2015

Proposal to rework Kubernetes deployment CLI #5472

Closed

4 tasks

bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015

roberthbailey mentioned this issue Sep 23, 2015

Node upgrades: investigate putting the mechanism on the master #6106

Closed

davidopp mentioned this issue Nov 3, 2015

Cluster shutdown "events" #16337

Closed

davidopp mentioned this issue Nov 22, 2015

Add proposal for node maintenance. #17393

Closed

davidopp added the davidopp label Dec 15, 2015

davidopp assigned mikedanese Dec 15, 2015

davidopp added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 15, 2015

davidopp modified the milestones: next-candidate, v1.2-candidate Feb 19, 2016

bgrant0607 mentioned this issue Mar 12, 2016

Support addon Deployments, make heapster a deployment with a nanny. #22893

Merged

roberthbailey mentioned this issue Mar 20, 2016

kube-addon-update revamp #23233

Closed

bgrant0607 removed this from the next-candidate milestone Apr 26, 2016

bgrant0607 added the area/bootstrapping label Apr 26, 2016

bgrant0607 added team/ux and removed team/control-plane labels Jun 23, 2016

bgrant0607 mentioned this issue Jun 30, 2016

Addon management layer kubernetes/enhancements#18

Closed

18 tasks

bgrant0607 removed the area/bootstrapping label Jul 12, 2016

bgrant0607 removed the davidopp label Dec 8, 2016

bgrant0607 mentioned this issue Mar 9, 2017

Add a built-in addon-manager #42756

Closed

roberthbailey closed this as completed Apr 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Controller #7459

Cluster Controller #7459

lavalamp commented Apr 28, 2015

kelseyhightower commented Apr 28, 2015

alex-mohr commented Apr 28, 2015

mbforbes commented Apr 28, 2015

davidopp commented Apr 30, 2015

alex-mohr commented Apr 30, 2015

davidopp commented Apr 30, 2015

alex-mohr commented Apr 30, 2015

lavalamp commented May 1, 2015

bgrant0607 commented Sep 10, 2015

roberthbailey commented Apr 21, 2017

Cluster Controller #7459

Cluster Controller #7459

Comments

lavalamp commented Apr 28, 2015

kelseyhightower commented Apr 28, 2015

alex-mohr commented Apr 28, 2015

mbforbes commented Apr 28, 2015

davidopp commented Apr 30, 2015

alex-mohr commented Apr 30, 2015

davidopp commented Apr 30, 2015

alex-mohr commented Apr 30, 2015

lavalamp commented May 1, 2015

bgrant0607 commented Sep 10, 2015

roberthbailey commented Apr 21, 2017