Proposal to rework Kubernetes deployment CLI #5472

zmerlynn · 2015-03-13T23:47:29Z

The following is a proposal for reworking the existing kube-up.sh, kube-down.sh, and kube-push.sh to eventually be in Go for advanced cloud deployments, while allowing the existing shell investment to continue for a period of time.

Define a deploy/deploycmd/api versioned Cluster object. This object holds all general configuration variables of the cluster and the add-on pods, plus IAAS relevant variables specific to each IAAS provider. (Issue to be filed soon to haggle over what's in these structures.)
Create a kubedeploy command with up, down, and push verbs, each of which take a cluster YAML. The up and push verbs take a path (possibly optional in the case of push) to the Kubernetes binaries to deploy.
The initial implementation of kubedeploy only has to act as a YAML -> env variable translator. Re-write the existing shell scripts to use common env variables (they're actually all close, e.g. config-default.sh, etc.). Then kubedeploy can take the given API object, spew a chunk of environment variables, and carefully execute the existing bash scripts (vigorous handwaving, but this part is not that technically challenging, just a hairy yak).
We can then define the interfaces in Go for what an "enlightened" deployment looks like, and in the process try to transition the GCE cloud provider over (keeping an eye on, say, Vagrant as the N=2 case.)

As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while. Oh, and @satnam6502 will probably love it for scalability, because I think he gouges his eyes out every time he deals with the bash in util.sh.

This will also give us a place for kubedeploy upgrade in the fullness of time, which is a distinct use case from push.

Thoughts?

cc @jlowdermilk, @brendandburns, @alex-mohr, and anyone else that cares.

The text was updated successfully, but these errors were encountered:

j3ffml · 2015-03-13T23:49:57Z

cc @satnam6502. Apologies to @satnamsingh who this probably does not interest :)

satnam6502 · 2015-03-13T23:58:26Z

After decades of using (and writing) hardware CAD tools I have developed a very very very high pain threshold.

justinsb · 2015-03-15T00:34:06Z

+1 from me (wearing my AWS-porting hat). Would we also replace the Salt code?

I have heard that coordinating other clouds to do the port is a blocker. However, there is also a lot of work today for other clouds to keep up with the latest functionality. I'd much rather do a solid day of rewriting work on the AWS cluster scripts and then have the ongoing work be much easier, by virtue of eliminating copy-and-paste and making more use of strong-typing.

zmerlynn · 2015-03-15T01:43:33Z

My goal was to start at the top and work my way down. kubedeploy would first do something like an object -> env translator, so with the very first cut for "unenlightened" cloud providers, you'd end up with something that almost acted how ssh-agent acts when you run it: it spews the appropriate bash environment variables that would drive the pre-existing scripts. This is simply a bridge, and to keep my sanity. It would eliminate pieces of the existing aws directory, I'm sure, and unify a lot of the environment variables between the different scripts, but I'm not sure it's a drastic refactor yet (but it's hard to project ahead until I see the code ahead).

I think the more drastic unification comes when you can fold it into the Go provider and join the fold there, because presumably we can then start to use radical concepts like "libraries" and "common functions" between providers.

I also think that once providers become enlightened, they can consider moving the receiving end of their deployment to something more interesting than bash, too. The work I did in #5119 is an example of moving the GCE deployment to a simplistic YAML based on environment variables, but this could also be a strongly typed Go object as well - I just didn't go that way because both the sender and receiver were bash.

derekwaynecarr · 2015-03-15T04:46:21Z

+1 to eliminate bash.
+1 to eliminate salt soon after.

Sent from my iPhone

On Mar 14, 2015, at 9:44 PM, Zach Loafman notifications@github.com wrote:

My goal was to start at the top and work my way down. kubedeploy would first do something like an object -> env translator, so with the very first cut for "unenlightened" cloud providers, you'd end up with something that almost acted how ssh-agent acts when you run it: it spews the appropriate bash environment variables that would drive the pre-existing scripts. This is simply a bridge, and to keep my sanity. It would eliminate pieces of the existing aws directory, I'm sure, and unify a lot of the environment variables between the different scripts, but I'm not sure it's a drastic refactor yet (but it's hard to project ahead until I see the code ahead).

I think the more drastic unification comes when you can fold it into the Go provider and join the fold there, because presumably we can then start to use radical concepts like "libraries" and "common functions" between providers.

I also think that once providers become enlightened, they can consider moving the receiving end of their deployment to something more interesting than bash, too. The work I did in #5119 is an example of moving the GCE deployment to a simplistic YAML based on environment variables, but this could also be a strongly typed Go object as well - I just didn't go that way because both the sender and receiver were bash.

—
Reply to this email directly or view it on GitHub.

j3ffml · 2015-03-15T23:16:23Z

+1 from me. The go -> bash translation sounds unpleasant but is probably the easiest way to bridge the existing mess with a better future.

erictune · 2015-03-16T17:03:58Z

I'd be inclined to take a bottom-up approach instead. That is, look at each shell function that is called by kube-up, and see if it makes sense to replace it with a small go program.

This lets you work in small increments, rather than a big-bang PR that forces ever other cloud-provider to do a big rewrite, and would allow use to work on it in parallel -- maybe as part of a fixit day.

erictune · 2015-03-16T17:11:42Z

@dchen1107

zmerlynn · 2015-03-16T17:15:03Z

I might call that "middle up", but you have a point. (The layering distinction depends on whether you cut the consumer at the node/salt/etc or not.)

My initial proposal was the way it was to start out with a simplistic design. It wouldn't be hard to introduce a second interface that others could glom onto that was an intermediate building block interface, though.

erictune · 2015-03-16T17:39:16Z

I think that there are a lot of individual functions in kube-up that don't belong in kube-up or a kubedeploy. They belong in kubectl, addons, or pkg/cloudprovider. I figured move as many as possible to those places before writing kubedeploy. I'll try to file specific issues for the things I have in mind.

zmerlynn · 2015-03-16T17:52:49Z

I think a lot of these fall out as you end up writing the libraries for enlightened providers, but the key is also that they don't exist today. I think it's actually somewhat difficult to see some of them apriori, but I'd love your input.

zmerlynn · 2015-03-16T17:53:39Z

(In particular, I'm worried that if we try to design it too much ahead of time, we'll end up with a structure we'll actually regret.)

mbforbes · 2015-03-16T23:45:20Z

+me

satnam6502 · 2015-03-16T23:48:43Z

Just as we have washed our hands of SOAP I think we should stop rubbing salt into our wounds and see if we can devise a setup is that is... just... a... Go program... reclaiming all that misc ad hoc config into the more civilized world of first class programming.

zmerlynn · 2015-03-16T23:56:00Z

I was thinking of reviving DCERPC, or at least ASN.1. Or maybe we could just use turtles all the way down and wrap ASN.1 in JSON.

bgrant0607 · 2015-03-17T00:39:10Z

As if we don't currently have a structure we regret?

What we're not happy with:

Salt, bash, bash that generates Salt, Go that generates bash that generates Salt, ...
Env vars plumbed through said scripts as the configuration mechanism
Fragile code that breaks continuously and/or leaks cloud resources
Intertwined code for cluster provisioning, OS configuration, network configuration, master deployment, etc.
Unprincipled division of functionality between /cluster and /cloudprovider
Lots of copy/paste and/or manual work to port to new combinations of OS distributions + cloud providers + network setups + update machinery
Most platforms are "supported" via DIY-flavor "getting-started guides" and/or blog posts

Where I think we want to get to:

Minimization of things that need to be done before the first node of a cluster is created.
Minimization of code that differs between environments (cloud providers, etc.).
Minimize differences between systemd- and init-based distributions.
It should be possible to start from a preconfigured image, with no post-facto OS image configuration required. We'd need some spec of what such an image should look like.
Users should be able to use whatever tool they please (Salt, Ansible, Puppet, Chef, Bosh, provider-specific image updater, etc.) for post-turnup image updates.
More standardized approach to network configuration.
Master node(s) should be configured identically to minion nodes.
Cloud-provider-specific functions moved behind a more comprehensive and principled cloudprovider API, in such a way that Kubernetes itself can invoke the functionality to automatically provision resources on demand, as necessary. Ideally this would build upon a library that's not specific to just Kubernetes Make it possible to build a cloudprovider outside the kubernetes repo and run on stock kubernetes #2770.
Faux cloudprovider for bare-metal installations. The best example at the moment would be CoreOS-based approaches using fleet, kube-register, etc.
All Kubernetes components (esp. master components and addons, but even Kubelet, IMO) should be able to be launched, updated, and configured using Kubernetes (i.e., self-hosted).
Node auto-configuration: Kubelet should configure Docker (or whatever container runtime), networking, etc. on the node using plugins.
Node self-registration.

The real problem is the first bootstrap step: creating things that need to exist before the first node of the cluster can be created (and creating that first node). That really should be done by a controller, cron job, or shared service -- but Real Code (TM), not scripts. By definition, it needs to run outside the cluster, but it has to have credentials/access necessary to create the cluster.

zmerlynn · 2015-03-18T16:50:09Z

@bgrant0607: I actually don't disagree with most of what you said, but we might disagree on the timelines to get there. I think we've been bleeding long enough on the front, with no concrete definition of what a cluster is at initial deployment. Some of that definition is definitely going to change over time:

I'd love to get to a world where the master is configured identical to nodes. This is actually more true today than it was before (breaking the master/minion relationship in Salt helped), but there's a lot of both deployment related pieces and k8s related pieces that need to happen first.
The self-hosting world seems not too far off, except for Kubelet. As someone eyeing upgrade, I'm very interested in this space, but Kubelet also seems like the nut to crack, too, because it has tendrils everywhere else in your list, in particular, things like: Upgrade of k8s components, node image upgrade, etc. So that timeline being farther out means we're "stuck" here for a little bit supporting particular forms. That's actually fine. At the contributors conference, for upgrade, we basically punted and said "look, we're going to support binary upgrade however you want within reason, but there's versioning health requirements we need to implement (Cluster versioning #4855)). This proposal is actually talking about a step beyond that that would allow specific providers to implement installation and upgrading specific mechanisms (GCE/SaltStack, etc.)
That said, there's things on that list if I don't get, like node self-registration, and better networking (in particular, fixing CIDR allocation for cbr0 style allocations), I'm probably going to cry. These are both necessary for autoscaling to work in GCE Managed Instance Groups. But that's slightly off-vector from this (but not orthogonal, in that it's hitting the same code, certainly).

To expand on the current proposal more after thinking about it some (I'll edit the original comment if this seems reasonable):

There are pieces that definitely belong in pkg/cloudprovider: Anything we actively call gcloud for in the current case should probably be in there, up to and including VM/MIG creation (VM creation is the most debatable, but it would allow k8s to do autoscaling eventually)
We need a separate library that handles the procedural generation of the configuration prior to pkg/cloudprovider, i.e. for "GCE/Salt", because right now pkg/cloudprovider is actually just IAAS, not IAAS x config. This could be a deployment library, it could be something else. (I know how this code is structured in GKE, and it's basically a pkg/deploy that would call into pkg/cloudprovider.

Additionally, I'm trying to figure out a path that also doesn't ditch a bunch of code along the way (the Go generating env variables part). That said, an alternative plan, is that we could just offer kubedeploy and say "Look, here's the new way." The Getting Started Guides for GCE and anyone else that came along would use it, and others would still use kube-up.sh and still others would be Getting Started Guides (Bring-Your-Own-Everything).

bgrant0607 · 2015-03-19T06:05:43Z

@zmerlynn I didn't really understand what problem(s) you were trying to address with your original proposal. The title of the issue was much more broad than the proposal, and there was no context with respect to previous proposals, such as #2303.

A single command with subcommands that consumes configuration would be superior to the hodgepodge of scripts we currently have -- a cluster deployment CLI. Hmm. That sounds familiar. However, does that facilitate solutions to the deeper problems, or is it just an independent improvement to UX?

Which configuration are you thinking? The overarching turnup workflow? Node OS configuration? Kubernetes binary configuration (command-line flags, etc.)?

zmerlynn · 2015-03-19T16:24:33Z

@bgrant0607: This is an improvement to UX, and it's the first UX that the end-user sees. That UX isn't kubectl, etc., today the first UX they see is either a hodge-podge Getting Started guide, or kube-up.sh, if they're lucky. If they run kube-up.sh, they get a wall of really ugly text output, and have about a 99% chance of creating a cluster (on GCE). Is that the first touch you want from OSS?

But no, this proposal isn't touching anything below the existing binary deployment/setup/installation/whatever you want to call it. If #2303 comes to full fruition, a chunk of this gets simpler. I consider that "far future", though.

a cluster deployment CLI. Hmm. That sounds familiar.

It should. I explicitly called out in the initial comment: "As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while." But right now, that code lives behind GKE API, which our CLI is hitting. We don't need the deployment-API complexity within k8s, but GKE would like to get to a place where we can share this code.

bgrant0607 · 2015-03-19T17:45:12Z

kubedeploy SGTM.

As for directory location, why not put the code parallel to kubectl directories, in cmd and pkg?

alex-mohr · 2015-03-19T21:42:13Z

Tagging this as milestone v1 as it seems like the most expedient path to getting upgrade working. But we need to be careful not to over-invest in refactoring that isn't in the critical v1 path?

jayunit100 · 2015-08-26T14:35:04Z

thanks for sharing this idea. the scripts do need to be cleaned up, is python on the table for this work? or is Go a hard requirement? Seems like a scripting language would more natural fit for the broader ops community.

justinsb · 2015-08-26T15:01:34Z

@jayunit100 I don't think anything is off the table (?) - certainly not from my point of view. My thought was that go is the language we're using elsewhere, and that we could eventually move/share code around (e.g. we might decide that the kube-controller-manager should launch / manage / scale the minions). I feel that I'm primarily doing an experiment into whether the idea of a program that generates scripts/templates works and I'd like to continue the experiment in go; if it works then we should have the discussion about python vs go vs X. And of course, if you wanted to conduct an experiment with python you should feel free - the more options we have the better!

mbforbes · 2015-08-26T16:14:56Z

I love Python (really!), but I will note that we've found a lot of value in keeping the language choice the same across different sections of the project. Sharing code and data structures is one benefit; having language consistency is also just nice as a dev. And I imagine this code would be built and maintained by the Kubernetes contributors and community, which is already all in Go.

zmerlynn · 2015-08-26T16:26:43Z

Also, keep in mind that "written in Go" really means "compiled and distributed as executable", in general, versus requiring an additional interpreter.

roberthbailey · 2015-08-26T16:58:32Z

I think @justinsb hit the nail on the head:

we could eventually move/share code around (e.g. we might decide that the kube-controller-manager should launch / manage / scale the minions).

This is the best reason to use Go. We are going to need to refactor the way that we interface with cloud providers, and having all of the code in a single language is going to significantly reduce the barriers to doing so. I'm hoping that we can start moving some of the launch/scale code into the master sooner rather than later (as an example).

pires · 2015-08-26T17:07:13Z

Scripting is evil. Even more evil than flags. Scripting killed my father. Jokes aside, if we want Kubernetes to become self-sufficient in terms of provisioning (scaling included) we definitely need to follow a Go path. After all, Kubernetes is built in Go.

That said, thank you @justinsb! I'm very interested in seeing something like your solution, but with the option to add other providers as well and integrate with master scaling features. I'm willing to contribute here as said in older comments in this thread.

Thanks @bgrant0607 for starting #12245.

bgrant0607 · 2015-08-27T01:55:42Z

I agree that shell scripts are evil, and that the mix of Salt, scripts, templating, etc. is inscrutable.

I am in favor of Go, for the reasons others mentioned.

In addition to #12245, #1553, and #1627, two more prerequisites for declarative self-hosting are underway: Deployment #1743 and Daemon #1518.

bgrant0607 · 2015-08-27T01:58:45Z

@jayunit100 More and more of our ops teams internally are switching from python to Go. Go provides a better foundation for incrementally improving the level of intelligence in the automation. What needs to be done manually today someone will want to automate in a quarter.

jayunit100 · 2015-08-27T13:10:01Z

Okay, now I think I understand the reason why this decision is being made : it will help to prevent bitrot and tie all the code together in a type-safe, and machine automatable way.

bgrant0607 · 2015-09-10T23:14:28Z

Update:

We have an HA configuration working, which is one prerequisite for full self-hosting of critical control-plane components #246.

Horizontal auto-scaling on GCE is working. Furthermore, at least on clouds / IaaS (e.g., GCE) and resource brokers (e.g. Mesos), I want to see Kubernetes driving provisioning of nodes, storage, etc. eventually. Cluster size definitely shouldn't be considered static configuration. Also, unlike ReplicationController, I expect clusters to be heterogeneous, so I don't think node count makes sense to be part of the desired state. It probably should be expressed in terms of other objectives, like utilization level, cluster and node oversubscription levels, max pod-creation burst size, etc.

We're now running a private registry: #1319

We're changing component configuration from flags to config files: #12245

We're working on a ConfigData API: #1553, #6477

We plan to use ConfigData to distribute component configuration: #1627

Both Deployment #1743 and DaemonSet #1518 APIs are underway.

kubectl apply #1702 is underway. Combined with Deployment, this will enable declarative updates of anything hosted by Kubernetes.

Deployment Manager integration #3685 is underway.

Proposal to rework Kubernetes deployment CLI #5472

Proposal to rework Kubernetes deployment CLI #5472

Comments

zmerlynn commented Mar 13, 2015

j3ffml commented Mar 13, 2015

satnam6502 commented Mar 13, 2015

justinsb commented Mar 15, 2015

zmerlynn commented Mar 15, 2015

derekwaynecarr commented Mar 15, 2015

j3ffml commented Mar 15, 2015

erictune commented Mar 16, 2015

erictune commented Mar 16, 2015

zmerlynn commented Mar 16, 2015

erictune commented Mar 16, 2015

zmerlynn commented Mar 16, 2015

zmerlynn commented Mar 16, 2015

mbforbes commented Mar 16, 2015

satnam6502 commented Mar 16, 2015

zmerlynn commented Mar 16, 2015

bgrant0607 commented Mar 17, 2015

zmerlynn commented Mar 18, 2015

bgrant0607 commented Mar 19, 2015

zmerlynn commented Mar 19, 2015

bgrant0607 commented Mar 19, 2015

alex-mohr commented Mar 19, 2015

jayunit100 commented Aug 26, 2015

justinsb commented Aug 26, 2015

mbforbes commented Aug 26, 2015

zmerlynn commented Aug 26, 2015

roberthbailey commented Aug 26, 2015

pires commented Aug 26, 2015

bgrant0607 commented Aug 27, 2015

bgrant0607 commented Aug 27, 2015

jayunit100 commented Aug 27, 2015

bgrant0607 commented Sep 10, 2015

chris-codaio commented Sep 25, 2015

jayunit100 commented Sep 28, 2015

JeanMertz commented Oct 5, 2015

davidopp commented Dec 15, 2015

fgrzadkowski commented Dec 15, 2015

roberthbailey commented Dec 15, 2015

zmerlynn commented Dec 15, 2015

davidopp commented Feb 19, 2016