Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to rework Kubernetes deployment CLI #5472

Closed
4 tasks
zmerlynn opened this issue Mar 13, 2015 · 71 comments
Closed
4 tasks

Proposal to rework Kubernetes deployment CLI #5472

zmerlynn opened this issue Mar 13, 2015 · 71 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@zmerlynn
Copy link
Member

The following is a proposal for reworking the existing kube-up.sh, kube-down.sh, and kube-push.sh to eventually be in Go for advanced cloud deployments, while allowing the existing shell investment to continue for a period of time.

  • Define a deploy/deploycmd/api versioned Cluster object. This object holds all general configuration variables of the cluster and the add-on pods, plus IAAS relevant variables specific to each IAAS provider. (Issue to be filed soon to haggle over what's in these structures.)
  • Create a kubedeploy command with up, down, and push verbs, each of which take a cluster YAML. The up and push verbs take a path (possibly optional in the case of push) to the Kubernetes binaries to deploy.
  • The initial implementation of kubedeploy only has to act as a YAML -> env variable translator. Re-write the existing shell scripts to use common env variables (they're actually all close, e.g. config-default.sh, etc.). Then kubedeploy can take the given API object, spew a chunk of environment variables, and carefully execute the existing bash scripts (vigorous handwaving, but this part is not that technically challenging, just a hairy yak).
  • We can then define the interfaces in Go for what an "enlightened" deployment looks like, and in the process try to transition the GCE cloud provider over (keeping an eye on, say, Vagrant as the N=2 case.)

As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while. Oh, and @satnam6502 will probably love it for scalability, because I think he gouges his eyes out every time he deals with the bash in util.sh.

This will also give us a place for kubedeploy upgrade in the fullness of time, which is a distinct use case from push.

Thoughts?

cc @jlowdermilk, @brendandburns, @alex-mohr, and anyone else that cares.

@zmerlynn zmerlynn added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Mar 13, 2015
@zmerlynn zmerlynn self-assigned this Mar 13, 2015
@j3ffml
Copy link
Contributor

j3ffml commented Mar 13, 2015

cc @satnam6502. Apologies to @satnamsingh who this probably does not interest :)

@satnam6502
Copy link
Contributor

After decades of using (and writing) hardware CAD tools I have developed a very very very high pain threshold.

@satnam6502 satnam6502 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 14, 2015
@justinsb
Copy link
Member

+1 from me (wearing my AWS-porting hat). Would we also replace the Salt code?

I have heard that coordinating other clouds to do the port is a blocker. However, there is also a lot of work today for other clouds to keep up with the latest functionality. I'd much rather do a solid day of rewriting work on the AWS cluster scripts and then have the ongoing work be much easier, by virtue of eliminating copy-and-paste and making more use of strong-typing.

@zmerlynn
Copy link
Member Author

My goal was to start at the top and work my way down. kubedeploy would first do something like an object -> env translator, so with the very first cut for "unenlightened" cloud providers, you'd end up with something that almost acted how ssh-agent acts when you run it: it spews the appropriate bash environment variables that would drive the pre-existing scripts. This is simply a bridge, and to keep my sanity. It would eliminate pieces of the existing aws directory, I'm sure, and unify a lot of the environment variables between the different scripts, but I'm not sure it's a drastic refactor yet (but it's hard to project ahead until I see the code ahead).

I think the more drastic unification comes when you can fold it into the Go provider and join the fold there, because presumably we can then start to use radical concepts like "libraries" and "common functions" between providers.

I also think that once providers become enlightened, they can consider moving the receiving end of their deployment to something more interesting than bash, too. The work I did in #5119 is an example of moving the GCE deployment to a simplistic YAML based on environment variables, but this could also be a strongly typed Go object as well - I just didn't go that way because both the sender and receiver were bash.

@derekwaynecarr
Copy link
Member

+1 to eliminate bash.
+1 to eliminate salt soon after.

Sent from my iPhone

On Mar 14, 2015, at 9:44 PM, Zach Loafman notifications@github.com wrote:

My goal was to start at the top and work my way down. kubedeploy would first do something like an object -> env translator, so with the very first cut for "unenlightened" cloud providers, you'd end up with something that almost acted how ssh-agent acts when you run it: it spews the appropriate bash environment variables that would drive the pre-existing scripts. This is simply a bridge, and to keep my sanity. It would eliminate pieces of the existing aws directory, I'm sure, and unify a lot of the environment variables between the different scripts, but I'm not sure it's a drastic refactor yet (but it's hard to project ahead until I see the code ahead).

I think the more drastic unification comes when you can fold it into the Go provider and join the fold there, because presumably we can then start to use radical concepts like "libraries" and "common functions" between providers.

I also think that once providers become enlightened, they can consider moving the receiving end of their deployment to something more interesting than bash, too. The work I did in #5119 is an example of moving the GCE deployment to a simplistic YAML based on environment variables, but this could also be a strongly typed Go object as well - I just didn't go that way because both the sender and receiver were bash.


Reply to this email directly or view it on GitHub.

@j3ffml
Copy link
Contributor

j3ffml commented Mar 15, 2015

+1 from me. The go -> bash translation sounds unpleasant but is probably the easiest way to bridge the existing mess with a better future.

@erictune
Copy link
Member

I'd be inclined to take a bottom-up approach instead. That is, look at each shell function that is called by kube-up, and see if it makes sense to replace it with a small go program.

This lets you work in small increments, rather than a big-bang PR that forces ever other cloud-provider to do a big rewrite, and would allow use to work on it in parallel -- maybe as part of a fixit day.

@erictune
Copy link
Member

@dchen1107

@zmerlynn
Copy link
Member Author

I might call that "middle up", but you have a point. (The layering distinction depends on whether you cut the consumer at the node/salt/etc or not.)

My initial proposal was the way it was to start out with a simplistic design. It wouldn't be hard to introduce a second interface that others could glom onto that was an intermediate building block interface, though.

@erictune
Copy link
Member

I think that there are a lot of individual functions in kube-up that don't belong in kube-up or a kubedeploy. They belong in kubectl, addons, or pkg/cloudprovider. I figured move as many as possible to those places before writing kubedeploy. I'll try to file specific issues for the things I have in mind.

@zmerlynn
Copy link
Member Author

I think a lot of these fall out as you end up writing the libraries for enlightened providers, but the key is also that they don't exist today. I think it's actually somewhat difficult to see some of them apriori, but I'd love your input.

@zmerlynn
Copy link
Member Author

(In particular, I'm worried that if we try to design it too much ahead of time, we'll end up with a structure we'll actually regret.)

@mbforbes
Copy link
Contributor

+me

@satnam6502
Copy link
Contributor

Just as we have washed our hands of SOAP I think we should stop rubbing salt into our wounds and see if we can devise a setup is that is... just... a... Go program... reclaiming all that misc ad hoc config into the more civilized world of first class programming.

@zmerlynn
Copy link
Member Author

I was thinking of reviving DCERPC, or at least ASN.1. Or maybe we could just use turtles all the way down and wrap ASN.1 in JSON.

@bgrant0607
Copy link
Member

As if we don't currently have a structure we regret?

What we're not happy with:

  • Salt, bash, bash that generates Salt, Go that generates bash that generates Salt, ...
  • Env vars plumbed through said scripts as the configuration mechanism
  • Fragile code that breaks continuously and/or leaks cloud resources
  • Intertwined code for cluster provisioning, OS configuration, network configuration, master deployment, etc.
  • Unprincipled division of functionality between /cluster and /cloudprovider
  • Lots of copy/paste and/or manual work to port to new combinations of OS distributions + cloud providers + network setups + update machinery
  • Most platforms are "supported" via DIY-flavor "getting-started guides" and/or blog posts

Where I think we want to get to:

  • Minimization of things that need to be done before the first node of a cluster is created.
  • Minimization of code that differs between environments (cloud providers, etc.).
  • Minimize differences between systemd- and init-based distributions.
  • It should be possible to start from a preconfigured image, with no post-facto OS image configuration required. We'd need some spec of what such an image should look like.
  • Users should be able to use whatever tool they please (Salt, Ansible, Puppet, Chef, Bosh, provider-specific image updater, etc.) for post-turnup image updates.
  • More standardized approach to network configuration.
  • Master node(s) should be configured identically to minion nodes.
  • Cloud-provider-specific functions moved behind a more comprehensive and principled cloudprovider API, in such a way that Kubernetes itself can invoke the functionality to automatically provision resources on demand, as necessary. Ideally this would build upon a library that's not specific to just Kubernetes Make it possible to build a cloudprovider outside the kubernetes repo and run on stock kubernetes #2770.
  • Faux cloudprovider for bare-metal installations. The best example at the moment would be CoreOS-based approaches using fleet, kube-register, etc.
  • All Kubernetes components (esp. master components and addons, but even Kubelet, IMO) should be able to be launched, updated, and configured using Kubernetes (i.e., self-hosted).
  • Node auto-configuration: Kubelet should configure Docker (or whatever container runtime), networking, etc. on the node using plugins.
  • Node self-registration.

The real problem is the first bootstrap step: creating things that need to exist before the first node of the cluster can be created (and creating that first node). That really should be done by a controller, cron job, or shared service -- but Real Code (TM), not scripts. By definition, it needs to run outside the cluster, but it has to have credentials/access necessary to create the cluster.

@zmerlynn
Copy link
Member Author

@bgrant0607: I actually don't disagree with most of what you said, but we might disagree on the timelines to get there. I think we've been bleeding long enough on the front, with no concrete definition of what a cluster is at initial deployment. Some of that definition is definitely going to change over time:

  • I'd love to get to a world where the master is configured identical to nodes. This is actually more true today than it was before (breaking the master/minion relationship in Salt helped), but there's a lot of both deployment related pieces and k8s related pieces that need to happen first.
  • The self-hosting world seems not too far off, except for Kubelet. As someone eyeing upgrade, I'm very interested in this space, but Kubelet also seems like the nut to crack, too, because it has tendrils everywhere else in your list, in particular, things like: Upgrade of k8s components, node image upgrade, etc. So that timeline being farther out means we're "stuck" here for a little bit supporting particular forms. That's actually fine. At the contributors conference, for upgrade, we basically punted and said "look, we're going to support binary upgrade however you want within reason, but there's versioning health requirements we need to implement (Cluster versioning #4855)). This proposal is actually talking about a step beyond that that would allow specific providers to implement installation and upgrading specific mechanisms (GCE/SaltStack, etc.)
  • That said, there's things on that list if I don't get, like node self-registration, and better networking (in particular, fixing CIDR allocation for cbr0 style allocations), I'm probably going to cry. These are both necessary for autoscaling to work in GCE Managed Instance Groups. But that's slightly off-vector from this (but not orthogonal, in that it's hitting the same code, certainly).

To expand on the current proposal more after thinking about it some (I'll edit the original comment if this seems reasonable):

  • There are pieces that definitely belong in pkg/cloudprovider: Anything we actively call gcloud for in the current case should probably be in there, up to and including VM/MIG creation (VM creation is the most debatable, but it would allow k8s to do autoscaling eventually)
  • We need a separate library that handles the procedural generation of the configuration prior to pkg/cloudprovider, i.e. for "GCE/Salt", because right now pkg/cloudprovider is actually just IAAS, not IAAS x config. This could be a deployment library, it could be something else. (I know how this code is structured in GKE, and it's basically a pkg/deploy that would call into pkg/cloudprovider.

Additionally, I'm trying to figure out a path that also doesn't ditch a bunch of code along the way (the Go generating env variables part). That said, an alternative plan, is that we could just offer kubedeploy and say "Look, here's the new way." The Getting Started Guides for GCE and anyone else that came along would use it, and others would still use kube-up.sh and still others would be Getting Started Guides (Bring-Your-Own-Everything).

@bgrant0607
Copy link
Member

@zmerlynn I didn't really understand what problem(s) you were trying to address with your original proposal. The title of the issue was much more broad than the proposal, and there was no context with respect to previous proposals, such as #2303.

A single command with subcommands that consumes configuration would be superior to the hodgepodge of scripts we currently have -- a cluster deployment CLI. Hmm. That sounds familiar. However, does that facilitate solutions to the deeper problems, or is it just an independent improvement to UX?

Which configuration are you thinking? The overarching turnup workflow? Node OS configuration? Kubernetes binary configuration (command-line flags, etc.)?

@zmerlynn zmerlynn changed the title Proposal to rework Kubernetes deployment Proposal to rework Kubernetes deployment CLI Mar 19, 2015
@zmerlynn
Copy link
Member Author

@bgrant0607: This is an improvement to UX, and it's the first UX that the end-user sees. That UX isn't kubectl, etc., today the first UX they see is either a hodge-podge Getting Started guide, or kube-up.sh, if they're lucky. If they run kube-up.sh, they get a wall of really ugly text output, and have about a 99% chance of creating a cluster (on GCE). Is that the first touch you want from OSS?

But no, this proposal isn't touching anything below the existing binary deployment/setup/installation/whatever you want to call it. If #2303 comes to full fruition, a chunk of this gets simpler. I consider that "far future", though.

a cluster deployment CLI. Hmm. That sounds familiar.

It should. I explicitly called out in the initial comment: "As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while." But right now, that code lives behind GKE API, which our CLI is hitting. We don't need the deployment-API complexity within k8s, but GKE would like to get to a place where we can share this code.

@bgrant0607
Copy link
Member

kubedeploy SGTM.

As for directory location, why not put the code parallel to kubectl directories, in cmd and pkg?

@alex-mohr alex-mohr added this to the v1.0 milestone Mar 19, 2015
@alex-mohr
Copy link
Contributor

Tagging this as milestone v1 as it seems like the most expedient path to getting upgrade working. But we need to be careful not to over-invest in refactoring that isn't in the critical v1 path?

@jayunit100
Copy link
Member

thanks for sharing this idea. the scripts do need to be cleaned up, is python on the table for this work? or is Go a hard requirement? Seems like a scripting language would more natural fit for the broader ops community.

@justinsb
Copy link
Member

@jayunit100 I don't think anything is off the table (?) - certainly not from my point of view. My thought was that go is the language we're using elsewhere, and that we could eventually move/share code around (e.g. we might decide that the kube-controller-manager should launch / manage / scale the minions). I feel that I'm primarily doing an experiment into whether the idea of a program that generates scripts/templates works and I'd like to continue the experiment in go; if it works then we should have the discussion about python vs go vs X. And of course, if you wanted to conduct an experiment with python you should feel free - the more options we have the better!

@mbforbes
Copy link
Contributor

I love Python (really!), but I will note that we've found a lot of value in keeping the language choice the same across different sections of the project. Sharing code and data structures is one benefit; having language consistency is also just nice as a dev. And I imagine this code would be built and maintained by the Kubernetes contributors and community, which is already all in Go.

@zmerlynn
Copy link
Member Author

Also, keep in mind that "written in Go" really means "compiled and distributed as executable", in general, versus requiring an additional interpreter.

@roberthbailey
Copy link
Contributor

I think @justinsb hit the nail on the head:

we could eventually move/share code around (e.g. we might decide that the kube-controller-manager should launch / manage / scale the minions).

This is the best reason to use Go. We are going to need to refactor the way that we interface with cloud providers, and having all of the code in a single language is going to significantly reduce the barriers to doing so. I'm hoping that we can start moving some of the launch/scale code into the master sooner rather than later (as an example).

@pires
Copy link
Contributor

pires commented Aug 26, 2015

Scripting is evil. Even more evil than flags. Scripting killed my father. Jokes aside, if we want Kubernetes to become self-sufficient in terms of provisioning (scaling included) we definitely need to follow a Go path. After all, Kubernetes is built in Go.

That said, thank you @justinsb! I'm very interested in seeing something like your solution, but with the option to add other providers as well and integrate with master scaling features. I'm willing to contribute here as said in older comments in this thread.

Thanks @bgrant0607 for starting #12245.

@bgrant0607
Copy link
Member

I agree that shell scripts are evil, and that the mix of Salt, scripts, templating, etc. is inscrutable.

I am in favor of Go, for the reasons others mentioned.

In addition to #12245, #1553, and #1627, two more prerequisites for declarative self-hosting are underway: Deployment #1743 and Daemon #1518.

@bgrant0607
Copy link
Member

@jayunit100 More and more of our ops teams internally are switching from python to Go. Go provides a better foundation for incrementally improving the level of intelligence in the automation. What needs to be done manually today someone will want to automate in a quarter.

@jayunit100
Copy link
Member

Okay, now I think I understand the reason why this decision is being made : it will help to prevent bitrot and tie all the code together in a type-safe, and machine automatable way.

@bgrant0607
Copy link
Member

Update:

We have an HA configuration working, which is one prerequisite for full self-hosting of critical control-plane components #246.

Horizontal auto-scaling on GCE is working. Furthermore, at least on clouds / IaaS (e.g., GCE) and resource brokers (e.g. Mesos), I want to see Kubernetes driving provisioning of nodes, storage, etc. eventually. Cluster size definitely shouldn't be considered static configuration. Also, unlike ReplicationController, I expect clusters to be heterogeneous, so I don't think node count makes sense to be part of the desired state. It probably should be expressed in terms of other objectives, like utilization level, cluster and node oversubscription levels, max pod-creation burst size, etc.

We're now running a private registry: #1319

We're changing component configuration from flags to config files: #12245

We're working on a ConfigData API: #1553, #6477

We plan to use ConfigData to distribute component configuration: #1627

Both Deployment #1743 and DaemonSet #1518 APIs are underway.

kubectl apply #1702 is underway. Combined with Deployment, this will enable declarative updates of anything hosted by Kubernetes.

Deployment Manager integration #3685 is underway.

See also #7459 (comment)

cc @fgrzadkowski

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 10, 2015
@bgrant0607 bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015
@chris-codaio
Copy link
Contributor

FWIW, I think the use of most scripting here is a mistake.

I've been trying to wade through this stuff for a few weeks now, and I've come to the conclusion that the only sane, sustainable, and maintainable option is a pure declarative config for the cluster setup itself.

I would argue that for GCP & AWS, the IaaS provisioning system should use DeploymentManager & CloudFormation. For other systems, use some existing shim that does the same thing.

The underlying OS should be initialized with cloud-config (i.e., make CoreOS the preferred option) ...no need for standalone scripts, salt, ansible, SSH, etc.

It's also important to recognize that these deployments need to fit into existing infrastructure - don't assume that the kubernetes bring-up system (in whatever form) is deploying into a vacuum. There need to be hooks to plugin where necessary (e.g., which network in a GCE project or which VPC in an AWS project).

@jayunit100
Copy link
Member

Terraform is a possibility here also.

@JeanMertz
Copy link

I'm probably not saying anything new here, but looking at Nomad as one way to solve this would be a wise choice in my opinion.

That is:

  • single binary (for both master and workers (and client?))
  • declarative configuration on how the cluster should behave
  • -dev command for single local master+worker instance

@davidopp
Copy link
Member

@mikedanese @fgrzadkowski Any thoughts on this issue? It's marked P1 and v1.2-candidate but seems like a long-term issue. It seems to be mostly about replacing our deployment shell scripts with go programs.

@fgrzadkowski
Copy link
Contributor

My understanding is that we don't want to rewrite things like Ansible/cloud-init etc on our own in go. Instead we want to provide tools/scripts that will reuse existing technologies, but organize in a way that will be more composable and modular. In my opinion we should close this issue, wait for the proposal to be merged and then file specific issues once we actually know what we want to do.

@roberthbailey @mikedanese Thoughts?

@roberthbailey
Copy link
Contributor

I agree with @fgrzadkowski. We should find some pieces of #18287 that we can target for 1.2 instead.

@zmerlynn
Copy link
Member Author

Move to close as well. The issue was probably targeting 1.2 as a holistic deployment piece rather than specific piece, and now that there are specific pieces, let's go with those.

@davidopp
Copy link
Member

Reading through the most recent comments, it sounds like this issue is subsumed by
https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/cluster-deployment.md
so I'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests