-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to rework Kubernetes deployment CLI #5472
Comments
cc @satnam6502. Apologies to @satnamsingh who this probably does not interest :) |
After decades of using (and writing) hardware CAD tools I have developed a very very very high pain threshold. |
+1 from me (wearing my AWS-porting hat). Would we also replace the Salt code? I have heard that coordinating other clouds to do the port is a blocker. However, there is also a lot of work today for other clouds to keep up with the latest functionality. I'd much rather do a solid day of rewriting work on the AWS cluster scripts and then have the ongoing work be much easier, by virtue of eliminating copy-and-paste and making more use of strong-typing. |
My goal was to start at the top and work my way down. I think the more drastic unification comes when you can fold it into the Go provider and join the fold there, because presumably we can then start to use radical concepts like "libraries" and "common functions" between providers. I also think that once providers become enlightened, they can consider moving the receiving end of their deployment to something more interesting than bash, too. The work I did in #5119 is an example of moving the GCE deployment to a simplistic YAML based on environment variables, but this could also be a strongly typed Go object as well - I just didn't go that way because both the sender and receiver were bash. |
+1 to eliminate bash. Sent from my iPhone
|
+1 from me. The go -> bash translation sounds unpleasant but is probably the easiest way to bridge the existing mess with a better future. |
I'd be inclined to take a bottom-up approach instead. That is, look at each shell function that is called by This lets you work in small increments, rather than a big-bang PR that forces ever other cloud-provider to do a big rewrite, and would allow use to work on it in parallel -- maybe as part of a fixit day. |
I might call that "middle up", but you have a point. (The layering distinction depends on whether you cut the consumer at the node/salt/etc or not.) My initial proposal was the way it was to start out with a simplistic design. It wouldn't be hard to introduce a second interface that others could glom onto that was an intermediate building block interface, though. |
I think that there are a lot of individual functions in |
I think a lot of these fall out as you end up writing the libraries for enlightened providers, but the key is also that they don't exist today. I think it's actually somewhat difficult to see some of them apriori, but I'd love your input. |
(In particular, I'm worried that if we try to design it too much ahead of time, we'll end up with a structure we'll actually regret.) |
+me |
Just as we have washed our hands of SOAP I think we should stop rubbing salt into our wounds and see if we can devise a setup is that is... just... a... Go program... reclaiming all that misc ad hoc config into the more civilized world of first class programming. |
I was thinking of reviving DCERPC, or at least ASN.1. Or maybe we could just use turtles all the way down and wrap ASN.1 in JSON. |
As if we don't currently have a structure we regret? What we're not happy with:
Where I think we want to get to:
The real problem is the first bootstrap step: creating things that need to exist before the first node of the cluster can be created (and creating that first node). That really should be done by a controller, cron job, or shared service -- but Real Code (TM), not scripts. By definition, it needs to run outside the cluster, but it has to have credentials/access necessary to create the cluster. |
@bgrant0607: I actually don't disagree with most of what you said, but we might disagree on the timelines to get there. I think we've been bleeding long enough on the front, with no concrete definition of what a cluster is at initial deployment. Some of that definition is definitely going to change over time:
To expand on the current proposal more after thinking about it some (I'll edit the original comment if this seems reasonable):
Additionally, I'm trying to figure out a path that also doesn't ditch a bunch of code along the way (the Go generating env variables part). That said, an alternative plan, is that we could just offer |
@zmerlynn I didn't really understand what problem(s) you were trying to address with your original proposal. The title of the issue was much more broad than the proposal, and there was no context with respect to previous proposals, such as #2303. A single command with subcommands that consumes configuration would be superior to the hodgepodge of scripts we currently have -- a cluster deployment CLI. Hmm. That sounds familiar. However, does that facilitate solutions to the deeper problems, or is it just an independent improvement to UX? Which configuration are you thinking? The overarching turnup workflow? Node OS configuration? Kubernetes binary configuration (command-line flags, etc.)? |
@bgrant0607: This is an improvement to UX, and it's the first UX that the end-user sees. That UX isn't But no, this proposal isn't touching anything below the existing binary deployment/setup/installation/whatever you want to call it. If #2303 comes to full fruition, a chunk of this gets simpler. I consider that "far future", though.
It should. I explicitly called out in the initial comment: "As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while." But right now, that code lives behind GKE API, which our CLI is hitting. We don't need the deployment-API complexity within k8s, but GKE would like to get to a place where we can share this code. |
As for directory location, why not put the code parallel to |
Tagging this as milestone v1 as it seems like the most expedient path to getting upgrade working. But we need to be careful not to over-invest in refactoring that isn't in the critical v1 path? |
thanks for sharing this idea. the scripts do need to be cleaned up, is python on the table for this work? or is Go a hard requirement? Seems like a scripting language would more natural fit for the broader ops community. |
@jayunit100 I don't think anything is off the table (?) - certainly not from my point of view. My thought was that go is the language we're using elsewhere, and that we could eventually move/share code around (e.g. we might decide that the kube-controller-manager should launch / manage / scale the minions). I feel that I'm primarily doing an experiment into whether the idea of a program that generates scripts/templates works and I'd like to continue the experiment in go; if it works then we should have the discussion about python vs go vs X. And of course, if you wanted to conduct an experiment with python you should feel free - the more options we have the better! |
I love Python (really!), but I will note that we've found a lot of value in keeping the language choice the same across different sections of the project. Sharing code and data structures is one benefit; having language consistency is also just nice as a dev. And I imagine this code would be built and maintained by the Kubernetes contributors and community, which is already all in Go. |
Also, keep in mind that "written in Go" really means "compiled and distributed as executable", in general, versus requiring an additional interpreter. |
I think @justinsb hit the nail on the head:
This is the best reason to use Go. We are going to need to refactor the way that we interface with cloud providers, and having all of the code in a single language is going to significantly reduce the barriers to doing so. I'm hoping that we can start moving some of the launch/scale code into the master sooner rather than later (as an example). |
Scripting is evil. Even more evil than flags. Scripting killed my father. Jokes aside, if we want Kubernetes to become self-sufficient in terms of provisioning (scaling included) we definitely need to follow a Go path. After all, Kubernetes is built in Go. That said, thank you @justinsb! I'm very interested in seeing something like your solution, but with the option to add other providers as well and integrate with Thanks @bgrant0607 for starting #12245. |
I agree that shell scripts are evil, and that the mix of Salt, scripts, templating, etc. is inscrutable. I am in favor of Go, for the reasons others mentioned. In addition to #12245, #1553, and #1627, two more prerequisites for declarative self-hosting are underway: Deployment #1743 and Daemon #1518. |
@jayunit100 More and more of our ops teams internally are switching from python to Go. Go provides a better foundation for incrementally improving the level of intelligence in the automation. What needs to be done manually today someone will want to automate in a quarter. |
Okay, now I think I understand the reason why this decision is being made : it will help to prevent bitrot and tie all the code together in a type-safe, and machine automatable way. |
Update: We have an HA configuration working, which is one prerequisite for full self-hosting of critical control-plane components #246. Horizontal auto-scaling on GCE is working. Furthermore, at least on clouds / IaaS (e.g., GCE) and resource brokers (e.g. Mesos), I want to see Kubernetes driving provisioning of nodes, storage, etc. eventually. Cluster size definitely shouldn't be considered static configuration. Also, unlike ReplicationController, I expect clusters to be heterogeneous, so I don't think node count makes sense to be part of the desired state. It probably should be expressed in terms of other objectives, like utilization level, cluster and node oversubscription levels, max pod-creation burst size, etc. We're now running a private registry: #1319 We're changing component configuration from flags to config files: #12245 We're working on a ConfigData API: #1553, #6477 We plan to use ConfigData to distribute component configuration: #1627 Both Deployment #1743 and DaemonSet #1518 APIs are underway.
Deployment Manager integration #3685 is underway. See also #7459 (comment) |
FWIW, I think the use of most scripting here is a mistake. I've been trying to wade through this stuff for a few weeks now, and I've come to the conclusion that the only sane, sustainable, and maintainable option is a pure declarative config for the cluster setup itself. I would argue that for GCP & AWS, the IaaS provisioning system should use DeploymentManager & CloudFormation. For other systems, use some existing shim that does the same thing. The underlying OS should be initialized with cloud-config (i.e., make CoreOS the preferred option) ...no need for standalone scripts, salt, ansible, SSH, etc. It's also important to recognize that these deployments need to fit into existing infrastructure - don't assume that the kubernetes bring-up system (in whatever form) is deploying into a vacuum. There need to be hooks to plugin where necessary (e.g., which network in a GCE project or which VPC in an AWS project). |
Terraform is a possibility here also. |
I'm probably not saying anything new here, but looking at Nomad as one way to solve this would be a wise choice in my opinion. That is:
|
@mikedanese @fgrzadkowski Any thoughts on this issue? It's marked P1 and v1.2-candidate but seems like a long-term issue. It seems to be mostly about replacing our deployment shell scripts with go programs. |
My understanding is that we don't want to rewrite things like Ansible/cloud-init etc on our own in go. Instead we want to provide tools/scripts that will reuse existing technologies, but organize in a way that will be more composable and modular. In my opinion we should close this issue, wait for the proposal to be merged and then file specific issues once we actually know what we want to do. @roberthbailey @mikedanese Thoughts? |
I agree with @fgrzadkowski. We should find some pieces of #18287 that we can target for 1.2 instead. |
Move to close as well. The issue was probably targeting 1.2 as a holistic deployment piece rather than specific piece, and now that there are specific pieces, let's go with those. |
Reading through the most recent comments, it sounds like this issue is subsumed by |
The following is a proposal for reworking the existing
kube-up.sh
,kube-down.sh
, andkube-push.sh
to eventually be in Go for advanced cloud deployments, while allowing the existing shell investment to continue for a period of time.deploy/deploycmd/api
versionedCluster
object. This object holds all general configuration variables of the cluster and the add-on pods, plus IAAS relevant variables specific to each IAAS provider. (Issue to be filed soon to haggle over what's in these structures.)kubedeploy
command withup
,down
, andpush
verbs, each of which take a cluster YAML. Theup
andpush
verbs take a path (possibly optional in the case ofpush
) to the Kubernetes binaries to deploy.kubedeploy
only has to act as a YAML -> env variable translator. Re-write the existing shell scripts to use common env variables (they're actually all close, e.g.config-default.sh
, etc.). Thenkubedeploy
can take the given API object, spew a chunk of environment variables, and carefully execute the existing bash scripts (vigorous handwaving, but this part is not that technically challenging, just a hairy yak).As an added benefit, the last bullet gives us a place to start upstreaming GKE cluster deployment code, which has been a desire for the GKE team for a while. Oh, and @satnam6502 will probably love it for scalability, because I think he gouges his eyes out every time he deals with the bash in
util.sh
.This will also give us a place for
kubedeploy upgrade
in the fullness of time, which is a distinct use case frompush
.Thoughts?
cc @jlowdermilk, @brendandburns, @alex-mohr, and anyone else that cares.
The text was updated successfully, but these errors were encountered: