Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatically simplify Kubernetes deployment #2303

Closed
22 tasks
jbeda opened this issue Nov 11, 2014 · 23 comments
Closed
22 tasks

Dramatically simplify Kubernetes deployment #2303

jbeda opened this issue Nov 11, 2014 · 23 comments
Labels
area/build-release priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@jbeda
Copy link
Contributor

jbeda commented Nov 11, 2014

We need to dramatically simplify deploying Kubernetes.

Task list to reduce/eliminate the bash/salt necessary for getting a cluster up and running:

  • Provide an 'all-in-one' binary for server components. This was started in Create a standalone k8s binary, capable of running a full cluster #2121 but we should let users specify individual components (i.e. kube-proxy) or a batch of servers (node, master)
    • Integrate etcd so that it isn't necessary to download/start separately. Users should be able to easily turn this off and use an existing etcd.
    • Integrate flannel so that there is a networking solution that works out of the box. This eliminates the need to statically assign container subnets to nodes.
    • Integrate flannel-route-manager so that more advanced configurations (GCE) are supported easily.
  • Allow for kubelets to securely nominate themselves to join the cluster
  • Implement the bootstrap API. (Defined below but should be broken out into a new issue).
    • Finish definition of the bootstrap API
    • Implement a new binary/mode for serving the bootstrap API
    • Host the bootstrap API on k8s.io
    • Have the unified binary race to be the master via the bootstrap API. Have it publish it's location.
    • Have the minion use the bootstrap API to find the api server and self-register
    • Have kubectl use the bootstrap API to find the api server (and CA for the api server).
  • Reduce/eliminate command line flags. These don't work well in a clustered environment. Thoughts in Store master and node config in API #1627.
    • Have initial config of cluster come from bootstrap API
  • Recast add-ons as post-cluster deploy scripts/tools that run on top of kubernetes. Try to eliminate any node configuration that isn't done through the kubernetes API
    • Node monitoring
    • Cluster monitoring
    • Node logging
    • DNS

cc: @eparis @smarterclayton

The end result is that we'd love for something like this to work:

Once per cluster start up:

$ kubernetes cluster-create
Creating new cluster via https://k8s.io/
Cluster ID: f95e695d6eac75b7
Admin certificate saved to ~/.k8s/f95e695d6eac75b7/admin.crt
Admin key saved to ~/.k8s/f95e695d6eac75b7/admin.key

Launch kubernetes on multiple machines with:
  kubernetes --bootstrap=https://k8s.io/k/f95e695d6eac75b7 --bootstrap-key=041fb4965cd31be8

And then on every node in the cluster:

kubernetes --bootstrap=https://k8s.io/k/f95e695d6eac75b7 --bootstrap-key=041fb4965cd31be8
docker -d $(cat /var/run/kube-docker-flags)

And then from the client, you can use the same URL to auth to the servers as an 'admin' user. It'll use the bootstrap url to get admin keys, certs and such. If you connect to a mode and it isn't the master, it'll return a redirect to who is currently the master. This way updating the bootstrap URL can lag actual master election.

kubectl --bootstrap=https://k8s.io/k/f95e695d6eac75b7 list pods
@smarterclayton
Copy link
Contributor

----- Original Message -----

We need to dramatically simplify deploying Kubernetes.

cc: @eparis @smarterclayton

Obvious things to think about:

  • Reduce the number of server side binaries to 1 that can morph to take on
    (perhaps multiple) roles. @brendandburns already started this with Create a standalone k8s binary, capable of running a full cluster #2121.
    • Users could still run an 'exploded' set of servers (and we would test
      this to keep ourselves honest) but the common case for small clusters
      would be a single binary.
    • Consider binding in etcd and flannel into the combined binary also.

Run it in a Docker container - I think we can make everything we need work with:

$ docker run -v /var/run/docker.sock:/var/run/docker.sock --net=host --privileged openshift/origin start

(that's kubelet plus master all in one, I don't know what in the container would block us).

  • Reduce the amount of data that needs to be populated across the cluster to
    get stuff up and running. Probably define a 'cluster bootstrap' API that
    both provides for a secret and a simple API endpoint used start the cluster.
    This has been called a 'cluster discovery API' or a 'rally point' in the
    past and is similar to what etcd does and what Docker is planning for
    cluster.
    • Open issue for how much logic we put there.
    • Could be used to simplify/bootstrap auth
    • Allocating new clusters should be doable via the API w/ no auth too
    • Implementation of the API should be part of k8s and runnable in a
      container
    • Host a version of this on https://bootstrap.k8s.io.
  • Rethink about how nodes are registered with the master. In addition to the
    explicit registration that we have today (where some already trusted
    user/component/server does the registration) we could either:
    • Run in a 'wide open' mode where anyone coming from a whitelisted CIDR is
      accepted as a node.
    • Set up a pattern where we get enough auth to the nodes (via the bootstrap
      API or other means) so that the nodes have permission to 'self register'.

With scoped tokens we can do "this token gives you the right to identify yourself as X and set your own minion description, as well as watch for pods under X". Unfortunately you'd have to have one token per minion there so it's easy. So some sort of back-and-forth bootstrapper seems inevitable: "client: hey, I'm X", "server: I see you at X, here's your token to create yourself".

As much as possible our ops team(s) wants to be able to push a button to provision new infra and have it self join, and are usually willing to embed whatever secrets are needed to make that easy into the image or image template.

The end result is that we'd love for something like this to work:

Once per cluster start up:

$ KUBE_BOOTSTRAP=$(curl -s https://bootstrap.k8s.io/v1/new)
$ echo $KUBE_BOOTSTRAP
https://bootsrap.k8s.io/v1/cluster/f95e695d6eac75b78743092a99fcc7de/

And then on every node in the cluster:

kube-cluster --bootstrap=${KUBE_BOOTSTRAP}
docker -d $(cat /var/run/kube-docker-flags)

And then from the client, you can use the same (or perhaps related, derived?)
URL to auth to the servers. It'll use the discovery url to get admin keys,
certs and such. Connecting to any node will forward to the master.

kubectl --bootstrap=${KUBE_BOOTSTRAP} list pods

If you are running on a different network you can override the host that
kubectl talks to yet still use the bootstrap to get auth data. This could
eliminate the the hackery around the HTTP password and using SSH to get
certs and such.

@jbeda
Copy link
Contributor Author

jbeda commented Nov 11, 2014

I think that we'll want to have a scalable set of choices -- from having a shared token across all minions that lets them join to allowing for per-minion tokens that are much tricker to distribute but also more secure.

Another option would be to borrow the salt model:

  • A node uses the bootstrap API to find the master.
  • The node then sends a request to the master with something like {client cert, name, IP}. This sits in an async queue.
  • The master then decides to either approve/reject that node. This could be done via policy (accept everyone!), be done by the cloud provider or done by hand. The cert could also be pre-approved before the node ever asks. If we want to get really fancy we could do some sort of PKI with signing and stuff based on IP but I'm not sure that is worth the complexity.
  • The node now has permission to register itself and watch/modify what it needs to do.

In this case, the bootstrap API is purely for discovery and not for auth bootstrapping.

@stp-ip
Copy link
Member

stp-ip commented Nov 12, 2014

I dislike the non-timeout nature of the shared token. In my opinion it eases deployment considerably, but the attack vector of only needing this one secret to get into the cluster is not dealt with during the later lifetime of the cluster.
My suggestion would be to use time sensitive shared or per-minion tokens.
You could setup a token to be valid for 24 hours and bake it into the latest hardware to be deployed. After 24 hours you can be sure that at least the attack window is done and perhaps can verify that only your hardware was added.
Perhaps make it possible to list all registered nodes for a token + the total number, which could be used to check against hardware nodes deployed in these 24 hours.
Nothing fancy, but at least it's simple and in my opinion reduces the attack surface.

@smarterclayton
Copy link
Contributor

It seems like it would be fairly easy for an ops team to rotate that token every X hours via a script against their IaaS, and maybe have a window where they overlap (valid for 9 hours, regenerate every 8).

----- Original Message -----

I dislike the non-timeout nature of the shared token. In my opinion it eases
deployment considerably, but the attack vector of only needing this one
secret to get into the cluster is not dealt with during the later lifetime
of the cluster.
My suggestion would be to use time sensitive shared or per-minion tokens.
You could setup a token to be valid for 24 hours and bake it into the latest
hardware to be deployed. After 24 hours you can be sure that at least the
attack window is done and perhaps can verify that only your hardware was
added.
Perhaps make it possible to list all registered nodes for a token + the total
number, which could be used to check against hardware nodes deployed in
these 24 hours.
Nothing fancy, but at least it's simple and in my opinion reduces the attack
surface.


Reply to this email directly or view it on GitHub:
#2303 (comment)

@jbeda
Copy link
Contributor Author

jbeda commented Nov 12, 2014

I'm leaning more and more to having this bootstrap API not be used for widespread identity or auth but rather for cluster metadata and simple auth. We will need auth to the bootstrap API itself.

Quick spec that could work for bootstrap service.

Kubernetes Cluster Bootstrap API

When bootstrapping up a cluster, there are two things that we need to do: Identify the master of the cluster (for both worker nodes and clients) and establish the 'admin' credentials for administering the cluster.

The Kubernetes Cluster Bootstrap API is optional -- it is possible to get clusters started without using this API. It is also possible to easily run this API inside of a constrained private network.

The basic flow -- some of these steps are optional to further lock down the security of the cluster.

  1. The user creates a new cluster. This is typically an unauthenticated operation. As a result of creating the cluster, the user gets back a set of information:
    • A cluster-id that is used as part of the cluster-bootstrap-url. This should be guarded but is not super sensitive.
    • A bootstrap-auth-token. This is a single use token. It can be used to establish a single writer into the bootstrap API. It is typically time bounded (1 hour?)
    • Credentials for the cluster admin account. This includes a public cert and a private key used for TLS client auth.
  2. The user starts up a set of servers and gives them the cluster-bootstrap-url along with the bootstrap-auth-token.
  3. The servers boot up and race to see who can claim to be master. They do this using the bootstrap-auth-token. Only one wins and writes its addresses to the bootstrap API.
  4. Other servers that don't win the race use the bootstrap API to find the master. They then register themselves with the master. If the master automatically accepts those workers or not is a matter of policy out of scope of the bootstrap API.
  5. The server that won the master race can add more 'writer' keys to the bootstrap API. That way if the master dies a new master can be elected and update the bootstrap API.
  6. The user uses a Kubernetes client (kubectl) to talk to the cluster. They can either specify the appropriate config data directly or can simply specify the cluster-bootstrap-url. kubectl will use the admin key with client TLS to identify itself to the master.
  7. From here more scoped (password?) auth can be configured and the cluster can be further configured.

Further notes:

  • The user can create/supply their own admin public cert when creating the cluster. Or they can opt out of using the bootstrap API to set up the admin credentials altogether. The admin cert can be cleared at any time.
  • The IP range that the master claims can be restricted to a known good range.

API Definition

  • <base> must be over TLS. We would host a public one on https://k8s.io.
  • Auth:
    • All auth is done via client TLS certs. There are three levels of auth:
      • none -- this method/endpoint is publicly accessible. The random cluster-id is the only protection.
      • bootstrap-cert -- this method/endpoint is accessible to any cert in the bootstrap-cert list or the admin-cert.
      • admin-cert -- this method/endpoint is only accessible to the admin-cert. If there is no admin-cert or the admin-cert is lost, this can never be modified or accessed.
  • <base>/k/
    • POST: creates a new cluster bootstrap endpoint.
      • Auth: none
      • Post body:
        • admin client options -- NONE, CREATE, or specify a cert
        • bootstrap cert -- list of certificates that can be used to write to the bootstrap API.
        • bootstrap token request -- true (with time limit) if a bootstrap token should be returned.
        • master IP range -- the master IP must be confined to this range.
      • Response:
        • cluster id -- long random string
        • cluster url -- <base>/k/<cluster-id>
        • bootstrap token
        • The public cert and private
  • <base>/k/<cluster-id>/bootstrap-certs This is the list of TLS client certs that are allowed to write to the bootstrap API endpoint.
    • GET: returns the list of bootstrap-certs along with the time and IP address that added them.
      • Auth: bootstrap-cert
    • PUT - updates the list of bootstrap-certs. Users must use an ETag to ensure that concurrent writes aren't clobbered. The admin-cert cannot be modified with this method.
      • Auth: bootstrap-cert
  • <base>/k/<cluster-id>/claim-token Used to add an entry to the bootstrap-cert list via a token.
    • POST
      • Auth: None, but must present valid one use token
      • Returns: Nothing, 200 if everything is good.
  • <base>/k/<cluster-id>/mint-token Create a new one use token that can be used to add a bootstrap-cert
    • POST
      • Auth: bootstrap-cert
      • Post Body: the time limit. Up to 24 hours.
      • Response: The token.
  • <base>/cluster-meta/<cluster-id>/admin-cert
    • GET: Gets the public key used for the admin account.
      • Auth: bootstrap-cert
    • PUT: Sets a new admin key. Can be cleared completely in which case there is no admin key.
      • Auth: admin-cert
  • <base>/cluster-meta/<cluster-id>/master. This is used to get/set the current master for the cluster.
    • This resource includes:
      • The current IP of the master. This is what other nodes in the cluster should use.
      • Preferred IPs or DNS names for clients. This is a priority list from most likely to least likely.
      • The CA cert used for server identification. If blank, then any system trusted CA would suffice.
    • GET: Gets the current master.
      • Auth: none
    • PUT: Sets the master.
      • Auth: bootstrap-cert
      • Optimistic concurrency here -- the user must use ETag to make sure that they aren't clobbering others in a start up race.

@jbeda
Copy link
Contributor Author

jbeda commented Nov 13, 2014

Other previous work here is the etcd discovery protocol: https://github.com/coreos/etcd/blob/master/Documentation/discovery-protocol.md

@bgrant0607
Copy link
Member

Question: Are you thinking we'd run pods on the master node just like on any other node?

One issue users have raised is that in a small cluster the master has minimal requirements, whereas the other nodes need to accommodate the requirements of the application being run on the cluster, and even single-digit numbers of pods may have significant cpu, memory, and/or flash requirements.

@jbeda
Copy link
Contributor Author

jbeda commented Nov 20, 2014

@bgrant0607 Yes -- I want to enable a mode where you have 1-5 machines and just want to kick the tires. It should be dead simple to get stuff running in that situation. Download/install one binary/package and copy/paste around a super minimal amount of info.

@goltermann goltermann added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 3, 2014
@smarterclayton
Copy link
Contributor

Has the discussion from the face-to-face been reflected in an issue? If so can we link it here?

@jbeda
Copy link
Contributor Author

jbeda commented Dec 15, 2014

I'm going to pick this stuff up again this week and break it down into a bunch of sub-items. I think that @erictune and @kelseyhightower were going to write some stuff up. I'm happy to get on that though.

@kapilt
Copy link

kapilt commented Dec 15, 2014

fwiw there's similiar work to the etcd discovery protocol embedded in
swarmd's discovery.

On Mon, Dec 15, 2014 at 1:05 PM, Joe Beda notifications@github.com wrote:

I'm going to pick this stuff up again this week and break it down into a
bunch of sub-items. I think that @erictune https://github.com/erictune
and @kelseyhightower https://github.com/kelseyhightower were going to
write some stuff up. I'm happy to get on that though.


Reply to this email directly or view it on GitHub
#2303 (comment)
.

@jbeda
Copy link
Contributor Author

jbeda commented Dec 15, 2014

@kapilt Yup -- I bugged them to document it. I'd love for us to do something that it a little bit more secure than that. Right now if you can steal the single token/cluster id, you can steal the cluster.

We also need to bootstrap the cluster parameters and the admin account.

@timothysc
Copy link
Member

So I'm a huge +1 for this, with one caveat of having a whitelist option on master #3103 , with reverse DNS lookup on minions to join. That should narrow the vector.

jbeda added a commit to jbeda/kubernetes that referenced this issue Jan 7, 2015
This is related to kubernetes#2303 and steals from kubernetes#2435.
@davidopp davidopp added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 17, 2015
@jdef
Copy link
Contributor

jdef commented Mar 11, 2015

xref mesosphere/kubernetes-mesos#169

@alex-mohr alex-mohr removed this from the v1.0 milestone Mar 19, 2015
@alex-mohr alex-mohr added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Mar 19, 2015
@alex-mohr
Copy link
Contributor

We now separately track the items from this uber-proposal that are part of milestone v1, so removing that tag.

@alex-mohr
Copy link
Contributor

Created label cluster/platform/mesos

On Thu, Mar 19, 2015 at 2:40 PM, Timothy St. Clair <notifications@github.com

wrote:

@alex-mohr https://github.com/alex-mohr, @jdef https://github.com/jdef

  • could we get a new label for tracking mesos-framework integration pieces?


Reply to this email directly or view it on GitHub
#2303 (comment)
.

@roberthbailey
Copy link
Contributor

I've broken cluster bootstrapping out into #5754.

@roberthbailey
Copy link
Contributor

I've broken the all-in-one binary out into #5755.

@roberthbailey
Copy link
Contributor

For reference, "Allow for kubelets to securely nominate themselves to join the cluster" is being tracked in #3168.

@roberthbailey
Copy link
Contributor

And "Recast add-ons as post-cluster deploy scripts/tools that run on top of kubernetes" is being tracked in #3579.

@roberthbailey
Copy link
Contributor

Closing this umbrella issue now that all of the various parts have been split into separate issues.

@sandric
Copy link
Contributor

sandric commented Apr 15, 2015

@jbeda, is it implemented/in process? I mean master/minion combined installation? I just didn't see it in the ones @roberthbailey separated this monstruosed issue to, can you point me to discussion? thx.

@bgrant0607
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-release priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests