Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc for clustering. #3281

Merged
merged 2 commits into from
Jan 9, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/design/clustering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Clustering in Kubernetes


## Overview
The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address.

Once a cluster is established, the following is true:

1. **Master -> Node** The master needs to know which nodes can take work and what their current status is wrt capacity.
1. **Location** The master knows the name and location of all of the nodes in the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does location mean IP address or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to state here that for now, the master must reach the node via a pathway that the node can reach itself? Or do we want to bake in the distinction that what node thinks of as its name may not be how the master reaches it? I would prefer the former, but expect people to ask about the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are meaning by pathway. Adding some clarification here that we need consistency so that we can verify certificates, at the least.

Suggest some language and I'll include it :)

* For the purposes of this doc, location and name should be enough information so that the master can open a TCP connection to the Node. Most probably we will make this either an IP address or a DNS name. It is going to be important to be consistent here (master must be able to reach kubelet on that DNS name) so that we can verify certificates appropriately.
2. **Target AuthN** A way to securely talk to the kubelet on that node. Currently we call out to the kubelet over HTTP. This should be over HTTPS and the master should know what CA to trust for that node.
3. **Caller AuthN/Z** This would be the master verifying itself (and permissions) when calling the node. Currently, this is only used to collect statistics as authorization isn't critical. This may change in the future though.
2. **Node -> Master** The nodes currently talk to the master to know which pods have been assigned to them and to publish events.
1. **Location** The nodes must know where the master is at.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should clarify here that this is currently "master" but could be "masters" and eventually will change over time (maybe as "not-considered-yet-but-known-issues" in a later section)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a note.

2. **Target AuthN** Since the master is assigning work to the nodes, it is critical that they verify whom they are talking to.
3. **Caller AuthN/Z** The nodes publish events and so must be authenticated to the master. Ideally this authentication is specific to each node so that authorization can be narrowly scoped. The details of the work to run (including things like environment variables) might be considered sensitive and should be locked down also.

**Note:** While the description here refers to a singular Master, in the future we should enable multiple Masters operating in an HA mode. While the "Master" is currently the combination of the API Server, Scheduler and Controller Manager, we will restrict ourselves to thinking about the main API and policy engine -- the API Server.

## Current Implementation

A central authority (generally the master) is responsible for determining the set of machines which are members of the cluster. Calls to create and remove worker nodes in the cluster are restricted to this single authority, and any other requests to add or remove worker nodes are rejected. (1.i).

Communication from the master to nodes is currently over HTTP and is not secured or authenticated in any way. (1.ii, 1.iii).

The location of the master is communicated out of band to the nodes. For GCE, this is done via Salt. Other cluster instructions/scripts use other methods. (2.i)

Currently most communication from the node to the master is over HTTP. When it is done over HTTPS there is currently no verification of the cert of the master (2.ii).

Currently, the node/kubelet is authenticated to the master via a token shared across all nodes. This token is distributed out of band (using Salt for GCE) and is optional. If it is not present then the kubelet is unable to publish events to the master. (2.iii)

Our current mix of out of band communication doesn't meet all of our needs from a security point of view and is difficult to set up and configure.

## Proposed Solution

The proposed solution will provide a range of options for setting up and maintaining a secure Kubernetes cluster. We want to both allow for centrally controlled systems (leveraging pre-existing trust and configuration systems) or more ad-hoc automagic systems that are incredibly easy to set up.

The building blocks of an easier solution:

* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly idenitfy the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN.
* [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate.
* **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors.
* **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about how policy is linked to a new Kubelet account. Several options come to mind:

  1. Some thing automatically creates one or more new policy statements each time a kubelet account is generated. Will result in a lot of policy statements. But, a simpler policy language will suffice.
  2. Kubelet accounts are automatically created as members of a Kubelet group. Policy is written with the group as the pricipal, and individual kubelets accounts are not mentioned. This would avoid duplication of policy line. But it means we need to add "groups" as a core resource in kubernetes. And will need to have some way to restrict access of individual kubelet to resources intended for it, such as a policy statement with a condition clause that compares the IP of the kubelet with the IP the pod is bound to, etc.
  3. Special case handling for kubelet accounts which is different from other principals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically speaking, the need to say "the kubelet is allowed to do things that are specific to its inherent identity" is similar to service accounts or other special users. It would be ideal if we could somehow tie identity to policy in a way that a single policy represents what an identity of form X can do. In the long run, we expect the kubelet policy, or extension plugins, to have well defined rights and behaviors. It seems better to make that possible to do in an easily understandable way(in the long run). Does the kubelet policy really need to be that flexible? We might want to add to what it can do, but its core policy is likely a fundamental aspect of the system.

Possibly that calls for something like a special policy rule that applies to a kubelet identity with a default checker that can have overrides.

Policy:
   any-identity-that-matches-this-regex: kubelet@<host>
      acts-as-kubelet
      allow fooresource GET <...>

where acts-as-kubelet is an alias for "we have a special coded checker that has the minimal policy for a kubelet". That reduces the need for policy to be totally generic to the level of checking request parameters and such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something more along the lines of what @smarterclayton is suggesting to start with. We'd name the kubelet accounts with something we could glob on (kubelet@host or kubelet:host or whatever) and then apply policy there. We could hard code it to start.

* To start with, we'd have the kubelets generate a cert/account in the form of `kubelet:<host>`. To start we would then hard code policy such that we give that particular account appropriate permissions. Over time, we can make the policy engine more generic.
* [optional] **Bootstrap API endpoint** This is a helper service hosted outside of the Kubernetes cluster that helps with initial discovery of the master.

### Static Clustering

In this sequence diagram there is out of band admin entity that is creating all certificates and distributing them. It is also making sure that the kubelets know where to find the master. This provides for a lot of control but is more difficult to set up as lots of information must be communicated outside of Kubernetes.

![Static Sequence Diagram](clustering/static.png)

### Dynamic Clustering

This diagram dynamic clustering using the bootstrap API endpoint. That API endpoint is used to both find the location of the master and communicate the root CA for the master.

This flow has the admin manually approving the kubelet signing requests. This is the `queue` policy defined above.This manual intervention could be replaced by code that can verify the signing requests via other means.

![Dynamic Sequence Diagram](clustering/dynamic.png)
1 change: 1 addition & 0 deletions docs/design/clustering/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DroidSansMono.ttf
16 changes: 16 additions & 0 deletions docs/design/clustering/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FONT := DroidSansMono.ttf

PNGS := $(patsubst %.seqdiag,%.png,$(wildcard *.seqdiag))

.PHONY: all
all: $(PNGS)

.PHONY: watch
watch:
fswatch *.seqdiag | xargs -n 1 sh -c "make || true"

$(FONT):
curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/$(FONT).ttf

%.png: %.seqdiag $(FONT)
seqdiag -a -f '$(FONT)' $<
9 changes: 9 additions & 0 deletions docs/design/clustering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This directory contains diagrams for the clustering design doc.

This depends on the `seqdiag` [utility](http://blockdiag.com/en/seqdiag/index.html). Assuming you have a non-borked python install, this should be installable with

```bash
pip install seqdiag
```

Just call `make` to regenerate the diagrams.
Binary file added docs/design/clustering/dynamic.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions docs/design/clustering/dynamic.seqdiag
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
seqdiag {
activation = none;


user[label = "Admin User"];
bootstrap[label = "Bootstrap API\nEndpoint"];
master;
kubelet[stacked];

user -> bootstrap [label="createCluster", return="cluster ID"];
user <-- bootstrap [label="returns\n- bootstrap-cluster-uri"];

user ->> master [label="start\n- bootstrap-cluster-uri"];
master => bootstrap [label="setMaster\n- master-location\n- master-ca"];

user ->> kubelet [label="start\n- bootstrap-cluster-uri"];
kubelet => bootstrap [label="get-master", return="returns\n- master-location\n- master-ca"];
kubelet ->> master [label="signCert\n- unsigned-kubelet-cert", return="retuns\n- kubelet-cert"];
user => master [label="getSignRequests"];
user => master [label="approveSignRequests"];
kubelet <<-- master [label="returns\n- kubelet-cert"];

kubelet => master [label="register\n- kubelet-location"]
}
Binary file added docs/design/clustering/static.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions docs/design/clustering/static.seqdiag
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
seqdiag {
activation = none;

admin[label = "Manual Admin"];
ca[label = "Manual CA"]
master;
kubelet[stacked];

admin => ca [label="create\n- master-cert"];
admin ->> master [label="start\n- ca-root\n- master-cert"];

admin => ca [label="create\n- kubelet-cert"];
admin ->> kubelet [label="start\n- ca-root\n- kubelet-cert\n- master-location"];

kubelet => master [label="register\n- kubelet-location"];
}