kubernetes · smarterclayton · Feb 20, 2015 · Nov 11, 2014 · smarterclayton · Feb 18, 2015
diff --git a/docs/design/security.md b/docs/design/security.md
@@ -97,6 +97,8 @@ A pod runs in a *security context* under a *service account* that is defined by
 * Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030
 * Docker secrets https://github.com/docker/docker/pull/6697
 * Docker vault https://github.com/docker/docker/issues/10310
+* Service Accounts: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md
+* Secret volumes https://github.com/GoogleCloudPlatform/kubernetes/4126
 
 ## Specific Design Points
 
@@ -112,4 +114,4 @@ Both the Kubelet and Kube Proxy need information related to their specific roles
 
 The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role.  For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes.
 
-The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster.  At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic.  It does not need the ability to modify pods or see other resources, only to create bindings.  It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them).  The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time).
+The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster.  At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic.  It does not need the ability to modify pods or see other resources, only to create bindings.  It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them).  The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time).
diff --git a/docs/design/security_context.md b/docs/design/security_context.md
@@ -172,13 +172,13 @@ type IDMapping struct {
 
 // IDMappingRange specifies a mapping between container IDs and node IDs
 type IDMappingRange struct {
-	// ContainerID is the starting container ID
+	// ContainerID is the starting container UID or GID
 	ContainerID int
 
-	// HostID is the starting host ID
+	// HostID is the starting host UID or GID
 	HostID int
 
-	// Length is the length of the ID range
+	// Length is the length of the UID/GID range
 	Length int
 }
 
@@ -187,4 +187,4 @@ type IDMappingRange struct {
 
 #### Security Context Lifecycle
 
-The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use.
+The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use.
diff --git a/docs/design/service_accounts.md b/docs/design/service_accounts.md
@@ -0,0 +1,164 @@
+#Service Accounts
+
+## Motivation 
+
+Processes in Pods may need to call the Kubernetes API.  For example:
+  - scheduler
+  - replication controller
+  - minion controller
+  - a map-reduce type framework which has a controller that then tries to make a dynamically determined number of workers and watch them
+  - continuous build and push system
+  - monitoring system
+
+They also may interact with services other than the Kubernetes API, such as:
+  - an image repository, such as docker -- both when the images are pulled to start the containers, and for writing
+    images in the case of pods that generate images.
+  - accessing other cloud services, such as blob storage, in the context of a larged, integrated, cloud offering (hosted
+    or private).
+  - accessing files in an NFS volume attached to the pod
+
+## Design Overview
+A service account binds together several things:
+  - a *name*, understood by users, and perhaps by peripheral systems, for an identity
+  - a *principal* that can be authenticated and (authorized)[../authorization.md]
+  - a [security context](./security_contexts.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other
+    capabilities and controls on interaction with the file system and OS.
+  - a set of [secrets](./secrets.md), which a container may use to
+    access various networked resources.
+
+## Design Discussion
+
+A new object Kind is added:
+```go
+type ServiceAccount struct {
+    TypeMeta   `json:",inline" yaml:",inline"`
+    ObjectMeta `json:"metadata,omitempty" yaml:"metadata,omitempty"`
+
+    username string
+    securityContext ObjectReference // (reference to a securityContext object)
+    secrets []ObjectReference // (references to secret objects
+}
+```
+
+The name ServiceAccount is chosen because it is widely used already (e.g. by Kerberos and LDAP)
+to refer to this type of account.  Note that it has no relation to kubernetes Service objects.
+
+The ServiceAccount object does not include any information that could not be defined separately:
+  - username can be defined however users are defined.
+  - securityContext and secrets are only referenced and are created using the REST API.
+
+The purpose of the serviceAccount object is twofold:
+  - to bind usernames to securityContexts and secrets, so that the username can be used to refer succinctly
+    in contexts where explicitly naming securityContexts and secrets would be inconvenient
+  - to provide an interface to simplify allocation of new securityContexts and secrets.
+These features are explained later.
+
+### Names
+
+From the standpoint of the Kubernetes API, a `user` is any principal which can authenticate to kubernetes API.
+This includes a human running `kubectl` on her desktop and a container in a Pod on a Node making API calls.
+
+There is already a notion of a username in kubernetes, which is populated into a request context after authentication.
+However, there is no API object representing a user.  While this may evolve, it is expected that in mature installations,
+the canonical storage of user identifiers will be handled by a system external to kubernetes. 
+
+Kubernetes does not dictate how to divide up the space of user identifier strings.  User names can be
+simple Unix-style short usernames, (e.g. `alice`), or may be qualified to allow for federated identity (
+`alice@example.com` vs `alice@example.org`.)  Naming convention may distinguish service accounts from user
+accounts (e.g. `alice@example.com` vs `build-service-account-a3b7f0@foo-namespace.service-accounts.example.com`),
+but Kubernetes does not require this.
+
+Kubernetes also does not require that there be a distinction between human and Pod users.  It will be possible
+to setup a cluster where Alice the human talks to the kubernetes API as username `alice` and starts pods that
+also talk to the API as user `alice` and write files to NFS as user `alice`.  But, this is not recommended.
+
+Instead, it is recommended that Pods and Humans have distinct identities, and reference implementations will
+make this distinction.
+
+The distinction is useful for a number of reasons:
+  - the requirements for humans and automated processes are different:
+    - Humans need a wide range of capabilities to do their daily activities. Automated processes often have more narrowly-defined activities.
+    - Humans may better tolerate the exceptional conditions created by expiration of a token.  Remembering to handle
+      this in a program is more annoying.  So, either long-lasting credentials or automated rotation of credentials is
+      needed.
+    - A Human typically keeps credentials on a machine that is not part of the cluster and so not subject to automatic
+      management.  A VM with a role/service-account can have its credentials automatically managed.
+  - the identity of a Pod cannot in general be mapped to a single human.
+    - If policy allows, it may be created by one human, and then updated by another, and another, until its behavior cannot be attributed to a single human.  
+
+**TODO**: consider getting rid of separate serviceAccount object and just rolling its parts into the SecurityContext or
+Pod Object.
+
+The `secrets` field is a list of references to /secret objects that an process started as that service account should
+have access to to be able to assert that role.
+
+The secrets are not inline with the serviceAccount object.  This way, most or all users can have permission to `GET /serviceAccounts` so they can remind themselves
+what serviceAccounts are available for use.
+
+Nothing will prevent creation of a serviceAccount with two secrets of type `SecretTypeKubernetesAuth`, or secrets of two
+different types.  Kubelet and client libraries will have some behavior, TBD, to handle the case of multiple secrets of a
+given type (pick first or provide all and try each in order, etc).
+
+When a serviceAccount and a matching secret exist, then a `User.Info` for the serviceAccount and a `BearerToken` from the secret
+are added to the map of tokens used by the authentication process in the apiserver, and similarly for other types.  (We
+might have some types that do not do anything on apiserver but just get pushed to the kubelet.)
+
+### Pods
+The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field.  If this is unset, then a
+default value is chosen.  If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the
+Service Account Finalizer (see below). 
+
+TBD: how policy limits which users can make pods with which service accounts.
+
+### Authorization
+Kubernetes API Authorization Policies refer to users.  Pods created with a `Pods.Spec.ServiceAccountUsername` typically
+get a `Secret` which allows them to authenticate to the Kubernetes APIserver as a particular user.  So any
+policy that is desired can be applied to them.
+
+A higher level workflow is needed to coordinate creation of serviceAccounts, secrets and relevant policy objects.
+Users are free to extend kubernetes to put this business logic wherever is convenient for them, though the
+Service Account Finalizer is one place where this can happen (see below).
+
+### Kubelet
+
+The kubelet will treat as "not ready to run" (needing a finalizer to act on it) any Pod which has an empty
+SecurityContext.  
+
+The kubelet will set a default, restrictive, security context for any pods created from non-Apiserver config
+sources (http, file).
+
+Kubelet watches apiserver for secrets which are needed by pods bound to it.
+
+**TODO**: how to only let kubelet see secrets it needs to know.
+
+### The service account finalizer
+
+There are several ways to use Pods with SecurityContexts and Secrets.
+
+One way is to explicitly specify the securityContext and all secrets of a Pod when the pod is initially created,
+like this:
+
+**TODO**: example of pod with explicit refs.
+
+Another way is with the *Service Account Finalizer*, a plugin process which is optional, and which handles
+business logic around service accounts. 
+
+The Service Account Finalizer watches Pods, Namespaces, and ServiceAccount definitions.
+
+First, if it finds pods which have a `Pod.Spec.ServiceAccountUsername` but no `Pod.Spec.SecurityContext` set,
+then it copies in the referenced securityContext and secrets references for the corresponding `serviceAccount`.
+
+Second, if ServiceAccount definitions change, it may take some actions.
+**TODO**: decide what actions it takes when a serviceAccount defintion changes.  Does it stop pods, or just
+allow someone to list ones that out out of spec?  In general, people may want to customize this?
+
+Third, if a new namespace is created, it may create a new serviceAccount for that namespace.  This may include
+a new username (e.g. `NAMESPACE-default-service-account@serviceaccounts.$CLUSTERID.kubernetes.io`), a new
+securityContext, a newly generated secret to authenticate that serviceAccount to the Kubernetes API, and default
+policies for that service account.
+**TODO**: more concrete example.  What are typical default permissions for default service account (e.g. readonly access
+to services in the same namespace and read-write access to events in that namespace?)
+
+Finally, it may provide an interface to automate creation of new serviceAccounts.  In that case, the user may want
+to GET serviceAccounts to see what has been created.
+