kubernetes · k8s-ci-robot · Nov 12, 2018 · Aug 14, 2017 · Oct 24, 2018 · krmayankk
diff --git a/keps/sig-auth/0000-20170814-bounding-self-labeling-kubelets.md b/keps/sig-auth/0000-20170814-bounding-self-labeling-kubelets.md
@@ -0,0 +1,141 @@
+---
+kep-number: 0
+title: Bounding Self-Labeling Kubelets
+authors:
+  - "@mikedanese"
+  - "@liggitt"
+owning-sig: sig-auth
+participating-sigs:
+  - sig-node
+  - sig-storage
+reviewers:
+  - "@saad-ali"
+  - "@tallclair"
+approvers:
+  - "@thockin"
+  - "@smarterclayton"
+creation-date: 2017-08-14
+last-updated: 2018-10-31
+status: implementable
+---
+
+# Bounding Self-Labeling Kubelets
+
+## Motivation
+
+Today the node client has total authority over its own Node labels.
+This ability is incredibly useful for the node auto-registration flow.
+The kubelet reports a set of well-known labels, as well as additional
+labels specified on the command line with `--node-labels`.
+
+While this distributed method of registration is convenient and expedient, it
+has two problems that a centralized approach would not have. Minorly, it makes
+management difficult. Instead of configuring labels in a centralized
+place, we must configure `N` kubelet command lines. More significantly, the
+approach greatly compromises security. Below are two straightforward escalations
+on an initially compromised node that exhibit the attack vector.
+
+### Capturing Dedicated Workloads
+
+Suppose company `foo` needs to run an application that deals with PII on
+dedicated nodes to comply with government regulation. A common mechanism for
+implementing dedicated nodes in Kubernetes today is to set a label or taint
+(e.g. `foo/dedicated=customer-info-app`) on the node and to select these
+dedicated nodes in the workload controller running `customer-info-app`.
+
+Since the nodes self reports labels upon registration, an intruder can easily
+register a compromised node with label `foo/dedicated=customer-info-app`. The
+scheduler will then bind `customer-info-app` to the compromised node potentially
+giving the intruder easy access to the PII.
+
+This attack also extends to secrets. Suppose company `foo` runs their outward
+facing nginx on dedicated nodes to reduce exposure to the company's publicly
+trusted server certificates. They use the secret mechanism to distribute the
+serving certificate key. An intruder captures the dedicated nginx workload in
+the same way and can now use the node certificate to read the company's serving
+certificate key.
+
+## Proposal
+
+1. Modify the `NodeRestriction` admission plugin to prevent Kubelets from self-setting labels
+within the `k8s.io` and `kubernetes.io` namespaces *except for these specifically allowed labels/prefixes*:
+
+    ```
+    kubernetes.io/hostname
+    kubernetes.io/instance-type
+    kubernetes.io/os
+    kubernetes.io/arch
+
+    beta.kubernetes.io/instance-type
+    beta.kubernetes.io/os
+    beta.kubernetes.io/arch
+
+    failure-domain.beta.kubernetes.io/zone
+    failure-domain.beta.kubernetes.io/region
+
+    failure-domain.kubernetes.io/zone
+    failure-domain.kubernetes.io/region
+
+    [*.]kubelet.kubernetes.io/*
+    [*.]node.kubernetes.io/*
+    ```
+
+2. Reserve and document the `node-restriction.kubernetes.io/*` label prefix for cluster administrators
+that want to label their `Node` objects centrally for isolation purposes.
+
+    > The `node-restriction.kubernetes.io/*` label prefix is reserved for cluster administrators
+    > to isolate nodes. These labels cannot be self-set by kubelets when the `NodeRestriction`
+    > admission plugin is enabled.
+
+This accomplishes the following goals:
+
+- continues allowing people to use arbitrary labels under their own namespaces any way they wish
+- supports legacy labels kubelets are already adding
+- provides a place under the `kubernetes.io` label namespace for node isolation labeling
+- provide a place under the `kubernetes.io` label namespace for kubelets to self-label with kubelet and node-specific labels
+
+## Implementation Timeline
+
+v1.13:
+
+* Kubelet deprecates setting `kubernetes.io` or `k8s.io` labels via `--node-labels`, 
+other than the specifically allowed labels/prefixes described above,
+and warns when invoked with `kubernetes.io` or `k8s.io` labels outside that set.
+* NodeRestriction admission prevents kubelets from adding/removing/modifying `[*.]node-restriction.kubernetes.io/*` labels on Node *create* and *update*
+* NodeRestriction admission prevents kubelets from adding/removing/modifying `kubernetes.io` or `k8s.io`
+labels other than the specifically allowed labels/prefixes described above on Node *update* only
+
+v1.15:
+
+* Kubelet removes the ability to set `kubernetes.io` or `k8s.io` labels via `--node-labels`
+other than the specifically allowed labels/prefixes described above (deprecation period
+of 6 months for CLI elements of admin-facing components is complete)
+
+v1.17:
+
+* NodeRestriction admission prevents kubelets from adding/removing/modifying `kubernetes.io` or `k8s.io`
+labels other than the specifically allowed labels/prefixes described above on Node *update* and *create*
+(oldest supported kubelet running against a v1.17 apiserver is v1.15)
+
+## Alternatives Considered
+
+### File or flag-based configuration of the apiserver to allow specifying allowed labels
+
+* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently
+* File-based config isn't easily inspectable to be able to verify enforced labels
+* File-based config isn't easily kept in sync in HA apiserver setups
+
+### API-based configuration of the apiserver to allow specifying allowed labels
+
+* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently
+* An API object that controls the allowed labels is a potential escalation path for a compromised node
+
+### Allow kubelets to add any labels they wish, and add NoSchedule taints if disallowed labels are added
+
+* To be robust, this approach would also likely involve a controller to automatically inspect labels and remove the NoSchedule taint. This seemed overly complex. Additionally, it was difficult to come up with a tainting scheme that preserved information about which labels were the cause.
+
+### Forbid all labels regardless of namespace except for a specifically allowed set
+
+* This was much more disruptive to existing usage of `--node-labels`.
+* This was much more difficult to integrate with other systems allowing arbitrary topology labels like CSI.
+* This placed restrictions on how labels outside the `kubernetes.io` and `k8s.io` label namespaces could be used, which didn't seem proper.