kubernetes · k8s-github-robot · Feb 19, 2016 · Jan 16, 2016 · mml · Jan 26, 2016
diff --git a/docs/proposals/resource-quota-scoping.md b/docs/proposals/resource-quota-scoping.md
@@ -0,0 +1,362 @@
+<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+
+<!-- BEGIN STRIP_FOR_RELEASE -->
+
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+
+<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
+
+If you are using a released version of Kubernetes, you should
+refer to the docs that go with that version.
+
+Documentation for other releases can be found at
+[releases.k8s.io](http://releases.k8s.io).
+</strong>
+--
+
+<!-- END STRIP_FOR_RELEASE -->
+
+<!-- END MUNGE: UNVERSIONED_WARNING -->
+
+# Resource Quota - Scoping resources
+
+## Problem Description
+
+### Ability to limit compute requests and limits
+
+The existing `ResourceQuota` API object constrains the total amount of compute
+resource requests.  This is useful when a cluster-admin is interested in
+controlling explicit resource guarantees such that there would be a relatively
+strong guarantee that pods created by users who stay within their quota will find
+enough free resources in the cluster to be able to schedule.  The end-user creating
+the pod is expected to have intimate knowledge on their minimum required resource
+as well as their potential limits.
+
+There are many environments where a cluster-admin does not extend this level
+of trust to their end-user because user's often request too much resource, and
+they have trouble reasoning about what they hope to have available for their
+application versus what their application actually needs.  In these environments,
+the cluster-admin will often just expose a single value (the limit) to the end-user.
+Internally, they may choose a variety of other strategies for setting the request.
+For example, some cluster operators are focused on satisfying a particular over-commit
+ratio and may choose to set the request as a factor of the limit to control for
+over-commit.  Other cluster operators may defer to a resource estimation tool that
+sets the request based on known historical trends.  In this environment, the
+cluster-admin is interested in exposing a quota to their end-users that maps
+to their desired limit instead of their request since that is the value the user
+manages.
+
+### Ability to limit impact to node and promote fair-use
+
+The current `ResourceQuota` API object does not allow the ability
+to quota best-effort pods separately from pods with resource guarantees.
+For example, if a cluster-admin applies a quota that caps requested
+cpu at 10 cores and memory at 10Gi, all pods in the namespace must
+make an explicit resource request for cpu and memory to satisfy
+quota.  This prevents a namespace with a quota from supporting best-effort
+pods.
+
+In practice, the cluster-admin wants to control the impact of best-effort
+pods to the cluster, but not restrict the ability to run best-effort pods
+altogether.
+
+As a result, the cluster-admin requires the ability to control the
+max number of active best-effort pods.  In addition, the cluster-admin
+requires the ability to scope a quota that limits compute resources to
+exclude best-effort pods.
+
+### Ability to quota long-running vs bounded-duration compute resources
+
+The cluster-admin may want to quota end-users separately
+based on long-running vs bounded-duration compute resources.
+
+For example, a cluster-admin may offer more compute resources
+for long running pods that are expected to have a more permanent residence
+on the node than bounded-duration pods.  Many batch style workloads
+tend to consume as much resource as they can until something else applies
+the brakes.  As a result, these workloads tend to operate at their limit,
+while many traditional web applications may often consume closer to their
+request if there is no active traffic.  An operator that wants to control
+density will offer lower quota limits for batch workloads than web applications.
+
+A classic example is a PaaS deployment where the cluster-admin may
+allow a separate budget for pods that run their web application vs pods that
+build web applications.
+
+Another example is providing more quota to a database pod than a
+pod that performs a database migration.
+
+## Use Cases
+
+* As a cluster-admin, I want the ability to quota
+ * compute resource requests
+ * compute resource limits
+ * compute resources for terminating vs non-terminating workloads
+ * compute resources for best-effort vs non-best-effort pods
+
+## Proposed Change
+
+### New quota tracked resources
+
+Support the following resources that can be tracked by quota.
+
+| Resource Name | Description |
+| ------------- | ----------- |
+| cpu | total cpu requests (backwards compatibility) |
+| cpu.request | total cpu requests |
+| cpu.limit | total cpu limits |
+| memory | total memory requests (backwards compatibility) |
+| memory.request | total memory requests |
+| memory.limit | total memory limits |
+
+### Resource Quota Scopes
+
+Add the ability to associate a set of `scopes` to a quota.
+
+A quota will only measure usage for a `resource` if it matches
+the intersection of enumerated `scopes`.
+
+Adding a `scope` to a quota limits the number of resources
+it supports to those that pertain to the `scope`.  Specifying
+a resource on the quota object outside of the allowed set
+would result in a validation error.
+
+| Scope | Description |
+| ----- | ----------- |
+| Terminating | Match `kind=Pod` where `spec.activeDeadlineSeconds >= 0` |
+| NotTerminating | Match `kind=Pod` where `spec.activeDeadlineSeconds = nil` |
+| BestEffort | Match `kind=Pod` where `status.qualityOfService in (BestEffort)` |
+| NotBestEffort | Match `kind=Pod` where `status.qualityOfService not in (BestEffort)` |
+
+A `BestEffort` scope restricts a quota to tracking the following resources:
+
+* pod
+
+A `Terminating`, `NotTerminating`, `NotBestEffort` scope restricts a quota to
+tracking the following resources:
+
+* pod
+* memory, memory.request, memory.limit
+* cpu, cpu.request, cpu.limit
+
+## Data Model Impact
+
+```
+// The following identify resource constants for Kubernetes object types
+const (
+  // CPU Request, in cores
+  ResourceCPURequest ResourceName = "cpu.request"
+  // CPU Limit, in bytes
+  ResourceCPULimit ResourceName = "cpu.limit"
+  // Memory Request, in bytes
+  ResourceMemoryRequest ResourceName = "memory.request"
+  // Memory Limit, in bytes
+  ResourceMemoryLimit ResourceName = "memory.limit"
+)
+
+// A scope is a filter that matches an object
+type ResourceQuotaScope string
+const (
+  ResourceQuotaScopeTerminating ResourceQuotaScope = "Terminating"
+  ResourceQuotaScopeNotTerminating ResourceQuotaScope = "NotTerminating"
+  ResourceQuotaScopeBestEffort ResourceQuotaScope = "BestEffort"
+  ResourceQuotaScopeNotBestEffort ResourceQuotaScope = "NotBestEffort"
+)
+
+// ResourceQuotaSpec defines the desired hard limits to enforce for Quota
+// The quota matches by default on all objects in its namespace.
+// The quota can optionally match objects that satisfy a set of scopes.
+type ResourceQuotaSpec struct {
+  // Hard is the set of desired hard limits for each named resource
+  Hard ResourceList `json:"hard,omitempty"`
+  // Scopes is the set of filters that must match an object for it to be
+  // tracked by the quota
+  Scopes []ResourceQuotaScope `json:"scopes,omitempty"`
+}
+```
+
+## Rest API Impact
+
+None.
+
+## Security Impact
+
+None.
+
+## End User Impact
+
+The `kubectl` commands that render quota should display its scopes.
+
+## Performance Impact
+
+This feature will make having more quota objects in a namespace
+more common in certain clusters.  This impacts the number of quota
+objects that need to be incremented during creation of an object
+in admission control.  It impacts the number of quota objects
+that need to be updated during controller loops.
+
+## Developer Impact
+
+None.
+
+## Alternatives
+
+This proposal initially enumerated a solution that leveraged a
+`FieldSelector` on a `ResourceQuota` object.  A `FieldSelector`
+grouped an `APIVersion` and `Kind` with a selector over its
+fields that supported set-based requirements.  It would have allowed
+a quota to track objects based on cluster defined attributes.
+
+For example, a quota could do the following:
+
+* match `Kind=Pod` where `spec.restartPolicy in (Always)`
+* match `Kind=Pod` where `spec.restartPolicy in (Never, OnFailure)`
+* match `Kind=Pod` where `status.qualityOfService in (BestEffort)`
+* match `Kind=Service` where `spec.type in (LoadBalancer)`
+ * see [#17484](https://github.com/kubernetes/kubernetes/issues/17484)
+
+Theoretically, it would enable support for fine-grained tracking
+on a variety of resource types.  While extremely flexible, there
+are cons to to this approach that make it premature to pursue
+at this time.
+
+* Generic field selectors are not yet settled art
+ * see [#1362](https://github.com/kubernetes/kubernetes/issues/1362)
+ * see [#19084](https://github.com/kubernetes/kubernetes/pull/19804)
+* Discovery API Limitations
+ * Not possible to discover the set of field selectors supported by kind.
+ * Not possible to discover if a field is readonly, readwrite, or immutable
+ post-creation.
+
+The quota system would want to validate that a field selector is valid,
+and it would only want to select on those fields that are readonly/immutable
+post creation to make resource tracking work during update operations.
+
+The current proposal could grow to support a `FieldSelector` on a
+`ResourceQuotaSpec` and support a simple migration path to convert
+`scopes` to the matching `FieldSelector` once the project has identified
+how it wants to handle `fieldSelector` requirements longer term.
+
+This proposal previously discussed a solution that leveraged a
+`LabelSelector` as a mechanism to partition quota.  This is potentially
+interesting to explore in the future to allow `namespace-admins` to
+quota workloads based on local knowledge.  For example, a quota
+could match all kinds that match the selector
+`tier=cache, environment in (dev, qa)` separately from quota that
+matched `tier=cache, environment in (prod)`.  This is interesting to
+explore in the future, but labels are insufficient selection targets
+for `cluster-administrators` to control footprint.  In those instances,
+you need fields that are cluster controlled and not user-defined.
+
+## Example
+
+### Scenario 1
+
+The cluster-admin wants to restrict the following:
+
+* limit 2 best-effort pods
+* limit 2 terminating pods that can not use more than 1Gi of memory, and 2 cpu cores
+* limit 4 long-running pods that can not use more than 4Gi of memory, and 4 cpu cores
+* limit 6 pods in total, 10 replication controllers
+
+This would require the following quotas to be added to the namespace:
+
+```
+$ cat quota-best-effort
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: quota-best-effort
+spec:
+  hard:
+    pods: "2"
+  scopes:
+  - BestEffort
+
+$ cat quota-terminating
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: quota-terminating
+spec:
+  hard:
+    pods: "2"
+    memory.limit: 1Gi
+    cpu.limit: 2
+  scopes:
+  - Terminating
+  - NotBestEffort
+
+$ cat quota-longrunning
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: quota-longrunning
+spec:
+  hard:
+    pods: "2"
+    memory.limit: 4Gi
+    cpu.limit: 4
+  scopes:
+  - NotTerminating
+  - NotBestEffort 
+
+$ cat quota
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: quota
+spec:
+  hard:
+    pods: "6"
+    replicationcontrollers: "10"
+```
+
+In the above scenario, every pod creation will result in its usage being
+tracked by `quota` since it has no additional scoping.  The pod will then
+be tracked by at 1 additional quota object based on the scope it
+matches.  In order for the pod creation to succeed, it must not violate
+the constraint of any matching quota.  So for example, a best-effort pod
+would only be created if there was available quota in `quota-best-effort`
+and `quota`.
+
+## Implementation
+
+### Assignee
+
+@derekwaynecarr
+
+### Work Items
+
+* Add support for requests and limits
+* Add support for scopes in quota-related admission and controller code
+
+## Dependencies
+
+None.
+
+Longer term, we should evaluate what we want to do with `fieldSelector` as
+the requests around different quota semantics will continue to grow.
+
+## Testing
+
+Appropriate unit and e2e testing will be authored.
+
+## Documentation Impact
+
+Existing resource quota documentation and examples will be updated.
+
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-quota-scoping.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->