Loosen the meaning of status in API conventions (#5842)

* Loosen the meaning of status in API conventions As per KEP kubernetes/enhancements#2527 - remove the "from observation" guidance and add some discussion of when to use status fields vs. additional types for allocated resources. * Backtick spec and status * Act on feedback
kubernetes · Jan 10, 2022 · b20f613 · b20f613
1 parent f058d9c
commit b20f613
Showing 1 changed file with 156 additions and 29 deletions.
diff --git a/contributors/devel/sig-architecture/api-conventions.md b/contributors/devel/sig-architecture/api-conventions.md
@@ -150,7 +150,7 @@ sub-resources. Common subresources include:
    * `/binding`: Used to bind a resource representing a user request (e.g., Pod,
 PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node,
 PersistentVolume).
-   * `/status`: Used to write just the status portion of a resource. For
+   * `/status`: Used to write just the `status` portion of a resource. For
 example, the `/pods` endpoint only allows updates to `metadata` and `spec`,
 since those reflect end-user intent. An automated process should be able to
 modify status for users to see by sending an updated Pod kind to the server to
@@ -250,25 +250,35 @@ tooling to decorate objects with additional metadata for their own use.
 #### Spec and Status
 
 By convention, the Kubernetes API makes a distinction between the specification
-of the desired state of an object (a nested object field called "spec") and the
+of the desired state of an object (a nested object field called `spec`) and the
 status of the object at the current time (a nested object field called
-"status"). The specification is a complete description of the desired state,
+`status`). The specification is a complete description of the desired state,
 including configuration settings provided by the user,
 [default values](#defaulting) expanded by the system, and properties initialized
 or otherwise changed after creation by other ecosystem components (e.g.,
 schedulers, auto-scalers), and is persisted in stable storage with the API
 object. If the specification is deleted, the object will be purged from the
-system. The status summarizes the current state of the object in the system, and
-is usually persisted with the object by automated processes but may be
-generated on the fly. At some cost and perhaps some temporary degradation in
-behavior, the status could be reconstructed by observation if it were lost.
-
-When a new version of an object is POSTed or PUT, the "spec" is updated and
-available immediately. Over time the system will work to bring the "status" into
-line with the "spec". The system will drive toward the most recent "spec"
-regardless of previous versions of that stanza. In other words, if a value is
+system.
+
+The `status` summarizes the current state of the object in the system, and is
+usually persisted with the object by automated processes but may be generated
+on the fly.  As a general guideline, fields in `status` should be the most recent
+observations of actual state, but they may contain information such as the
+results of allocations or similar operations which are executed in response to
+the object's `spec`.  See [below](#representing-allocated-values) for more
+details.
+
+Types with both `spec` and `status` stanzas can (and usually should) have distinct
+authorization scopes for them.  This allows users to be granted full write
+access to `spec` and read-only access to status, while relevant controllers are
+granted read-only access to `spec` but full write access to status.
+
+When a new version of an object is POSTed or PUT, the `spec` is updated and
+available immediately. Over time the system will work to bring the `status` into
+line with the `spec`. The system will drive toward the most recent `spec`
+regardless of previous versions of that stanza. For example, if a value is
 changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system
-is not required to 'touch base' at 5 before changing the "status" to 3. In other
+is not required to 'touch base' at 5 before changing the `status` to 3. In other
 words, the system's behavior is *level-based* rather than *edge-based*. This
 enables robust behavior in the presence of missed intermediate state changes.
 
@@ -279,8 +289,8 @@ specification should have declarative rather than imperative names and
 semantics -- they represent the desired state, not actions intended to yield the
 desired state.
 
-The PUT and POST verbs on objects MUST ignore the "status" values, to avoid
-accidentally overwriting the status in read-modify-write scenarios. A `/status`
+The PUT and POST verbs on objects MUST ignore the `status` values, to avoid
+accidentally overwriting the `status` in read-modify-write scenarios. A `/status`
 subresource MUST be provided to enable system components to update statuses of
 resources they manage.
 
@@ -295,21 +305,20 @@ alternative resource representations that allow mutation of the status, or
 performing custom actions on the object.
 
 All objects that represent a physical resource whose state may vary from the
-user's desired intent SHOULD have a "spec" and a "status". Objects whose state
-cannot vary from the user's desired intent MAY have only "spec", and MAY rename
-"spec" to a more appropriate name.
+user's desired intent SHOULD have a `spec` and a `status`. Objects whose state
+cannot vary from the user's desired intent MAY have only `spec`, and MAY rename
+`spec` to a more appropriate name.
 
-Objects that contain both spec and status should not contain additional
+Objects that contain both `spec` and `status` should not contain additional
 top-level fields other than the standard metadata fields.
 
 Some objects which are not persisted in the system - such as `SubjectAccessReview`
-and other webhook style calls - may choose to add spec and status to encapsulate
-a "call and response" pattern. The spec is the request (often a request for
-information) and the status is the response. For these RPC like objects the only
+and other webhook style calls - may choose to add `spec` and `status` to encapsulate
+a "call and response" pattern. The `spec` is the request (often a request for
+information) and the `status` is the response. For these RPC like objects the only
 operation may be POST, but having a consistent schema between submission and
 response reduces the complexity of these clients.
 
-
 ##### Typical status properties
 
 **Conditions** provide a standard mechanism for higher-level status reporting
@@ -343,7 +352,7 @@ Conditions are most useful when they follow some consistent conventions:
 
    * Not all controllers will observe the previous advice about reporting
      "Unknown" or "False" values. For known conditions, the absence of a
-     condition status should be interpreted the same as `Unknown`, and
+     condition `status` should be interpreted the same as `Unknown`, and
      typically indicates that reconciliation has not yet finished (or that the
      resource state may not yet be observable).
 
@@ -365,10 +374,10 @@ Conditions are most useful when they follow some consistent conventions:
   resource, rather than describing the current state transitions. This
   typically means that the name should be an adjective ("Ready", "OutOfDisk")
   or a past-tense verb ("Succeeded", "Failed") rather than a present-tense verb
-  ("Deploying"). Intermediate states may be indicated by setting the status of
+  ("Deploying"). Intermediate states may be indicated by setting the `status` of
   the condition to `Unknown`.
 
-  * For state transitions which take a long period of time (rule of thumb: > 1
+  * For state transitions which take a long period of time (e.g. more than 1
     minute), it is reasonable to treat the transition itself as an observed
     state. In these cases, the Condition (such as "Resizing") itself should not
     be transient, and should instead be signalled using the
@@ -414,7 +423,7 @@ can cause a large fan-out effect for some resources.
 Condition types should be named in PascalCase. Short condition names are
 preferred (e.g. "Ready" over "MyResourceReady").
 
-Condition status values may be `True`, `False`, or `Unknown`. The absence of a
+Condition `status` values may be `True`, `False`, or `Unknown`. The absence of a
 condition should be interpreted the same as `Unknown`.  How controllers handle
 `Unknown` depends on the Condition in question.
 
@@ -486,7 +495,7 @@ referring object's status.
 
 For references to specific objects, see [Object references](#object-references).
 
-References in the status of the referee to the referrer may be permitted, when
+References in the `status` of the referee to the referrer may be permitted, when
 the references are one-to-one and do not need to be frequently updated,
 particularly in an edge-based manner.
 
@@ -1671,7 +1680,6 @@ be less than 256", "must be greater than or equal to 0".  Do not use words
 like "larger than", "bigger than", "more than", "higher than", etc.
 * When specifying numeric ranges, use inclusive ranges when possible.
 
-
 ## Automatic Resource Allocation And Deallocation
 
 API objects often are [union](#Unions) object containing the following:
@@ -1694,3 +1702,122 @@ allocates resources such as `NodePorts` and `ClusterIPs` and automatically fill
 represent them in case of the service is of type `NodePort` or `ClusterIP` (`discriminator` values).
 These resources and the fields representing them are automatically cleared when  the users changes
 service type to `ExternalName` where these resources and field values no longer apply.
+
+## Representing Allocated Values
+
+Many API types include values that are allocated on behalf of the user from
+some larger space (e.g. IP addresses from a range, or storage bucket names).
+These allocations are usually driven by controllers asynchronously to the
+user's API operations.  Sometimes the user can request a specific value and a
+controller must confirm or reject that request.  There are many examples of
+this in Kubernetes, and there a handful of patterns used to represent it.
+
+The common theme among all of these is that the system should not trust users
+with such fields, and must verify or otherwise confirm such requests before
+using them.
+
+Some examples:
+
+* Service `clusterIP`: Users may request a specific IP in `spec` or will be
+  allocated one (in the same `spec` field).  If a specific IP is requested, the
+  apiserver will either confirm that IP is available or, failing that, will
+  reject the API operation synchronously (rare).  Consumers read the result
+  from `spec`.  This is safe because the value is either valid or it is never
+  stored.
+* Service `loadBalancerIP`: Users may request a specific IP in `spec` or will
+  be allocated one which is reported in `status`.  If a specific IP is
+  requested, the LB controller will either ensure that IP is available or
+  report failure asynchronously.  Consumers read the result from `status`.
+  This is safe because most users do not have acces to write to `status`.
+* PersistentVolumeClaims: Users may request a specific PersistentVolume in
+  `spec` or will be allocated one (in the same `spec` field).  If a specific PV
+  is requested, the volume controller will either ensure that the volume is
+  available or report failure asynchronously.  Consumers read the result by
+  examining both the PVC and the PV.  This is more complicated than the others
+  because the `spec` value is stored before being confirmed, which could
+  (hypothetically, thanks to extra checking) lead to a user accessing someone
+  else's PV.
+* VolumeSnapshots: Users may request a particular source to be snaphotted in
+  `spec`.  The details of the resulting snapshot is reflected in `status`.
+
+A counter-example:
+
+* Service `externalIPs`: Users must specify one or more specific IPs in `spec`.
+  The system cannot easily verify those IPs (by their definition, they are
+  external). Consumers read the result from `spec`.  This is UNSAFE and has
+  caused problems with untrusted users.
+
+In the past, API conventions dictated that `status` fields always come from
+observation, which made some of these cases more complicated than necessary.
+The conventions have been updated to allow `status` to hold such allocated
+values.  This is not a one-size-fits-all solution, though.
+
+### When to use a `spec` field
+
+New APIs should almost never do this.  Instead, they should use `status`.
+PersistentVolumes might have been simpler if we had done this.
+
+### When to use a `status` field
+
+Storing such values in `status` is the easiest and most straight-forward
+pattern.  This is appropriate when:
+
+* the allocated value is highly coupled to the rest of the object (e.g. pod
+  resource allocations)
+* the allocated value is always or almost always needed (i.e. most instances of
+  this type will have a value)
+* the schema and controller are known a priori (i.e. it's not an extension)
+* it is "safe" to allow the controller(s) to write to `status` (i.e.
+  there's low risk of them causing problems via other `status` fields).
+
+Consumers of such values can look at the `status` field for the "final" value
+or an error or condition indicating why the allocation could not be performed.
+
+#### Sequencing operations
+
+Since almost everything is happening asynchronously to almost everything else,
+controller implementations should take care around the ordering of operations.
+For example, whether the controller updates a `status` field before or after it
+actuates a change depends on what guarantees need to be made to observers of
+the system.  In some cases, writing to a `status` field represents an
+acknowledgement or acceptance of a `spec` value, and it is OK to write it before
+actuation.  However, if it would be problematic for a client to observe the
+`status` value before it is actuated then the controller must actuate first and
+update `status` afterward.  In some rarer cases, controllers will need to
+acknowledge, then actuate, then update to a "final" value.
+
+Controllers must take care to consider how a `status` field will be handled in
+the case of interrupted control loops (e.g. controller crash and restart), and
+must act idempotently and consistently.  This is particularly important when
+using an informer-fed cache, which might not be updated with recent writes.
+Using a resourceVersion precondition to detect the "conflict" is the common
+pattern in this case.  See [this issue](http://issue.k8s.io/105199) for an
+example.
+
+### When to use a different type
+
+Storing allocated values in a different type is more complicated but also more
+flexible.  This is most appropriate when:
+
+* the allocated value is optional (i.e. many instances of this type will not
+  have a value at all)
+* the schema and controller are not known a priori (i.e. it's an extension)
+* the schema is sufficiently complicated (i.e. it doesn't make sense to burden
+  the main type with it)
+* access control for this type demands finer granularity than "all of status"
+* the lifecycle of the allocated value is different than the lifecycle of the
+  allocation holder
+
+Services and Endpoints could be considered a form of this pattern, as could
+PersistentVolumes and PersistentVolumeClaims.
+
+When using this pattern, you must account for lifecycle of the allocated
+objects (who cleans them up and when) as well as the "linkage" between them and
+the main type (often using the same name, an object-ref field, or a selector).
+
+There will always be some cases which could follow either path, and these will
+need human evaluation to decide.  For example, Service `clusterIP` is highly
+coupled to the rest of Service and most instances use it.  But it also is
+strictly optional and has an increasingly complicated schema of related fields.
+An argument could be made for either path.
+>>>>>>> 49012588 (Loosen the meaning of status in API conventions)