Navigation Menu

Skip to content

Commit

Permalink
Loosen the meaning of status in API conventions (#5842)
Browse files Browse the repository at this point in the history
* Loosen the meaning of status in API conventions

As per KEP kubernetes/enhancements#2527 -
remove the "from observation" guidance and add some discussion of when
to use status fields vs. additional types for allocated resources.

* Backtick spec and status

* Act on feedback
  • Loading branch information
thockin committed Jan 10, 2022
1 parent f058d9c commit b20f613
Showing 1 changed file with 156 additions and 29 deletions.
185 changes: 156 additions & 29 deletions contributors/devel/sig-architecture/api-conventions.md
Expand Up @@ -150,7 +150,7 @@ sub-resources. Common subresources include:
* `/binding`: Used to bind a resource representing a user request (e.g., Pod,
PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node,
PersistentVolume).
* `/status`: Used to write just the status portion of a resource. For
* `/status`: Used to write just the `status` portion of a resource. For
example, the `/pods` endpoint only allows updates to `metadata` and `spec`,
since those reflect end-user intent. An automated process should be able to
modify status for users to see by sending an updated Pod kind to the server to
Expand Down Expand Up @@ -250,25 +250,35 @@ tooling to decorate objects with additional metadata for their own use.
#### Spec and Status

By convention, the Kubernetes API makes a distinction between the specification
of the desired state of an object (a nested object field called "spec") and the
of the desired state of an object (a nested object field called `spec`) and the
status of the object at the current time (a nested object field called
"status"). The specification is a complete description of the desired state,
`status`). The specification is a complete description of the desired state,
including configuration settings provided by the user,
[default values](#defaulting) expanded by the system, and properties initialized
or otherwise changed after creation by other ecosystem components (e.g.,
schedulers, auto-scalers), and is persisted in stable storage with the API
object. If the specification is deleted, the object will be purged from the
system. The status summarizes the current state of the object in the system, and
is usually persisted with the object by automated processes but may be
generated on the fly. At some cost and perhaps some temporary degradation in
behavior, the status could be reconstructed by observation if it were lost.

When a new version of an object is POSTed or PUT, the "spec" is updated and
available immediately. Over time the system will work to bring the "status" into
line with the "spec". The system will drive toward the most recent "spec"
regardless of previous versions of that stanza. In other words, if a value is
system.

The `status` summarizes the current state of the object in the system, and is
usually persisted with the object by automated processes but may be generated
on the fly. As a general guideline, fields in `status` should be the most recent
observations of actual state, but they may contain information such as the
results of allocations or similar operations which are executed in response to
the object's `spec`. See [below](#representing-allocated-values) for more
details.

Types with both `spec` and `status` stanzas can (and usually should) have distinct
authorization scopes for them. This allows users to be granted full write
access to `spec` and read-only access to status, while relevant controllers are
granted read-only access to `spec` but full write access to status.

When a new version of an object is POSTed or PUT, the `spec` is updated and
available immediately. Over time the system will work to bring the `status` into
line with the `spec`. The system will drive toward the most recent `spec`
regardless of previous versions of that stanza. For example, if a value is
changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system
is not required to 'touch base' at 5 before changing the "status" to 3. In other
is not required to 'touch base' at 5 before changing the `status` to 3. In other
words, the system's behavior is *level-based* rather than *edge-based*. This
enables robust behavior in the presence of missed intermediate state changes.

Expand All @@ -279,8 +289,8 @@ specification should have declarative rather than imperative names and
semantics -- they represent the desired state, not actions intended to yield the
desired state.

The PUT and POST verbs on objects MUST ignore the "status" values, to avoid
accidentally overwriting the status in read-modify-write scenarios. A `/status`
The PUT and POST verbs on objects MUST ignore the `status` values, to avoid
accidentally overwriting the `status` in read-modify-write scenarios. A `/status`
subresource MUST be provided to enable system components to update statuses of
resources they manage.

Expand All @@ -295,21 +305,20 @@ alternative resource representations that allow mutation of the status, or
performing custom actions on the object.

All objects that represent a physical resource whose state may vary from the
user's desired intent SHOULD have a "spec" and a "status". Objects whose state
cannot vary from the user's desired intent MAY have only "spec", and MAY rename
"spec" to a more appropriate name.
user's desired intent SHOULD have a `spec` and a `status`. Objects whose state
cannot vary from the user's desired intent MAY have only `spec`, and MAY rename
`spec` to a more appropriate name.

Objects that contain both spec and status should not contain additional
Objects that contain both `spec` and `status` should not contain additional
top-level fields other than the standard metadata fields.

Some objects which are not persisted in the system - such as `SubjectAccessReview`
and other webhook style calls - may choose to add spec and status to encapsulate
a "call and response" pattern. The spec is the request (often a request for
information) and the status is the response. For these RPC like objects the only
and other webhook style calls - may choose to add `spec` and `status` to encapsulate
a "call and response" pattern. The `spec` is the request (often a request for
information) and the `status` is the response. For these RPC like objects the only
operation may be POST, but having a consistent schema between submission and
response reduces the complexity of these clients.


##### Typical status properties

**Conditions** provide a standard mechanism for higher-level status reporting
Expand Down Expand Up @@ -343,7 +352,7 @@ Conditions are most useful when they follow some consistent conventions:

* Not all controllers will observe the previous advice about reporting
"Unknown" or "False" values. For known conditions, the absence of a
condition status should be interpreted the same as `Unknown`, and
condition `status` should be interpreted the same as `Unknown`, and
typically indicates that reconciliation has not yet finished (or that the
resource state may not yet be observable).

Expand All @@ -365,10 +374,10 @@ Conditions are most useful when they follow some consistent conventions:
resource, rather than describing the current state transitions. This
typically means that the name should be an adjective ("Ready", "OutOfDisk")
or a past-tense verb ("Succeeded", "Failed") rather than a present-tense verb
("Deploying"). Intermediate states may be indicated by setting the status of
("Deploying"). Intermediate states may be indicated by setting the `status` of
the condition to `Unknown`.

* For state transitions which take a long period of time (rule of thumb: > 1
* For state transitions which take a long period of time (e.g. more than 1
minute), it is reasonable to treat the transition itself as an observed
state. In these cases, the Condition (such as "Resizing") itself should not
be transient, and should instead be signalled using the
Expand Down Expand Up @@ -414,7 +423,7 @@ can cause a large fan-out effect for some resources.
Condition types should be named in PascalCase. Short condition names are
preferred (e.g. "Ready" over "MyResourceReady").

Condition status values may be `True`, `False`, or `Unknown`. The absence of a
Condition `status` values may be `True`, `False`, or `Unknown`. The absence of a
condition should be interpreted the same as `Unknown`. How controllers handle
`Unknown` depends on the Condition in question.

Expand Down Expand Up @@ -486,7 +495,7 @@ referring object's status.

For references to specific objects, see [Object references](#object-references).

References in the status of the referee to the referrer may be permitted, when
References in the `status` of the referee to the referrer may be permitted, when
the references are one-to-one and do not need to be frequently updated,
particularly in an edge-based manner.

Expand Down Expand Up @@ -1671,7 +1680,6 @@ be less than 256", "must be greater than or equal to 0". Do not use words
like "larger than", "bigger than", "more than", "higher than", etc.
* When specifying numeric ranges, use inclusive ranges when possible.


## Automatic Resource Allocation And Deallocation

API objects often are [union](#Unions) object containing the following:
Expand All @@ -1694,3 +1702,122 @@ allocates resources such as `NodePorts` and `ClusterIPs` and automatically fill
represent them in case of the service is of type `NodePort` or `ClusterIP` (`discriminator` values).
These resources and the fields representing them are automatically cleared when the users changes
service type to `ExternalName` where these resources and field values no longer apply.

## Representing Allocated Values

Many API types include values that are allocated on behalf of the user from
some larger space (e.g. IP addresses from a range, or storage bucket names).
These allocations are usually driven by controllers asynchronously to the
user's API operations. Sometimes the user can request a specific value and a
controller must confirm or reject that request. There are many examples of
this in Kubernetes, and there a handful of patterns used to represent it.

The common theme among all of these is that the system should not trust users
with such fields, and must verify or otherwise confirm such requests before
using them.

Some examples:

* Service `clusterIP`: Users may request a specific IP in `spec` or will be
allocated one (in the same `spec` field). If a specific IP is requested, the
apiserver will either confirm that IP is available or, failing that, will
reject the API operation synchronously (rare). Consumers read the result
from `spec`. This is safe because the value is either valid or it is never
stored.
* Service `loadBalancerIP`: Users may request a specific IP in `spec` or will
be allocated one which is reported in `status`. If a specific IP is
requested, the LB controller will either ensure that IP is available or
report failure asynchronously. Consumers read the result from `status`.
This is safe because most users do not have acces to write to `status`.
* PersistentVolumeClaims: Users may request a specific PersistentVolume in
`spec` or will be allocated one (in the same `spec` field). If a specific PV
is requested, the volume controller will either ensure that the volume is
available or report failure asynchronously. Consumers read the result by
examining both the PVC and the PV. This is more complicated than the others
because the `spec` value is stored before being confirmed, which could
(hypothetically, thanks to extra checking) lead to a user accessing someone
else's PV.
* VolumeSnapshots: Users may request a particular source to be snaphotted in
`spec`. The details of the resulting snapshot is reflected in `status`.

A counter-example:

* Service `externalIPs`: Users must specify one or more specific IPs in `spec`.
The system cannot easily verify those IPs (by their definition, they are
external). Consumers read the result from `spec`. This is UNSAFE and has
caused problems with untrusted users.

In the past, API conventions dictated that `status` fields always come from
observation, which made some of these cases more complicated than necessary.
The conventions have been updated to allow `status` to hold such allocated
values. This is not a one-size-fits-all solution, though.

### When to use a `spec` field

New APIs should almost never do this. Instead, they should use `status`.
PersistentVolumes might have been simpler if we had done this.

### When to use a `status` field

Storing such values in `status` is the easiest and most straight-forward
pattern. This is appropriate when:

* the allocated value is highly coupled to the rest of the object (e.g. pod
resource allocations)
* the allocated value is always or almost always needed (i.e. most instances of
this type will have a value)
* the schema and controller are known a priori (i.e. it's not an extension)
* it is "safe" to allow the controller(s) to write to `status` (i.e.
there's low risk of them causing problems via other `status` fields).

Consumers of such values can look at the `status` field for the "final" value
or an error or condition indicating why the allocation could not be performed.

#### Sequencing operations

Since almost everything is happening asynchronously to almost everything else,
controller implementations should take care around the ordering of operations.
For example, whether the controller updates a `status` field before or after it
actuates a change depends on what guarantees need to be made to observers of
the system. In some cases, writing to a `status` field represents an
acknowledgement or acceptance of a `spec` value, and it is OK to write it before
actuation. However, if it would be problematic for a client to observe the
`status` value before it is actuated then the controller must actuate first and
update `status` afterward. In some rarer cases, controllers will need to
acknowledge, then actuate, then update to a "final" value.

Controllers must take care to consider how a `status` field will be handled in
the case of interrupted control loops (e.g. controller crash and restart), and
must act idempotently and consistently. This is particularly important when
using an informer-fed cache, which might not be updated with recent writes.
Using a resourceVersion precondition to detect the "conflict" is the common
pattern in this case. See [this issue](http://issue.k8s.io/105199) for an
example.

### When to use a different type

Storing allocated values in a different type is more complicated but also more
flexible. This is most appropriate when:

* the allocated value is optional (i.e. many instances of this type will not
have a value at all)
* the schema and controller are not known a priori (i.e. it's an extension)
* the schema is sufficiently complicated (i.e. it doesn't make sense to burden
the main type with it)
* access control for this type demands finer granularity than "all of status"
* the lifecycle of the allocated value is different than the lifecycle of the
allocation holder

Services and Endpoints could be considered a form of this pattern, as could
PersistentVolumes and PersistentVolumeClaims.

When using this pattern, you must account for lifecycle of the allocated
objects (who cleans them up and when) as well as the "linkage" between them and
the main type (often using the same name, an object-ref field, or a selector).

There will always be some cases which could follow either path, and these will
need human evaluation to decide. For example, Service `clusterIP` is highly
coupled to the rest of Service and most instances use it. But it also is
strictly optional and has an increasingly complicated schema of related fields.
An argument could be made for either path.
>>>>>>> 49012588 (Loosen the meaning of status in API conventions)

0 comments on commit b20f613

Please sign in to comment.