From fc36fec10812398c5b42f04ac065f0ab2403f800 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Thu, 23 Feb 2017 16:26:39 -0800 Subject: [PATCH 01/12] initial StatefulSet updates proposal --- .../design-proposals/statefulset-update.md | 850 ++++++++++++++++++ 1 file changed, 850 insertions(+) create mode 100644 contributors/design-proposals/statefulset-update.md diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md new file mode 100644 index 00000000000..eceaa227c2f --- /dev/null +++ b/contributors/design-proposals/statefulset-update.md @@ -0,0 +1,850 @@ +# StatefulSet Updates + +**Author**: kow3ns@ + +**Status**: Proposal + +## Abstract +Currently (as of Kubernetes 1.6), `.Spec.Replicas`, and +`.Spec.Template.Containers` are the only mutable fields of the +StatefulSet API object. Updating `.Spec.Replicas` will scale the number of Pods +in the StatefulSet. Updating `.Spec.Template.Containers` causes all subsequently +created Pods to have the specified containers. In order to cause the +StatefulSet controller to apply its updated `.Spec`, users must manually delete +each Pod. This manual method of applying updates is error prone. The +implementation of this proposal will add the capability to perform ordered, +automated, sequential updates. + +## Affected Components +1. API Server +1. Kubectl +1. StatefulSet Controller +1. StatefulSetSpec API object +1. StatefulSetStatus API object + +## Use Cases +Upon implementation, this design will support the following in scope use cases, +and it will not rule out the future implementation of the out of scope use +cases. + +### In Scope +- As the administrator of a stateful application, in order to vertically scale +my application, I want to update resource limits or requested resources. +- As the administrator of a stateful application, in order to deploy critical +security updates, break fix patches, and feature releases, I want to update +container images. +- As the administrator of a stateful application, in order to update my +application's configuration, I want to update environment variables, container +entry point commands or parameters, or configuration files. +- As the administrator of the logging and monitoring infrastructure for my +organization, in order to add logging and monitoring side cars, I want to patch +containers to add images. + +### Out of Scope +- As the administrator of a stateful application, in order to increase the +applications storage capacity, I want to update PersistentVolumes. +- As the administrator of a stateful application, in order to update the +network configuration of the application, I want to update Services and +container ports in a consistent way. +- As the administrator of a stateful application, when I scale my application +horizontally, I went associated PodDistruptionBudgets to be adjusted to +compensate for the application's scaling. + +## Assumptions + - StatefulSet update must support singleton StatefulSets. However, an update in + this case will cause a temporary outage. This is acceptable as a single + process application is, by definition, not highly available. + - Disruption in Kubernetes is controlled by PodDistruptionBugets. As + StatefulSet updates progress one Pod at a time, and only occur when all + other Pods have a Status of Running and a Ready Condition, they can not + violate reasonable PodDisrutptionBugdets. + - Without priority and preemption, there is no guarantee that an update will + not block due to a loss of capacity or due to the scheduling of another Pod + between Pod termination and Pod creation. This is mitigated by blocking the + update when a Pod fails to schedule. Remediation will require operator + intervention. This implementation is no worse than the current behavior with + respect to eviction. + - We will eventually implement a signal that is delivered to Pods to indicate + the + [reason for termination](https://github.com/kubernetes/kubernetes/issues/1462). + This will be a general implementation, usable for any Pod in a Kubernetes + cluster. It is, therefore, out of scope to design such a mechanism here. + - Kubelet does not support resizing a container's resources without terminating + the Pod. In place resource reallocation is out of scope for this design. + Vertical scaling must be performed destructively. + - The primary means of configuration update will be configuration files, + command line flags, environment variables, or ConfigMaps consumed as the one + of the former. + - In place configuration update via SIGHUP is not universally + supported, and Kubelet provides no mechanism to perform this currently. Pod + reconfiguration will be performed destructively. + - Stateful applications are likely to evolve wire protocols and storage formats + between versions. In most cases, when updating the application's Pod's + containers, it will not be safe to roll back or forward to an arbitrary + version. StatefulSet update should work well when rolling out an update, + or performing a rollback, between two specific revisions of the StatefulSet. + +## Requirements +This design is based on the following requirements. +- Users must be able to update the containers of a StatefulSet's Pods. + - Updates to container commands, images, resources and configuration must be + supported. +- The update must progress in a sequential, deterministic order and respect the + StatefulSet + [identity](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#pod-identity), + [deployment, and scaling](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#deployment-and-scaling-guarantee) + guarantees. +- A failed update must halt. +- Users must be able to rollback an update. +- Users must be able to roll forward to fix a failing/failed update. +- Users must be able to view the status of an update. +- Users should be able to view a bounded history of the updates that have been +applied to the StatefulSet. + +## API Object + +The following modifications will be made to the StatefulSetStatus API object. + +```go + type StatefulSetStatus struct { + // ObservedGeneration and Replicas fields are ommitted for brevity. + + // CurrentTemplateRevision, if not nil, is the revision of the PodTemplate + // that was used to create Pods with ordinals in the sequence + // [0,CurrentRevisionReplicas). + TemplateRevision *int64 `json:"templateRevision,omitempty"` + + // TargetTemplateRevision, if not nil, is the revision of the PodTemplate + // that was used to create Pods with ordinals in the sequence + // [Replicas - TargetRevisionReplicas, Replicas). + TargetTemplateRevision *int64 `json:"targetTemplateRevision,omitempty"` + + // ReadyReplicas is the current number of Pods, created by the StatefulSet + // controller, that have a Status for Running and a Ready Condition. + ReadyReplicas int32 `json:"readyReplicas,omitempty"` + + // CurrentRevisionReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by CurrentTemplateRevision. + CurrentReplicas int32 `json:"currentReplicas,omitempty"` + + // TargetRevisionReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by TargetTemplateRevision. + TargeReplicas int32 `json:"taretReplicas,omitempty"` +} +``` + +The following modifications will be made to the StatefulSetSpec API object. + +```go +type StatefulSetSpec struct { + // Replicas, Selector, Template, VolumeClaimsTemplate, and ServiceName + // ommitted for brevity. + v1.PodTemplateSpec `json:"template"` + + // TemplateRevision is a monotonically increasing, 64 bit, integer used to + // indicate the version of the of the PodTemplateSpec. If nil, the + // StatefulSetController has not initialized its revision history, + // change tracking is not enabled, and all Pods will be created from + // Template. + TemplateRevision *int64 `json:"templateRevision"` + + // RevisionParition paritions the Pods in the StatefulSet by ordinal such + // that all Pods with a lower ordinal will be created from the PodTemplate that + // represents the current revision of the StatefulSet's revision history and + // all Pods with an a greater or equal ordinal will be created from the + // PodTemplate that represents the target revision of the StatefulSet's + // revision history. + RevisionPartition *int32 `json:"revisionPartition,omitempty` + + // RevisionHistoryDepth is the maximum number of PodTemplates that will + // be maintained in the StatefulSet's revision history. It must be at + // least two. + RevisionHisotryDepth int32 `json:historyRevisionDepth,omitempty` +} +``` + +Additionally, we introduce the following constants. + +```go +// StatefulSetPodTemplateLabel is the label applied to a PodTemplate to allow +// the StatefulSet controller to select the PodTemplates in its revision +// history. +const StatefulSetPodTemplateLabel = "created-by-statefulset" + +// StatefulSetTemplateRevisionLabel is the label applied to a PodTemplate or +// Pod to indicate the position of the object's Template in the revision +// history of a StatefulSet. +const StatefulSetTemplateRevisionLabel = "statefulset-template-revision" +``` + +## StatefulSet Controller +The StatefulSet controller will watch for modifications to StatefulSet and Pod +API objects. When a StatefulSet is created or updated, or when one +of the Pods in a StatefulSet is updated or deleted, the StatefulSet +controller will attempt to create, update, or delete Pods to conform the +current state of the system to the user declared target state. +The user declared target state of the system, with respect to an individual +StatefulSet, is determined as below. + +### Target State +The declared target state of a StatefulSet requires that all Pods in the +StatefulSet conform to exactly one or two PodTemplates in the StatefulSet's +revision history. If the declared target state references two PodTemplates, as +is the case when a user wants to perform a canary update or a phased roll out, +they are partitioned around an ordinal such that all Pods with a lower ordinal +conform to one PodTemplate and all Pods with a greater or equal ordinal +conform to the other. The conditions that define this state in terms of the +StatefulSet's StatefulSetSpec and StatefulSetStatus are below. + +1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. +1. If StatefulSet's `.Spec.RevisionPartition` is nil, then the following is true. + 1. The StatefulSet's `.Status.TemplateRevision` is equal to its + `.Status.TargetRevision`. + 1. All Pods in the StatefulSet have been generated from the PodTemplate + labled with a `StatefulSetTemplateRevisionLabel` equal to its + `.Status.TemplateRevision`. +1. If the StatefulSet's `.Spec.RevisionPartition` is not nil, then the following +is true. + 1. All Pods with ordinals is the sequence `[0,.Spec.RevisionPartition)` have + been generated from the PodTemplate in the StatefulSet's revision history + that is labeled with a `StatefulSetTemplateRevisionLabel` equal to + `.Status.TemplateRevision`. + 1. All Pods with ordinals in the sequence + `[Spec.RevisionParition,.Spec.Replicas)` have been created with the + PodTemplate in the StatefulSet's revision history that is labeled with a + `StatefulSetTemplateRevisionLabel` equal to + `.Status.TargetTemplateRevision`. + +### Revised Controller Algorithm +The StatefulSet controller will use the following algorithm to continue to +make progress toward the user declared [target state](#target-state) while +respecting the controller's +[identity](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#pod-identity), +[deployment, and scaling](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#deployment-and-scaling-guarantee) +guarantees. + +1. The controller will +[reconstruct the revision history](#history-reconstruction) of the StatefulSet. +1. The controller will process any [template updates](#template-updates) to +ensure that the StatefulSet's revision history is consistent with the user +declared desired state. +1. The controller will select all Pods in the StatefulSet, filter any Pods not +owned by the StatefulSet, and sort the remaining Pods in ordinal order. +1. If any Pods with ordinals in the sequence `[0,.Spec.Replicas)` have not been +created, for the Pod corresponding to the lowest such ordinal, the controller +will [select the PodTemplate](#podtemplate-selection) from the StatefulSet's +revision history corresponding to the Pod's ordinal and create the Pod. +1. If all Pods in the sequence `[0,.Spec.Replicas)` have been created, but if any +have a Status other than Running or do not have a Ready Condition, the +StatefulSet controller will wait for these Pods to either become Running and +Ready, or to be completely deleted. +1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Status of Running and +a Condition indicating Ready, and if `.Spec.Replicas` is less than +`.Status.Replicas`, the controller will delete the Pod corresponding to the +largest ordinal. This implies that scaling takes precedence over Pod updates. +1. If all Pods in the range `[0,.Spec.Replicas)` have a Status of Running and +a Ready Condition, if `.Spec.Replicas` is equal to `.Status.Replicas`, and if +there are Pods that do not match the +[declared desired PodTemplate](#podtemplate-selection), the Pod corresponding to +the largest ordinal will be deleted. +1. If the StatefulSet controller has achieved the +[declared target state](#target-state), and if that state has a +`.Spec.ParitionOrdinal` equal to `0`, the StatefulSet controller will +[complete any in progress updates](#update-completion). +1. The controller will [report its status](#status-reporting). +1. The controller will perform any necessary +[maintenance of its revision history](#history-maintenance). + +### StatefulSet Revision History +The StatefulSet controller will use labeled, versioned PodTemplates to keep a +history of updates preformed on a StatefulSet. The number of stored PodTemplates +is considered to be the depth of the StatefulSet's revision history. The +maximum revision history depth for a StatefulSet must be at least two, but it +may be greater. + +#### PodTemplate Creation +When the StatefulSet controller creates a PodTemplate for a StatefulSet, it will +do the following. + +1. The controller will set the PodTemplate's `.PodTemplateSpec` field to the +StatefulSet's `.Spec.Template` field. +1. The controller will create a ControllerRef object in the PodTemplate's +`.OwnerReferences` list to mediate selector overlapping. +1. The controller will label the PodTemplate with a +`StatefulSetPodTemplateLabel` set to the StatefulSet's `.Name` to allow for +selection of the PodTemplates that comprise the StatefulSet's revision history. +1. The controller will label the PodTemplate with a +`StaefulSetTemplateRevisionLabel` set to the StatefulSet's +`.Spec.TemplateRevision`. +1. The controller will set the Name of the PodTemplate to a concatenation of the +`.Name` of the StatefulSet and the `.Spec.TemplateRevision`. +1. The controller will then create the PodTemplate. + +#### PodTemplate Deletion +When the `StatefulSet` controller deletes a PodTemplate in the revision +history of a StatefulSet it will do the following. + +1. If the PodTemplate's ControllerRef does not match the StatefulSet, the +controller will not delete the PodTemplate. In this way, we prevent selector +overlap from causing the deletion of PodTemplates that are part of another +object's revision history. In practice, these PodTemplates will be filtered out +prior to history maintenance. +1. If the PodTemplate's ControllerRef matches the StatefulSet, the +StatefulSet controller will delete the PodTemplate. + +#### History Reconstruction +In order to reconstruct the history of revisions to a StatefulSet, the +StatefulSet controller will do the following. + +1. If the StatefulSet's `.Spec.TemplateRevision` is nil, the StatefulSet +has never been updated, and its history has never been initialized. This is +the state the object will be in when a cluster is first upgraded from a version +that does not support StatefulSet update to a version that does. In this case, +the controller will not enforce PodTemplate revisions. When creating Pods, +it will always use the StatefulSet's `.Spec.Template`. Otherwise, the controller +will continue as below. +1. The controller will select all PodTemplates with a +`StatefulSetPodTemplateLabel` matching the `.Name` field of the StatefulSet. +1. The controller will filter out all PodTemplates that do not contain a +ControllerRef matching the the StatefulSet. If the controller selects +PodTemplates that it does not own, it will report an error, but it will continue +reconstructing the StatefulSet's history. +1. The controller will filter out all PodTemplates that do not have a +`StatefulSetTemplateRevisionLabel` mapped to a valid revision. This can only +occur if the user purposefully deletes the label. In this case, the +controller will report an error, but it will continue reconstructing the +StatefulSet's revision history. +1. For all the remaining PodTemplates, the controller will sort them in +ascending order by the value mapped to their `StatefulSetTemplateRevisionLabel`. +This will reconstruct a list of PodTemplates from oldest to newest. Note that, +as the revision is monotonically increasing for an individual StatefulSet, and +as we use ControllerRef to mitigate selector overlap, the StatefulSet's history + is a strictly ordered set. + +#### History Maintenance +In order to prevent the revision history of the StatefulSet from exceeding +memory or storage limits, the StatefulSet controller will periodically prune +the oldest PodTemplates from the StatefulSet's revision history. + +1. The StatefulSet controller will +[reconstruct the revision history](#history-reconstruction) +of the StatefulSet. +1. If the number of PodTemplates in the StatefulSet's revision history is +greater than the StatefulSet's `.Spec.RevisionHistoryDepth`, the +StatefulSet controller will delete PodTemplates, starting with the head of +the revision history, until the depth of the revision history is equal to +the StatefulSet's `.Spec.RevisionHistoryDepth`. +1. As a StatefulSet's `.Spec.RevisionHistoryDepth` is always at least two, and +as the PodTemplates corresponding to `.Status.TemplateRevision` +or `.Status.TargetTemplateRevision` are always the most recent PodTemplates +in the revision history, the StatefulSet controller will not delete any +`PodTemplates` that represent the current or target revisions. + +### Template Updates +The StatefulSet controller will create PodTemplates upon mutation of the +`.Spec.Template` of a StatefulSet. + +1. When the StafefulSet controller observes a mutation to a StatefulSet's + `.Spec.Template` it will compare the `.Spec.TemplateRevision` to the + `.Status.TargetTemplateRevision`. +1. If the `.Spec.TemplateRevision` is equivalent to the +`.Status.TargetTemplateRevision`, no update has occurred. Note that, in the +event that both are nil, they are considered to be equivalent, and we expect +this to occur after an initial upgrade to a version of Kubernetes that supports +StatefulSet update form one that does not. +1. If the `.Status.TemplateRevision` field is nil, and the +`.Spec.TemplateRevision` is not nil, then the StatefulSet has no revision +history. To initialize its revision history, the StatefulSet controller will +set both `.Status.TemplateRevision` and `.Status.TargetTemplateRevision` +to `.Spec.TemplateRevision` and +[create a new PodTemplate](#podtemplate-creation). +1. If the `.Status.TemplateRevision` is not nil, and if the +`.Spec.TemplateRevision` is not equal to the `.Status.TargetTemplateRevision`, +the StatefulSet controller will do the following. + 1. The controller will + [reconstruct the revision history](#history-reconsturction) of the + StatefulSet. + 1. If the revision history of the StatefulSet contains a PodTemplate + whose `.PodTemplateSpec` is semantically, deeply equivalent to the + StatefulSet's `.Spec.Template`, the youngest such PodTemplate will be used + as the target PodTemplate. + 1. If no such PodTemplate exists, the StatefulSet controller will + [create a new PodTemplate](#podtemplate-creation) from the StatefulSet's + `.Spec.Template`, and it will use this as the target PodTemplate. + 1. The controller will update the StatefulSet's `.Status.TargetTemplate` + based on the selection made above. + +### PodTemplate Selection +When the StatefulSet controller creates the Pods in a StatefulSet, it will use +the following criteria to select the PodTemplateSpec used to create a +Pod. These criteria allow the controller to continue to make progress toward +its target state, while respecting its guarantees and allowing for rolling +updates back and forward. + +1. If the StatefulSet's `.Spec.TemplateRevision` is nil, then the cluster +has been upgraded from a version that does not support StatefulSet update to +a version that does. + 1. In this case the `.Spec.Template` is the current revision, + and no Pods in the StatefulSet should be labeled with a + `StatefulSetPodTemplateRevision` label. + 1. The StatefulSet will initialize its revision history on the first + update to its `.Spec.Template`. +1. If the StatefulSet's `.Spec.TemplateRevision` is equal to its +`.Status.TemplateRevision`, then there is no update in progress and all +Pods will be created from the PodTemplate matching this revision. +1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, +then it was previously created from the PodTemplate matching the +StatefulSet's `.Status.TemplateRevision`, and it will be recreated +from this PodTemplate. +1. If the Pod's ordinal is in the sequence + `[.Spec.Replicas-.Status.TargetReplicas,.Spec.Replicas)`, then it was + previously created from the PodTemplate matching the StatefulSet's, + `.Status.TargetTemplateRevision`, and it will be recreated from this + PodTemplate. +1. If the ordinal does not meet either of the prior two conditions, and +if ordinal is in the sequence `[0, .Spec.RevisionPartition)`, it will be created +from the PodTemplate matching the StatefulSet's +`.Status.TemplateRevision`. +1. Otherwise, the Pod is created from the PodTemplate matching the +StatefulSet's `.Status.TargetTemplateRevision`. + +### Update Completion +A StatefulSet update is complete when the following conditions are met. + +1. All Pods with ordinals in the sequence `[0,.Spec.Replicas)` have a Status of +Running and a Ready Condition. +1. The StatefulSet's `.Spec.RevisionPartition` is equal to `0`. +1. All Pods in the StatefulSet are labeled with a +`StatefulSetTemplateRevisionLabel` equal to the StatefulSet's +`.Status.TargetTemplateRevision` (This implies they have been created from +the PodTemplate at that revision). + +When a StatefulSet update is complete, the controller will signal completion by +doing the following. + +1. The controller will set the StatefulSet's `.Status.TemplateRevision` to its +`.Status.TargetTemplateRevision`. +1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its +`Status.TargetReplicas`. +1. The controller will set the StatefulSet's `Status.TargetReplicas` to 0. + +### Status Reporting +After processing the creation, update, or deletion of a StatefulSet or Pod, +the StatefulSet controller will record its status by persisting the +a StatefulSetStatus object. This has two purposes. + +1. It allows the StatefulSet controller to recreate the exact StatefulSet +membership in the event of a hard restart of the entire system. +1. It communicates the current state of the StatefulSet to clients. Using the +`.Status.ObserverGeneration`, clients can construct a linearizable view of +the operations performed by the controller. + +When the StatefulSet controller records the status of a StatefulSet it will +do the following. + +1. The controller will increment the `.Status.ObservedGeneration` to communicate +the `.Generation` of the StatefulSet object that was observed. +1. The controller will set the `.Status.Replicas` to the current number of +created Pods. +1. The controller will set the `.Status.ReadyReplicas` to the current number of +Pods that have a Status of Running and a ReadyCondition. +1. The controller will set the `.Status.TemplateRevision` and +`.Status.TargetTemplateRevision` +in accordance with [maintaining its revision history](#history-maintenance) +and the status of any [complete updates](#update-completion). +1. The controller will set the `.Status.CurrentReplicas` to the number of +Pods that it has created from the PodTemplate that corresponds to the +current revision of the StatefulSet. +1. The controller will set the `.Status.TargetReplicas` to the number of Pods +that it has created from the PodTemplate that corresponds to the target +revision of the StatefulSet. +1. The controller will then persist the StatefulSetStatus make it durable and +communicate it to observers. + +## API Server +The API Server will perform validation for StatefulSet updates and ensure that +a StatefulSet's `.Spec.TemplateRevision` is a generator for a strictly +monotonically increasing sequence. + +### StatefulSet Validation +As is currently implemented, the API Server will not allow mutation to any +fields of the StatefulSet object other than `.Spec.Replicas` and +`.Spec.Template.Containers`. This design imposes the following, additional +constraints. + +1. The `.Spec.RevisionHistoryDepty` must be greater than or equal to `2`. +1. The `.Spec.PositionOrdinal` must be in the sequence `[0,.Spec.Replicas)`. + +### TemplateRevision Maintenance +It will be the responsibility of the API Server to enforce that updates to +StatefulSet's `.Spec.Template` atomically increment the +`.Spec.TemplateRevision` counter. There is no need for the value to be +strictly sequential, but it must be strictly, monotonically increasing. +As validation will not allow mutation to any field other than the +`.Spec.Template.Containers` field, the API Server need not track all fields of +StatefulSet's `.Spec` for modifications, but it must trigger an update to the +revision when the current and previous `.Spec.Template` versions fail a test for +deep semantic equality. + +## Kubectl +Kubectl will use the `rollout` command to control and provide the status of +StatefulSet updates. + + - `kubectl rollout status statefulset `: displays the status + of a StatefulSet update. + - `kubectl rollout undo statefulset `: triggers a rollback + of the current update. + - `kubectl rollout history statefulset `: displays a the + StatefulSets revision history. + +## Usage +This section demonstrates how the design functions in typical usage scenarios. + +### Initial Deployment +Users can create a StatefulSet using `kubectl create`. + +Given the following manifest `web.yaml` + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 3 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.8 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + +Users can use the following command to create the StatefulSet. + +```shell +kubectl create -f web.yaml +``` + +The only difference between the proposed and current implementation is that +the proposed implementation will initialize the StatefulSet's revision history +upon initial creation. + +### Rolling out an Update +Users can create a rolling update using `kubectl apply`. If a user creates a +StatefulSet [as above](#initial-deployment), the user can trigger a rolling +update by updating image (as in the manifest as below). + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 3 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.9 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + + +Users can use the following command to trigger a rolling update. + +```shell +kubectl apply -f web.yaml +``` + +### Canaries +Users can create a canary using `kubectl apply`. The only difference between a + [rolling update](#rolling-out-an-update) and a canary is that the + `.Spec.RevisionPartition` is set to `.Spec.Replicas - 1`. + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 3 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.9 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + revisionPartition: 2 + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + +Users can also simultaneously scale up and add a canary. This reduces risk +for some deployment scenarios by adding additional capacity for the canary. + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 4 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.9 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + partitionOrdinal: 3 + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + +### Staged Roll Outs +Users can create a canary using `kubectl apply`. The only difference between a + [canary](#canaries) and a staged roll out is that the `.Spec.RevisionPartition` + is set to value less than `.Spec.Replicas - 1`. + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 3 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.9 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + revisionParition: 2 + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + +Staged roll outs can be used to roll out a configuration, image, or resource +update to some portion of the fleet maintained by the StatefulSet prior to +updating the entire fleet. It is useful to support linear, geometric, and +exponential roll out of an update. Users can modify the +`.Spec.RevisionPartition` to allow the roll out to progress. + +```yaml +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: web +spec: + serviceName: "nginx" + replicas: 3 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: gcr.io/google_containers/nginx-slim:0.9 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + revisionPartition: 1 + volumeClaimTemplates: + - metadata: + name: www + annotations: + volume.alpha.kubernetes.io/storage-class: anything + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +``` + +### Rollbacks +To rollback an update, users can use the `kubectl rollout` command. + +The command below will roll back the `web` StatefulSet to the previous revision in +its history. If a roll out is in progress, it will stop deploying the target +revision, and roll back to the current revision. + +```shell +kubectl rollout undo statefulset web +``` + +### Rolling Forward +Rolling back is usually the safest, and often the fastest, strategy to mitigate +deployment failure, but rolling forward is sometimes the only practical solution +for stateful applications (e.g. A users has a minor configuration error but has +already modified the storage format for the application). Users can use +sequential `kubectl apply`'s to update the `.Status.TargetRevision` of a +StatefulSet. This will respect the `.Spec.RevisionPartition` with respect to the +target state, and it therefor interacts well with canaries and staged roll outs. +Note that, while users can update the target template revision, they can not +update the current template revision. The only way to advance the current +template revision is to successfully complete an update. + +## Tests +- Updating a StatefulSet's containers will trigger updates to the StatefulSet's +Pods respecting the +[identity](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#pod-identity) +and [deployment, and scaling](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#deployment-and-scaling-guarantee) +guarantees. +- A StatefulSet update will block on failure. +- A StatefulSet update can be rolled back. +- A StatefulSet update can be rolled forward by applying another update. +- A StatefulSet update's status can be retrieved. +- A StatefulSet's revision history contains all updates with respect to the +configured revision history depth. +- A StatefulSet update can create a canary. +- A StatefulSet update can be performed in stages. + +## Future Work +In the future, we may implement the following features to enhance StatefulSet +updates. + +### Termination Reason +Without communicating a signal indicating the reason for termination to a Pod in +a StatefulSet, as proposed [here](https://github.com/kubernetes/kubernetes/issues/1462), +the tenant application has no way to determine if it is being terminated due to +a scale down operation or due to an update. +Consider a BASE distributed storage application like Cassandra, where 2 TiB of +persistent data is not atypical, and the data distribution is not identical on +every server. We want to enable two distinct behaviors based on reason for +termination. + +- If the termination is due to scale down, during the configured termination +grace period, the entry point of the Pod should cause the application to drain +its client connections, replicate its persisted data (so that the cluster is not +left under replicated) and decommission the application to remove it from the +cluster. +- If the termination is due to a temporary capacity loss (e.g. an update or an +image upgrade), the application should drain all of its client connections, +flush any in memory data structures to the file system, and synchronize the +file system with storage media. It should not redistribute its data. + +If the application implements the strategy of always redistributing its data, +we unnecessarily decrease recovery time during an update and incur the +additional network and storage cost of two full data redistributions for every +updated node. +It should be noted that this is already an issue for Node cordon and Pod eviction +(due to drain or taints), and applications can use the same mitigation as they +would for these events for StatefulSet update. + +### VolumeTemplatesSpec Updates +While this proposal does not address +[VolumeTemplateSpec updates](https://github.com/kubernetes/kubernetes/issues/41015), +this would be a valuable feature for production users of storage systems that use +intermittent compaction as a form of garbage collection. Application that use +log structured merge trees with size tiered compaction (e.g Cassandra) or append +only B(+/*) Trees (e.g Couchbase) can temporarily double their storage usage when +compacting their on disk storage. If there is insufficient space for compaction +to progress, these applications will either fail or degrade until +additional capacity is added. While, if the user is using AWS EBS or GCE PD, +there are valid manual workarounds to expand the size of a PD, it would be +useful to automate the resize via updates to the StatefulSet's +VolumeClaimsTemplate. + +### In Place Updates +Currently configuration, images, and resource request/limits updates are all +performed destructively. Without a [termination reason](https://github.com/kubernetes/kubernetes/issues/1462) +implementation, there is little value to implementing in place image updates, +and configuration and resource request/limit updates are not possible. +When [termination reason](#https://github.com/kubernetes/kubernetes/issues/1462) is implemented we may modify +the behavior of StatefulSet update to only update, rather than delete and +create, Pods when the only mutated value is the container image, and if resizable +resource request/limits is implemented, we may extend the above to +allow for updates to Pod resources. From 708300f8ca01c5f08bce904ce28e6ecc849bb996 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 3 Apr 2017 13:43:08 -0700 Subject: [PATCH 02/12] address kargakis@ comments --- .../design-proposals/statefulset-update.md | 59 ++++++++++--------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index eceaa227c2f..962e664881b 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -47,7 +47,7 @@ applications storage capacity, I want to update PersistentVolumes. network configuration of the application, I want to update Services and container ports in a consistent way. - As the administrator of a stateful application, when I scale my application -horizontally, I went associated PodDistruptionBudgets to be adjusted to +horizontally, I want associated PodDistruptionBudgets to be adjusted to compensate for the application's scaling. ## Assumptions @@ -109,14 +109,14 @@ The following modifications will be made to the StatefulSetStatus API object. type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are ommitted for brevity. - // CurrentTemplateRevision, if not nil, is the revision of the PodTemplate - // that was used to create Pods with ordinals in the sequence - // [0,CurrentRevisionReplicas). + // TemplateRevision, if not nil, is the revision of the PodTemplate that was + // used to create Pods with ordinals in the sequence + // [0,CurrentReplicas). TemplateRevision *int64 `json:"templateRevision,omitempty"` // TargetTemplateRevision, if not nil, is the revision of the PodTemplate // that was used to create Pods with ordinals in the sequence - // [Replicas - TargetRevisionReplicas, Replicas). + // [Replicas - UpdatedReplicas, Replicas). TargetTemplateRevision *int64 `json:"targetTemplateRevision,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet @@ -127,9 +127,9 @@ The following modifications will be made to the StatefulSetStatus API object. // controller from the PodTemplateSpec indicated by CurrentTemplateRevision. CurrentReplicas int32 `json:"currentReplicas,omitempty"` - // TargetRevisionReplicas is the number of Pods created by the StatefulSet + // UpdatedReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec indicated by TargetTemplateRevision. - TargeReplicas int32 `json:"taretReplicas,omitempty"` + UpdatedReplicas int32 `json:"taretReplicas,omitempty"` } ``` @@ -148,7 +148,7 @@ type StatefulSetSpec struct { // Template. TemplateRevision *int64 `json:"templateRevision"` - // RevisionParition paritions the Pods in the StatefulSet by ordinal such + // RevisionPartition partitions the Pods in the StatefulSet by ordinal such // that all Pods with a lower ordinal will be created from the PodTemplate that // represents the current revision of the StatefulSet's revision history and // all Pods with an a greater or equal ordinal will be created from the @@ -156,10 +156,10 @@ type StatefulSetSpec struct { // revision history. RevisionPartition *int32 `json:"revisionPartition,omitempty` - // RevisionHistoryDepth is the maximum number of PodTemplates that will + // RevisionHistoryLimit is the maximum number of PodTemplates that will // be maintained in the StatefulSet's revision history. It must be at // least two. - RevisionHisotryDepth int32 `json:historyRevisionDepth,omitempty` + RevisionHisotryLimit int32 `json:historyRevisionDepth,omitempty` } ``` @@ -201,7 +201,7 @@ StatefulSet's StatefulSetSpec and StatefulSetStatus are below. 1. The StatefulSet's `.Status.TemplateRevision` is equal to its `.Status.TargetRevision`. 1. All Pods in the StatefulSet have been generated from the PodTemplate - labled with a `StatefulSetTemplateRevisionLabel` equal to its + labeled with a `StatefulSetTemplateRevisionLabel` equal to its `.Status.TemplateRevision`. 1. If the StatefulSet's `.Spec.RevisionPartition` is not nil, then the following is true. @@ -258,8 +258,8 @@ the largest ordinal will be deleted. ### StatefulSet Revision History The StatefulSet controller will use labeled, versioned PodTemplates to keep a history of updates preformed on a StatefulSet. The number of stored PodTemplates -is considered to be the depth of the StatefulSet's revision history. The -maximum revision history depth for a StatefulSet must be at least two, but it +is considered to be the limit of the StatefulSet's revision history. The +maximum revision history limit for a StatefulSet must be at least two, but it may be greater. #### PodTemplate Creation @@ -290,7 +290,8 @@ overlap from causing the deletion of PodTemplates that are part of another object's revision history. In practice, these PodTemplates will be filtered out prior to history maintenance. 1. If the PodTemplate's ControllerRef matches the StatefulSet, the -StatefulSet controller will delete the PodTemplate. +StatefulSet controller orphan the PodTemplate by removing its ControllerRef, +and it will allow the PodTemplate to be deleted via garbage collection. #### History Reconstruction In order to reconstruct the history of revisions to a StatefulSet, the @@ -330,11 +331,11 @@ the oldest PodTemplates from the StatefulSet's revision history. [reconstruct the revision history](#history-reconstruction) of the StatefulSet. 1. If the number of PodTemplates in the StatefulSet's revision history is -greater than the StatefulSet's `.Spec.RevisionHistoryDepth`, the +greater than the StatefulSet's `.Spec.RevisionHistoryLimit, the StatefulSet controller will delete PodTemplates, starting with the head of -the revision history, until the depth of the revision history is equal to -the StatefulSet's `.Spec.RevisionHistoryDepth`. -1. As a StatefulSet's `.Spec.RevisionHistoryDepth` is always at least two, and +the revision history, until the limit of the revision history is equal to +the StatefulSet's `.Spec.RevisionHistoryLimit`. +1. As a StatefulSet's `.Spec.RevisionHistoryLimit` is always at least two, and as the PodTemplates corresponding to `.Status.TemplateRevision` or `.Status.TargetTemplateRevision` are always the most recent PodTemplates in the revision history, the StatefulSet controller will not delete any @@ -397,7 +398,7 @@ then it was previously created from the PodTemplate matching the StatefulSet's `.Status.TemplateRevision`, and it will be recreated from this PodTemplate. 1. If the Pod's ordinal is in the sequence - `[.Spec.Replicas-.Status.TargetReplicas,.Spec.Replicas)`, then it was + `[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was previously created from the PodTemplate matching the StatefulSet's, `.Status.TargetTemplateRevision`, and it will be recreated from this PodTemplate. @@ -425,8 +426,8 @@ doing the following. 1. The controller will set the StatefulSet's `.Status.TemplateRevision` to its `.Status.TargetTemplateRevision`. 1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its -`Status.TargetReplicas`. -1. The controller will set the StatefulSet's `Status.TargetReplicas` to 0. +`Status.UpdatedReplicas`. +1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0. ### Status Reporting After processing the creation, update, or deletion of a StatefulSet or Pod, @@ -455,7 +456,7 @@ and the status of any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of Pods that it has created from the PodTemplate that corresponds to the current revision of the StatefulSet. -1. The controller will set the `.Status.TargetReplicas` to the number of Pods +1. The controller will set the `.Status.UpdatedReplicas` to the number of Pods that it has created from the PodTemplate that corresponds to the target revision of the StatefulSet. 1. The controller will then persist the StatefulSetStatus make it durable and @@ -788,7 +789,7 @@ guarantees. - A StatefulSet update can be rolled forward by applying another update. - A StatefulSet update's status can be retrieved. - A StatefulSet's revision history contains all updates with respect to the -configured revision history depth. +configured revision history limit. - A StatefulSet update can create a canary. - A StatefulSet update can be performed in stages. @@ -803,7 +804,7 @@ the tenant application has no way to determine if it is being terminated due to a scale down operation or due to an update. Consider a BASE distributed storage application like Cassandra, where 2 TiB of persistent data is not atypical, and the data distribution is not identical on -every server. We want to enable two distinct behaviors based on reason for +every server. We want to enable two distinct behaviors based on the reason for termination. - If the termination is due to scale down, during the configured termination @@ -843,8 +844,8 @@ Currently configuration, images, and resource request/limits updates are all performed destructively. Without a [termination reason](https://github.com/kubernetes/kubernetes/issues/1462) implementation, there is little value to implementing in place image updates, and configuration and resource request/limit updates are not possible. -When [termination reason](#https://github.com/kubernetes/kubernetes/issues/1462) is implemented we may modify -the behavior of StatefulSet update to only update, rather than delete and -create, Pods when the only mutated value is the container image, and if resizable -resource request/limits is implemented, we may extend the above to -allow for updates to Pod resources. +When [termination reason](#https://github.com/kubernetes/kubernetes/issues/1462) +is implemented we may modify the behavior of StatefulSet update to only update, +rather than delete and create, Pods when the only mutated value is the container + image, and if resizable resource request/limits is implemented, we may extend + the above to allow for updates to Pod resources. From ad19d350a4b9b0eef79fe292aae4c0cdce318efc Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 3 Apr 2017 16:18:15 -0700 Subject: [PATCH 03/12] Change Revision to Generation for consistency Make reference to fact that Deployment and Stateful set both update from a specific revision to a specific revision --- .../design-proposals/statefulset-update.md | 158 +++++++++--------- 1 file changed, 80 insertions(+), 78 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 962e664881b..fac75b77ebd 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -54,7 +54,7 @@ compensate for the application's scaling. - StatefulSet update must support singleton StatefulSets. However, an update in this case will cause a temporary outage. This is acceptable as a single process application is, by definition, not highly available. - - Disruption in Kubernetes is controlled by PodDistruptionBugets. As + - Disruption in Kubernetes is controlled by PodDistruptionBudgets. As StatefulSet updates progress one Pod at a time, and only occur when all other Pods have a Status of Running and a Ready Condition, they can not violate reasonable PodDisrutptionBugdets. @@ -81,8 +81,10 @@ compensate for the application's scaling. - Stateful applications are likely to evolve wire protocols and storage formats between versions. In most cases, when updating the application's Pod's containers, it will not be safe to roll back or forward to an arbitrary - version. StatefulSet update should work well when rolling out an update, - or performing a rollback, between two specific revisions of the StatefulSet. + version. Controller based Pod update should work well when rolling out an + update, or performing a rollback, between two specific revisions of the + controlled API object. This is how Deployment functions, and this property is, + perhaps, even more critical for stateful applications. ## Requirements This design is based on the following requirements. @@ -109,26 +111,26 @@ The following modifications will be made to the StatefulSetStatus API object. type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are ommitted for brevity. - // TemplateRevision, if not nil, is the revision of the PodTemplate that was + // TemplateGeneration, if not nil, is the generation of the PodTemplate that was // used to create Pods with ordinals in the sequence // [0,CurrentReplicas). - TemplateRevision *int64 `json:"templateRevision,omitempty"` + TemplateGeneration *int64 `json:"templateGeneration,omitempty"` - // TargetTemplateRevision, if not nil, is the revision of the PodTemplate + // TargetTemplateGeneration, if not nil, is the generation of the PodTemplate // that was used to create Pods with ordinals in the sequence // [Replicas - UpdatedReplicas, Replicas). - TargetTemplateRevision *int64 `json:"targetTemplateRevision,omitempty"` + TargetTemplateGeneration *int64 `json:"targetTemplateGeneration,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet // controller, that have a Status for Running and a Ready Condition. ReadyReplicas int32 `json:"readyReplicas,omitempty"` - // CurrentRevisionReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by CurrentTemplateRevision. + // CurrentGenerationReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by CurrentTemplateGeneration. CurrentReplicas int32 `json:"currentReplicas,omitempty"` // UpdatedReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by TargetTemplateRevision. + // controller from the PodTemplateSpec indicated by TargetTemplateGeneration. UpdatedReplicas int32 `json:"taretReplicas,omitempty"` } ``` @@ -141,25 +143,25 @@ type StatefulSetSpec struct { // ommitted for brevity. v1.PodTemplateSpec `json:"template"` - // TemplateRevision is a monotonically increasing, 64 bit, integer used to + // TemplateGeneration is a monotonically increasing, 64 bit, integer used to // indicate the version of the of the PodTemplateSpec. If nil, the // StatefulSetController has not initialized its revision history, // change tracking is not enabled, and all Pods will be created from // Template. - TemplateRevision *int64 `json:"templateRevision"` + TemplateGeneration *int64 `json:"templateGeneration"` - // RevisionPartition partitions the Pods in the StatefulSet by ordinal such + // GenerationPartition partitions the Pods in the StatefulSet by ordinal such // that all Pods with a lower ordinal will be created from the PodTemplate that // represents the current revision of the StatefulSet's revision history and // all Pods with an a greater or equal ordinal will be created from the // PodTemplate that represents the target revision of the StatefulSet's // revision history. - RevisionPartition *int32 `json:"revisionPartition,omitempty` + GenerationPartition *int32 `json:"generationPartition,omitempty` // RevisionHistoryLimit is the maximum number of PodTemplates that will // be maintained in the StatefulSet's revision history. It must be at // least two. - RevisionHisotryLimit int32 `json:historyRevisionDepth,omitempty` + RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty` } ``` @@ -171,10 +173,10 @@ Additionally, we introduce the following constants. // history. const StatefulSetPodTemplateLabel = "created-by-statefulset" -// StatefulSetTemplateRevisionLabel is the label applied to a PodTemplate or +// StatefulSetTemplateGenerationLabel is the label applied to a PodTemplate or // Pod to indicate the position of the object's Template in the revision // history of a StatefulSet. -const StatefulSetTemplateRevisionLabel = "statefulset-template-revision" +const StatefulSetTemplateGenerationLabel = "statefulset-template-revision" ``` ## StatefulSet Controller @@ -197,23 +199,23 @@ conform to the other. The conditions that define this state in terms of the StatefulSet's StatefulSetSpec and StatefulSetStatus are below. 1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. -1. If StatefulSet's `.Spec.RevisionPartition` is nil, then the following is true. - 1. The StatefulSet's `.Status.TemplateRevision` is equal to its - `.Status.TargetRevision`. +1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is true. + 1. The StatefulSet's `.Status.TemplateGeneration` is equal to its + `.Status.TargetGeneration`. 1. All Pods in the StatefulSet have been generated from the PodTemplate - labeled with a `StatefulSetTemplateRevisionLabel` equal to its - `.Status.TemplateRevision`. -1. If the StatefulSet's `.Spec.RevisionPartition` is not nil, then the following + labeled with a `StatefulSetTemplateGenerationLabel` equal to its + `.Status.TemplateGeneration`. +1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the following is true. - 1. All Pods with ordinals is the sequence `[0,.Spec.RevisionPartition)` have + 1. All Pods with ordinals is the sequence `[0,.Spec.GenerationPartition)` have been generated from the PodTemplate in the StatefulSet's revision history - that is labeled with a `StatefulSetTemplateRevisionLabel` equal to - `.Status.TemplateRevision`. + that is labeled with a `StatefulSetTemplateGenerationLabel` equal to + `.Status.TemplateGeneration`. 1. All Pods with ordinals in the sequence - `[Spec.RevisionParition,.Spec.Replicas)` have been created with the + `[Spec.GenerationParition,.Spec.Replicas)` have been created with the PodTemplate in the StatefulSet's revision history that is labeled with a - `StatefulSetTemplateRevisionLabel` equal to - `.Status.TargetTemplateRevision`. + `StatefulSetTemplateGenerationLabel` equal to + `.Status.TargetTemplateGeneration`. ### Revised Controller Algorithm The StatefulSet controller will use the following algorithm to continue to @@ -274,10 +276,10 @@ StatefulSet's `.Spec.Template` field. `StatefulSetPodTemplateLabel` set to the StatefulSet's `.Name` to allow for selection of the PodTemplates that comprise the StatefulSet's revision history. 1. The controller will label the PodTemplate with a -`StaefulSetTemplateRevisionLabel` set to the StatefulSet's -`.Spec.TemplateRevision`. +`StaefulSetTemplateGenerationLabel` set to the StatefulSet's +`.Spec.TemplateGeneration`. 1. The controller will set the Name of the PodTemplate to a concatenation of the -`.Name` of the StatefulSet and the `.Spec.TemplateRevision`. +`.Name` of the StatefulSet and the `.Spec.TemplateGeneration`. 1. The controller will then create the PodTemplate. #### PodTemplate Deletion @@ -297,7 +299,7 @@ and it will allow the PodTemplate to be deleted via garbage collection. In order to reconstruct the history of revisions to a StatefulSet, the StatefulSet controller will do the following. -1. If the StatefulSet's `.Spec.TemplateRevision` is nil, the StatefulSet +1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, the StatefulSet has never been updated, and its history has never been initialized. This is the state the object will be in when a cluster is first upgraded from a version that does not support StatefulSet update to a version that does. In this case, @@ -311,12 +313,12 @@ ControllerRef matching the the StatefulSet. If the controller selects PodTemplates that it does not own, it will report an error, but it will continue reconstructing the StatefulSet's history. 1. The controller will filter out all PodTemplates that do not have a -`StatefulSetTemplateRevisionLabel` mapped to a valid revision. This can only +`StatefulSetTemplateGenerationLabel` mapped to a valid revision. This can only occur if the user purposefully deletes the label. In this case, the controller will report an error, but it will continue reconstructing the StatefulSet's revision history. 1. For all the remaining PodTemplates, the controller will sort them in -ascending order by the value mapped to their `StatefulSetTemplateRevisionLabel`. +ascending order by the value mapped to their `StatefulSetTemplateGenerationLabel`. This will reconstruct a list of PodTemplates from oldest to newest. Note that, as the revision is monotonically increasing for an individual StatefulSet, and as we use ControllerRef to mitigate selector overlap, the StatefulSet's history @@ -331,13 +333,13 @@ the oldest PodTemplates from the StatefulSet's revision history. [reconstruct the revision history](#history-reconstruction) of the StatefulSet. 1. If the number of PodTemplates in the StatefulSet's revision history is -greater than the StatefulSet's `.Spec.RevisionHistoryLimit, the +greater than the StatefulSet's `.Spec.RevisionHistoryLimit`, the StatefulSet controller will delete PodTemplates, starting with the head of the revision history, until the limit of the revision history is equal to the StatefulSet's `.Spec.RevisionHistoryLimit`. 1. As a StatefulSet's `.Spec.RevisionHistoryLimit` is always at least two, and -as the PodTemplates corresponding to `.Status.TemplateRevision` -or `.Status.TargetTemplateRevision` are always the most recent PodTemplates +as the PodTemplates corresponding to `.Status.TemplateGeneration` +or `.Status.TargetTemplateGeneration` are always the most recent PodTemplates in the revision history, the StatefulSet controller will not delete any `PodTemplates` that represent the current or target revisions. @@ -346,21 +348,21 @@ The StatefulSet controller will create PodTemplates upon mutation of the `.Spec.Template` of a StatefulSet. 1. When the StafefulSet controller observes a mutation to a StatefulSet's - `.Spec.Template` it will compare the `.Spec.TemplateRevision` to the - `.Status.TargetTemplateRevision`. -1. If the `.Spec.TemplateRevision` is equivalent to the -`.Status.TargetTemplateRevision`, no update has occurred. Note that, in the + `.Spec.Template` it will compare the `.Spec.TemplateGeneration` to the + `.Status.TargetTemplateGeneration`. +1. If the `.Spec.TemplateGeneration` is equivalent to the +`.Status.TargetTemplateGeneration`, no update has occurred. Note that, in the event that both are nil, they are considered to be equivalent, and we expect this to occur after an initial upgrade to a version of Kubernetes that supports StatefulSet update form one that does not. -1. If the `.Status.TemplateRevision` field is nil, and the -`.Spec.TemplateRevision` is not nil, then the StatefulSet has no revision +1. If the `.Status.TemplateGeneration` field is nil, and the +`.Spec.TemplateGeneration` is not nil, then the StatefulSet has no revision history. To initialize its revision history, the StatefulSet controller will -set both `.Status.TemplateRevision` and `.Status.TargetTemplateRevision` -to `.Spec.TemplateRevision` and +set both `.Status.TemplateGeneration` and `.Status.TargetTemplateGeneration` +to `.Spec.TemplateGeneration` and [create a new PodTemplate](#podtemplate-creation). -1. If the `.Status.TemplateRevision` is not nil, and if the -`.Spec.TemplateRevision` is not equal to the `.Status.TargetTemplateRevision`, +1. If the `.Status.TemplateGeneration` is not nil, and if the +`.Spec.TemplateGeneration` is not equal to the `.Status.TargetTemplateGeneration`, the StatefulSet controller will do the following. 1. The controller will [reconstruct the revision history](#history-reconsturction) of the @@ -382,49 +384,49 @@ Pod. These criteria allow the controller to continue to make progress toward its target state, while respecting its guarantees and allowing for rolling updates back and forward. -1. If the StatefulSet's `.Spec.TemplateRevision` is nil, then the cluster +1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, then the cluster has been upgraded from a version that does not support StatefulSet update to a version that does. 1. In this case the `.Spec.Template` is the current revision, and no Pods in the StatefulSet should be labeled with a - `StatefulSetPodTemplateRevision` label. + `StatefulSetPodTemplateGeneration` label. 1. The StatefulSet will initialize its revision history on the first update to its `.Spec.Template`. -1. If the StatefulSet's `.Spec.TemplateRevision` is equal to its -`.Status.TemplateRevision`, then there is no update in progress and all +1. If the StatefulSet's `.Spec.TemplateGeneration` is equal to its +`.Status.TemplateGeneration`, then there is no update in progress and all Pods will be created from the PodTemplate matching this revision. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, then it was previously created from the PodTemplate matching the -StatefulSet's `.Status.TemplateRevision`, and it will be recreated +StatefulSet's `.Status.TemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the Pod's ordinal is in the sequence `[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was previously created from the PodTemplate matching the StatefulSet's, - `.Status.TargetTemplateRevision`, and it will be recreated from this + `.Status.TargetTemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the ordinal does not meet either of the prior two conditions, and -if ordinal is in the sequence `[0, .Spec.RevisionPartition)`, it will be created +if ordinal is in the sequence `[0, .Spec.GenerationPartition)`, it will be created from the PodTemplate matching the StatefulSet's -`.Status.TemplateRevision`. +`.Status.TemplateGeneration`. 1. Otherwise, the Pod is created from the PodTemplate matching the -StatefulSet's `.Status.TargetTemplateRevision`. +StatefulSet's `.Status.TargetTemplateGeneration`. ### Update Completion A StatefulSet update is complete when the following conditions are met. 1. All Pods with ordinals in the sequence `[0,.Spec.Replicas)` have a Status of Running and a Ready Condition. -1. The StatefulSet's `.Spec.RevisionPartition` is equal to `0`. +1. The StatefulSet's `.Spec.GenerationPartition` is equal to `0`. 1. All Pods in the StatefulSet are labeled with a -`StatefulSetTemplateRevisionLabel` equal to the StatefulSet's -`.Status.TargetTemplateRevision` (This implies they have been created from +`StatefulSetTemplateGenerationLabel` equal to the StatefulSet's +`.Status.TargetTemplateGeneration` (This implies they have been created from the PodTemplate at that revision). When a StatefulSet update is complete, the controller will signal completion by doing the following. -1. The controller will set the StatefulSet's `.Status.TemplateRevision` to its -`.Status.TargetTemplateRevision`. +1. The controller will set the StatefulSet's `.Status.TemplateGeneration` to its +`.Status.TargetTemplateGeneration`. 1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its `Status.UpdatedReplicas`. 1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0. @@ -449,8 +451,8 @@ the `.Generation` of the StatefulSet object that was observed. created Pods. 1. The controller will set the `.Status.ReadyReplicas` to the current number of Pods that have a Status of Running and a ReadyCondition. -1. The controller will set the `.Status.TemplateRevision` and -`.Status.TargetTemplateRevision` +1. The controller will set the `.Status.TemplateGeneration` and +`.Status.TargetTemplateGeneration` in accordance with [maintaining its revision history](#history-maintenance) and the status of any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of @@ -464,7 +466,7 @@ communicate it to observers. ## API Server The API Server will perform validation for StatefulSet updates and ensure that -a StatefulSet's `.Spec.TemplateRevision` is a generator for a strictly +a StatefulSet's `.Spec.TemplateGeneration` is a generator for a strictly monotonically increasing sequence. ### StatefulSet Validation @@ -473,13 +475,13 @@ fields of the StatefulSet object other than `.Spec.Replicas` and `.Spec.Template.Containers`. This design imposes the following, additional constraints. -1. The `.Spec.RevisionHistoryDepty` must be greater than or equal to `2`. +1. The `.Spec.RevisionHistoryLimit` must be greater than or equal to `2`. 1. The `.Spec.PositionOrdinal` must be in the sequence `[0,.Spec.Replicas)`. -### TemplateRevision Maintenance +### TemplateGeneration Maintenance It will be the responsibility of the API Server to enforce that updates to StatefulSet's `.Spec.Template` atomically increment the -`.Spec.TemplateRevision` counter. There is no need for the value to be +`.Spec.TemplateGeneration` counter. There is no need for the value to be strictly sequential, but it must be strictly, monotonically increasing. As validation will not allow mutation to any field other than the `.Spec.Template.Containers` field, the API Server need not track all fields of @@ -599,7 +601,7 @@ kubectl apply -f web.yaml ### Canaries Users can create a canary using `kubectl apply`. The only difference between a [rolling update](#rolling-out-an-update) and a canary is that the - `.Spec.RevisionPartition` is set to `.Spec.Replicas - 1`. + `.Spec.GenerationPartition` is set to `.Spec.Replicas - 1`. ```yaml apiVersion: apps/v1beta1 @@ -623,7 +625,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionPartition: 2 + generationParition: 2 volumeClaimTemplates: - metadata: name: www @@ -661,7 +663,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - partitionOrdinal: 3 + generationParition: 3 volumeClaimTemplates: - metadata: name: www @@ -676,7 +678,7 @@ spec: ### Staged Roll Outs Users can create a canary using `kubectl apply`. The only difference between a - [canary](#canaries) and a staged roll out is that the `.Spec.RevisionPartition` + [canary](#canaries) and a staged roll out is that the `.Spec.GenerationPartition` is set to value less than `.Spec.Replicas - 1`. ```yaml @@ -701,7 +703,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionParition: 2 + generationParition: 2 volumeClaimTemplates: - metadata: name: www @@ -718,7 +720,7 @@ Staged roll outs can be used to roll out a configuration, image, or resource update to some portion of the fleet maintained by the StatefulSet prior to updating the entire fleet. It is useful to support linear, geometric, and exponential roll out of an update. Users can modify the -`.Spec.RevisionPartition` to allow the roll out to progress. +`.Spec.GenerationPartition` to allow the roll out to progress. ```yaml apiVersion: apps/v1beta1 @@ -742,7 +744,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionPartition: 1 + generationParition: 1 volumeClaimTemplates: - metadata: name: www @@ -771,8 +773,8 @@ Rolling back is usually the safest, and often the fastest, strategy to mitigate deployment failure, but rolling forward is sometimes the only practical solution for stateful applications (e.g. A users has a minor configuration error but has already modified the storage format for the application). Users can use -sequential `kubectl apply`'s to update the `.Status.TargetRevision` of a -StatefulSet. This will respect the `.Spec.RevisionPartition` with respect to the +sequential `kubectl apply`'s to update the `.Status.TargetGeneration` of a +StatefulSet. This will respect the `.Spec.GenerationPartition` with respect to the target state, and it therefor interacts well with canaries and staged roll outs. Note that, while users can update the target template revision, they can not update the current template revision. The only way to advance the current @@ -829,7 +831,7 @@ would for these events for StatefulSet update. While this proposal does not address [VolumeTemplateSpec updates](https://github.com/kubernetes/kubernetes/issues/41015), this would be a valuable feature for production users of storage systems that use -intermittent compaction as a form of garbage collection. Application that use +intermittent compaction as a form of garbage collection. Applications that use log structured merge trees with size tiered compaction (e.g Cassandra) or append only B(+/*) Trees (e.g Couchbase) can temporarily double their storage usage when compacting their on disk storage. If there is insufficient space for compaction From 68dae83826d25b61421dbc76a4176d3a5205938e Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Tue, 4 Apr 2017 16:41:23 -0700 Subject: [PATCH 04/12] Updates the algorithm to relabel Pods with deeply semantically equivalent PodTemplates keepting the target PodTemplate equivalent to Spec.Template Changes TargetTemplateRevision to UpdatedTemplateRevision Changes TemplateRevision to CurrentTemplateRevision Adds clarity to the relationship between PodTemplate Creation and Template Updates --- .../design-proposals/statefulset-update.md | 222 +++++++++--------- 1 file changed, 108 insertions(+), 114 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index fac75b77ebd..dd7af22c6a1 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -105,42 +105,12 @@ applied to the StatefulSet. ## API Object -The following modifications will be made to the StatefulSetStatus API object. - -```go - type StatefulSetStatus struct { - // ObservedGeneration and Replicas fields are ommitted for brevity. - - // TemplateGeneration, if not nil, is the generation of the PodTemplate that was - // used to create Pods with ordinals in the sequence - // [0,CurrentReplicas). - TemplateGeneration *int64 `json:"templateGeneration,omitempty"` - - // TargetTemplateGeneration, if not nil, is the generation of the PodTemplate - // that was used to create Pods with ordinals in the sequence - // [Replicas - UpdatedReplicas, Replicas). - TargetTemplateGeneration *int64 `json:"targetTemplateGeneration,omitempty"` - - // ReadyReplicas is the current number of Pods, created by the StatefulSet - // controller, that have a Status for Running and a Ready Condition. - ReadyReplicas int32 `json:"readyReplicas,omitempty"` - - // CurrentGenerationReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by CurrentTemplateGeneration. - CurrentReplicas int32 `json:"currentReplicas,omitempty"` - - // UpdatedReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by TargetTemplateGeneration. - UpdatedReplicas int32 `json:"taretReplicas,omitempty"` -} -``` - The following modifications will be made to the StatefulSetSpec API object. ```go type StatefulSetSpec struct { // Replicas, Selector, Template, VolumeClaimsTemplate, and ServiceName - // ommitted for brevity. + // omitted for brevity. v1.PodTemplateSpec `json:"template"` // TemplateGeneration is a monotonically increasing, 64 bit, integer used to @@ -159,20 +129,46 @@ type StatefulSetSpec struct { GenerationPartition *int32 `json:"generationPartition,omitempty` // RevisionHistoryLimit is the maximum number of PodTemplates that will - // be maintained in the StatefulSet's revision history. It must be at - // least two. + // be maintained in the StatefulSet's revision history. The revision history + // consists of all revisions not reprented by a currently applied + // PodTemplate. The default value is 2. RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty` } ``` -Additionally, we introduce the following constants. +The following modifications will be made to the StatefulSetStatus API object. -```go -// StatefulSetPodTemplateLabel is the label applied to a PodTemplate to allow -// the StatefulSet controller to select the PodTemplates in its revision -// history. -const StatefulSetPodTemplateLabel = "created-by-statefulset" +```go + type StatefulSetStatus struct { + // ObservedGeneration and Replicas fields are omitted for brevity. + + // CurrentTemplateGeneration, if not nil, is the generation of the + // PodTemplate that was used to create Pods with ordinals in the sequence + // [0,CurrentReplicas). + CurrentTemplateGeneration *int64 `json:"currnetTemplateGeneration,omitempty"` + + // UpdatedTemplateGeneration, if not nil, is the generation of the PodTemplate + // that was used to create Pods with ordinals in the sequence + // [Replicas - UpdatedReplicas, Replicas). + UpdatedTemplateGeneration *int64 `json:"updatedTemplateGeneration,omitempty"` + + // ReadyReplicas is the current number of Pods, created by the StatefulSet + // controller, that have a Status for Running and a Ready Condition. + ReadyReplicas int32 `json:"readyReplicas,omitempty"` + + // CurrentReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by CurrentTemplateGeneration. + CurrentReplicas int32 `json:"currentReplicas,omitempty"` + + // UpdatedReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by TargetTemplateGeneration. + UpdatedReplicas int32 `json:"taretReplicas,omitempty"` +} +``` + +Additionally, we introduce the following constant. +```go // StatefulSetTemplateGenerationLabel is the label applied to a PodTemplate or // Pod to indicate the position of the object's Template in the revision // history of a StatefulSet. @@ -199,14 +195,15 @@ conform to the other. The conditions that define this state in terms of the StatefulSet's StatefulSetSpec and StatefulSetStatus are below. 1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. -1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is true. - 1. The StatefulSet's `.Status.TemplateGeneration` is equal to its - `.Status.TargetGeneration`. +1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is +true. + 1. The StatefulSet's `.Status.CurrentTemplateGeneration` is equal to its + `.Status.UpdatedTemplateGeneration`. 1. All Pods in the StatefulSet have been generated from the PodTemplate labeled with a `StatefulSetTemplateGenerationLabel` equal to its `.Status.TemplateGeneration`. -1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the following -is true. +1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the +following is true. 1. All Pods with ordinals is the sequence `[0,.Spec.GenerationPartition)` have been generated from the PodTemplate in the StatefulSet's revision history that is labeled with a `StatefulSetTemplateGenerationLabel` equal to @@ -215,7 +212,7 @@ is true. `[Spec.GenerationParition,.Spec.Replicas)` have been created with the PodTemplate in the StatefulSet's revision history that is labeled with a `StatefulSetTemplateGenerationLabel` equal to - `.Status.TargetTemplateGeneration`. + `.Status.UpdatedTemplateGeneration`. ### Revised Controller Algorithm The StatefulSet controller will use the following algorithm to continue to @@ -237,13 +234,12 @@ created, for the Pod corresponding to the lowest such ordinal, the controller will [select the PodTemplate](#podtemplate-selection) from the StatefulSet's revision history corresponding to the Pod's ordinal and create the Pod. 1. If all Pods in the sequence `[0,.Spec.Replicas)` have been created, but if any -have a Status other than Running or do not have a Ready Condition, the -StatefulSet controller will wait for these Pods to either become Running and -Ready, or to be completely deleted. -1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Status of Running and -a Condition indicating Ready, and if `.Spec.Replicas` is less than -`.Status.Replicas`, the controller will delete the Pod corresponding to the -largest ordinal. This implies that scaling takes precedence over Pod updates. +have a do not have a Ready Condition, the StatefulSet controller will wait for +these Pods to either become Ready, or to be completely deleted. +1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Condition indicating +Ready, and if `.Spec.Replicas` is less than `.Status.Replicas`, the controller +will delete the Pod corresponding to the largest ordinal. This implies that +scaling takes precedence over Pod updates. 1. If all Pods in the range `[0,.Spec.Replicas)` have a Status of Running and a Ready Condition, if `.Spec.Replicas` is equal to `.Status.Replicas`, and if there are Pods that do not match the @@ -251,7 +247,7 @@ there are Pods that do not match the the largest ordinal will be deleted. 1. If the StatefulSet controller has achieved the [declared target state](#target-state), and if that state has a -`.Spec.ParitionOrdinal` equal to `0`, the StatefulSet controller will +`.Spec.GenrationPartition` equal to `0`, the StatefulSet controller will [complete any in progress updates](#update-completion). 1. The controller will [report its status](#status-reporting). 1. The controller will perform any necessary @@ -260,22 +256,24 @@ the largest ordinal will be deleted. ### StatefulSet Revision History The StatefulSet controller will use labeled, versioned PodTemplates to keep a history of updates preformed on a StatefulSet. The number of stored PodTemplates -is considered to be the limit of the StatefulSet's revision history. The -maximum revision history limit for a StatefulSet must be at least two, but it -may be greater. +is considered to be the size of the StatefulSet's revision history. The +maximum size of a StatefulSet's revision history is two (these are the current +and target PodTemplates) plus the history limit (represented by its +`.Spec.RevisionHistoryLimit`). #### PodTemplate Creation -When the StatefulSet controller creates a PodTemplate for a StatefulSet, it will -do the following. +When the `.Spec.Template` of a StatefulSet is [updated](#template-updates), the +StatefulSet controller will create a new PodTemplate to represent the new +revision of the StatefulSet. When the controller creates a PodTemplate for a +StatefulSet, it will do the following. 1. The controller will set the PodTemplate's `.PodTemplateSpec` field to the StatefulSet's `.Spec.Template` field. 1. The controller will create a ControllerRef object in the PodTemplate's `.OwnerReferences` list to mediate selector overlapping. -1. The controller will label the PodTemplate with a -`StatefulSetPodTemplateLabel` set to the StatefulSet's `.Name` to allow for -selection of the PodTemplates that comprise the StatefulSet's revision history. -1. The controller will label the PodTemplate with a +1. The controller will label the PodTemplate with a with a label matching the +StatefulSet's `.Spec.Selector` to allow for the selection of the PodTemplate. +1. The controller will label the PodTemplate's PodTemplateSpec with a `StaefulSetTemplateGenerationLabel` set to the StatefulSet's `.Spec.TemplateGeneration`. 1. The controller will set the Name of the PodTemplate to a concatenation of the @@ -292,8 +290,7 @@ overlap from causing the deletion of PodTemplates that are part of another object's revision history. In practice, these PodTemplates will be filtered out prior to history maintenance. 1. If the PodTemplate's ControllerRef matches the StatefulSet, the -StatefulSet controller orphan the PodTemplate by removing its ControllerRef, -and it will allow the PodTemplate to be deleted via garbage collection. +StatefulSet controller will delete the PodTemplate. #### History Reconstruction In order to reconstruct the history of revisions to a StatefulSet, the @@ -332,16 +329,14 @@ the oldest PodTemplates from the StatefulSet's revision history. 1. The StatefulSet controller will [reconstruct the revision history](#history-reconstruction) of the StatefulSet. +1. The StatefulSet will remove any PodTemplates that correspond to created +Pods. There should be at most two of these, the PodTemplates corresponding +to the current and target revisions. 1. If the number of PodTemplates in the StatefulSet's revision history is greater than the StatefulSet's `.Spec.RevisionHistoryLimit`, the StatefulSet controller will delete PodTemplates, starting with the head of -the revision history, until the limit of the revision history is equal to +the revision history, until the size of the revision history is equal to the StatefulSet's `.Spec.RevisionHistoryLimit`. -1. As a StatefulSet's `.Spec.RevisionHistoryLimit` is always at least two, and -as the PodTemplates corresponding to `.Status.TemplateGeneration` -or `.Status.TargetTemplateGeneration` are always the most recent PodTemplates -in the revision history, the StatefulSet controller will not delete any -`PodTemplates` that represent the current or target revisions. ### Template Updates The StatefulSet controller will create PodTemplates upon mutation of the @@ -349,40 +344,39 @@ The StatefulSet controller will create PodTemplates upon mutation of the 1. When the StafefulSet controller observes a mutation to a StatefulSet's `.Spec.Template` it will compare the `.Spec.TemplateGeneration` to the - `.Status.TargetTemplateGeneration`. + `.Status.UpdatedTemplateGeneration`. 1. If the `.Spec.TemplateGeneration` is equivalent to the -`.Status.TargetTemplateGeneration`, no update has occurred. Note that, in the +`.Status.UpdatedTemplateGeneration`, no update has occurred. Note that, in the event that both are nil, they are considered to be equivalent, and we expect this to occur after an initial upgrade to a version of Kubernetes that supports StatefulSet update form one that does not. 1. If the `.Status.TemplateGeneration` field is nil, and the `.Spec.TemplateGeneration` is not nil, then the StatefulSet has no revision history. To initialize its revision history, the StatefulSet controller will -set both `.Status.TemplateGeneration` and `.Status.TargetTemplateGeneration` +set both `.Status.CurrentTemplateGeneration` and `.Status.UpdatedTemplateGeneration` to `.Spec.TemplateGeneration` and [create a new PodTemplate](#podtemplate-creation). -1. If the `.Status.TemplateGeneration` is not nil, and if the -`.Spec.TemplateGeneration` is not equal to the `.Status.TargetTemplateGeneration`, +1. If the `.Status.CurrentTemplateGeneration` is not nil, and if the +`.Spec.TemplateGeneration` is not equal to the `.Status.UpdatedTemplateGeneration`, the StatefulSet controller will do the following. 1. The controller will [reconstruct the revision history](#history-reconsturction) of the StatefulSet. - 1. If the revision history of the StatefulSet contains a PodTemplate + 1. If the revision history of the StatefulSet contains any PodTemplate whose `.PodTemplateSpec` is semantically, deeply equivalent to the - StatefulSet's `.Spec.Template`, the youngest such PodTemplate will be used - as the target PodTemplate. - 1. If no such PodTemplate exists, the StatefulSet controller will - [create a new PodTemplate](#podtemplate-creation) from the StatefulSet's - `.Spec.Template`, and it will use this as the target PodTemplate. - 1. The controller will update the StatefulSet's `.Status.TargetTemplate` - based on the selection made above. + StatefulSet's `.Spec.Template`, the controller will record the revision of + all such templates. + 1. The controller will update the `StatefulSetPodTemplateGeneration` label + of all Pods that match any revision recorded above. + 1. The controller will update the StatefulSet's + `.Status.UpdatedTemplateGeneration` to the new revision. ### PodTemplate Selection When the StatefulSet controller creates the Pods in a StatefulSet, it will use -the following criteria to select the PodTemplateSpec used to create a -Pod. These criteria allow the controller to continue to make progress toward -its target state, while respecting its guarantees and allowing for rolling -updates back and forward. +the following criteria to select the PodTemplate used to create a Pod. These +criteria allow the controller to continue to make progress toward its target +state, while respecting its guarantees and allowing for rolling updates back +and forward. 1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, then the cluster has been upgraded from a version that does not support StatefulSet update to @@ -393,23 +387,23 @@ a version that does. 1. The StatefulSet will initialize its revision history on the first update to its `.Spec.Template`. 1. If the StatefulSet's `.Spec.TemplateGeneration` is equal to its -`.Status.TemplateGeneration`, then there is no update in progress and all +`.Status.CurrentTemplateGeneration`, then there is no update in progress and all Pods will be created from the PodTemplate matching this revision. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, then it was previously created from the PodTemplate matching the -StatefulSet's `.Status.TemplateGeneration`, and it will be recreated +StatefulSet's `.Status.CurrentTemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the Pod's ordinal is in the sequence `[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was previously created from the PodTemplate matching the StatefulSet's, - `.Status.TargetTemplateGeneration`, and it will be recreated from this + `.Status.UpdatedTemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the ordinal does not meet either of the prior two conditions, and if ordinal is in the sequence `[0, .Spec.GenerationPartition)`, it will be created from the PodTemplate matching the StatefulSet's -`.Status.TemplateGeneration`. +`.Status.CurrentTemplateGeneration`. 1. Otherwise, the Pod is created from the PodTemplate matching the -StatefulSet's `.Status.TargetTemplateGeneration`. +StatefulSet's `.Status.UpdatedTemplateGeneration`. ### Update Completion A StatefulSet update is complete when the following conditions are met. @@ -419,14 +413,14 @@ Running and a Ready Condition. 1. The StatefulSet's `.Spec.GenerationPartition` is equal to `0`. 1. All Pods in the StatefulSet are labeled with a `StatefulSetTemplateGenerationLabel` equal to the StatefulSet's -`.Status.TargetTemplateGeneration` (This implies they have been created from +`.Status.UpdatedTemplateGeneration` (This implies they have been created from the PodTemplate at that revision). When a StatefulSet update is complete, the controller will signal completion by doing the following. -1. The controller will set the StatefulSet's `.Status.TemplateGeneration` to its -`.Status.TargetTemplateGeneration`. +1. The controller will set the StatefulSet's `.Status.CurrentTemplateGeneration` to its +`.Status.UpdatedTemplateGeneration`. 1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its `Status.UpdatedReplicas`. 1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0. @@ -450,9 +444,9 @@ the `.Generation` of the StatefulSet object that was observed. 1. The controller will set the `.Status.Replicas` to the current number of created Pods. 1. The controller will set the `.Status.ReadyReplicas` to the current number of -Pods that have a Status of Running and a ReadyCondition. -1. The controller will set the `.Status.TemplateGeneration` and -`.Status.TargetTemplateGeneration` +Pods that have a Ready Condition. +1. The controller will set the `.Status.CurrentTemplateGeneration` and +`.Status.UpdatedTemplateGeneration` in accordance with [maintaining its revision history](#history-maintenance) and the status of any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of @@ -475,8 +469,10 @@ fields of the StatefulSet object other than `.Spec.Replicas` and `.Spec.Template.Containers`. This design imposes the following, additional constraints. -1. The `.Spec.RevisionHistoryLimit` must be greater than or equal to `2`. -1. The `.Spec.PositionOrdinal` must be in the sequence `[0,.Spec.Replicas)`. +1. The `.Spec.GenerationPartition` must be in the sequence `[0,.Spec.Replicas)`. +1. The `.Spec.TemplateGeneration` must only be mutated by the API Server. It +may be set upon creation by the user, but, after this, it is not mutable by +the user. ### TemplateGeneration Maintenance It will be the responsibility of the API Server to enforce that updates to @@ -504,7 +500,7 @@ StatefulSet updates. This section demonstrates how the design functions in typical usage scenarios. ### Initial Deployment -Users can create a StatefulSet using `kubectl create`. +Users can create a StatefulSet using `kubectl apply`. Given the following manifest `web.yaml` @@ -545,7 +541,7 @@ spec: Users can use the following command to create the StatefulSet. ```shell -kubectl create -f web.yaml +kubectl apply -f web.yaml ``` The only difference between the proposed and current implementation is that @@ -663,7 +659,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationParition: 3 + generationPartition: 3 volumeClaimTemplates: - metadata: name: www @@ -676,9 +672,9 @@ spec: storage: 1Gi ``` -### Staged Roll Outs +### Phased Roll Outs Users can create a canary using `kubectl apply`. The only difference between a - [canary](#canaries) and a staged roll out is that the `.Spec.GenerationPartition` + [canary](#canaries) and a phased roll out is that the `.Spec.GenerationPartition` is set to value less than `.Spec.Replicas - 1`. ```yaml @@ -703,7 +699,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationParition: 2 + generationPartition: 2 volumeClaimTemplates: - metadata: name: www @@ -716,7 +712,7 @@ spec: storage: 1Gi ``` -Staged roll outs can be used to roll out a configuration, image, or resource +Phased roll outs can be used to roll out a configuration, image, or resource update to some portion of the fleet maintained by the StatefulSet prior to updating the entire fleet. It is useful to support linear, geometric, and exponential roll out of an update. Users can modify the @@ -744,7 +740,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationParition: 1 + generationPartition: 1 volumeClaimTemplates: - metadata: name: www @@ -773,12 +769,10 @@ Rolling back is usually the safest, and often the fastest, strategy to mitigate deployment failure, but rolling forward is sometimes the only practical solution for stateful applications (e.g. A users has a minor configuration error but has already modified the storage format for the application). Users can use -sequential `kubectl apply`'s to update the `.Status.TargetGeneration` of a -StatefulSet. This will respect the `.Spec.GenerationPartition` with respect to the -target state, and it therefor interacts well with canaries and staged roll outs. -Note that, while users can update the target template revision, they can not -update the current template revision. The only way to advance the current -template revision is to successfully complete an update. +sequential `kubectl apply`'s to update the StatefulSet's current +[target state](#target-state). The StatefulSet's `.Spec.GenerationPartition` +will be respected, and it therefore interacts well with canaries and phased roll + outs. ## Tests - Updating a StatefulSet's containers will trigger updates to the StatefulSet's From d5d445a5e79814a9011f6cdd21e027a5f8b7393f Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Thu, 6 Apr 2017 09:50:32 -0700 Subject: [PATCH 05/12] typos --- contributors/design-proposals/statefulset-update.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index dd7af22c6a1..413e8d95067 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -111,7 +111,6 @@ The following modifications will be made to the StatefulSetSpec API object. type StatefulSetSpec struct { // Replicas, Selector, Template, VolumeClaimsTemplate, and ServiceName // omitted for brevity. - v1.PodTemplateSpec `json:"template"` // TemplateGeneration is a monotonically increasing, 64 bit, integer used to // indicate the version of the of the PodTemplateSpec. If nil, the @@ -255,7 +254,7 @@ the largest ordinal will be deleted. ### StatefulSet Revision History The StatefulSet controller will use labeled, versioned PodTemplates to keep a -history of updates preformed on a StatefulSet. The number of stored PodTemplates +history of updates performed on a StatefulSet. The number of stored PodTemplates is considered to be the size of the StatefulSet's revision history. The maximum size of a StatefulSet's revision history is two (these are the current and target PodTemplates) plus the history limit (represented by its @@ -621,7 +620,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationParition: 2 + generationPartition: 2 volumeClaimTemplates: - metadata: name: www From 42f7e37181f4d83ae6cfd1ba834d8a42f3bdd86b Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Fri, 5 May 2017 12:53:24 -0700 Subject: [PATCH 06/12] Updates the proposal to use a struct to indicate the strategy as per smarterclayton@ suggestion Updates the proposal to use the common history mechanism agreed upon by the SIG apps subgroup --- .../design-proposals/statefulset-update.md | 548 +++++++++--------- 1 file changed, 259 insertions(+), 289 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 413e8d95067..db6a4d946f4 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -5,7 +5,7 @@ **Status**: Proposal ## Abstract -Currently (as of Kubernetes 1.6), `.Spec.Replicas`, and +Currently (as of Kubernetes 1.6), `.Spec.Replicas` and `.Spec.Template.Containers` are the only mutable fields of the StatefulSet API object. Updating `.Spec.Replicas` will scale the number of Pods in the StatefulSet. Updating `.Spec.Template.Containers` causes all subsequently @@ -47,17 +47,17 @@ applications storage capacity, I want to update PersistentVolumes. network configuration of the application, I want to update Services and container ports in a consistent way. - As the administrator of a stateful application, when I scale my application -horizontally, I want associated PodDistruptionBudgets to be adjusted to +horizontally, I want associated PodDisruptionBudgets to be adjusted to compensate for the application's scaling. ## Assumptions - StatefulSet update must support singleton StatefulSets. However, an update in this case will cause a temporary outage. This is acceptable as a single process application is, by definition, not highly available. - - Disruption in Kubernetes is controlled by PodDistruptionBudgets. As + - Disruption in Kubernetes is controlled by PodDisruptionBudgets. As StatefulSet updates progress one Pod at a time, and only occur when all other Pods have a Status of Running and a Ready Condition, they can not - violate reasonable PodDisrutptionBugdets. + violate reasonable PodDisruptionBudgets. - Without priority and preemption, there is no guarantee that an update will not block due to a loss of capacity or due to the scheduling of another Pod between Pod termination and Pod creation. This is mitigated by blocking the @@ -66,7 +66,10 @@ compensate for the application's scaling. respect to eviction. - We will eventually implement a signal that is delivered to Pods to indicate the - [reason for termination](https://github.com/kubernetes/kubernetes/issues/1462). + [reason for termination](https://github.com/kubernetes/community/pull/541). + - StatefulSet updates will use the methodology outlined in the + [controller history](https://github.com/kubernetes/community/pull/594) proposal + for version tracking, update detection, and rollback detection. This will be a general implementation, usable for any Pod in a Kubernetes cluster. It is, therefore, out of scope to design such a mechanism here. - Kubelet does not support resizing a container's resources without terminating @@ -108,29 +111,59 @@ applied to the StatefulSet. The following modifications will be made to the StatefulSetSpec API object. ```go +// StatefulSetUpdateStrategy indicates the strategy that the StatefulSet +// controller will use to perform updates. It includes any additional parameters +// nessecary to preform the update for the indicated strategy. +type StatefulSetUpdateStrategy struct { + // Type indicates the type of the StatefulSetUpdateStrategy. + Type StatefulSetUpdateStrategyType + // Partition is used to communicate the ordinal at which to partition + // the StatefulSet when Type is PartitionedStatefulSetStrategyType. This + // value must be set when Type is PartitionedStatefulSetStrategyType, + // and it must be nil otherwise. + Partition *ParitionedStatefulSet +} + +// StatefulSetUpdateStrategyType is a string enumeration type that enumerates +// all possible update strategies for the StatefulSet controller. +type StatefulSetUpdateStrategyType string + +const ( + // PartitionedStatefulSetStrategyType indicates that updates will only be + // applied to a partition of the StatefulSet. This is useful for canaries + // and phased roll outs. + PartitionedStatefulSetStrategyType StatefulSetUpdateStrategyType = "Partitioned" + // RollingUpdateStatefulSetStrategyType indicates that update will be + // applied to all Pods in the StatefulSet with respect to the StatefulSet + // ordering constraints. + RollingUpdateStatefulSetStrategyType = "RollingUpdate" + // OnDeleteStatefulSetStrategyType triggers the legacy behavior. Version + // tracking and ordered rolling restarts are disabled. Pods are recreated + // from the StatefulSetSpec when they are manually deleted. + OnDeleteStatefulSetStrategyType = "OnDelete" +) + +// PartitionedStatefulSet contains the parameters used with the +// PartitionedStatefulSetStrategyType. +type PartitionedStatefulSet struct { + // Ordinal indicates the ordinal at which the StatefulSet should be + // partitioned. + Ordianl uint32 +} + type StatefulSetSpec struct { // Replicas, Selector, Template, VolumeClaimsTemplate, and ServiceName // omitted for brevity. - // TemplateGeneration is a monotonically increasing, 64 bit, integer used to - // indicate the version of the of the PodTemplateSpec. If nil, the - // StatefulSetController has not initialized its revision history, - // change tracking is not enabled, and all Pods will be created from - // Template. - TemplateGeneration *int64 `json:"templateGeneration"` - - // GenerationPartition partitions the Pods in the StatefulSet by ordinal such - // that all Pods with a lower ordinal will be created from the PodTemplate that - // represents the current revision of the StatefulSet's revision history and - // all Pods with an a greater or equal ordinal will be created from the - // PodTemplate that represents the target revision of the StatefulSet's - // revision history. - GenerationPartition *int32 `json:"generationPartition,omitempty` + // UpdateStrategy indicates the StatefulSetUpdateStrategy that will be + // employed to update Pods in the StatefulSet when a revision is made to + // Template or VolumeClaimsTemplate. + UpdateStrategy StatefulSetUpdateStrategy `json:"updateStrategy,omitempty` // RevisionHistoryLimit is the maximum number of PodTemplates that will // be maintained in the StatefulSet's revision history. The revision history - // consists of all revisions not reprented by a currently applied - // PodTemplate. The default value is 2. + // consists of all revisions not represented by a currently applied + // StatefulSetSpec version. The default value is 2. RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty` } ``` @@ -141,77 +174,46 @@ The following modifications will be made to the StatefulSetStatus API object. type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are omitted for brevity. - // CurrentTemplateGeneration, if not nil, is the generation of the - // PodTemplate that was used to create Pods with ordinals in the sequence + // CurrentVersion, if not empty, indicates the version of PodSpecTemplate, + // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [0,CurrentReplicas). - CurrentTemplateGeneration *int64 `json:"currnetTemplateGeneration,omitempty"` + CurrentVersion string `json:"currentVersion,omitempty"` - // UpdatedTemplateGeneration, if not nil, is the generation of the PodTemplate - // that was used to create Pods with ordinals in the sequence - // [Replicas - UpdatedReplicas, Replicas). - UpdatedTemplateGeneration *int64 `json:"updatedTemplateGeneration,omitempty"` + // UpdatedVersion, if not empty, indicates the version of PodSpecTemplate, + // VolumeClaimsTemplate tuple used to generate Pods in the sequence + // [Replicas-UpdatedReplicas,Replicas) + UpdateVersion string `json:"updateVersion,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet - // controller, that have a Status for Running and a Ready Condition. + // controller, that have a Status of Running and a Ready Condition. ReadyReplicas int32 `json:"readyReplicas,omitempty"` // CurrentReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by CurrentTemplateGeneration. + // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated + // by CurrentVersion. CurrentReplicas int32 `json:"currentReplicas,omitempty"` // UpdatedReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by TargetTemplateGeneration. + // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated + // by CurrentVersion. UpdatedReplicas int32 `json:"taretReplicas,omitempty"` } ``` -Additionally, we introduce the following constant. +Additionally we introduce the following constant. -```go -// StatefulSetTemplateGenerationLabel is the label applied to a PodTemplate or -// Pod to indicate the position of the object's Template in the revision -// history of a StatefulSet. -const StatefulSetTemplateGenerationLabel = "statefulset-template-revision" -``` +```go +// StatefulSetVersionLabel is the label used by StatefulSet controller to track +// which version of StatefulSet's StatefulSetSpec was used generate a Pod. +const StatefulSetVersionLabel = "StatefulSetVersion" +``` ## StatefulSet Controller The StatefulSet controller will watch for modifications to StatefulSet and Pod API objects. When a StatefulSet is created or updated, or when one of the Pods in a StatefulSet is updated or deleted, the StatefulSet controller will attempt to create, update, or delete Pods to conform the -current state of the system to the user declared target state. -The user declared target state of the system, with respect to an individual -StatefulSet, is determined as below. - -### Target State -The declared target state of a StatefulSet requires that all Pods in the -StatefulSet conform to exactly one or two PodTemplates in the StatefulSet's -revision history. If the declared target state references two PodTemplates, as -is the case when a user wants to perform a canary update or a phased roll out, -they are partitioned around an ordinal such that all Pods with a lower ordinal -conform to one PodTemplate and all Pods with a greater or equal ordinal -conform to the other. The conditions that define this state in terms of the -StatefulSet's StatefulSetSpec and StatefulSetStatus are below. - -1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. -1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is -true. - 1. The StatefulSet's `.Status.CurrentTemplateGeneration` is equal to its - `.Status.UpdatedTemplateGeneration`. - 1. All Pods in the StatefulSet have been generated from the PodTemplate - labeled with a `StatefulSetTemplateGenerationLabel` equal to its - `.Status.TemplateGeneration`. -1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the -following is true. - 1. All Pods with ordinals is the sequence `[0,.Spec.GenerationPartition)` have - been generated from the PodTemplate in the StatefulSet's revision history - that is labeled with a `StatefulSetTemplateGenerationLabel` equal to - `.Status.TemplateGeneration`. - 1. All Pods with ordinals in the sequence - `[Spec.GenerationParition,.Spec.Replicas)` have been created with the - PodTemplate in the StatefulSet's revision history that is labeled with a - `StatefulSetTemplateGenerationLabel` equal to - `.Status.UpdatedTemplateGeneration`. +current state of the system to the user declared [target state](#target-state). ### Revised Controller Algorithm The StatefulSet controller will use the following algorithm to continue to @@ -219,215 +221,176 @@ make progress toward the user declared [target state](#target-state) while respecting the controller's [identity](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#pod-identity), [deployment, and scaling](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#deployment-and-scaling-guarantee) -guarantees. +guarantees. The StatefulSet controller will use the technique proposed in +[Controller History](https://github.com/kubernetes/community/pull/594) to +snapshot and version its [target Object state](#target-pod-state). +1. The controller will reconstruct the +[revision history](#history-reconstruction) of the StatefulSet. 1. The controller will -[reconstruct the revision history](#history-reconstruction) of the StatefulSet. -1. The controller will process any [template updates](#template-updates) to +[process any updates to its StatefulSetSpec](#specification-updates) to ensure that the StatefulSet's revision history is consistent with the user declared desired state. 1. The controller will select all Pods in the StatefulSet, filter any Pods not owned by the StatefulSet, and sort the remaining Pods in ordinal order. +1. For all created Pods, the controller will perform any nessecary +[non-destructive state reconciliation](#pod-state-reconciliation). 1. If any Pods with ordinals in the sequence `[0,.Spec.Replicas)` have not been created, for the Pod corresponding to the lowest such ordinal, the controller -will [select the PodTemplate](#podtemplate-selection) from the StatefulSet's -revision history corresponding to the Pod's ordinal and create the Pod. +will create the Pod with declared [target Pod state](#target-pod-state). 1. If all Pods in the sequence `[0,.Spec.Replicas)` have been created, but if any -have a do not have a Ready Condition, the StatefulSet controller will wait for -these Pods to either become Ready, or to be completely deleted. -1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Condition indicating -Ready, and if `.Spec.Replicas` is less than `.Status.Replicas`, the controller -will delete the Pod corresponding to the largest ordinal. This implies that -scaling takes precedence over Pod updates. +do not have a Ready Condition, the StatefulSet controller will wait for these +Pods to either become Ready, or to be completely deleted. +1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Ready Condition, and +if `.Spec.Replicas` is less than `.Status.Replicas`, the controller will delete +the Pod corresponding to the largest ordinal. This implies that scaling takes +precedence over Pod updates. 1. If all Pods in the range `[0,.Spec.Replicas)` have a Status of Running and a Ready Condition, if `.Spec.Replicas` is equal to `.Status.Replicas`, and if -there are Pods that do not match the -[declared desired PodTemplate](#podtemplate-selection), the Pod corresponding to -the largest ordinal will be deleted. +there are Pods that do not match their [target Pod state](#target-pod-state), +the Pod with the largest ordinal in that set will be deleted. 1. If the StatefulSet controller has achieved the -[declared target state](#target-state), and if that state has a -`.Spec.GenrationPartition` equal to `0`, the StatefulSet controller will +[declared target state](#target-state) the StatefulSet controller will [complete any in progress updates](#update-completion). 1. The controller will [report its status](#status-reporting). 1. The controller will perform any necessary [maintenance of its revision history](#history-maintenance). +### Target State +The target state of the StatefulSet controller with respect to an individual +StatefulSet is defined as follows. + +1. The StatefulSet contains exactly `[0,Spec.Replicas)` Pods. +1. All Pods in the StatefulSet have the correct +[target Pod state](#target-pod-state). + +### Target Pod State +As in the [Controller History](https://github.com/kubernetes/community/pull/594) +proposal we define the target Object state of StatefulSetSpec specification type +object to be the `.Template` and `.VolumeClaimsTemplate`. The latter is currently +immutable, but we will version it as one day this constraint may be lifted. This +state provides enough information to generate a Pod and its associated +PersistentVolumeClaims. The target Pod State for a Pod in a StatefulSet is as +follows. +1. The Pods PersistentVolumeClaims have been created. + - Note that we do not currently delete PersistentVolumeClaims. +1. If the Pod's ordinal is in the sequence `[0,.Spec.Replicas)` the Pod should +have a Ready Condition. This implies the Pod is Running. +1. If Pod's ordinal is greater than or equal to `.Spec.Replicas`, the Pod +should be completely terminated and deleted. +1. If StatefulSet's `Spec.UpdateStrategy.Type` is equal to +`OnDeleteStatefulSetStrategyType` then no version tracking is performed. Pods +can be at an arbitrary version and will be recreated from the current +`.Spec.Template` and `.Spec.VolumeClaimsTemplate` when the are deleted. +1. If StatefulSet's `Spec.UpdateStrategy.Type` is equal to +`RollingUpdateStatefulSetStrategyType` then the version of the Pod should be +as follows. + 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, + the Pod should consistent with version indicated by `Status.CurrentVersion`. + 1. If the Pod's ordinal is in the sequence + `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)` + the Pod should be consistent with the version indicated by + `Status.UpdateVersion`. +1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to +`PartitionedStatefulSetStrategyType` then the version of the Pod should be +as follows. + 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, + the Pod should consistent with version indicated by `Status.CurrentVersion`. + 1. If the Pod's ordinal is in the sequence + `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)`the Pod + should be consistent with the version indicated by `Status.UpdateVersion`. + 1. If the Pod does not meet either of the prior two conditions, and if + ordinal is in the sequence `[0, .Spec.UpdateStrategy.Partition.Ordinal)`, + it should be consistent with the version indicated by + `Status.CurrentVersion`. + 1. Otherwise, the Pod should be consistent with the version indicated + by `Status.UpdateVersion`. + +### Pod State Reconciliation +In order to reconcile a Pod with declared desired +[target state](#target-pod-state) the StatefulSet controller will do the +following. + +1. If the Pod is already consistent with its target state the controller will do +nothing. +1. If the Pod is labeled with a `StatefulSetVersionLabel` that indicates +the Pod was generated from a version of the StatefulSetSpec that is semantically +equivalent to, but not equal to, the [target version](#target-pod-state), the +StatefulSet controller will update the Pod with a `StatefulSetVersionLabel` +indicating the new semantically equivalent version. This form of reconciliation +is non-destructive. +1. If the Pod was not created from the target version, the Pod will be deleted +and recreated from that version. This form of reconciliation is destructive. + +### Specification Updates +The StatefulSet controller will [snapshot](#snapshot-creation) its target +Object state when mutations are made to its `.Spec.Template` or +`.Spec.VolumeClaimsTemplate` (Note that the latter is currently immutable). + +1. When the StatefulSet controller observes a mutation to a StatefulSet's + `.Spec.Template` it will snapshot its target Object state and compare +the snapshot with the version indicated by its `.Status.UpdateVersion`. +1. If the current state is equivalent to the version indicated by +`.Status.UpdateVersion` no update has occurred. +1. If the `Status.CurrentVersion` field is empty, then the StatefulSet has no +revision history. To initialize its revision history, the StatefulSet controller +will set both `.Status.CurrentVersion` and `.Status.UpdateVersion` to the +version of the current snapshot. +1. If the `.Status.CurrentVersion` is not empty, and if the +`.Status.UpdateVersion` is not equal to the version of the current snapshot, +the StatefulSet controller will set the `.Status.UpdateVersion` to the version +indicated by the current snapshot. + ### StatefulSet Revision History -The StatefulSet controller will use labeled, versioned PodTemplates to keep a -history of updates performed on a StatefulSet. The number of stored PodTemplates -is considered to be the size of the StatefulSet's revision history. The -maximum size of a StatefulSet's revision history is two (these are the current -and target PodTemplates) plus the history limit (represented by its -`.Spec.RevisionHistoryLimit`). - -#### PodTemplate Creation -When the `.Spec.Template` of a StatefulSet is [updated](#template-updates), the -StatefulSet controller will create a new PodTemplate to represent the new -revision of the StatefulSet. When the controller creates a PodTemplate for a -StatefulSet, it will do the following. - -1. The controller will set the PodTemplate's `.PodTemplateSpec` field to the -StatefulSet's `.Spec.Template` field. -1. The controller will create a ControllerRef object in the PodTemplate's -`.OwnerReferences` list to mediate selector overlapping. -1. The controller will label the PodTemplate with a with a label matching the -StatefulSet's `.Spec.Selector` to allow for the selection of the PodTemplate. -1. The controller will label the PodTemplate's PodTemplateSpec with a -`StaefulSetTemplateGenerationLabel` set to the StatefulSet's -`.Spec.TemplateGeneration`. -1. The controller will set the Name of the PodTemplate to a concatenation of the -`.Name` of the StatefulSet and the `.Spec.TemplateGeneration`. -1. The controller will then create the PodTemplate. - -#### PodTemplate Deletion -When the `StatefulSet` controller deletes a PodTemplate in the revision -history of a StatefulSet it will do the following. - -1. If the PodTemplate's ControllerRef does not match the StatefulSet, the -controller will not delete the PodTemplate. In this way, we prevent selector -overlap from causing the deletion of PodTemplates that are part of another -object's revision history. In practice, these PodTemplates will be filtered out -prior to history maintenance. -1. If the PodTemplate's ControllerRef matches the StatefulSet, the -StatefulSet controller will delete the PodTemplate. +The StatefulSet controller will use the technique proposed in +[Controller History](https://github.com/kubernetes/community/pull/594) to +snapshot and version its target Object state. + +#### Snapshot Creation +In order to snapshot a version of its target Object state, it will +serialize and store the `.Spec.Template` and `.Spec.VolumesClaimsTemplate` +along with the `.Generation` in each snapshot. Each snapshot will be labeled +with the StatefulSet's `.Selector`. #### History Reconstruction -In order to reconstruct the history of revisions to a StatefulSet, the -StatefulSet controller will do the following. - -1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, the StatefulSet -has never been updated, and its history has never been initialized. This is -the state the object will be in when a cluster is first upgraded from a version -that does not support StatefulSet update to a version that does. In this case, -the controller will not enforce PodTemplate revisions. When creating Pods, -it will always use the StatefulSet's `.Spec.Template`. Otherwise, the controller -will continue as below. -1. The controller will select all PodTemplates with a -`StatefulSetPodTemplateLabel` matching the `.Name` field of the StatefulSet. -1. The controller will filter out all PodTemplates that do not contain a -ControllerRef matching the the StatefulSet. If the controller selects -PodTemplates that it does not own, it will report an error, but it will continue -reconstructing the StatefulSet's history. -1. The controller will filter out all PodTemplates that do not have a -`StatefulSetTemplateGenerationLabel` mapped to a valid revision. This can only -occur if the user purposefully deletes the label. In this case, the -controller will report an error, but it will continue reconstructing the -StatefulSet's revision history. -1. For all the remaining PodTemplates, the controller will sort them in -ascending order by the value mapped to their `StatefulSetTemplateGenerationLabel`. -This will reconstruct a list of PodTemplates from oldest to newest. Note that, -as the revision is monotonically increasing for an individual StatefulSet, and -as we use ControllerRef to mitigate selector overlap, the StatefulSet's history - is a strictly ordered set. +As proposed in +[Controller History](https://github.com/kubernetes/community/pull/594), in +order to reconstruct the revision history of a StatefulSet, the StatefulSet +controller will select all snapshots based on its `Spec.Selector` and sort them +by the contained `.Generation`. This will produce an ordered set of +revisions to the StatefulSet's target Object state. #### History Maintenance In order to prevent the revision history of the StatefulSet from exceeding memory or storage limits, the StatefulSet controller will periodically prune -the oldest PodTemplates from the StatefulSet's revision history. - -1. The StatefulSet controller will -[reconstruct the revision history](#history-reconstruction) -of the StatefulSet. -1. The StatefulSet will remove any PodTemplates that correspond to created -Pods. There should be at most two of these, the PodTemplates corresponding -to the current and target revisions. -1. If the number of PodTemplates in the StatefulSet's revision history is -greater than the StatefulSet's `.Spec.RevisionHistoryLimit`, the -StatefulSet controller will delete PodTemplates, starting with the head of -the revision history, until the size of the revision history is equal to -the StatefulSet's `.Spec.RevisionHistoryLimit`. - -### Template Updates -The StatefulSet controller will create PodTemplates upon mutation of the -`.Spec.Template` of a StatefulSet. - -1. When the StafefulSet controller observes a mutation to a StatefulSet's - `.Spec.Template` it will compare the `.Spec.TemplateGeneration` to the - `.Status.UpdatedTemplateGeneration`. -1. If the `.Spec.TemplateGeneration` is equivalent to the -`.Status.UpdatedTemplateGeneration`, no update has occurred. Note that, in the -event that both are nil, they are considered to be equivalent, and we expect -this to occur after an initial upgrade to a version of Kubernetes that supports -StatefulSet update form one that does not. -1. If the `.Status.TemplateGeneration` field is nil, and the -`.Spec.TemplateGeneration` is not nil, then the StatefulSet has no revision -history. To initialize its revision history, the StatefulSet controller will -set both `.Status.CurrentTemplateGeneration` and `.Status.UpdatedTemplateGeneration` -to `.Spec.TemplateGeneration` and -[create a new PodTemplate](#podtemplate-creation). -1. If the `.Status.CurrentTemplateGeneration` is not nil, and if the -`.Spec.TemplateGeneration` is not equal to the `.Status.UpdatedTemplateGeneration`, -the StatefulSet controller will do the following. - 1. The controller will - [reconstruct the revision history](#history-reconsturction) of the - StatefulSet. - 1. If the revision history of the StatefulSet contains any PodTemplate - whose `.PodTemplateSpec` is semantically, deeply equivalent to the - StatefulSet's `.Spec.Template`, the controller will record the revision of - all such templates. - 1. The controller will update the `StatefulSetPodTemplateGeneration` label - of all Pods that match any revision recorded above. - 1. The controller will update the StatefulSet's - `.Status.UpdatedTemplateGeneration` to the new revision. - -### PodTemplate Selection -When the StatefulSet controller creates the Pods in a StatefulSet, it will use -the following criteria to select the PodTemplate used to create a Pod. These -criteria allow the controller to continue to make progress toward its target -state, while respecting its guarantees and allowing for rolling updates back -and forward. - -1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, then the cluster -has been upgraded from a version that does not support StatefulSet update to -a version that does. - 1. In this case the `.Spec.Template` is the current revision, - and no Pods in the StatefulSet should be labeled with a - `StatefulSetPodTemplateGeneration` label. - 1. The StatefulSet will initialize its revision history on the first - update to its `.Spec.Template`. -1. If the StatefulSet's `.Spec.TemplateGeneration` is equal to its -`.Status.CurrentTemplateGeneration`, then there is no update in progress and all -Pods will be created from the PodTemplate matching this revision. -1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, -then it was previously created from the PodTemplate matching the -StatefulSet's `.Status.CurrentTemplateGeneration`, and it will be recreated -from this PodTemplate. -1. If the Pod's ordinal is in the sequence - `[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was - previously created from the PodTemplate matching the StatefulSet's, - `.Status.UpdatedTemplateGeneration`, and it will be recreated from this - PodTemplate. -1. If the ordinal does not meet either of the prior two conditions, and -if ordinal is in the sequence `[0, .Spec.GenerationPartition)`, it will be created -from the PodTemplate matching the StatefulSet's -`.Status.CurrentTemplateGeneration`. -1. Otherwise, the Pod is created from the PodTemplate matching the -StatefulSet's `.Status.UpdatedTemplateGeneration`. +its revision history so that no more that `.Spec.RevisionHisotryLimit` non-live +versions of target Object state are preserved. ### Update Completion -A StatefulSet update is complete when the following conditions are met. - -1. All Pods with ordinals in the sequence `[0,.Spec.Replicas)` have a Status of -Running and a Ready Condition. -1. The StatefulSet's `.Spec.GenerationPartition` is equal to `0`. -1. All Pods in the StatefulSet are labeled with a -`StatefulSetTemplateGenerationLabel` equal to the StatefulSet's -`.Status.UpdatedTemplateGeneration` (This implies they have been created from -the PodTemplate at that revision). - -When a StatefulSet update is complete, the controller will signal completion by -doing the following. - -1. The controller will set the StatefulSet's `.Status.CurrentTemplateGeneration` to its -`.Status.UpdatedTemplateGeneration`. -1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its -`Status.UpdatedReplicas`. -1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0. +The criteria for update completion is as follows. + +1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to +`OnDeleteStatefulSetStrategyType` then no version tracking is performed. It +this case an update can never be in progress. +1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to +`PartitionedStatefulSetStrategyType` updates can not complete. The version +indicated `.Status.UpdateVersion` will only be applied to Pods with ordinals +in the sequence `(0,.Spec.UpdateStrategy.Partition.Ordinal]` +1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to +`RollingUpdateStatefulSetStrategyType`, then an update is complete when the +StatefulSet is at its [target state](#target-state). The StatefulSet controller +will signal update completion as follows. + 1. The controller will set `.Status.CurrentVersion` to the value of + `.Staus.UpdateVersion`. + 1. The controller will set `.Status.CurrentReplicas` to + `.Status.UpdateReplicas`. Note that this value will be equal to + `.Status.Replicas`. + 1. The controller will set `.Status.UpdateReplicas` to 0. ### Status Reporting After processing the creation, update, or deletion of a StatefulSet or Pod, -the StatefulSet controller will record its status by persisting the -a StatefulSetStatus object. This has two purposes. +the StatefulSet controller will record its status by persisting a +StatefulSetStatus object. This has two purposes. 1. It allows the StatefulSet controller to recreate the exact StatefulSet membership in the event of a hard restart of the entire system. @@ -444,23 +407,20 @@ the `.Generation` of the StatefulSet object that was observed. created Pods. 1. The controller will set the `.Status.ReadyReplicas` to the current number of Pods that have a Ready Condition. -1. The controller will set the `.Status.CurrentTemplateGeneration` and -`.Status.UpdatedTemplateGeneration` -in accordance with [maintaining its revision history](#history-maintenance) -and the status of any [complete updates](#update-completion). +1. The controller will set the `.Status.CurrentVersion` and +`.Status.UpdateVersion` in accordance with StatefulSet's +[revision history](#statefulset-revision history) and +any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of -Pods that it has created from the PodTemplate that corresponds to the -current revision of the StatefulSet. +Pods that it has created from the version indicated by +`.Status.CurrentVersion`. 1. The controller will set the `.Status.UpdatedReplicas` to the number of Pods -that it has created from the PodTemplate that corresponds to the target -revision of the StatefulSet. +that it has created from the version indicated by `.Status.UpdateVersion`. 1. The controller will then persist the StatefulSetStatus make it durable and communicate it to observers. ## API Server -The API Server will perform validation for StatefulSet updates and ensure that -a StatefulSet's `.Spec.TemplateGeneration` is a generator for a strictly -monotonically increasing sequence. +The API Server will perform validation for StatefulSet creation and updates. ### StatefulSet Validation As is currently implemented, the API Server will not allow mutation to any @@ -468,21 +428,13 @@ fields of the StatefulSet object other than `.Spec.Replicas` and `.Spec.Template.Containers`. This design imposes the following, additional constraints. -1. The `.Spec.GenerationPartition` must be in the sequence `[0,.Spec.Replicas)`. -1. The `.Spec.TemplateGeneration` must only be mutated by the API Server. It -may be set upon creation by the user, but, after this, it is not mutable by -the user. - -### TemplateGeneration Maintenance -It will be the responsibility of the API Server to enforce that updates to -StatefulSet's `.Spec.Template` atomically increment the -`.Spec.TemplateGeneration` counter. There is no need for the value to be -strictly sequential, but it must be strictly, monotonically increasing. -As validation will not allow mutation to any field other than the -`.Spec.Template.Containers` field, the API Server need not track all fields of -StatefulSet's `.Spec` for modifications, but it must trigger an update to the -revision when the current and previous `.Spec.Template` versions fail a test for -deep semantic equality. +1. If the `.Spec.UpdateStrategy.Type` is equal to +`RollingUpdateStatefulSetStrategyType`, the API Server should fail validation +if any of the following conditions are true. + 1. `.Spec.UpdateStrategy.Partition` is nil. + 1. `.Spec.UpdateStratgegy.Parition` is not nil, and + `.Spec.UpdateStrategy.Partition.Ordinal` not in the sequence + `(0,.Spec.Replicas)`. ## Kubectl Kubectl will use the `rollout` command to control and provide the status of @@ -550,7 +502,7 @@ upon initial creation. ### Rolling out an Update Users can create a rolling update using `kubectl apply`. If a user creates a StatefulSet [as above](#initial-deployment), the user can trigger a rolling -update by updating image (as in the manifest as below). +update by updating the image (as in the manifest as below). ```yaml apiVersion: apps/v1beta1 @@ -565,6 +517,8 @@ spec: labels: app: nginx spec: + updateStrategy: + type: RollingUpdate containers: - name: nginx image: gcr.io/google_containers/nginx-slim:0.9 @@ -596,7 +550,9 @@ kubectl apply -f web.yaml ### Canaries Users can create a canary using `kubectl apply`. The only difference between a [rolling update](#rolling-out-an-update) and a canary is that the - `.Spec.GenerationPartition` is set to `.Spec.Replicas - 1`. + `.Spec.UpdateStrategy.Type` is set to `ParitionedStatefulSetStrategyType` and + the `.Spec.UpdateStrategy.Partition.Ordinal` is set to `.Spec.Replicas-1`. + ```yaml apiVersion: apps/v1beta1 @@ -611,6 +567,10 @@ spec: labels: app: nginx spec: + updateStrategy: + type: Partitioned + partition: + ordinal: 2 containers: - name: nginx image: gcr.io/google_containers/nginx-slim:0.9 @@ -620,7 +580,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationPartition: 2 + volumeClaimTemplates: - metadata: name: www @@ -649,6 +609,10 @@ spec: labels: app: nginx spec: + updateStrategy: + type: Partitioned + partition: + ordinal: 3 containers: - name: nginx image: gcr.io/google_containers/nginx-slim:0.9 @@ -658,7 +622,6 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationPartition: 3 volumeClaimTemplates: - metadata: name: www @@ -673,8 +636,9 @@ spec: ### Phased Roll Outs Users can create a canary using `kubectl apply`. The only difference between a - [canary](#canaries) and a phased roll out is that the `.Spec.GenerationPartition` - is set to value less than `.Spec.Replicas - 1`. + [canary](#canaries) and a phased roll out is that the + `.Spec.UpdateStrategy.Partition.Ordinal` is set to a value less than + `.Spec.Replicas-1`. ```yaml apiVersion: apps/v1beta1 @@ -683,12 +647,16 @@ metadata: name: web spec: serviceName: "nginx" - replicas: 3 + replicas: 4 template: metadata: labels: app: nginx spec: + updateStrategy: + type: Partitioned + partition: + ordinal: 2 containers: - name: nginx image: gcr.io/google_containers/nginx-slim:0.9 @@ -698,7 +666,6 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationPartition: 2 volumeClaimTemplates: - metadata: name: www @@ -715,7 +682,7 @@ Phased roll outs can be used to roll out a configuration, image, or resource update to some portion of the fleet maintained by the StatefulSet prior to updating the entire fleet. It is useful to support linear, geometric, and exponential roll out of an update. Users can modify the -`.Spec.GenerationPartition` to allow the roll out to progress. +`.Spec.UpdateStrategy.Partition.Ordinal` to allow the roll out to progress. ```yaml apiVersion: apps/v1beta1 @@ -730,6 +697,10 @@ spec: labels: app: nginx spec: + updateStrategy: + type: Partitioned + partition: + ordinal: 1 containers: - name: nginx image: gcr.io/google_containers/nginx-slim:0.9 @@ -739,7 +710,6 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - generationPartition: 1 volumeClaimTemplates: - metadata: name: www @@ -794,7 +764,7 @@ updates. ### Termination Reason Without communicating a signal indicating the reason for termination to a Pod in -a StatefulSet, as proposed [here](https://github.com/kubernetes/kubernetes/issues/1462), +a StatefulSet, as proposed [here](https://github.com/kubernetes/community/pull/541), the tenant application has no way to determine if it is being terminated due to a scale down operation or due to an update. Consider a BASE distributed storage application like Cassandra, where 2 TiB of @@ -836,7 +806,7 @@ VolumeClaimsTemplate. ### In Place Updates Currently configuration, images, and resource request/limits updates are all -performed destructively. Without a [termination reason](https://github.com/kubernetes/kubernetes/issues/1462) +performed destructively. Without a [termination reason](https://github.com/kubernetes/community/pull/541) implementation, there is little value to implementing in place image updates, and configuration and resource request/limit updates are not possible. When [termination reason](#https://github.com/kubernetes/kubernetes/issues/1462) From 713d7c0609c6c4686ec8ae7cc24e648784629c09 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 8 May 2017 15:37:43 -0700 Subject: [PATCH 07/12] Changes CurrentVersion and UpdateVersion to CurrentRevision and UpdateRevision Change StatefulSetVersionLabel to StatefulSetRevisionLabel and update the const value to be consistent with other controllers. RevisionHistoryLimit is *int32 ParitionStatefulSet typo corrected --- .../design-proposals/statefulset-update.md | 73 +++++++++---------- 1 file changed, 36 insertions(+), 37 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index db6a4d946f4..2d27c9c2532 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -121,8 +121,7 @@ type StatefulSetUpdateStrategy struct { // the StatefulSet when Type is PartitionedStatefulSetStrategyType. This // value must be set when Type is PartitionedStatefulSetStrategyType, // and it must be nil otherwise. - Partition *ParitionedStatefulSet -} + Partition *PartitionedStatefulSet // StatefulSetUpdateStrategyType is a string enumeration type that enumerates // all possible update strategies for the StatefulSet controller. @@ -148,7 +147,7 @@ const ( type PartitionedStatefulSet struct { // Ordinal indicates the ordinal at which the StatefulSet should be // partitioned. - Ordianl uint32 + Ordinal int32 } type StatefulSetSpec struct { @@ -164,7 +163,7 @@ type StatefulSetSpec struct { // be maintained in the StatefulSet's revision history. The revision history // consists of all revisions not represented by a currently applied // StatefulSetSpec version. The default value is 2. - RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty` + RevisionHistoryLimit *int32 `json:revisionHistoryLimit,omitempty` } ``` @@ -174,15 +173,15 @@ The following modifications will be made to the StatefulSetStatus API object. type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are omitted for brevity. - // CurrentVersion, if not empty, indicates the version of PodSpecTemplate, + // CurrentRevision, if not empty, indicates the version of PodSpecTemplate, // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [0,CurrentReplicas). - CurrentVersion string `json:"currentVersion,omitempty"` + CurrentRevision string `json:"currentVersion,omitempty"` - // UpdatedVersion, if not empty, indicates the version of PodSpecTemplate, + // UpdateRevision, if not empty, indicates the version of PodSpecTemplate, // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [Replicas-UpdatedReplicas,Replicas) - UpdateVersion string `json:"updateVersion,omitempty"` + UpdateRevision string `json:"updateVersion,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet // controller, that have a Status of Running and a Ready Condition. @@ -190,12 +189,12 @@ The following modifications will be made to the StatefulSetStatus API object. // CurrentReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated - // by CurrentVersion. + // by CurrentRevision. CurrentReplicas int32 `json:"currentReplicas,omitempty"` // UpdatedReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated - // by CurrentVersion. + // by UpdateRevision. UpdatedReplicas int32 `json:"taretReplicas,omitempty"` } ``` @@ -203,9 +202,9 @@ The following modifications will be made to the StatefulSetStatus API object. Additionally we introduce the following constant. ```go -// StatefulSetVersionLabel is the label used by StatefulSet controller to track +// StatefulSetRevisionLabel is the label used by StatefulSet controller to track // which version of StatefulSet's StatefulSetSpec was used generate a Pod. -const StatefulSetVersionLabel = "StatefulSetVersion" +const StatefulSetRevisionLabel = "statefulset.kubernetes.io/revision" ``` ## StatefulSet Controller @@ -278,33 +277,33 @@ follows. have a Ready Condition. This implies the Pod is Running. 1. If Pod's ordinal is greater than or equal to `.Spec.Replicas`, the Pod should be completely terminated and deleted. -1. If StatefulSet's `Spec.UpdateStrategy.Type` is equal to -`OnDeleteStatefulSetStrategyType` then no version tracking is performed. Pods -can be at an arbitrary version and will be recreated from the current +1. If the StatefulSet's `Spec.UpdateStrategy.Type` is equal to +`OnDeleteStatefulSetStrategyType`, no version tracking is performed, Pods +can be at an arbitrary version, and they will be recreated from the current `.Spec.Template` and `.Spec.VolumeClaimsTemplate` when the are deleted. 1. If StatefulSet's `Spec.UpdateStrategy.Type` is equal to `RollingUpdateStatefulSetStrategyType` then the version of the Pod should be as follows. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, - the Pod should consistent with version indicated by `Status.CurrentVersion`. + the Pod should be consistent with version indicated by `Status.CurrentRevision`. 1. If the Pod's ordinal is in the sequence `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)` the Pod should be consistent with the version indicated by - `Status.UpdateVersion`. + `Status.UpdateRevision`. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to `PartitionedStatefulSetStrategyType` then the version of the Pod should be as follows. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, - the Pod should consistent with version indicated by `Status.CurrentVersion`. + the Pod should be consistent with version indicated by `Status.CurrentRevision`. 1. If the Pod's ordinal is in the sequence `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)`the Pod - should be consistent with the version indicated by `Status.UpdateVersion`. + should be consistent with the version indicated by `Status.UpdateRevision`. 1. If the Pod does not meet either of the prior two conditions, and if ordinal is in the sequence `[0, .Spec.UpdateStrategy.Partition.Ordinal)`, it should be consistent with the version indicated by - `Status.CurrentVersion`. + `Status.CurrentRevision`. 1. Otherwise, the Pod should be consistent with the version indicated - by `Status.UpdateVersion`. + by `Status.UpdateRevision`. ### Pod State Reconciliation In order to reconcile a Pod with declared desired @@ -313,10 +312,10 @@ following. 1. If the Pod is already consistent with its target state the controller will do nothing. -1. If the Pod is labeled with a `StatefulSetVersionLabel` that indicates +1. If the Pod is labeled with a `StatefulSetRevisionLabel` that indicates the Pod was generated from a version of the StatefulSetSpec that is semantically equivalent to, but not equal to, the [target version](#target-pod-state), the -StatefulSet controller will update the Pod with a `StatefulSetVersionLabel` +StatefulSet controller will update the Pod with a `StatefulSetRevisionLabel` indicating the new semantically equivalent version. This form of reconciliation is non-destructive. 1. If the Pod was not created from the target version, the Pod will be deleted @@ -329,16 +328,16 @@ Object state when mutations are made to its `.Spec.Template` or 1. When the StatefulSet controller observes a mutation to a StatefulSet's `.Spec.Template` it will snapshot its target Object state and compare -the snapshot with the version indicated by its `.Status.UpdateVersion`. +the snapshot with the version indicated by its `.Status.UpdateRevision`. 1. If the current state is equivalent to the version indicated by -`.Status.UpdateVersion` no update has occurred. -1. If the `Status.CurrentVersion` field is empty, then the StatefulSet has no +`.Status.UpdateRevision` no update has occurred. +1. If the `Status.CurrentRevision` field is empty, then the StatefulSet has no revision history. To initialize its revision history, the StatefulSet controller -will set both `.Status.CurrentVersion` and `.Status.UpdateVersion` to the +will set both `.Status.CurrentRevision` and `.Status.UpdateRevision` to the version of the current snapshot. -1. If the `.Status.CurrentVersion` is not empty, and if the -`.Status.UpdateVersion` is not equal to the version of the current snapshot, -the StatefulSet controller will set the `.Status.UpdateVersion` to the version +1. If the `.Status.CurrentRevision` is not empty, and if the +`.Status.UpdateRevision` is not equal to the version of the current snapshot, +the StatefulSet controller will set the `.Status.UpdateRevision` to the version indicated by the current snapshot. ### StatefulSet Revision History @@ -374,14 +373,14 @@ The criteria for update completion is as follows. this case an update can never be in progress. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to `PartitionedStatefulSetStrategyType` updates can not complete. The version -indicated `.Status.UpdateVersion` will only be applied to Pods with ordinals +indicated `.Status.UpdateRevision` will only be applied to Pods with ordinals in the sequence `(0,.Spec.UpdateStrategy.Partition.Ordinal]` 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to `RollingUpdateStatefulSetStrategyType`, then an update is complete when the StatefulSet is at its [target state](#target-state). The StatefulSet controller will signal update completion as follows. - 1. The controller will set `.Status.CurrentVersion` to the value of - `.Staus.UpdateVersion`. + 1. The controller will set `.Status.CurrentRevision` to the value of + `.Staus.UpdateRevision`. 1. The controller will set `.Status.CurrentReplicas` to `.Status.UpdateReplicas`. Note that this value will be equal to `.Status.Replicas`. @@ -407,15 +406,15 @@ the `.Generation` of the StatefulSet object that was observed. created Pods. 1. The controller will set the `.Status.ReadyReplicas` to the current number of Pods that have a Ready Condition. -1. The controller will set the `.Status.CurrentVersion` and -`.Status.UpdateVersion` in accordance with StatefulSet's +1. The controller will set the `.Status.CurrentRevision` and +`.Status.UpdateRevision` in accordance with StatefulSet's [revision history](#statefulset-revision history) and any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of Pods that it has created from the version indicated by -`.Status.CurrentVersion`. +`.Status.CurrentRevision`. 1. The controller will set the `.Status.UpdatedReplicas` to the number of Pods -that it has created from the version indicated by `.Status.UpdateVersion`. +that it has created from the version indicated by `.Status.UpdateRevision`. 1. The controller will then persist the StatefulSetStatus make it durable and communicate it to observers. From 04267226d78ed1394a9bbe155f86cb590d655115 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 8 May 2017 16:41:36 -0700 Subject: [PATCH 08/12] Add section of validation to StatefulSetStatus --- contributors/design-proposals/statefulset-update.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 2d27c9c2532..81f70dbae09 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -106,7 +106,7 @@ This design is based on the following requirements. - Users should be able to view a bounded history of the updates that have been applied to the StatefulSet. -## API Object +## API Objects The following modifications will be made to the StatefulSetSpec API object. @@ -434,7 +434,13 @@ if any of the following conditions are true. 1. `.Spec.UpdateStratgegy.Parition` is not nil, and `.Spec.UpdateStrategy.Partition.Ordinal` not in the sequence `(0,.Spec.Replicas)`. - +1. The API Server will fail validation on any update to a StatefulSetStatus +object if any of the following conditions are true. + 1. `.Status.Replicas` is negative. + 1. `.Status.ReadyReplicas` is negative or greater than `.Status.Replicas`. + 1. `.Status.CurrentReplicas` is negative or greater than `.Status.Replicas`. + 1. `.Stauts.UpdateReplicas` is negative or greater than `.Status.Replicas`. + ## Kubectl Kubectl will use the `rollout` command to control and provide the status of StatefulSet updates. From ae7047fe721c8bc2b58336e65910278f82623553 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Tue, 9 May 2017 10:59:41 -0700 Subject: [PATCH 09/12] Update JSON serialization annotation to reflect the correct names for CurrentRevision and UpdateRevision --- contributors/design-proposals/statefulset-update.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 81f70dbae09..ad542a986b8 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -176,12 +176,12 @@ The following modifications will be made to the StatefulSetStatus API object. // CurrentRevision, if not empty, indicates the version of PodSpecTemplate, // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [0,CurrentReplicas). - CurrentRevision string `json:"currentVersion,omitempty"` + CurrentRevision string `json:"currentRevision,omitempty"` // UpdateRevision, if not empty, indicates the version of PodSpecTemplate, // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [Replicas-UpdatedReplicas,Replicas) - UpdateRevision string `json:"updateVersion,omitempty"` + UpdateRevision string `json:"updateRevision,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet // controller, that have a Status of Running and a Ready Condition. From 8caee797f7aad9380eeef2bf369b290af169db84 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 15 May 2017 21:05:02 -0700 Subject: [PATCH 10/12] Address fourth round of comments --- .../design-proposals/statefulset-update.md | 53 ++++++++++--------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index ad542a986b8..ca0c9214138 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -100,7 +100,7 @@ This design is based on the following requirements. [deployment, and scaling](https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/#deployment-and-scaling-guarantee) guarantees. - A failed update must halt. -- Users must be able to rollback an update. +- Users must be able to roll back an update. - Users must be able to roll forward to fix a failing/failed update. - Users must be able to view the status of an update. - Users should be able to view a bounded history of the updates that have been @@ -113,38 +113,42 @@ The following modifications will be made to the StatefulSetSpec API object. ```go // StatefulSetUpdateStrategy indicates the strategy that the StatefulSet // controller will use to perform updates. It includes any additional parameters -// nessecary to preform the update for the indicated strategy. +// necessary to preform the update for the indicated strategy. type StatefulSetUpdateStrategy struct { // Type indicates the type of the StatefulSetUpdateStrategy. Type StatefulSetUpdateStrategyType // Partition is used to communicate the ordinal at which to partition - // the StatefulSet when Type is PartitionedStatefulSetStrategyType. This - // value must be set when Type is PartitionedStatefulSetStrategyType, + // the StatefulSet when Type is PartitionStatefulSetStrategyType. This + // value must be set when Type is PartitionStatefulSetStrategyType, // and it must be nil otherwise. - Partition *PartitionedStatefulSet + Partition *PartitionStatefulSetStrategy // StatefulSetUpdateStrategyType is a string enumeration type that enumerates // all possible update strategies for the StatefulSet controller. type StatefulSetUpdateStrategyType string const ( - // PartitionedStatefulSetStrategyType indicates that updates will only be + // PartitionStatefulSetStrategyType indicates that updates will only be // applied to a partition of the StatefulSet. This is useful for canaries - // and phased roll outs. - PartitionedStatefulSetStrategyType StatefulSetUpdateStrategyType = "Partitioned" + // and phased roll outs. When a scale operation is performed with this + // strategy, new Pods will be created from the updated specification. + PartitionStatefulSetStrategyType StatefulSetUpdateStrategyType = "Partition" // RollingUpdateStatefulSetStrategyType indicates that update will be // applied to all Pods in the StatefulSet with respect to the StatefulSet - // ordering constraints. + // ordering constraints. When a scale operation is performed with this + // strategy, new Pods will be created from the updated specification. RollingUpdateStatefulSetStrategyType = "RollingUpdate" // OnDeleteStatefulSetStrategyType triggers the legacy behavior. Version // tracking and ordered rolling restarts are disabled. Pods are recreated - // from the StatefulSetSpec when they are manually deleted. + // from the StatefulSetSpec when they are manually deleted. When a scale + // operation is performed with this strategy, new Pods will be created + // from the current specification. OnDeleteStatefulSetStrategyType = "OnDelete" ) -// PartitionedStatefulSet contains the parameters used with the -// PartitionedStatefulSetStrategyType. -type PartitionedStatefulSet struct { +// PartitionStatefulSetStrategy contains the parameters used with the +// PartitionStatefulSetStrategyType. +type PartitionStatefulSetStrategy struct { // Ordinal indicates the ordinal at which the StatefulSet should be // partitioned. Ordinal int32 @@ -159,7 +163,7 @@ type StatefulSetSpec struct { // Template or VolumeClaimsTemplate. UpdateStrategy StatefulSetUpdateStrategy `json:"updateStrategy,omitempty` - // RevisionHistoryLimit is the maximum number of PodTemplates that will + // RevisionHistoryLimit is the maximum number of revisions that will // be maintained in the StatefulSet's revision history. The revision history // consists of all revisions not represented by a currently applied // StatefulSetSpec version. The default value is 2. @@ -291,7 +295,7 @@ as follows. the Pod should be consistent with the version indicated by `Status.UpdateRevision`. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to -`PartitionedStatefulSetStrategyType` then the version of the Pod should be +`PartitionStatefulSetStrategyType` then the version of the Pod should be as follows. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, the Pod should be consistent with version indicated by `Status.CurrentRevision`. @@ -372,9 +376,9 @@ The criteria for update completion is as follows. `OnDeleteStatefulSetStrategyType` then no version tracking is performed. It this case an update can never be in progress. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to -`PartitionedStatefulSetStrategyType` updates can not complete. The version +`PartitionStatefulSetStrategyType` updates can not complete. The version indicated `.Status.UpdateRevision` will only be applied to Pods with ordinals -in the sequence `(0,.Spec.UpdateStrategy.Partition.Ordinal]` +in the sequence `(.Spec.UpdateStrategy.Partition.Ordinal,.Spec.Replicas)`. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to `RollingUpdateStatefulSetStrategyType`, then an update is complete when the StatefulSet is at its [target state](#target-state). The StatefulSet controller @@ -428,7 +432,7 @@ fields of the StatefulSet object other than `.Spec.Replicas` and constraints. 1. If the `.Spec.UpdateStrategy.Type` is equal to -`RollingUpdateStatefulSetStrategyType`, the API Server should fail validation +`PartitionStatefulSetStrategyType`, the API Server should fail validation if any of the following conditions are true. 1. `.Spec.UpdateStrategy.Partition` is nil. 1. `.Spec.UpdateStratgegy.Parition` is not nil, and @@ -573,7 +577,7 @@ spec: app: nginx spec: updateStrategy: - type: Partitioned + type: Partition partition: ordinal: 2 containers: @@ -615,7 +619,7 @@ spec: app: nginx spec: updateStrategy: - type: Partitioned + type: Partition partition: ordinal: 3 containers: @@ -659,7 +663,7 @@ spec: app: nginx spec: updateStrategy: - type: Partitioned + type: Partition partition: ordinal: 2 containers: @@ -703,7 +707,7 @@ spec: app: nginx spec: updateStrategy: - type: Partitioned + type: Partition partition: ordinal: 1 containers: @@ -772,6 +776,7 @@ Without communicating a signal indicating the reason for termination to a Pod in a StatefulSet, as proposed [here](https://github.com/kubernetes/community/pull/541), the tenant application has no way to determine if it is being terminated due to a scale down operation or due to an update. + Consider a BASE distributed storage application like Cassandra, where 2 TiB of persistent data is not atypical, and the data distribution is not identical on every server. We want to enable two distinct behaviors based on the reason for @@ -801,8 +806,8 @@ While this proposal does not address this would be a valuable feature for production users of storage systems that use intermittent compaction as a form of garbage collection. Applications that use log structured merge trees with size tiered compaction (e.g Cassandra) or append -only B(+/*) Trees (e.g Couchbase) can temporarily double their storage usage when -compacting their on disk storage. If there is insufficient space for compaction +only B(+/*) Trees (e.g Couchbase) can temporarily double their storage requirement +during compaction. If there is insufficient space for compaction to progress, these applications will either fail or degrade until additional capacity is added. While, if the user is using AWS EBS or GCE PD, there are valid manual workarounds to expand the size of a PD, it would be From d54be8f60f88b23ac5cec3916d222b27408df218 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 22 May 2017 07:48:45 -0700 Subject: [PATCH 11/12] Address mikegrass comments --- .../design-proposals/statefulset-update.md | 32 ++++++++++--------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index ca0c9214138..800e801d5f7 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -38,7 +38,7 @@ application's configuration, I want to update environment variables, container entry point commands or parameters, or configuration files. - As the administrator of the logging and monitoring infrastructure for my organization, in order to add logging and monitoring side cars, I want to patch -containers to add images. +a Pods' containers to add images. ### Out of Scope - As the administrator of a stateful application, in order to increase the @@ -199,7 +199,7 @@ The following modifications will be made to the StatefulSetStatus API object. // UpdatedReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated // by UpdateRevision. - UpdatedReplicas int32 `json:"taretReplicas,omitempty"` + UpdatedReplicas int32 `json:"updatedReplicas,omitempty"` } ``` @@ -236,7 +236,7 @@ ensure that the StatefulSet's revision history is consistent with the user declared desired state. 1. The controller will select all Pods in the StatefulSet, filter any Pods not owned by the StatefulSet, and sort the remaining Pods in ordinal order. -1. For all created Pods, the controller will perform any nessecary +1. For all created Pods, the controller will perform any necessary [non-destructive state reconciliation](#pod-state-reconciliation). 1. If any Pods with ordinals in the sequence `[0,.Spec.Replicas)` have not been created, for the Pod corresponding to the lowest such ordinal, the controller @@ -248,7 +248,7 @@ Pods to either become Ready, or to be completely deleted. if `.Spec.Replicas` is less than `.Status.Replicas`, the controller will delete the Pod corresponding to the largest ordinal. This implies that scaling takes precedence over Pod updates. -1. If all Pods in the range `[0,.Spec.Replicas)` have a Status of Running and +1. If all Pods in the sequence `[0,.Spec.Replicas)` have a Status of Running and a Ready Condition, if `.Spec.Replicas` is equal to `.Status.Replicas`, and if there are Pods that do not match their [target Pod state](#target-pod-state), the Pod with the largest ordinal in that set will be deleted. @@ -263,7 +263,7 @@ the Pod with the largest ordinal in that set will be deleted. The target state of the StatefulSet controller with respect to an individual StatefulSet is defined as follows. -1. The StatefulSet contains exactly `[0,Spec.Replicas)` Pods. +1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. 1. All Pods in the StatefulSet have the correct [target Pod state](#target-pod-state). @@ -300,7 +300,7 @@ as follows. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, the Pod should be consistent with version indicated by `Status.CurrentRevision`. 1. If the Pod's ordinal is in the sequence - `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)`the Pod + `[.Status.Replicas - .Status.UpdatedReplicas, .Status.Replicas)` the Pod should be consistent with the version indicated by `Status.UpdateRevision`. 1. If the Pod does not meet either of the prior two conditions, and if ordinal is in the sequence `[0, .Spec.UpdateStrategy.Partition.Ordinal)`, @@ -366,15 +366,15 @@ revisions to the StatefulSet's target Object state. #### History Maintenance In order to prevent the revision history of the StatefulSet from exceeding memory or storage limits, the StatefulSet controller will periodically prune -its revision history so that no more that `.Spec.RevisionHisotryLimit` non-live +its revision history so that no more that `.Spec.RevisionHistoryLimit` non-live versions of target Object state are preserved. ### Update Completion The criteria for update completion is as follows. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to -`OnDeleteStatefulSetStrategyType` then no version tracking is performed. It -this case an update can never be in progress. +`OnDeleteStatefulSetStrategyType` then no version tracking is performed. In +this case, an update can never be in progress. 1. If the StatefulSet's `.Spec.UpdateStrategy.Type` is equal to `PartitionStatefulSetStrategyType` updates can not complete. The version indicated `.Status.UpdateRevision` will only be applied to Pods with ordinals @@ -384,11 +384,11 @@ in the sequence `(.Spec.UpdateStrategy.Partition.Ordinal,.Spec.Replicas)`. StatefulSet is at its [target state](#target-state). The StatefulSet controller will signal update completion as follows. 1. The controller will set `.Status.CurrentRevision` to the value of - `.Staus.UpdateRevision`. + `.Status.UpdateRevision`. 1. The controller will set `.Status.CurrentReplicas` to - `.Status.UpdateReplicas`. Note that this value will be equal to + `.Status.UpdatedReplicas`. Note that this value will be equal to `.Status.Replicas`. - 1. The controller will set `.Status.UpdateReplicas` to 0. + 1. The controller will set `.Status.UpdatedReplicas` to 0. ### Status Reporting After processing the creation, update, or deletion of a StatefulSet or Pod, @@ -412,7 +412,7 @@ created Pods. Pods that have a Ready Condition. 1. The controller will set the `.Status.CurrentRevision` and `.Status.UpdateRevision` in accordance with StatefulSet's -[revision history](#statefulset-revision history) and +[revision history](#statefulset-revision-history) and any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of Pods that it has created from the version indicated by @@ -435,7 +435,7 @@ constraints. `PartitionStatefulSetStrategyType`, the API Server should fail validation if any of the following conditions are true. 1. `.Spec.UpdateStrategy.Partition` is nil. - 1. `.Spec.UpdateStratgegy.Parition` is not nil, and + 1. `.Spec.UpdateStrategy.Parition` is not nil, and `.Spec.UpdateStrategy.Partition.Ordinal` not in the sequence `(0,.Spec.Replicas)`. 1. The API Server will fail validation on any update to a StatefulSetStatus @@ -559,7 +559,7 @@ kubectl apply -f web.yaml ### Canaries Users can create a canary using `kubectl apply`. The only difference between a [rolling update](#rolling-out-an-update) and a canary is that the - `.Spec.UpdateStrategy.Type` is set to `ParitionedStatefulSetStrategyType` and + `.Spec.UpdateStrategy.Type` is set to `PartitionStatefulSetStrategyType` and the `.Spec.UpdateStrategy.Partition.Ordinal` is set to `.Spec.Replicas-1`. @@ -604,6 +604,8 @@ spec: Users can also simultaneously scale up and add a canary. This reduces risk for some deployment scenarios by adding additional capacity for the canary. +For example, in the manifest below, `.Spec.Replicas` is increased to `4` while +`.Spec.UpdateStrategy.Partition.Ordinal` is set to `.Spec.Replicas-1`. ```yaml apiVersion: apps/v1beta1 From ac00afd78d363075f925b487356be0c0e55c5d75 Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 22 May 2017 07:50:49 -0700 Subject: [PATCH 12/12] Update golang spacing --- .../design-proposals/statefulset-update.md | 70 +++++++++---------- 1 file changed, 35 insertions(+), 35 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 800e801d5f7..c880186127b 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -115,13 +115,13 @@ The following modifications will be made to the StatefulSetSpec API object. // controller will use to perform updates. It includes any additional parameters // necessary to preform the update for the indicated strategy. type StatefulSetUpdateStrategy struct { - // Type indicates the type of the StatefulSetUpdateStrategy. - Type StatefulSetUpdateStrategyType - // Partition is used to communicate the ordinal at which to partition - // the StatefulSet when Type is PartitionStatefulSetStrategyType. This - // value must be set when Type is PartitionStatefulSetStrategyType, - // and it must be nil otherwise. - Partition *PartitionStatefulSetStrategy + // Type indicates the type of the StatefulSetUpdateStrategy. + Type StatefulSetUpdateStrategyType + // Partition is used to communicate the ordinal at which to partition + // the StatefulSet when Type is PartitionStatefulSetStrategyType. This + // value must be set when Type is PartitionStatefulSetStrategyType, + // and it must be nil otherwise. + Partition *PartitionStatefulSetStrategy // StatefulSetUpdateStrategyType is a string enumeration type that enumerates // all possible update strategies for the StatefulSet controller. @@ -157,17 +157,17 @@ type PartitionStatefulSetStrategy struct { type StatefulSetSpec struct { // Replicas, Selector, Template, VolumeClaimsTemplate, and ServiceName // omitted for brevity. - - // UpdateStrategy indicates the StatefulSetUpdateStrategy that will be - // employed to update Pods in the StatefulSet when a revision is made to - // Template or VolumeClaimsTemplate. - UpdateStrategy StatefulSetUpdateStrategy `json:"updateStrategy,omitempty` - - // RevisionHistoryLimit is the maximum number of revisions that will - // be maintained in the StatefulSet's revision history. The revision history - // consists of all revisions not represented by a currently applied - // StatefulSetSpec version. The default value is 2. - RevisionHistoryLimit *int32 `json:revisionHistoryLimit,omitempty` + + // UpdateStrategy indicates the StatefulSetUpdateStrategy that will be + // employed to update Pods in the StatefulSet when a revision is made to + // Template or VolumeClaimsTemplate. + UpdateStrategy StatefulSetUpdateStrategy `json:"updateStrategy,omitempty` + + // RevisionHistoryLimit is the maximum number of revisions that will + // be maintained in the StatefulSet's revision history. The revision history + // consists of all revisions not represented by a currently applied + // StatefulSetSpec version. The default value is 2. + RevisionHistoryLimit *int32 `json:revisionHistoryLimit,omitempty` } ``` @@ -176,30 +176,30 @@ The following modifications will be made to the StatefulSetStatus API object. ```go type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are omitted for brevity. - - // CurrentRevision, if not empty, indicates the version of PodSpecTemplate, - // VolumeClaimsTemplate tuple used to generate Pods in the sequence - // [0,CurrentReplicas). - CurrentRevision string `json:"currentRevision,omitempty"` - - // UpdateRevision, if not empty, indicates the version of PodSpecTemplate, + + // CurrentRevision, if not empty, indicates the version of PodSpecTemplate, + // VolumeClaimsTemplate tuple used to generate Pods in the sequence + // [0,CurrentReplicas). + CurrentRevision string `json:"currentRevision,omitempty"` + + // UpdateRevision, if not empty, indicates the version of PodSpecTemplate, // VolumeClaimsTemplate tuple used to generate Pods in the sequence // [Replicas-UpdatedReplicas,Replicas) - UpdateRevision string `json:"updateRevision,omitempty"` - - // ReadyReplicas is the current number of Pods, created by the StatefulSet + UpdateRevision string `json:"updateRevision,omitempty"` + + // ReadyReplicas is the current number of Pods, created by the StatefulSet // controller, that have a Status of Running and a Ready Condition. - ReadyReplicas int32 `json:"readyReplicas,omitempty"` - - // CurrentReplicas is the number of Pods created by the StatefulSet + ReadyReplicas int32 `json:"readyReplicas,omitempty"` + + // CurrentReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated // by CurrentRevision. - CurrentReplicas int32 `json:"currentReplicas,omitempty"` - - // UpdatedReplicas is the number of Pods created by the StatefulSet + CurrentReplicas int32 `json:"currentReplicas,omitempty"` + + // UpdatedReplicas is the number of Pods created by the StatefulSet // controller from the PodTemplateSpec, VolumeClaimsTemplate tuple indicated // by UpdateRevision. - UpdatedReplicas int32 `json:"updatedReplicas,omitempty"` + UpdatedReplicas int32 `json:"updatedReplicas,omitempty"` } ```