Skip to content

Commit

Permalink
Merge pull request #3193 from mattcary/ss124
Browse files Browse the repository at this point in the history
KEP-1847: Update beta milestone to 1.25
  • Loading branch information
k8s-ci-robot committed Jun 23, 2022
2 parents 9098231 + 289d18c commit 8c6d57e
Show file tree
Hide file tree
Showing 3 changed files with 118 additions and 100 deletions.
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-apps/1847.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
kep-number: 1847
alpha:
approver: "@wojtek-t"
beta:
approver: "@johnbelamaric"
196 changes: 104 additions & 92 deletions keps/sig-apps/1847-autoremove-statefulset-pvcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
- [Mutating <code>PersistentVolumeClaimRetentionPolicy</code>](#mutating-)
- [Cluster role change for statefulset controller](#cluster-role-change-for-statefulset-controller)
- [Test Plan](#test-plan)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [E2E tests](#e2e-tests)
- [Upgrade/downgrade &amp; feature enabled/disable tests](#upgradedowngrade--feature-enableddisable-tests)
- [Graduation Criteria](#graduation-criteria)
- [Alpha release](#alpha-release)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
Expand All @@ -46,14 +50,14 @@
Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [X] (R) KEP approvers have approved the KEP status as `implementable`
- [X] (R) Design details are appropriately documented
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes


Expand Down Expand Up @@ -111,7 +115,7 @@ The following changes are required:

These fields may be set to the following values.
* `Retain` - the default policy, which is also used when no policy is
specified. This specifies the existing behaviour: when a StatefulSet is
specified. This specifies the existing behavior: when a StatefulSet is
deleted or scaled down, no action is taken with respect to the PVCs
created by the StatefulSet.
* `Delete` - specifies that the appropriate PVCs as described above will be
Expand Down Expand Up @@ -177,7 +181,7 @@ VolumeClaimTemplate it will be deleted according to the deletion policy.
Currently the PVCs created by StatefulSet are not deleted automatically. Using
`whenScaled` or `whenDeleted` set to `Delete` would delete the PVCs
automatically. Since this involves persistent data being deleted, users should
take appropriate care using this feature. Having the `Retain` behaviour as
take appropriate care using this feature. Having the `Retain` behavior as
default will ensure that the PVCs remain intact by default and only a conscious
choice made by user will involve any persistent data being deleted.

Expand Down Expand Up @@ -305,38 +309,43 @@ In order to update the PVC ownerReference, the `buildControllerRoles` will be up

### Test Plan

1. Unit tests

1. e2e/integration tests
- With `Delete` policies for `whenScaled` and `whenDeleted`
1. Create 2 pod statefulset, scale to 1 pod, confirm PVC deleted
1. Create 2 pod statefulset, add data to PVs, scale to 1 pod, scale back to 2, confirm PV empty.
1. Create 2 pod statefulset, delete stateful set, confirm PVCs deleted.
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, confirm pod comes back and PV still has data (PVC not deleted).
1. As above, but manually delete all pods in stateful set.
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PVC is deleted.
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV is empty.
1. Create 2 pod statefulset, add data to PVs, perform rolling confirm PVC don't get deleted and PV contents remain intact and useful in the updated pods.
- With `Delete` policy for `whenDeleted` only
1. Create 2 pod statefulset, scale to 1 pod, confirm PVC still exists,
1. Create 2 pod statefulset, add data to PVs, scale to 1 pod, scale back to 2, confirm PV has data (PVC not deleted).
1. Create 2 pod statefulset, delete stateful set, confirm PVCs deleted
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, confirm pod comes back and PV has data (PVC not deleted).
1. As above, but manually delete all pods in stateful set.
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PVC exists.
1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV has data.
- Retain:
1. same tests as above, but PVCs not deleted in any case and confirm data intact on the PV.
- Pod restart tests:
1. Create statefulset, perform rolling update, confirm PVC data still exists.
- `--casecade=false` tests.
1. Upgrade/Downgrade tests
[X] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

#### Unit tests

- `k8s.io/kubernetes/pkg/controller/statefulset`: `2022-06-15`: `85.5%`
- `k8s.io/kubernetes/pkg/registry/apps/statefulset`: `2022-06-15`: `68.4%`
- `k8s.io/kubernetes/pkg/registry/apps/statefulset/storage`: `2022-06-15`: `64%`


##### Integration tests

- `test/integration/statefulset`: `2022-06-15`: These do not appear to be
running in a job visible to the triage dashboard, see for example a search
for the previously existing [TestStatefulSetStatusWithPodFail](https://storage.googleapis.com/k8s-triage/index.html?test=TestStatefulSetStatusWithPodFail).

Added `TestAutodeleteOwnerRefs` to `k8s.io/kubernetes/test/integration/statefulset`.

##### E2E tests

- `ci-kuberentes-e2e-gci-gce-statefulset`: `2022-06-15`: `3/646 Failures`
- Note that as this is behind the `StatefulSetAutoDeletePVC` feature gate,
tests for this KEP are not being run.

Added `Feature:StatefulSetAutoDeletePVC` tests to `k8s.io/kubernetes/test/e2e/apps/`.

##### Upgrade/downgrade & feature enabled/disable tests

Should be added as an e2e tests, but we have not figured out if there is a
mechanism to run upgrade/downgrade tests.

1. Create statefulset in previous version and upgrade to the version
supporting this feature. The PVCs should remain intact.
2. Downgrade to earlier version and check the PVCs with Retain
remain intact and the others with set policies before upgrade
gets deleted based on if the references were already set.
1. Feature disablement/enable test for alpha feature flag `statefulset-autodelete-pvcs`.


### Graduation Criteria
Expand Down Expand Up @@ -373,10 +382,10 @@ are not involved so there is no version skew between nodes and the control plane
resource (eg dropDisabledFields).

* **Does enabling the feature change any default behavior?**
No. What happens during StatefulSet deletion differs from current behaviour
No. What happens during StatefulSet deletion differs from current behavior
only when the user explicitly specifies the
`PersistentVolumeClaimDeletePolicy`. Hence no change in any user visible
behaviour change by default.
behavior change by default.

* **Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
Yes. Disabling the feature gate will cause the new field to be ignored. If the feature
Expand Down Expand Up @@ -407,72 +416,71 @@ are not involved so there is no version skew between nodes and the control plane

### Rollout, Upgrade and Rollback Planning

_TBD upon graduation to beta._

* **How can a rollout fail? Can it impact already running workloads?**
Try to be as paranoid as possible - e.g., what if some components will restart
mid-rollout?
If there is a control plane update which disables the feature while a stateful
set is in the process of being deleted or scaled down, it is undefined which
PVCs will be deleted. Before the update, PVCs will be marked for deletion;
until the updated controller has a chance to reconcile some PVCs may be
garbage collected before the controller has a chance to remove any owner
references. We do not think this is a true failure, as it should be clear to
an operator that there is an essential race condition when a cluster update
happens during a stateful set scale down or delete.

* **What specific metrics should inform a rollback?**
The operator can monitor the `statefulset_pvcs_owned_by_*` metrics to see if
there are possible pending deletions. If consistent behavior is required, the
operator can wait for those metrics to stablize. For example,
`statefulset_pvcs_owned_by_pod` going to zero indicates all scale down
deletions are complete.

* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
Yes. The race condition wasn't exposed, but we confirmed the PVCs were updated correctly.

* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
fields of API types, flags, etc.?**
Even if applying deprecation policies, they may still surprise some users.
Enabling the feature also enables the `PersistentVolumeClaimRetentionPolicy`
api field.

### Monitoring Requirements

_TBD upon graduation to beta._

* **How can an operator determine if the feature is in use by workloads?**
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
`statefulset_when_deleted_policy` or `statefulset_when_scaled_policy` will
have nonzero counts for the `delete` policy fields.

* **What are the SLIs (Service Level Indicators) an operator can use to determine
the health of the service?**
- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:
- Metric name: `statefulset_reconcile_delay`
- [Optional] Aggregation method: `quantile`
- Components exposing the metric: `pke/controller/statefulset`
- Metric name: `statefulset_unhealthy_pods`
- [Optional] Aggregation method: `sum`
- Components exposing the metric: `pke/controller/statefulset`

* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
At a high level, this usually will be in the form of "high percentile of SLI
per day <= X". It's impossible to provide comprehensive guidance, but at the very
high level (needs more precise definitions) those may be things like:
- per-day percentage of API calls finishing with 5XX errors <= 1%
- 99% percentile over day of absolute value from (job creation time minus expected
job creation time) for cron job <= 10%
- 99,9% of /health requests per day finish with 200 code

The reconcile delay (time between statefulset reconcilliation loops) should be
low. For example, the 99%ile should be at most minutes.

This can be combined with the unhealthy pod count, although as unhealthy pods
are usually an application error rather than a problem with the stateful set
controller, this will be more a decision for the operator to decide on a
per-cluster basis.

* **Are there any missing metrics that would be useful to have to improve observability
of this feature?**
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
implementation difficulties, etc.).

### Dependencies
The stateful set controller has not had any metrics in the past despite it
being a core Kubernetes feature for some time. Hence which metrics are useful
in practice is an open question in spite of the stability of the feature.

_TBD upon graduation to beta._
### Dependencies

* **Does this feature depend on any specific services running in the cluster?**
Think about both cluster-level services (e.g. metrics-server) as well
as node-level agents (e.g. specific version of CRI). Focus on external or
optional services that are needed. For example, if this feature depends on
a cloud provider API, or upon an external software-defined storage or network
control plane.

For each of these, fill in the following—thinking about running existing user workloads
and creating new ones, as well as about cluster-level services (e.g. DNS):
- [Dependency name]
- Usage description:
- Impact of its outage on the feature:
- Impact of its degraded performance or high-error rates on the feature:
No, outside of depending on the scheduler, the garbage collector and volume
management (provisioning, attaching, etc) as does almost anything in
Kubernetes. This feature does not add any new dependencies that did not
already exist with the stateful set controller.


### Scalability
Expand Down Expand Up @@ -517,28 +525,32 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**

### Troubleshooting

_TBD on beta graduation._

The Troubleshooting section currently serves the `Playbook` role. We may consider
splitting it into a dedicated `Playbook` document (potentially with some monitoring
details). For now, we leave it here.

* **How does this feature react if the API server and/or etcd is unavailable?**

PVC deletion will be paused. If the control plane went unavailable in the middle
of a stateful set being deleted or scaled down, there may be deleted Pods whose
PVCs have not yet been deleted. Deletion will continue normally after the
control plane returns.

* **What are other known failure modes?**
For each of them, fill in the following information by copying the below template:
- [Failure mode brief description]
- Detection: How can it be detected via metrics? Stated another way:
how can an operator troubleshoot without logging into a master or worker node?
- Mitigations: What can be done to stop the bleeding, especially for already
running user workloads?
- Diagnostics: What are the useful log messages and their required logging
levels that could help debug the issue?
Not required until feature graduated to beta.
- Testing: Are there any tests for failure mode? If not, describe why.
- PVCs from a stateful set not being deleted as expected.
- Detection: This can be deteted by lower than expected counts of the
`statefulset_pvcs_owned_by_*` metrics and by an operator listing and examining PVCs.
- Mitigations: We expect this to happen only if there are other,
operator-installed, controllers that are also managing owner refs on
PVCs. Any such PVCs can be deleted manually. The conflicting controllers
will have to be manually discovered.
- Diagnostics: Logs from kube-controller-manager and stateful set controller.
- Testing: Tests are in place for confirming owner refs are added by the
`StatefulSet` controller, but Kubernetes does not test against external
custom controller.

* **What steps should be taken if SLOs are not being met to determine the problem?**

Stateful set SLOs are new with this feature and are in process of being
evaluated. If they are not being met, the kube-controller-manager (where the
stateful set controller lives) should be examined and/or restarted.

[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos

Expand Down
20 changes: 12 additions & 8 deletions keps/sig-apps/1847-autoremove-statefulset-pvcs/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ reviewers:
- "@msau42"
- "@janetkuo"
prr-approvers:
- "@wojtek-t"
- "@johnbelamaric"
approvers:
- "@msau42"
- "@janetkuo"

stage: alpha
stage: beta

latest-milestone: "v1.23"
latest-milestone: "v1.25"

milestone:
alpha: "v1.23"
beta: "v1.24"
stable: "v1.25"
beta: "v1.25"
stable: "v1.27"

feature-gates:
- name: StatefulSetAutoDeletePVC
Expand All @@ -36,6 +36,10 @@ feature-gates:
- kube-apiserver
disable-supported: true

# The following PRR answers are required at beta release
# metrics:
# Currently no metrics is planned for alpha release.
metrics:
- statefulset_reconcile_delay
- statefulset_unhealthy_pods
- statefulset_pvcs_owned_by_pod
- statefulset_pvcs_owned_by_set
- statefulset_when_deleted_policy
- statefulset_when_scaled_policy

0 comments on commit 8c6d57e

Please sign in to comment.