From 53669b7ae269894c2c878b3f93c2e7714c7098ae Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Fri, 10 Jun 2022 17:54:38 +0200 Subject: [PATCH] KEP-277: Target stable in 1.25 --- keps/prod-readiness/sig-node/277.yaml | 2 + .../277-ephemeral-containers/README.md | 402 ++++++++++++++---- .../277-ephemeral-containers/kep.yaml | 4 +- 3 files changed, 329 insertions(+), 79 deletions(-) diff --git a/keps/prod-readiness/sig-node/277.yaml b/keps/prod-readiness/sig-node/277.yaml index b2db29fd726..1ff2b71a718 100644 --- a/keps/prod-readiness/sig-node/277.yaml +++ b/keps/prod-readiness/sig-node/277.yaml @@ -3,3 +3,5 @@ alpha: approver: "@johnbelamaric" beta: approver: "@johnbelamaric" +stable: + approver: "@johnbelamaric" diff --git a/keps/sig-node/277-ephemeral-containers/README.md b/keps/sig-node/277-ephemeral-containers/README.md index 026c5cfb2b4..5554e26d876 100644 --- a/keps/sig-node/277-ephemeral-containers/README.md +++ b/keps/sig-node/277-ephemeral-containers/README.md @@ -33,6 +33,11 @@ - [Updating a Pod](#updating-a-pod) - [Container Runtime Interface (CRI) changes](#container-runtime-interface-cri-changes) - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Test Plan](#test-plan-1) - [Graduation Criteria](#graduation-criteria) - [Alpha -> Beta Graduation](#alpha---beta-graduation) - [Beta -> GA Graduation](#beta---ga-graduation) @@ -631,14 +636,7 @@ updated to support container namespace targeting, described fully in -This feature will be tested with a combination of unit, integration and e2e -tests. In particular: -* Field validation (e.g. of Container fields disallowed in Ephemeral Containers) - will be tested in unit tests. -* Pod update semantics will be tested in integration tests. -* Ephemeral Container creation will be tested in e2e-node. +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +N/A - This feature was implemented prior to the addition of this section to the KEP +template. + +##### Unit tests + + + +Complete unit test coverage is possible. + + + +This enhancement was implemented prior to test coverage reporting. -None of the tests for this feature are unusual or tricky. + +##### Integration tests + + + +- `k8s.io/kubernetes/test/integration/pods/pods_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=TestPodCreateEphemeralContainers +- `k8s.io/kubernetes/test/integration/pods/pods_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=TestPodPatchEphemeralContainers +- `k8s.io/kubernetes/test/integration/pods/pods_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=TestPodUpdateEphemeralContainers +- `k8s.io/kubernetes/test/integration/pods/pods_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=TestPodEphemeralContainersDisabled + +##### e2e tests + + + +- `k8s.io/kubernetes/test/e2e/common/node/ephemeral_containers.go`: https://storage.googleapis.com/k8s-triage/index.html?test=Ephemeral%20Containers + +### Test Plan ### Graduation Criteria @@ -662,19 +722,19 @@ None of the tests for this feature are unusual or tricky. - [x] Ephemeral Containers API has been in alpha for at least 2 releases. - [x] Ephemeral Containers support namespace targeting. -- [ ] Metrics for Ephemeral Containers are added to existing contain creation +- [x] Metrics for Ephemeral Containers are added to existing contain creation metrics. - [x] CLI using Ephemeral Containers for debugging checked into a Kubernetes project repository (e.g. in `kubectl` or a `kubectl` plugin). - [x] A task on https://kubernetes.io/docs/tasks/ describes how to troubleshoot a running pod using Ephemeral Containers. -- [ ] Ephemeral Container creation is covered by e2e-node tests. -- [ ] Update via `/ephemeralcontainers` validates entire PodSpec to protect against future bugs. +- [x] Ephemeral Container creation is covered by e2e-node tests. +- [x] Update via `/ephemeralcontainers` validates entire PodSpec to protect against future bugs. #### Beta -> GA Graduation -- [ ] Ephemeral Containers have been in beta for at least 2 releases. -- [ ] Ephemeral Containers see use in 3 projects or articles. +- [x] Ephemeral Containers have been in beta for at least 2 releases. +- [x] Ephemeral Containers see use in 3 projects or articles. - [ ] Ephemeral Container creation is covered by [conformance tests]. - [ ] The following cosmetic codebase TODOs are resolved: - [ ] kubectl incorrectly suggests a debug container can be reattached after exit @@ -730,34 +790,69 @@ you need any help or guidance. ### Feature Enablement and Rollback -_This section must be completed when targeting alpha to a release._ + + +###### How can this feature be enabled / disabled in a live cluster? + + -* **How can this feature be enabled / disabled in a live cluster?** - [x] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: EphemeralContainers - Components depending on the feature gate: kube-apiserver, kubelet - - [ ] Other - - Describe the mechanism: - - Will enabling / disabling the feature require downtime of the control - plane? - - Will enabling / disabling the feature require downtime or reprovisioning - of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). -* **Does enabling the feature change any default behavior?** +###### Does enabling the feature change any default behavior? + + No, this feature does not change existing behavior. -* **Can the feature be disabled once it has been enabled (i.e. can we roll back - the enablement)?** +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + Yes. Any running ephemeral containers will continue to run, but they will become inaccessible and exit when the Pod is deleted. -* **What happens if we reenable the feature if it was previously rolled back?** +###### What happens if we reenable the feature if it was previously rolled back? This behaves as expected: the feature will begin working again. -* **Are there any tests for feature enablement/disablement?** +###### Are there any tests for feature enablement/disablement? + + Some unit tests are exercised with the feature both enabled and disabled to verify proper behavior in both cases. Integration test verify that the API @@ -780,9 +875,21 @@ _This section must be completed when targeting alpha to a release._ ### Rollout, Upgrade and Rollback Planning -_This section must be completed when targeting beta graduation to a release._ + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + This feature allows setting a new field, `ephemeralContainers` in a Pod spec. Enabling the feature won't affect existing workloads since they were not @@ -790,16 +897,24 @@ _This section must be completed when targeting beta graduation to a release._ Component restarts won't affect this feature. -* **What specific metrics should inform a rollback?** +###### What specific metrics should inform a rollback? + + A rollback is only indicated if there's a catastrophic failure that prevents the cluster from functioning normally, for example if pod or container creation begins to fail. -* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** - Describe manual testing that was done and the outcomes. - Longer term, we may want to require automated upgrade/rollback tests, but we - are missing a bunch of machinery and tooling and can't do that now. +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + Since this feature is not critical to production workloads, the main risk is that enabling the feature by default will adversely affect other components. @@ -831,16 +946,30 @@ _This section must be completed when targeting beta graduation to a release._ ephemeral containers. We'll investigate whether it's possible to add ephemeral containers to these existing tests. -* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, -fields of API types, flags, etc.?** +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + No. ### Monitoring Requirements + -_This section must be completed when targeting beta graduation to a release._ -* **How can an operator determine if the feature is in use by workloads?** +###### How can an operator determine if the feature is in use by workloads? + + This information is available by examining pod objects in the API server for the field `pod.spec.ephemeralContainers`. Additionally, the kubelet surfaces @@ -855,38 +984,86 @@ _This section must be completed when targeting beta graduation to a release._ when this kubelet starts containers, idnexed by `container_type`. Ephemeral containers have a `container_type` of `ephemeral_container`. -* **What are the SLIs (Service Level Indicators) an operator can use to determine -the health of the service?** +###### How can someone using this feature know that it is working for their instance? + + + +- [x] Events + - Event Reason: (same as Containers/InitContainers) +- [x] API .status + - Other field: pod.status.ephemeralContainerStatuses[x].state + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + - [x] Metrics - Metric name: `apiserver_request_total{component="apiserver",resource="pods",subresource="ephemeralcontainers"}` (apiserver), `kubelet_started_containers_errors_total{container_type="ephemeral_container"}` - [Optional] Aggregation method: Aggregate by container type - Components exposing the metric: apiserver, kubelet - - [ ] Other (treat as last resort) - - Details: -* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** - At a high level, this usually will be in the form of "high percentile of SLI - per day <= X". It's impossible to provide comprehensive guidance, but at the very - high level (needs more precise definitions) those may be things like: +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + Ephemeral containers are, by design, best effort. We are unable to offer an SLO for ephemeral containers until the kubelet supports some sort of dynamic resource reallocation. -* **Are there any missing metrics that would be useful to have to improve observability -of this feature?** + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + No. ### Dependencies -_This section must be completed when targeting beta graduation to a release._ + + +###### Does this feature depend on any specific services running in the cluster? -* **Does this feature depend on any specific services running in the cluster?** + - Runtime support for [Namespace targeting]. - Usage description: One feature of Ephemeral containers, namespace @@ -900,20 +1077,42 @@ _This section must be completed when targeting beta graduation to a release._ ### Scalability -_For alpha, this section is encouraged: reviewers should consider these questions -and attempt to answer them._ + -_For GA, this section is required: approvers should be able to confirm the -previous answers based on experience in the field._ +###### Will enabling / using this feature result in any new API calls? -* **Will enabling / using this feature result in any new API calls?** + Not in a meaningful way. Any additional calls would fall within existing usage patterns of humans interactive with Pods. -* **Will enabling / using this feature result in introducing new API types?** +###### Will enabling / using this feature result in introducing new API types? + + There an no new Kinds for storage, but new types are used in `v1.Pod`. Ephemeral containers are added by writing a `v1.Pod` containing @@ -925,13 +1124,24 @@ previous answers based on experience in the field._ - Supported number of objects per cluster: same as Pods - Supported number of objects per namespace: same as Pods -* **Will enabling / using this feature result in any new calls to the cloud -provider?** +###### Will enabling / using this feature result in any new calls to the cloud provider? + + No. -* **Will enabling / using this feature result in increasing size or count of -the existing API objects?** +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + - API type(s): v1.Pod - Estimated increase in size: Additional `Container` for each Ephemeral @@ -939,14 +1149,31 @@ the existing API objects?** manually by humans. - Estimated amount of new objects: N/A -* **Will enabling / using this feature result in increasing time taken by any -operations covered by [existing SLIs/SLOs]?** +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + When users add additional containers to a Pod, the pod will have additional containers to shut down and garbage collect when the Pod exits. -* **Will enabling / using this feature result in non-negligible increase of -resource usage (CPU, RAM, disk, IO, ...) in any components?** +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + Not automatically. Use of this feature will result in additional containers running on kubelets, but it does not change the amount of resources allocated @@ -954,17 +1181,36 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?** ### Troubleshooting + -_This section must be completed when targeting beta graduation to a release._ - -* **How does this feature react if the API server and/or etcd is unavailable?** +###### How does this feature react if the API server and/or etcd is unavailable? Identical to other (non-ephemeral) containers. -* **What are other known failure modes?** +###### What are other known failure modes? + + + - Addition of ephemeral container is prohibited by API server - Detection: API server metric described in monitoring section - Mitigations: None. This doesn't affect user workloads. @@ -991,7 +1237,7 @@ _This section must be completed when targeting beta graduation to a release._ without a restart by removing authorization to `ephemeralcontainers` subresource via [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/). -* **What steps should be taken if SLOs are not being met to determine the problem?** +###### What steps should be taken if SLOs are not being met to determine the problem? Troubleshoot using apiserver and kubelet error logs. @@ -1014,6 +1260,8 @@ _This section must be completed when targeting beta graduation to a release._ - *2021-05-14*: Add additional graduation criteria - *2021-07-09*: Revert KEP to alpha because of the new API introduced in 1.22. - *2021-08-23*: Updated KEP for beta release in 1.23. +- *2022-06-10*: Updated Testing and Production Readiness sections to new format. +- *2022-06-10*: Updated KEP for stable release in 1.25. ## Drawbacks diff --git a/keps/sig-node/277-ephemeral-containers/kep.yaml b/keps/sig-node/277-ephemeral-containers/kep.yaml index 99f61dc145e..76387224869 100644 --- a/keps/sig-node/277-ephemeral-containers/kep.yaml +++ b/keps/sig-node/277-ephemeral-containers/kep.yaml @@ -19,12 +19,12 @@ see-also: - "/keps/sig-cli/1441-kubectl-debug" # The target maturity stage in the current dev cycle for this KEP. -stage: beta +stage: stable # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.23" +latest-milestone: "v1.25" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: