- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
The primary goal of this enhancement is to define the OperatingSystem scheduling requirements for a Pod in a manner which is common in the Kubelet as well as the KubernetesControlPlane. on which a pod would like to run.
Identifying the OS of the pods during the API Server admission time is crucial to apply appropriate security constraints to the pod. In the absence of having information about a target OS for a pod, some admission plugins may apply unnecessary security constraints to the pods(which may prevent them from running properly) or in the worst case, don't apply security constraints at all.
Admission plugins and/or Admission Webhooks identifying the Operating System on which pod intends to run.
- Interaction with container runtimes.
- Interaction with scheduler (this may change in future).
We propose to add a new field to the pod spec called
os to identify the OS of the containers specified in the pod. There is no default
value for this OS field or the Name field in OS struct.
type PodSpec struct {
// If specified, indicates the type of OS on which pod intends
// to run. This field is alpha-level and is only honored by servers that
// enable the IdentifyPodOS feature. This is used to help identifying the
// OS authoritatively during the API server admission.
// +optional
OS *OS
}
// OS has information on the type of OS.
// We're making this a struct for possible future expansion.
type OS struct {
// Name of the OS. Current supported values are linux and windows.
Name string
}With the above change, all the admission plugins which validate or mutate the pod can identify the pod's OS authoritatively and can act accordingly. As of now, PodSecurityAdmission plugin is the only admission plugin can leverage this field on the pod spec. In addition, API validators like ValidatePod and ValidatePodUpdate should be modified. In future, we can have a validating admission plugin for Windows pods as Linux and Windows host capabilities are different and not all Kubernetes features are supported on Windows host.
As a Kubernetes cluster administrator, I want appropriate security contexts to be applied to my Windows Pods along with Linux pods
As a Kubernetes cluster administrator, I want to use my own admission webhook for pods based on the OS it intends to run on.
Since the OS field is optional in the pod spec, we can expect the users to omit this field when submitting pod to API server. In this scenario, we expect admission plugins to treat the pod the way it is being treated now.
The primary risk is a bug in the implementation of the admission plugins that validate or mutate based on the OS field in the pod spec. The best mitigation for that scenario is unit testing when this featuregate is enabled and disabled. Additionally, there may be some end-user confusion on the functional consequences of setting the new OS field, given that it is optional.
- Pod spec API validation will be adjusted to ensure values are not set for OS specific field that are irrelevant to the Pod's OS
- Unit tests will be added to new fields in the pod spec are classified as OS specific or not (and which OSes they are allowed for)
- E2e test that demonstrates only required OS specific fields are applied to pods during API admission time.
- Pod Security Standards will be reviewed and updated to indicate which Pod OSes they apply to
- The restricted Pod Security Standard will be reviewed to see if there are OS-specific requirements that should be added
- The PodSecurity admission implementation will be updated to skip checks which do not apply to the Pod's OS
- Unit and E2e tests which demostrate the PodSecurity admission plugin is behaving correctly with the new OS field
Pod Security Standards are to be changed in 1.25 timeframe to accommodate the supported kubelet and kube-apiserver skew.
Apart from the above API changes, we intend to make the following changes to Kubelet:
- Kubelet should reject admitting pod if the kubelet cannot honor the pod.Spec.OS.Name. For instance, if the OS.Name does not match the host os.
We let the users to explicitly specify nodeSelectors/nodeAffinities+tolerations or runtimeclasses to express their intention to run an particular OS. However, in future, once the OS struct expands, we can see if we can leverage those fields to express scheduling constraints. During the alpha, we assume there are no scheduling implications.
- Unit tests covering API server defaulting to various fields within pod with and without this feature
- Unit tests covering admission plugins which validate/mutate pod spec based on this feature
- Unit and E2E tests for Kubelet changes.
- Updates to the
sig-windowstagged tests to utilize to direct windows scheduling for all pods.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
k8s.io/kubernetes/pkg/apis/core/validation: 06/03/2022 82.1% of statements
k8s.io/kubernetes/pkg/apis/core/validation/validation.go:2979: 06/03/2022: 76.2% of statements
k8s.io/kubernetes/pkg/apis/core/validation/validation.go:3488: 06/03/2022: 92.0% of statements
k8s.io/kubernetes/pkg/apis/core/validation/validation.go:3573: 06/03/2022: 100.0% of statements
k8s.io/kubernetes/pkg/apis/core/validation/validation.go:3590: 06/03/2022: 65.3% of statements
k8s.io/kubernetes/pkg/apis/core/validation/validation.go:6360: 06/03/2022: 100.0% of statements
The pod security standard integration tests will be updated to include OS specific validations.
There are e2e tests which
- validate that the kubelet rejects the pod which have os field not matching the underlying OS
- validate that the kubelet reconciles and corrects if the node's OS label is changed
- Feature implemented behind a feature flag
- Basic units and e2e tests completed and enabled
- Expand the e2e tests with more scenarios
- Gather feedback from end users
- Tests are in Testgrid and linked in the KEP
- 2 examples of end users using this field
- The latest version of OpenShift is using OS field in pod spec to identify pod OS and apply SCCs judiciously
- Rancher, CAPZ plan to use this field.
- Upgrades:
When upgrading from a release without this feature, to a release with
IdentifyPodOSfeature enabled, there is no change to existing behavior unless user specifies OS field in the pod spec. - Downgrades:
When downgrading from a release with this feature to a release without
IdentifyPodOS, the current behavior will continue.
If the feature gate is enabled there are some kubelet implications as the code to strip security constraints based on OS can be removed and we need to add admission/denying in the kubelet logic which was mentioned above. Older Kubelets without this change will continue stripping the unnecessary fields in the pod spec which is the current behavior.
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: IdentifyPodOS
- Components depending on the feature gate:
- kubelet
- kube-apiserver
A Kubelet with a misscheduled pod (i.e. trying to run a windows pod on a linux node) will fail before trying to run a container (i.e. it will never actually invoke the underlying CRI), as opposed to after.
Yes. Using the featuregate is the only way to enable/disable this feature. We'd have unit tests which exercise the update validation code which changes as a result of this feature. The change to update validation comes from the fact that we will allow certain fields to be empty or invalidated when this OS field in the pod spec is set.
The admission plugins/Kubelet can act based on the OS field in pod spec if set by the end-user
Yes, unit and e2es tests for feature enabled, disabled
It shouldn't impact already running workloads. This is an opt-in feature since users need to explicitly set the OS parameter in the Pod spec i.e .spec.os field. If the feature is disabled the field is preserved if it was already set in the presisted pod object, otherwise it is silently dropped.
N/A
Not yet tested.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
None. This feature will be additive
By looking at the pod's spec. Some fields like Security Constraint Contexts may not be applied to the pod spec if the proper OS has been set.
Windows Pods and Linux Pods with proper OS field set in the pod spec would cause pods to run properly on desired OS
- Events
- Event Reason: Corresponding admission plugins(PodSecurityAdmission plugin) will send pod denied event.
- Event Reason: Kubelet will send pod admitted/denied event.
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
All the pods which have OS field set in the pod spec should have OS specific constraints/defaults applied.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
kube_pod_status_phase - Metric name:
apiserver_request_total - [Optional] Aggregation method:
- Components exposing the metric:
- Metric name:
- [] Other (treat as last resort)
If the pod is rejected by admission plugins, we'd get a 400 series error. The increase in 400 series errors
during pod creation/updation would give us an indication of the health. This can be measured via metric
apiserver_request_total
If the pod gets admitted at the kube-apiserver and gets rejected by kubelet, the metric kube_pod_status_phase
would give us an indication of where the failure is happening.
Are there any missing metrics that would be useful to have to improve observability of this feature?
kube-apiserver and kubelet
No
Yes
No
It increases the size of Pod object since a new string field is added.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No
The API validation would fail if API server and/or etcd is unavailable. The pod object won't be persisted to etcd.
- Windows Pod by passing windows specific validation and linux pods by passing linux specific validation even after
IdentifyPodOSfeaturegate is enabled.- Detection: Looking at
kube_pod_status_phasemetric - Mitigations: Disable the
IdentifyPodOSfeaturegate - Diagnostics: Increasing the log-level of APIServer
- Testing: Yes, unit tests are already in place
- Detection: Looking at
- Both windows and linux pods are getting rejected when
IdentifyPodOSfeaturegate is enabled.- Detection: Looking at
apiserver_request_totalmetric - Mitigations: Disable the
IdentifyPodOSfeaturegate - Diagnostics: Increasing the log-level of APIServer
- Testing: Yes, unit tests are already in place
- Detection: Looking at
Disabling the IdentifyPodOS featuregate will help in determining the problem.
- 2021-09-08: Initial KEP merged
- 2021-10-29: Initial implementation PR merged
- 2021-12-07: Feature introduced in v1.23.0 as
Alpha - 2022-01-19: Graduate the feature to Beta proposed
- 2022-05-03: Graduated to
Betain v1.24.0 - 2022-05-09: Graduate the feature to GA proposed
- 2022-08-23: Graduated to
Stablein v1.25.0
Identifying the pods targeting Windows nodes during the scheduling phase can be done in the following ways:
- Based on nodeSelector and tolerations in the pod spec with Windows node specific labels.
- Based on runtimeclasses in the pod spec
The runtimeclass is a higher level abstraction which gets translated again to nodeSelectors+tolerations. While the nodeSelector with reserved OS label is good enough, it has following shortcomings:
- Piggybacking on nodeSelectors+tolerations to definitively identify Pod OS may not be ideal experience for end-users as they can convey scheduling constraints using the same abstractions.
- We can use this field when the hostOS is not not always equal to Container OS. For example, Linux Containers on Windows(using WSL).