release-note: Describe issues around node admission in 1.22 #107348

smarterclayton · 2022-01-05T20:50:40Z

The 1.22 release fixed an issue where pods that were terminating
were not always properly accounting for the resources they used.
As a consequence, certain workloads that saturate a single node with
pods may see increased pod creation failures until existing pods
fully terminate. Inform users of that change and link to where we
will resolve in the future.

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2022-01-05T20:50:47Z

@smarterclayton: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-01-05T20:51:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~CHANGELOG/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2022-01-05T22:06:32Z

CHANGELOG/CHANGELOG-1.22.md

@@ -826,6 +826,10 @@ A regression bug was found where guaranteed Pods with multiple containers do not

 If CSIMigrationvSphere feature gate is enabled, user should not upgrade to Kubernetes v1.22. vSphere CSI Driver does not support Kubernetes v1.22 yet because it uses v1beta1 CRD APIs. Support for v1.22 will be added at a later release. Check the following document for supported Kubernetes releases for a given [vSphere CSI Driver version](https://vsphere-csi-driver.sigs.k8s.io/compatiblity_matrix.html#compatibility-matrix-for-vsphere-csi-driver).

+### Workloads that saturate nodes with pods may see pods that fail due to node admission
+
+1.22 addressed a long-standing issue in the Kubelet where terminating pods were [vulnerable to race conditions](https://github.com/kubernetes/kubernetes/pull/102344) leading to early shutdown, resource leaks, or long delays in actually completing pod shutdown. As a consequence of this change the Kubelet now correctly takes into account the resources of running and terminating pods when deciding to accept new pods, since terminating pods are still holding on to those resources. This stricter handling may surface to end users as pod rejections when creating pods that are scheduled to mostly full nodes that have other terminating pods holding the resources the new pods need. The most likely error would be a pod set to `Failed` phase with reason set to `OutOfCpu` or `OutOfMemory`, but any resource on the node that has some fixed limit (including persistent volume counts on cloud nodes, exclusive CPU cores, or unique hardware devices) could trigger the failure. While this behavior is correct it reduces the throughput of pod execution and creates user-visible warnings - [future versions of Kubernetes will minimize the likelihood users see rejected pods](https://github.com/kubernetes/kubernetes/issues/106884).


it's a "warning" in the case of a workload pod (replicaset, statefulset, etc). But it's a complete failure in the case of plain pods or jobs with a small backoffLimit.
Can we suggest users to not use plain pods or backoffLimit=0?

It was already a complete failure for those users during eviction anyway. Do we have a doc that describes the recommended interactions here to reference directly, vs inlining it? I'd probably prefer

https://kubernetes.io/docs/concepts/scheduling-eviction/#pod-disruption is close, we could potentially describe in more detail there.

https://kubernetes.io/docs/tasks/extend-kubernetes/ is probably a good place where we could describe "writing a controller that schedules pods onto a cluster" in detail, including this warning.

Added a stub sentence pending us deciding where to put a better section.

can we say "user-visible pod failures" instead?

Maybe a better link is https://kubernetes.io/docs/concepts/workloads/

In the context of the discussion, https://kubernetes.io/docs/concepts/workloads/ doesn't seem to address taking kubelet rejection into account directly - which part do you view as important to communicate to users?

dims · 2022-01-14T18:16:58Z

/assign alculquicondor

@smarterclayton one suggestion inline from reviewer!

CHANGELOG/CHANGELOG-1.22.md

alculquicondor · 2022-01-17T18:53:31Z

CHANGELOG/CHANGELOG-1.22.md

@@ -826,6 +826,10 @@ A regression bug was found where guaranteed Pods with multiple containers do not

 If CSIMigrationvSphere feature gate is enabled, user should not upgrade to Kubernetes v1.22. vSphere CSI Driver does not support Kubernetes v1.22 yet because it uses v1beta1 CRD APIs. Support for v1.22 will be added at a later release. Check the following document for supported Kubernetes releases for a given [vSphere CSI Driver version](https://vsphere-csi-driver.sigs.k8s.io/compatiblity_matrix.html#compatibility-matrix-for-vsphere-csi-driver).

+### Workloads that saturate nodes with pods may see pods that fail due to node admission
+
+1.22 addressed a long-standing issue in the Kubelet where terminating pods were [vulnerable to race conditions](https://github.com/kubernetes/kubernetes/pull/102344) leading to early shutdown, resource leaks, or long delays in actually completing pod shutdown. As a consequence of this change the Kubelet now correctly takes into account the resources of running and terminating pods when deciding to accept new pods, since terminating pods are still holding on to those resources. This stricter handling may surface to end users as pod rejections when creating pods that are scheduled to mostly full nodes that have other terminating pods holding the resources the new pods need. The most likely error would be a pod set to `Failed` phase with reason set to `OutOfCpu` or `OutOfMemory`, but any resource on the node that has some fixed limit (including persistent volume counts on cloud nodes, exclusive CPU cores, or unique hardware devices) could trigger the failure. While this behavior is correct it reduces the throughput of pod execution and creates user-visible warnings - [future versions of Kubernetes will minimize the likelihood users see rejected pods](https://github.com/kubernetes/kubernetes/issues/106884).


can we say "user-visible pod failures" instead?

Maybe a better link is https://kubernetes.io/docs/concepts/workloads/

The 1.22 release fixed an issue where pods that were terminating were not always properly accounting for the resources they used. As a consequence, certain workloads that saturate a single node with pods may see increased pod creation failures until existing pods fully terminate. Inform users of that change and link to where we will resolve in the future.

alculquicondor · 2022-02-04T21:56:22Z

/lgtm

k8s-triage-robot · 2022-02-05T00:57:55Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-triage-robot · 2022-02-05T04:58:34Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 5, 2022

smarterclayton mentioned this pull request Jan 5, 2022

Status of pods can become "OutOfCpu" when many pods are created and completed in a short time on the same node. #106884

Closed

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 5, 2022

k8s-ci-robot requested review from cici37 and Damans227 January 5, 2022 20:51

alculquicondor reviewed Jan 5, 2022

View reviewed changes

k8s-ci-robot assigned alculquicondor Jan 14, 2022

smarterclayton force-pushed the warn_about_admission branch 2 times, most recently from 0147160 to 885d20e Compare January 17, 2022 18:42

alculquicondor reviewed Jan 17, 2022

View reviewed changes

smarterclayton force-pushed the warn_about_admission branch from 885d20e to 2819891 Compare February 4, 2022 21:41

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2022

k8s-ci-robot merged commit a274ec0 into kubernetes:master Feb 5, 2022

k8s-ci-robot added this to the v1.24 milestone Feb 5, 2022

aojea mentioned this pull request Nov 8, 2022

grace-period=0 --force Forced deletion failed #113717

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-note: Describe issues around node admission in 1.22 #107348

release-note: Describe issues around node admission in 1.22 #107348

smarterclayton commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

alculquicondor Jan 5, 2022

smarterclayton Jan 17, 2022

alculquicondor Jan 17, 2022

smarterclayton Feb 4, 2022

dims commented Jan 14, 2022

alculquicondor Jan 17, 2022

alculquicondor commented Feb 4, 2022

k8s-triage-robot commented Feb 5, 2022

k8s-triage-robot commented Feb 5, 2022

release-note: Describe issues around node admission in 1.22 #107348

release-note: Describe issues around node admission in 1.22 #107348

Conversation

smarterclayton commented Jan 5, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

alculquicondor Jan 5, 2022

Choose a reason for hiding this comment

smarterclayton Jan 17, 2022

Choose a reason for hiding this comment

alculquicondor Jan 17, 2022

Choose a reason for hiding this comment

smarterclayton Feb 4, 2022

Choose a reason for hiding this comment

dims commented Jan 14, 2022

alculquicondor Jan 17, 2022

Choose a reason for hiding this comment

alculquicondor commented Feb 4, 2022

k8s-triage-robot commented Feb 5, 2022

k8s-triage-robot commented Feb 5, 2022