Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grace period override used for immediate evictions in eviction manager #119570

Closed

Conversation

claassen
Copy link

@claassen claassen commented Jul 25, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently when evicting pods due to disk or memory pressure, or when a pod's storage use has exceeded the configured ephemeral-storage resource limit we use a grace period override of 0 during eviction which actually ends up allowing the pod's full configured grace period rather than performing an immediate eviction due to the logic here:

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_workers.go#L1000-L1002

In order to actually perform an immediate eviction we should instead use a grace period override of 1 in these cases.

Which issue(s) this PR fixes:

Fixes #115819

Special notes for your reviewer:

Note that the linked issue also mentions a similar problem when force deleting pods via --grace-period=0 --force. This PR makes no attempt to address that issue. There are some notes around the --force option using a grace period of 0 for backwards compatibility which I am not sure the reason for, but pod deletion using --now or the equivalent --grace-period=1 behaves as expected and serves to illustrate how the correct grace period override for immediate eviction should be 1 rather than 0.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 25, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 25, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @claassen!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 25, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @claassen. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: claassen
Once this PR has been reviewed and has the lgtm label, please assign random-liu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 25, 2023
@claassen claassen marked this pull request as ready for review July 25, 2023 20:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2023
@@ -294,7 +295,7 @@ func TestDiskPressureNodeFs_VerifyPodStatus(t *testing.T) {
wantPodStatus: v1.PodStatus{
Phase: v1.PodFailed,
Reason: "Evicted",
Message: "The node was low on resource: ephemeral-storage. Threshold quantity: 2Gi, available: 1536Mi. ",
Message: "The node was low on resource: ephemeral-storage. Threshold quantity: 2Gi, available: 1536Mi. Container above-requests was using 700Mi, request is 100Mi, has larger consumption of ephemeral-storage. ",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The messages here changed because previously in the test helper we were not setting the container name on the ContainerStats objects so it wasn't picking up the resource usage

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Jul 26, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Aug 2, 2023

/triage accepted
/priority important-soon
/assign @bobbypage @SergeyKanzhelev

@rphillips
Copy link
Member

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-eviction

@k8s-ci-robot
Copy link
Contributor

@claassen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction 12a8836 link false /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
pull-crio-cgroupv1-node-e2e-eviction 12a8836 link false /test pull-crio-cgroupv1-node-e2e-eviction

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tuibeovince
Copy link

Can't we fix this for all cases if we handle it in pod_workers.go? Something like:

diff --git a/pkg/kubelet/pod_workers.go b/pkg/kubelet/pod_workers.go
index 20e8b493a8f..b90ccad3e5a 100644
--- a/pkg/kubelet/pod_workers.go
+++ b/pkg/kubelet/pod_workers.go
@@ -981,10 +981,12 @@ func calculateEffectiveGracePeriod(status *podSyncStatus, pod *v1.Pod, options *
        // enforce the restriction that a grace period can only decrease and track whatever our value is,
        // then ensure a calculated value is passed down to lower levels
        gracePeriod := status.gracePeriod
+       overriden := false
        // this value is bedrock truth - the apiserver owns telling us this value calculated by apiserver
        if override := pod.DeletionGracePeriodSeconds; override != nil {
                if gracePeriod == 0 || *override < gracePeriod {
                        gracePeriod = *override
+                       overriden = true
                }
        }
        // we allow other parts of the kubelet (namely eviction) to request this pod be terminated faster
@@ -992,12 +994,13 @@ func calculateEffectiveGracePeriod(status *podSyncStatus, pod *v1.Pod, options *
                if override := options.PodTerminationGracePeriodSecondsOverride; override != nil {
                        if gracePeriod == 0 || *override < gracePeriod {
                                gracePeriod = *override
+                               overriden = true
                        }
                }
        }
        // make a best effort to default this value to the pod's desired intent, in the event
        // the kubelet provided no requested value (graceful termination?)
-       if gracePeriod == 0 && pod.Spec.TerminationGracePeriodSeconds != nil {
+       if !overriden && gracePeriod == 0 && pod.Spec.TerminationGracePeriodSeconds != nil {
                gracePeriod = *pod.Spec.TerminationGracePeriodSeconds
        }
        // no matter what, we always supply a grace period of 1

I agree with this perspective that fixing the problem on pod_workers.go side may be enough to cover all the cases.
But since, there is a code block that checks for all grace period values < 1 and sets it all to 1:

    // no matter what, we always supply a grace period of 1

leaving this PR's fix for assigning the grace period to 1 during evictions will skip this one condition check and be a less complicated implementation. In my opinion, there might be merit in applying the fix in both eviction_manager.go and pod_workers.go. What do you think?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@zcahana
Copy link

zcahana commented Dec 12, 2023

In my opinion, there might be merit in applying the fix in both eviction_manager.go and pod_workers.go. What do you think?

Applying the fix to pod_workers.go will have the added benefit of solving this for force deleted pods, which too are stopped with their full terminationGracePeriodSeconds instead of "immediately" (see #108741, as well as this comment in the original issue solved by this PR). While not explicitly in-scope for this PR, this could catch 2 (related) birds.

@tuibeovince
Copy link

Applying the fix to pod_workers.go will have the added benefit of solving this for force deleted pods, which too are stopped with their full terminationGracePeriodSeconds instead of "immediately" (see #108741, as well as this comment in the original issue solved by this PR). While not explicitly in-scope for this PR, this could catch 2 (related) birds.

I share the same sentiments. A fix in pod_workers.go may just be sufficient. Still, as stated in the issue being solved by this PR, the convention of "immediate", the documentation, and such must also be addressed with equal priority.

Sorry but I am still unfamiliar with the process of redefining conventions yet (I would love to know).

@dims dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 4, 2024
@tuibeovince
Copy link

There is a PR that I am also looking into that addresses the problem entirely on the pod_workers.go end (#120451) This might be a point of interest or a place for further discussion about this fix.

@tuibeovince
Copy link

@Seaiii Thank you. I will look into your PR as well.

@Seaiii
Copy link

Seaiii commented Jan 12, 2024

@Seaiii Thank you. I will look into your PR as well.

Oh, sorry, I misunderstood, you were talking about forcing the first deletion to be 0. I mentioned issue before and now it doesn't allow first time deletion with a value of 0 so I turned it off

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

SIG Node PR Triage automation moved this from Needs Approver to Done Feb 11, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Member

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Feb 13, 2024
SIG Node PR Triage automation moved this from Done to Triage Feb 13, 2024
@k8s-ci-robot
Copy link
Contributor

@rphillips: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh
Copy link
Contributor

bart0sh commented Feb 13, 2024

/remove-lifecycle rotten

@claassen please, rebase the PR, thanks.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 13, 2024
@bart0sh bart0sh moved this from Triage to Waiting on Author in SIG Node PR Triage Feb 13, 2024
@olyazavr
Copy link
Contributor

👋 We've encountered this bug in our setup, and would be interested in seeing this merged in

@olyazavr
Copy link
Contributor

I've rebased this PR here: #124063

@bart0sh
Copy link
Contributor

bart0sh commented Mar 29, 2024

/close
in favor of #124063

@k8s-ci-robot
Copy link
Contributor

@bart0sh: Closed this PR.

In response to this:

/close
in favor of #124063

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node PR Triage automation moved this from Waiting on Author to Done Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

kubelet: Evicted and force deleted pods get their full termination grace period when they should not