-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix grace period override used for immediate evictions in eviction manager #119570
Fix grace period override used for immediate evictions in eviction manager #119570
Conversation
Welcome @claassen! |
Hi @claassen. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: claassen The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -294,7 +295,7 @@ func TestDiskPressureNodeFs_VerifyPodStatus(t *testing.T) { | |||
wantPodStatus: v1.PodStatus{ | |||
Phase: v1.PodFailed, | |||
Reason: "Evicted", | |||
Message: "The node was low on resource: ephemeral-storage. Threshold quantity: 2Gi, available: 1536Mi. ", | |||
Message: "The node was low on resource: ephemeral-storage. Threshold quantity: 2Gi, available: 1536Mi. Container above-requests was using 700Mi, request is 100Mi, has larger consumption of ephemeral-storage. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The messages here changed because previously in the test helper we were not setting the container name on the ContainerStats
objects so it wasn't picking up the resource usage
/triage accepted |
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction |
@claassen: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I agree with this perspective that fixing the problem on pod_workers.go side may be enough to cover all the cases.
leaving this PR's fix for assigning the grace period to 1 during evictions will skip this one condition check and be a less complicated implementation. In my opinion, there might be merit in applying the fix in both eviction_manager.go and pod_workers.go. What do you think? |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Applying the fix to |
I share the same sentiments. A fix in pod_workers.go may just be sufficient. Still, as stated in the issue being solved by this PR, the convention of "immediate", the documentation, and such must also be addressed with equal priority. Sorry but I am still unfamiliar with the process of redefining conventions yet (I would love to know). |
There is a PR that I am also looking into that addresses the problem entirely on the |
@Seaiii Thank you. I will look into your PR as well. |
Oh, sorry, I misunderstood, you were talking about forcing the first deletion to be 0. I mentioned issue before and now it doesn't allow first time deletion with a value of 0 so I turned it off |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@rphillips: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten @claassen please, rebase the PR, thanks. |
👋 We've encountered this bug in our setup, and would be interested in seeing this merged in |
I've rebased this PR here: #124063 |
/close |
@bart0sh: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently when evicting pods due to disk or memory pressure, or when a pod's storage use has exceeded the configured ephemeral-storage resource limit we use a grace period override of 0 during eviction which actually ends up allowing the pod's full configured grace period rather than performing an immediate eviction due to the logic here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_workers.go#L1000-L1002
In order to actually perform an immediate eviction we should instead use a grace period override of 1 in these cases.
Which issue(s) this PR fixes:
Fixes #115819
Special notes for your reviewer:
Note that the linked issue also mentions a similar problem when force deleting pods via
--grace-period=0 --force
. This PR makes no attempt to address that issue. There are some notes around the--force
option using a grace period of 0 for backwards compatibility which I am not sure the reason for, but pod deletion using--now
or the equivalent--grace-period=1
behaves as expected and serves to illustrate how the correct grace period override for immediate eviction should be 1 rather than 0.Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
NONE