New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
liveness/readiness probe is executed and failed while pod is terminated #52817
Comments
/kind bug |
/sig node |
PING |
Since the upgrade to 1.7, it seems our deployment rollouts have a higher failure rate. Occasionally, the pod would come up, but no readiness probe ever gets started. It stays in that state, blocking the entire deployment. I usually have to delete the pod so it is rescheduled, and a new readiness probe is fired to check. I wonder if these are related issues here. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/reopen We see this consistently with all pods who define a liveness or readiness probe. Whenever we roll out a new deployment, the pods who are terminated will emit a failed liveness/readiness probe AFTER they have been terminated. We have considered adding a Is this an impossible-to-solve race condition between kubernetes moving parts? |
@cpnielsen: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sunao-uehara Do you still experience this? This happens to us running kubernetes 1.11.4. |
On k8s 1.12.7 and this is happening to me as well. |
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} still happen. I think just add a phase "Terminateing" in pod, when an pod is in Terminateing just stop check. |
/reopen |
@pigletfly: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
still experience this on kubernetes 1.12.4.
and probe failed event
so it seems that after the pod is deleted, the probe keeps running.I am wondering if the probe is started before the pod deletion. |
/area kubelet |
/priority important-soon |
@matthyx: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
@matthyx: GitHub didn't allow me to assign the following users: ashleyschuett. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1.16.9 here. Same issue. |
/assign |
I am also facing this issue. We have pods which have a lot of cleanup to do during shutdown, it can take up to 5 mins to terminate gracefully. During this time the livelinessProbe is detecting failure and restarting the pod. not really what we want. I am unable to prevent the service that handles the liveliness check from stopping while the cleanup is happening. It would be better if the pod was immediately removed from the service and the probes stopped while the shutdown is performed. Basically this ends up that k8s never actually is able to terminate the pod. |
I will take that point and propose a shutdown probe to API sig. |
Same here at 1.18.8 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
kubernetes_version 1.19.3. Same issue. |
Same here, kubernetes 1.19.4. |
/remove-lifecycle rotten |
/assign |
Should I consider a cherry-pick for 1.20 and 1.19? (maybe 1.18 too?) |
@matthyx we're running 1.16 and being hit by this continuously when some of our elixir apps shutdown so cherry-picking it in 1.18 would at least put it closer in our update path 👼 |
What happened:
liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X
How to reproduce it (as minimally and precisely as possible):
execute
kubectl delete pod nginx-A1
to delete pod, so status of the nginx-podA1 is changed toTerminating
, right after that it seems Liveness and Readiness Probe is executed and failed, but only once.Nginx reverse proxy is running in the pod. so I just use
httpGet
method for liveness and readinessHere is my Deployment config.
Here is Events log by
kubectl describe pod nginx-A1
Environment:
The text was updated successfully, but these errors were encountered: