-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node status down -> pod status unknown -> pod cant be deleted, cordon doesnt finish. #51333
Comments
@jayunit100
Note: Method 1 will trigger an email to the group. You can find the group list here and label list here. |
Here's a snippet reproducing the exact behaviour.
|
/sig node |
@jayunit100 Have you tried forceful deletion?
|
this is expected behavior. previously, the node controller would have deleted these pods, but the behavior changed in kube 1.5 to require the admin to forceful delete the pod per the above comment. |
Regardless of the intent of the 1.5 changes, as things stand now at release 1.9.2, even when using --grace-period=0 --force the pod is not deleted. The status remains Terminated. The web UI shows the status with a "moon" incon that displays the tooltip "This pod is in a pending state" and the status column displaysTerminated:ExitCode:$. But this is only the beginning of many potential other woes. If the pod was part of a daemonset the pod cannot be replaced. And, BTW, the dameonset cannot be deleted because it refererences the terminated but pending pod. I hate cliche terms, but the term "zombie pod" someone used in another post out there almost seems appropriate here. The only way at the moment I can see to resolve this is to clean up the etcd data. Since there are no etcd cleanup tools for K8S, that means the etcd data must be waxed, which really means the K8S cluster must be torn down and stood up again. That is obviously not very elegant and is disruptive, which in most production scenarios is not an acceptable alternative. |
@lsylvain We have the same problem. What we've determined to fix for the time being is that you need to create a node object with the name of the one that died, bounce your schedulers & controllers, and then delete the node object. This seems to allow the scheduler to pick a new, good, node |
Sorry for the slow response @kostyrev. Glad you were able to figure it out and that it worked for you too! |
Hi, I met the same problem.. When a node crashed, we expect pods to be recreated on some label nodes. But these pods stuck. |
Hi, we are facing the same issue. When a node is crashed/deleted and a pod was running there, we can not remove the pod from the list:
However, it is still there. Not deleted. Is there any way to remove it from the list? |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
A pod went in to zombie state (unknown) where it wouldn't be deleted after its node went down.
What you expected to happen:
Pods should readily delete.
How to reproduce it (as minimally and precisely as possible):
Not sure: But in this case we had a pod with 3 containers, the node went down (literally, couldn't ssh into it), and then, run
kubectl delete pod
infinitely, the delete operation succeeds, rather then failing - and in the pod is truly never deleted.Anything else we need to know?:
Looking at the pod conditions, it clearly is aware that it lost its node.
Environment:
kubectl version
):kubectl version
The text was updated successfully, but these errors were encountered: