New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods are not moved when Node in NotReady state #55713
Comments
/sig node |
@marczahn how long did you wait after turning off the Kubelet? By default pods won't be moved for 5m minutes which is configurable via the following flag on the controller manager.
This allows for cases like a node reboot to not reschedule pods unnecessarily. |
I know this parameter and Iwas waiting way longer for than the eviction-timeout. It definitely happened nothing. |
We encountered the same problem. Our k8s version is 1.8.4, docker version is 1.12.4 |
I wrote a script, that can be run as a cronjob:
It is actually checking if a node is down and not drained and vice versa. Hope it helps |
Got the same issue 1.9.3 . No eviction after 30 minutes.
|
+1. Is this the intended behavior? If it is, then load balancers should keep serving traffic to those pods (now they do not). |
we encounter the same problem on 1.6.3 |
got same problem as @erkules
tried to drain master node (in setup of 3 nodes), but command keeps hanging. if it helps, cluster is installed on AWS using kops 1.9.x, 3 masters on separate AZs (m4.large instances)
|
I am not quite sure, if this is related to this issue. Let me know, if I should create a new one. root@kube-controller-1:~# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/webapper-856ff74c66-59b2t 1/1 Running 0 9h
pod/webapper-856ff74c66-qhlmb 1/1 Running 0 9h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 2d
service/webapper ClusterIP 10.32.0.100 <none> 8080/TCP 6h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/webapper 2 2 2 0 9h
NAME DESIRED CURRENT READY AGE
replicaset.apps/webapper-856ff74c66 2 2 0 9h
root@kube-controller-1:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-worker-1 NotReady <none> 2d v1.11.0
kube-worker-2 NotReady <none> 2d v1.11.0
root@kube-controller-1:~# kubectl exec -ti webapper-856ff74c66-qhlmb sh
Error from server: error dialing backend: dial tcp 10.0.0.17:10250: connect: connection refused I am using the latest kubernetes version root@kube-controller-1:~# kube-apiserver --version
Kubernetes v1.11.0
root@kube-controller-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"} |
I have just run into this issue (v1.10.1). I suspect it has something to do with volumes not being detached/unmounted. |
I encountered a similar issue. I experimenting with the kubernetes autoscaler. When I manually stop a node VM, the node itself goes into NotReady state. And after a while, the pod scheduled on the removed node goes to Unknown state. At this point, Kubernetes behaves correctly by creating a new pod, and autoscaler creates a new node to schedule the new pod. However, the removed pod gets stuck in Unknown state. The original node cannot be removed by autoscaler from Kubernetes because autoscaler still thinks there is load (i.e. the stuck pod) on the node.
This is part of the output of describing the NotReady worker node, which shows that Kubernetes still thinks the stuck pod is scheduled on this node:
If we have to manually and forcefully remove the pod by using "kubectl delete pod --force --grade-period=0", it means that autoscaler will be affected and not correctly managing cluster resources without user interference. |
@zeelichsheng check kubernetes/enhancements#551 and #58635. |
Hi, I have the same problem. No pods are evicted if a node is "NotReady" even after |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I am also seeing this issue.
|
Surely this is one of the first failure modes everyone tests? It's the first worker-related failure I tested while evaluating Kubernetes. I even gracefully shutdown the worker node and let all kube processes exit cleanly. IMO it very much violates the Principle of Least Astonishment that pods assigned to NotReady nodes remain in the Running state. (1.13.3 with a single node test cluster.) |
@jtackaberry It's not so simple. The cluster nodes need an external monitor or hypervisor to reliably determine whether the NotReady node is actually shut down, in order to take into account a possible split-brain scenario. In other words, you cannot assume that pods are not running just because the node is not responding. See: kubernetes/enhancements#719 |
@jitendra1987 can you test a cluster with 1 master and 2 workers? I also tested with 1 master and 1 worker and pod eviction didn't happen. However, when there are 3 machines in the cluster, it happens normally. |
With 1 master + 2 nodes configuration the problem isn't reproducible. |
Any fix yet? Annoying issue |
fixed in 1.21.0 pods moved to healthy worker01 node from worker02 node which went down. |
Apparently there is still some issue: #101674 |
I seem to have this issue with v1.21.0 on Debian10/amd64, a test cluster of 1 master and 3 worker nodes: I create a pod without a nodeSelector, find the node it is running on, and shut down or poweroff its host emulating hardware failure or maintenance. I expect my pod to be recreated at another healthy node, but this never happens. I have to reapply the pod definition to make it run again and it says "pod XXX created. Expected behavior: the pod should have been moved/recreated to some healthy node when the timeout expires. |
@elChipardo Did you actually read the preceding comment of @victor-sudakov ? He mentions v1.21.0 |
I confirm the fix and validated with Kubernetes 1.20.6 (contained in Rke 1.2.8 / Rancher 2.5.8). |
What do you mean by "confirm the fix"? I've just checked, on Kubernetes v1.21.1/Debian10, when a Node is powered off or dies, its Pods are in Terminating status forever, and never get moved elsewhere. When the Node is back alive, its Pods are gone for good and have to be redeployed again. |
In Kubernetes 1.20.4: the shutdown of a node results in node being NotReady, but the pods hosted by the node runs like nothing happened. However doing logs or exec does not work (normal). However we noticed that pods from statefulset are not moved in another node, but are still in Terminating, while pods from deployments and jobs are also in Terminating but also rescheduled elsewhere. Maybe that is your case. |
I have never seen this happen unless the pods are part of a deployment. If you have created a pod ("kind: Pod", not "kind: Deployment") it never gets rescheduled. Maybe it's by design? |
@victor-sudakov Yes, a Pod is by definition bound to a certain Node. Rescheduling is nothing else than deleting and creating a new Pod, which is usually controlled by a ReplicaSet, which is usually owned by a Deployment. |
I have the same situation in version 1.20.0, when shutting down a node it remains tainted
But if I taint myself the node with
This seems to be same issue also: #98851 Can anyone confirm what version is solved? |
It sounds like this has been fixed in all supported versions of k8s: #55713 (comment) as such, I'm closing this |
@haircommander: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I am facing a similar issue with the daemon sets. The daemon sets pods remain in the running state when a node is in a not-ready state. Even though the node is in a not-ready state but since the pod is still in the running state, the headless service exposing the daemonset as endpoints returns the IP address of the daemon set pod corresponding to the not ready node. I understand that the daemonset pod to remain in the running state is expected behavior as the daemon set controller is not able to reach out to the API server but is there an option to ensure that headless doesn't return the IP address of the POD corresponding to the down node. |
may you need set this for kube-apiserver :
the value of
|
I'm still having this issue on EKS
|
Did you solve this problem? |
I think this is the best workaround for now. I use k8s 1.20.7 and |
This doesn't seem to apply for StatefulSets that have PVCs under a terminated/powered off node(using rook-ceph with rbd). The cronjob that @marczahn noted should work as I have tested it manually #55713 (comment) but this functionality should be covered by the Scheduler? |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
To simulate a crashed worker node I stopped the kubelet service on that node (Debian Jessie).
The node got into unknow state (resp. NotReady) as expected:
The pods running on the lls-lon-testing01 stay declared as running:
But is declared as ready: false on describe:
What you expected to happen:
I excpected the pods on the "crashed" node to be moved to the remaining node.
How to reproduce it (as minimally and precisely as possible):
In my situtation: Having a node (A)and a master+node (B) installed with Kubespray. Running at least one pod on each node. Stopping Kubelet on A and wait
Anything else we need to know?:
Environment:
kubectl version
):uname -a
): Linux lls-lon-db-master 3.16.0-4-amd64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: