-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K3S Claims that pods are running but hosts (nodes) are dead #1264
Comments
I restarted everything and booted only master to get the same result |
I waited 30 minutes but nothing happen, I managed to drain nodes manually with this script:
although this script should never be needed, the whole point of Kubernetes is to have ability to self healing |
I found that when you delete a NotReady node it will actually reassign pods but worker gets added to the cluster only after k3s-agent service is rebooted |
I powered off a worker node (worker2) on my 3 node Raspberry PI4 cluster running Rook/Ceph some 3.5 hours ago and my cluster still has not really recovered. If we overlook the the Wordpress failure due to the fact that the the new instance cannot bind to the pvc because still thinks there is a claim from the terminating instance on the powered off node the k3s provisioned traefik lb instance is still listed as terminating and hanging there. The things that have recovered are the ones (mostly rook) that do not have a pvc so even though the instances on the failed node are still listed as terminating it does not stop the new instances coming up. Am I missing something here regarding Kunernetes node failure.
|
So powering up the 'failed' node allowed all the 'terminating' instances to finally end, the Rook config sorted itself out and my Wordpress instance finally came back along with Certman as the pvc (on work press) finally was released, |
I learned that there's a difference between having a node in NotReady state and deleting the node. When your node goes into NotReady state then Kubernetes will not reschedule running pods to other ones as Kubernetes cannot distinguish between node restart, network error or kubelet error. Kubernetes will reschedule pods only when it's sure that they are not running and just because node is in NotReady state does not mean that pods are not running, they might be running but just the fact that Kubernetes cannot communicate with kubelet does not mean that they are not running :/ It's really a bummer for me as
Although that's just my point of view, it's really weird that k3s on it's own does not seem to support The script that I published cordons the faulty nodes, drains them and then eventually deletes them, it will uncordon the node once it's in Ready state. K3s seems to be rejoining the master only when it restarts though |
Please see https://kubernetes.io/docs/concepts/architecture/nodes/, from that link:
So pods stuck in a The key in the original issue is, "Controller detected that all Nodes are not-Ready. Entering master disruption mode.", looks to be related to kubernetes/kubernetes#42733. If all of the nodes become |
@erikwilson in my case none of the pods was in Terminating/Unknown state (it was the same when only one node was NotReady) and that issue was fixed? 🤔 Will set that |
It looks like the expected behavior, also see from that docs link:
|
Ok thanks, that would tie in with last time I tried this as it was on an earlier version of Kubernetes and I was not aware of that change, also I think I have done this on a Rancher managed cluster with some node management options set so never had an issue. |
Hi. I’m experiencing the same issue and mitigated it with the following script in my launch template user data: kubectl get nodes |
awk -v "host=$(hostname)" '$1 != host && $2 == "NotReady" { print $1 }' |
xargs --no-run-if-empty kubectl delete node So when one node goes down, the autoscaling group creates a new instance that will run the above script when booting. I advise you to triple check that The node draining was not working and getting stuck forever since the target node was dead. So much for HA! |
I am seeing this issue with using kube-vip in a daemonset, more information about my issue is here. k3s version: My masters config: cluster-init: true
cluster-cidr: 10.69.0.0/16
disable:
- flannel
- traefik
- servicelb
- metrics-server
- local-storage
disable-cloud-controller: true
disable-network-policy: true
docker: false
flannel-backend: none
kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
node-ip: 192.168.42.10
service-cidr: 10.96.0.0/16
tls-san:
- 192.168.69.5
write-kubeconfig-mode: '644'
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true My worker nodes: kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
node-ip: 192.168.42.13 I can see the taints were added to my
|
Whats managing those pods? Daemonset/deployment/etc? Whatever's going on here is core Kubernetes behavior; I suspect it's just not doing what you expected. |
Hi, |
@jawabuu Is there any document I can refer to about the arguments mentioned in your notes?
|
Any updates? I am experiencing the same behavior. |
This would be the responsibility of the Kubernetes controller-manager. Can you show the output of |
Hi, may I check any solution for this problem? I am using v1.21.4. I also see the problem.
|
@cwalterhk you appear to have an agent that is running a newer version of Kubernetes than the server. This is not supported; please upgrade your servers if you are going to have agents running 1.22 |
I just created a new cluster using the latest version. However, I still see the same problem. Even s1 is not available, pods does not restart to other nodes.
|
After waiting for about 8 minutes, it is terminating. Thank you very much. Can I check how to detect failure faster and restart the pods in another nodes?
|
With these options I was able to reduce your mentioned 8 minutes to ~20 seconds:
|
And where did you put these parameters? On the master node(s)? Or on the workers as well? |
same problem here:
|
I had the same problem. |
Hey, where exactly did you pass these arguments? |
to the ExecStart in the systemd service: |
Closing as this appears to be expected upstream behavior with a valid workaround |
Unfortunately, many of these parameters are removed from Kubernetes v1.27. See for example the |
They have not been removed. They've been listed as depreciated for ages but I am not aware of any actual work to remove them and force use of a config file. |
I have a deployment with a pvc attached in mode ReadWriteOnce, So to test this, i turned off the k3s service on one of the nodes, after waiting some time, the pods did go to terminating state, but now the deployment with pvc wont start because of the volume is still attached to the older pod is it possible to delete or evict the pods instead of them being stuck in terminating stage? |
@anshuman852 I think this "terminating" state means it tries to perform eviction or delete. But it is not able - either because kubelet is not responding or because there is finalizer on the pod. You can try checking pod manifests and logs of kube-controller-manager what is happening and what is the issue. |
Version:
k3s version v1.0.0 (18bd921)
Describe the bug
I have a cluster that consists of 1 master and 3 workers, after I unplugged 3 workers none of running pods were reassigned to master from workers and Kubectl claims that they are alive:
I believe that the self healing should happen and it should run all those pods on master, I plugged in one worker and pods from two other ones were not assigned to it
journalctl from last 20 minutes:
The text was updated successfully, but these errors were encountered: