K3S Claims that pods are running but hosts (nodes) are dead #1264

kamilgregorczyk · 2020-01-02T20:31:44Z

Version:
k3s version v1.0.0 (18bd921)

Describe the bug
I have a cluster that consists of 1 master and 3 workers, after I unplugged 3 workers none of running pods were reassigned to master from workers and Kubectl claims that they are alive:

➜  ~ kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
worker2   NotReady   node     15d   v1.16.3-k3s.2
worker1   NotReady   node     15d   v1.16.3-k3s.2
worker3   NotReady   node     15d   v1.16.3-k3s.2
master    Ready      master   16d   v1.16.3-k3s.2

➜  ~ kubectl get pods --all-namespaces -o wide
NAMESPACE              NAME                                                   READY   STATUS    RESTARTS   AGE    IP              NODE      NOMINATED NODE   READINESS GATES
kube-system            metrics-server-6d684c7b5-8fzld                         1/1     Running   29         16d    10.42.0.139     master    <none>           <none>
metallb-system         speaker-lv7cq                                          1/1     Running   7          3d2h   192.168.0.201   master    <none>           <none>
default                nginx-1-775985c86-4q5xq                                1/1     Running   18         5d7h   10.42.0.142     master    <none>           <none>
kube-system            coredns-d798c9dd-f2wrb                                 1/1     Running   28         16d    10.42.0.140     master    <none>           <none>
kube-system            local-path-provisioner-58fb86bdfd-8sbzq                1/1     Running   4          32h    10.42.0.141     master    <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-5996555fd8-k684f                  1/1     Running   23         15d    10.42.2.59      worker2   <none>           <none>
metallb-system         speaker-hdq7h                                          1/1     Running   5          3d2h   192.168.0.203   worker2   <none>           <none>
kube-system            nginx-nginx-ingress-controller-595c6b856c-m6997        1/1     Running   3          31h    10.42.2.58      worker2   <none>           <none>
kube-system            nginx-nginx-ingress-default-backend-6595d9d88b-vff2c   1/1     Running   2          30h    10.42.1.59      worker1   <none>           <none>
metallb-system         speaker-54h22                                          1/1     Running   5          3d2h   192.168.0.202   worker1   <none>           <none>
metallb-system         controller-57967b9448-mjgcb                            1/1     Running   5          3d2h   10.42.1.60      worker1   <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-76585494d8-bzccd             1/1     Running   31         15d    10.42.1.58      worker1   <none>           <none>
metallb-system         speaker-grzfq                                          1/1     Running   6          3d2h   192.168.0.204   worker3   <none>           <none>

I believe that the self healing should happen and it should run all those pods on master, I plugged in one worker and pods from two other ones were not assigned to it

journalctl from last 20 minutes:

pi@master:~ $ sudo journalctl -u k3s --since "20 minutes ago"
-- Logs begin at Wed 2020-01-01 22:17:01 CET, end at Thu 2020-01-02 21:30:46 CET. --
Jan 02 21:11:02 master k3s[550]: time="2020-01-02T21:11:02.232431780+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:11:02 master k3s[550]: E0102 21:11:02.273803     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:11:07 master k3s[550]: I0102 21:11:07.220756     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:07 master k3s[550]: I0102 21:12:07.249238     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:24 master k3s[550]: time="2020-01-02T21:12:24.244186839+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:12:24 master k3s[550]: E0102 21:12:24.293070     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:13:07 master k3s[550]: I0102 21:13:07.278093     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:07 master k3s[550]: I0102 21:14:07.292650     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.490492     550 controller.go:606] quota admission added evaluator for: replicasets.apps
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.505901     550 event.go:274] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"default", Name:"nginx-1", UID:"9f62d494-264e-43af
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.557512     550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.570165     550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.662089     550 event.go:274] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"default", Name:"nginx-1", UID:"55f3f3a9-ba7c-44de-
Jan 02 21:14:42 master k3s[550]: E0102 21:14:42.365014     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:15:07 master k3s[550]: I0102 21:15:07.319919     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:15:08 master k3s[550]: time="2020-01-02T21:15:08.153599154+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:15:08 master k3s[550]: E0102 21:15:08.180379     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:16:07 master k3s[550]: I0102 21:16:07.334593     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:04 master k3s[550]: time="2020-01-02T21:17:04.958137785+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:49916: i/o
Jan 02 21:17:07 master k3s[550]: I0102 21:17:07.362715     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:29 master k3s[550]: time="2020-01-02T21:17:29.298245766+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:35906: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580422869+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.201:59938: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580988438+01:00" level=error msg="Remotedialer proxy error" error="read tcp 192.168.0.201:59938->192.168.0.201:6443: i/o t
Jan 02 21:17:31 master k3s[550]: time="2020-01-02T21:17:31.605583377+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52256: i/o
Jan 02 21:17:34 master k3s[550]: I0102 21:17:34.106828     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:17:35 master k3s[550]: time="2020-01-02T21:17:35.587707038+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:37 master k3s[550]: E0102 21:17:37.476561     550 pod_workers.go:191] Error syncing pod ed9ae66f-71eb-49b0-b0d3-05dedd447d5f ("local-path-provisioner-58fb86bdfd-8sbzq_k
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200829872+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200932464+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.318397     550 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the ser
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.823064     550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:42 master k3s[550]: W0102 21:17:42.211175     550 garbagecollector.go:640] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to
Jan 02 21:17:43 master k3s[550]: time="2020-01-02T21:17:43.201207406+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521597124+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521784474+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:44 master k3s[550]: E0102 21:17:44.827117     550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:45 master k3s[550]: E0102 21:17:45.545554     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: E0102 21:17:47.649744     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: time="2020-01-02T21:17:47.843063318+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252599     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"nginx-nginx-ingress-controller-595c6
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252704     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-7k7qv", UID:"",
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252742     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kubernetes-dashboard", Name:"kubernetes-dashboard-599655
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252776     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-vgn7b", UID:"",
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.522014337+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.557293978+01:00" level=info msg="Handling backend connection request [master]"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.767827024+01:00" level=info msg="Handling backend connection request [worker1]"
Jan 02 21:17:52 master k3s[550]: time="2020-01-02T21:17:52.492071111+01:00" level=info msg="Handling backend connection request [worker3]"
Jan 02 21:17:56 master k3s[550]: E0102 21:17:56.271504     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:58 master k3s[550]: E0102 21:17:58.666991     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:18:02 master k3s[550]: time="2020-01-02T21:18:02.494515040+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52364: i/o
Jan 02 21:18:04 master k3s[550]: time="2020-01-02T21:18:04.771525434+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:36488: i/o
Jan 02 21:18:07 master k3s[550]: I0102 21:18:07.390554     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:18:07 master k3s[550]: time="2020-01-02T21:18:07.844839055+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:51336: i/o
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.293129     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.442367     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker1", UID:"f4226dde-8c79-473b-88c3-9d65ffa
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.782884     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker3", UID:"23d267c0-0917-4852-82da-830b9e9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.927947     550 node_lifecycle_controller.go:1058] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.072909     550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.156328     550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:19:07 master k3s[550]: I0102 21:19:07.418583     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:19:42 master k3s[550]: E0102 21:19:42.370539     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:20:07 master k3s[550]: I0102 21:20:07.434826     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:20:35 master k3s[550]: time="2020-01-02T21:20:35.907270451+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:20:35 master k3s[550]: E0102 21:20:35.950947     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:21:07 master k3s[550]: I0102 21:21:07.463278     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:22:07 master k3s[550]: I0102 21:22:07.512476     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:23:07 master k3s[550]: I0102 21:23:07.549374     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:07 master k3s[550]: I0102 21:24:07.564454     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:42 master k3s[550]: E0102 21:24:42.365024     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:25:07 master k3s[550]: I0102 21:25:07.579445     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:26:07 master k3s[550]: I0102 21:26:07.594475     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:07 master k3s[550]: I0102 21:27:07.623520     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:30 master k3s[550]: I0102 21:27:30.102377     550 node_lifecycle_controller.go:1085] Controller detected that some Nodes are Ready. Exiting master disruption mode.
Jan 02 21:27:30 master k3s[550]: time="2020-01-02T21:27:30.185484725+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:28:07 master k3s[550]: I0102 21:28:07.664740     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:07 master k3s[550]: I0102 21:29:07.699334     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:42 master k3s[550]: E0102 21:29:42.365156     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:30:07 master k3s[550]: I0102 21:30:07.713346     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io

The text was updated successfully, but these errors were encountered:

kamilgregorczyk · 2020-01-02T20:37:25Z

I restarted everything and booted only master to get the same result

kamilgregorczyk · 2020-01-02T22:10:59Z

I waited 30 minutes but nothing happen, I managed to drain nodes manually with this script:

#!/bin/bash

KUBECTL="/usr/local/bin/kubectl"

NOT_READY_NODES=$($KUBECTL get nodes | grep  'NotReady' | awk '{print $1}')

while IFS= read -r line; do
	if [[ ! $line =~ [^[:space:]] ]] ; then
  		continue
	fi
	echo "Found $line node to be dead, draining..."
	$KUBECTL drain --ignore-daemonsets --force $line
done <<< "$NOT_READY_NODES"

READY_NODES=$(kubectl get nodes | grep '\sReady,SchedulingDisabled' | awk '{print $1}')

while IFS= read -r line; do
	if [[ ! $line =~ [^[:space:]] ]] ; then
  		continue
	fi
	echo "Found $line node to be online again, undraining..."
	$KUBECTL uncordon $line
done <<< "$READY_NODES"

although this script should never be needed, the whole point of Kubernetes is to have ability to self healing

kamilgregorczyk · 2020-01-05T10:11:06Z

I found that when you delete a NotReady node it will actually reassign pods but worker gets added to the cluster only after k3s-agent service is rebooted

serverbaboon · 2020-01-06T16:30:11Z

I powered off a worker node (worker2) on my 3 node Raspberry PI4 cluster running Rook/Ceph some 3.5 hours ago and my cluster still has not really recovered. If we overlook the the Wordpress failure due to the fact that the the new instance cannot bind to the pvc because still thinks there is a claim from the terminating instance on the powered off node the k3s provisioned traefik lb instance is still listed as terminating and hanging there.

The things that have recovered are the ones (mostly rook) that do not have a pvc so even though the instances on the failed node are still listed as terminating it does not stop the new instances coming up.

Am I missing something here regarding Kunernetes node failure.

NAMESPACE                 NAME                                                    READY   STATUS              RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
kube-system               pod/helm-install-traefik-2zd8t                          0/1     Completed           0          11d     10.42.0.3        master    <none>           <none>
kubernetes-dashboard      pod/kubernetes-dashboard-544f4d6b8c-4bmbm               1/1     Running             1          2d      10.42.1.127      worker1   <none>           <none>
kubernetes-dashboard      pod/dashboard-metrics-scraper-744c77948-n2z5w           1/1     Running             1          2d      10.42.1.126      worker1   <none>           <none>
kube-system               pod/svclb-traefik-zq5sw                                 3/3     Running             30         11d     10.42.1.128      worker1   <none>           <none>
cert-manager              pod/cert-manager-5c47f46f57-ww4ql                       1/1     Running             1          45h     10.42.0.114      master    <none>           <none>
kube-system               pod/metrics-server-6d684c7b5-pgmtf                      1/1     Running             9          11d     10.42.0.117      master    <none>           <none>
kube-system               pod/local-path-provisioner-58fb86bdfd-xxkr9             1/1     Running             9          11d     10.42.0.118      master    <none>           <none>
kube-system               pod/svclb-traefik-q6tx6                                 3/3     Running             27         11d     10.42.0.115      master    <none>           <none>
cert-manager              pod/cert-manager-webhook-547567b88f-4nhx9               1/1     Running             1          45h     10.42.0.112      master    <none>           <none>
kube-system               pod/coredns-d798c9dd-b5h2l                              1/1     Running             9          11d     10.42.0.119      master    <none>           <none>
kube-system               pod/traefik-65bccdc4bd-2qglj                            1/1     Running             9          11d     10.42.0.116      master    <none>           <none>
rook-ceph                 pod/rook-discover-dthqw                                 1/1     Running             0          17h     10.42.0.120      master    <none>           <none>
rook-ceph                 pod/rook-discover-jb5gm                                 1/1     Running             0          17h     10.42.1.129      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-fhct7                               1/1     Running             0          17h     192.168.10.107   worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-wkl5s                               1/1     Running             0          17h     192.168.10.102   master    <none>           <none>
rook-ceph                 pod/rook-ceph-mon-a-7987b7749c-dqhv9                    1/1     Running             0          17h     10.42.1.132      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-c-59d7b8fb4d-7sqjj                    1/1     Running             0          17h     10.42.0.122      master    <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker1-6bbbbf6696-zxzqc   1/1     Running             0          17h     10.42.1.133      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-master-8cf749cdc-zw6ph     1/1     Running             0          17h     10.42.0.123      master    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-1-dbb578859-6rv64                     1/1     Running             0          17h     10.42.1.135      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-2-6c7d9966cd-56ggs                    1/1     Running             0          17h     10.42.0.125      master    <none>           <none>
rook-ceph                 pod/rook-ceph-tools-57d8bd875b-nzmdh                    1/1     Running             0          17h     192.168.10.107   worker1   <none>           <none>
default                   pod/adminer-69bcfb4764-bngsb                            1/1     Running             0          15h     10.42.0.129      master    <none>           <none>
rook-cockroachdb-system   pod/rook-cockroachdb-operator-784f89dcc5-hgzq7          1/1     Running             0          5h59m   10.42.0.130      master    <none>           <none>
default                   pod/mariadb-0                                           1/1     Running             0          4h4m    10.42.1.143      worker1   <none>           <none>
kube-system               pod/svclb-traefik-lxfjb                                 3/3     Running             21         10d     10.42.2.117      worker2   <none>           <none>
rook-ceph                 pod/rook-discover-grnvm                                 1/1     Running             0          17h     10.42.2.119      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-5nz5d                               1/1     Running             0          17h     192.168.10.95    worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mgr-a-7f65b8f79f-kqzvw                    1/1     Terminating         2          17h     10.42.2.122      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mgr-a-7f65b8f79f-p7vrh                    1/1     Running             0          3h32m   10.42.1.144      worker1   <none>           <none>
default                   pod/wordpress-6c7c6fcccf-8hsvc                          1/1     Terminating         0          4h8m    10.42.2.134      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-0-6786789854-6qzd5                    1/1     Terminating         0          17h     10.42.2.125      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-b-565bc66f97-64q84                    1/1     Terminating         0          17h     10.42.2.121      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker2-67895bf8df-f8cqr   1/1     Terminating         0          17h     10.42.2.126      worker2   <none>           <none>
cert-manager              pod/cert-manager-cainjector-6659d6844d-krnhk            1/1     Terminating         2          45h     10.42.2.116      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-operator-6d794bf987-plntb                 1/1     Terminating         0          17h     10.42.2.118      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-b-565bc66f97-gs8h5                    0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-0-6786789854-6v765                    0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
default                   pod/wordpress-6c7c6fcccf-8mhdd                          0/1     ContainerCreating   0          3h27m   <none>           worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker2-67895bf8df-5sksv   0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
rook-ceph                 pod/rook-ceph-operator-6d794bf987-bq6zm                 1/1     Running             0          3h27m   10.42.0.133      master    <none>           <none>
cert-manager              pod/cert-manager-cainjector-6659d6844d-7p7p5            1/1     Running             0          3h27m   10.42.1.145      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-prepare-master-bphzw                  0/1     Completed           0          3h5m    10.42.0.135      master    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-prepare-worker1-mdhmt                 0/1     Completed           0          3h5m    10.42.1.146      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-d-canary-666965574c-62b2f             0/1     Pending             0          15m     <none>           <none>    <none>           <none>

NAMESPACE   NAME           STATUS     ROLES    AGE   VERSION         INTERNAL-IP      EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION      CONTAINER-RUNTIME
            node/worker2   NotReady   <none>   10d   v1.16.3-k3s.2   192.168.15.9    <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5
            node/master    Ready      master   11d   v1.16.3-k3s.2   192.168.15.10   <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5
            node/worker1   Ready      <none>   11d   v1.16.3-k3s.2   192.168.15.11   <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5

serverbaboon · 2020-01-06T16:38:07Z

So powering up the 'failed' node allowed all the 'terminating' instances to finally end, the Rook config sorted itself out and my Wordpress instance finally came back along with Certman as the pvc (on work press) finally was released,

kamilgregorczyk · 2020-01-06T17:14:11Z

I learned that there's a difference between having a node in NotReady state and deleting the node. When your node goes into NotReady state then Kubernetes will not reschedule running pods to other ones as Kubernetes cannot distinguish between node restart, network error or kubelet error. Kubernetes will reschedule pods only when it's sure that they are not running and just because node is in NotReady state does not mean that pods are not running, they might be running but just the fact that Kubernetes cannot communicate with kubelet does not mean that they are not running :/ It's really a bummer for me as

There should be a deadline like if a node is NotReady for 5 minutes then it should drain it with force, no matter if something might be running or not
Pods that are potentially running on NotReady notes should be marked somehow, definitely not shown as 1/1 Running via kubectl

Although that's just my point of view, it's really weird that k3s on it's own does not seem to support --pod-eviction-timeout flag which is 5 minutes by default

The script that I published cordons the faulty nodes, drains them and then eventually deletes them, it will uncordon the node once it's in Ready state. K3s seems to be rejoining the master only when it restarts though

erikwilson · 2020-01-06T17:37:15Z

Please see https://kubernetes.io/docs/concepts/architecture/nodes/, from that link:

In versions of Kubernetes prior to 1.5, the node controller would force delete these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is confirmed that they have stopped running in the cluster. You can see the pods that might be running on an unreachable node as being in the Terminating or Unknown state. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on the node to be deleted from the apiserver, and frees up their names.

So pods stuck in a Terminating state but with a duplicate running on another node look to be expected. The --pod-eviction-timeout flag should be able to be set like:
k3s server --kube-controller-manager-arg pod-eviction-timeout=1m.

The key in the original issue is, "Controller detected that all Nodes are not-Ready. Entering master disruption mode.", looks to be related to kubernetes/kubernetes#42733. If all of the nodes become Not Ready the controller manager may refuse to evict.

kamilgregorczyk · 2020-01-06T17:47:28Z

@erikwilson in my case none of the pods was in Terminating/Unknown state (it was the same when only one node was NotReady) and that issue was fixed? 🤔 Will set that --kube-controller-manager-arg pod-eviction-timeout=1m flag and see what happens

erikwilson · 2020-01-06T17:49:21Z

It looks like the expected behavior, also see from that docs link:

The corner case is when all zones are completely unhealthy (i.e. there are no healthy nodes in the cluster). In such case, the node controller assumes that there’s some problem with master connectivity and stops all evictions until some connectivity is restored.

serverbaboon · 2020-01-06T18:04:58Z

@ericwilson

Ok thanks, that would tie in with last time I tried this as it was on an earlier version of Kubernetes and I was not aware of that change, also I think I have done this on a Rancher managed cluster with some node management options set so never had an issue.

izeau · 2020-04-29T10:04:31Z

Hi. I’m experiencing the same issue and mitigated it with the following script in my launch template user data:

kubectl get nodes |
  awk -v "host=$(hostname)" '$1 != host && $2 == "NotReady" { print $1 }' |
  xargs --no-run-if-empty kubectl delete node

So when one node goes down, the autoscaling group creates a new instance that will run the above script when booting.

I advise you to triple check that hostname returns the correct hostname for your nodes, otherwise you risk deleting the current node...

The node draining was not working and getting stuck forever since the target node was dead. So much for HA!

onedr0p · 2021-08-23T14:23:30Z

I am seeing this issue with using kube-vip in a daemonset, more information about my issue is here.

k3s version: v1.21.4+k3s1
Ubuntu version: 21.04

My masters config:

cluster-init: true
cluster-cidr: 10.69.0.0/16
disable:
- flannel
- traefik
- servicelb
- metrics-server
- local-storage
disable-cloud-controller: true
disable-network-policy: true
docker: false
flannel-backend: none
kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
node-ip: 192.168.42.10
service-cidr: 10.96.0.0/16
tls-san:
- 192.168.69.5
write-kubeconfig-mode: '644'
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

My worker nodes:

kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
- "feature-gates=MixedProtocolLBService=true"
node-ip: 192.168.42.13

I can see the taints were added to my k8s-0 node but the pods are not being evicted:

ubuntu@k8s-1:~$ sudo k3s kubectl get ds/kube-vip -n kube-system -o yaml
...
  taints:
  - effect: NoSchedule
    key: node.kubernetes.io/unreachable
    timeAdded: "2021-08-23T13:48:30Z"
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    timeAdded: "2021-08-23T13:48:36Z"
...

ubuntu@k8s-1:~$ sudo k3s kubectl get nodes
NAME    STATUS     ROLES                       AGE   VERSION
k8s-0   NotReady   control-plane,etcd,master   65d   v1.21.4+k3s1
k8s-1   Ready      control-plane,etcd,master   65d   v1.21.4+k3s1
k8s-2   Ready      control-plane,etcd,master   65d   v1.21.4+k3s1
k8s-3   Ready      worker                      65d   v1.21.4+k3s1
k8s-4   Ready      worker                      65d   v1.21.4+k3s1
k8s-5   Ready      worker                      65d   v1.21.4+k3s1

ubuntu@k8s-1:~$ sudo k3s kubectl get po -n kube-system -l "app.kubernetes.io/instance=kube-vip" -o wide
kube-vip-jk96t                                  1/1     Running     4          30d     192.168.42.12   k8s-2   <none>           <none>
kube-vip-kdg8x                                  1/1     Running     4          30d     192.168.42.11   k8s-1   <none>           <none>
kube-vip-r9vhx                                  1/1     Running     5          30d     192.168.42.10   k8s-0   <none>           <none>

brandond · 2021-08-26T03:34:40Z

Whats managing those pods? Daemonset/deployment/etc? Whatever's going on here is core Kubernetes behavior; I suspect it's just not doing what you expected.

piellick · 2021-10-04T07:08:30Z

Hi,
a workaround to use pod-eviction-timeout on K3s 1.21.4 ?

Cryingmouse · 2021-12-10T04:23:59Z

@jawabuu Is there any document I can refer to about the arguments mentioned in your notes?

Hey @brandond @erikwilson I'm able to reproduce this consistently in v1.20.4+k3s1 Start k3s with the following flags (any number of nodes)
 "--kubelet-arg 'node-status-update-frequency=4s'",
    "--kube-controller-manager-arg 'node-monitor-period=2s'",
    "--kube-controller-manager-arg 'node-monitor-grace-period=16s'",
    "--kube-apiserver-arg 'default-not-ready-toleration-seconds=20'",
    "--kube-apiserver-arg 'default-unreachable-toleration-seconds=20'"
Power off a node, it is marked as NotReady as expected Wait for pods on that node to be rescheduled. This does not happen. Pods stay in Running state indefinitely.

Tested v1.21.1+k3s1 and it works as expected. For anyone coming across this please not that pod-eviction-timeout is not used post 1.13

bufo333 · 2022-01-02T11:48:59Z

@jawabuu Is there any document I can refer to about the arguments mentioned in your notes?
Hey @brandond @erikwilson I'm able to reproduce this consistently in v1.20.4+k3s1 Start k3s with the following flags (any number of nodes)
 "--kubelet-arg 'node-status-update-frequency=4s'",
    "--kube-controller-manager-arg 'node-monitor-period=2s'",
    "--kube-controller-manager-arg 'node-monitor-grace-period=16s'",
    "--kube-apiserver-arg 'default-not-ready-toleration-seconds=20'",
    "--kube-apiserver-arg 'default-unreachable-toleration-seconds=20'"
Power off a node, it is marked as NotReady as expected Wait for pods on that node to be rescheduled. This does not happen. Pods stay in Running state indefinitely.
Tested v1.21.1+k3s1 and it works as expected. For anyone coming across this please not that pod-eviction-timeout is not used post 1.13

Any updates? I am experiencing the same behavior.

brandond · 2022-01-04T00:27:46Z

This would be the responsibility of the Kubernetes controller-manager. Can you show the output of kubectl get node,lease -n kube-system -o wide ?

ccwalterhk · 2022-02-11T14:33:41Z

Hi, may I check any solution for this problem? I am using v1.21.4. I also see the problem.

NAME         STATUS   ROLES                  AGE     VERSION
k3-slave3    Ready    <none>                 118d    v1.21.5+k3s2
k3s-slave2   Ready    <none>                 139d    v1.21.4+k3s1
k3s-slave4   Ready    <none>                 6d23h   v1.22.6+k3s1
k3-master    Ready    control-plane,master   139d    v1.21.4+k3s1
k3s-slave1   Ready    <none>                 139d    v1.21.4+k3s1

brandond · 2022-02-11T17:59:56Z

@cwalterhk you appear to have an agent that is running a newer version of Kubernetes than the server. This is not supported; please upgrade your servers if you are going to have agents running 1.22

ccwalterhk · 2022-02-12T06:31:59Z

I just created a new cluster using the latest version. However, I still see the same problem. Even s1 is not available, pods does not restart to other nodes.

walter@k3s-m1-mark3:~$ k get node -o wide
NAME           STATUS     ROLES                  AGE    VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k3s-s1-mark3   NotReady   <none>                 169m   v1.22.6+k3s1   192.168.1.91   <none>        Ubuntu 20.04.3 LTS   5.4.0-99-generic   containerd://1.5.9-k3s1
k3s-m1-mark3   Ready      control-plane,master   171m   v1.22.6+k3s1   192.168.1.90   <none>        Ubuntu 20.04.3 LTS   5.4.0-99-generic   containerd://1.5.9-k3s1
k3s-s2-mark3   Ready      <none>                 169m   v1.22.6+k3s1   192.168.1.92   <none>        Ubuntu 20.04.3 LTS   5.4.0-99-generic   containerd://1.5.9-k3s1
walter@k3s-m1-mark3:~$ k get pod -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
hello-world-7884c6997d-h9nwx   1/1     Running   0          6m42s   10.42.0.10   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-vxlsl   1/1     Running   0          6m42s   10.42.0.9    k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-gbx8x   1/1     Running   0          6m42s   10.42.0.11   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-cdfp8   1/1     Running   0          6m42s   10.42.2.6    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-2ksws   1/1     Running   0          6m42s   10.42.2.4    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-2gflm   1/1     Running   0          6m42s   10.42.2.5    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-hsnct   1/1     Running   0          6m42s   10.42.1.6    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-5xhf7   1/1     Running   0          6m42s   10.42.1.4    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-gzbvq   1/1     Running   0          6m42s   10.42.1.5    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-gh5qc   1/1     Running   0          6m42s   10.42.1.3    k3s-s1-mark3   <none>           <none>
walter@k3s-m1-mark3:~$

ccwalterhk · 2022-02-12T06:36:02Z

After waiting for about 8 minutes, it is terminating. Thank you very much. Can I check how to detect failure faster and restart the pods in another nodes?

walter@k3s-m1-mark3:~$ k get pod -o wide
NAME                           READY   STATUS        RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
hello-world-7884c6997d-h9nwx   1/1     Running       0          10m     10.42.0.10   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-vxlsl   1/1     Running       0          10m     10.42.0.9    k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-gbx8x   1/1     Running       0          10m     10.42.0.11   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-cdfp8   1/1     Running       0          10m     10.42.2.6    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-2ksws   1/1     Running       0          10m     10.42.2.4    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-2gflm   1/1     Running       0          10m     10.42.2.5    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-gzbvq   1/1     Terminating   0          10m     10.42.1.5    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-5xhf7   1/1     Terminating   0          10m     10.42.1.4    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-gh5qc   1/1     Terminating   0          10m     10.42.1.3    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-hsnct   1/1     Terminating   0          10m     10.42.1.6    k3s-s1-mark3   <none>           <none>
hello-world-7884c6997d-wqzfq   1/1     Running       0          2m47s   10.42.0.12   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-6bsx4   1/1     Running       0          2m47s   10.42.0.13   k3s-m1-mark3   <none>           <none>
hello-world-7884c6997d-njgdd   1/1     Running       0          2m47s   10.42.2.8    k3s-s2-mark3   <none>           <none>
hello-world-7884c6997d-9w8vh   1/1     Running       0          2m47s   10.42.2.7    k3s-s2-mark3   <none>           <none>
walter@k3s-m1-mark3:~$

dfoxg · 2022-02-12T17:08:38Z

With these options I was able to reduce your mentioned 8 minutes to ~20 seconds:

--kubelet-arg "node-status-update-frequency=4s" \
--kube-controller-manager-arg "node-monitor-period=4s" \
--kube-controller-manager-arg "node-monitor-grace-period=16s" \
--kube-controller-manager-arg "pod-eviction-timeout=20s" \
--kube-apiserver-arg "default-not-ready-toleration-seconds=20" \
--kube-apiserver-arg "default-unreachable-toleration-seconds=20" \

janvanveldhuizen · 2022-03-30T08:54:58Z

With these options I was able to reduce your mentioned 8 minutes to ~20 seconds:

And where did you put these parameters? On the master node(s)? Or on the workers as well?

helletheone · 2022-07-13T17:05:00Z

same problem here:

k3s-agent-large-ilg Ready 17m v1.23.8+k3s2
k3s-agent-large-kmf Ready 6d8h v1.23.8+k3s2
k3s-agent-small-uui Ready 32m v1.23.8+k3s2
k3s-control-plane-fsn1-dke Ready control-plane,etcd,master 6d8h v1.23.8+k3s2

Jeffote · 2022-08-24T04:54:29Z

With these options I was able to reduce your mentioned 8 minutes to ~20 seconds:

--kubelet-arg "node-status-update-frequency=4s" \
--kube-controller-manager-arg "node-monitor-period=4s" \
--kube-controller-manager-arg "node-monitor-grace-period=16s" \
--kube-controller-manager-arg "pod-eviction-timeout=20s" \
--kube-apiserver-arg "default-not-ready-toleration-seconds=20" \
--kube-apiserver-arg "default-unreachable-toleration-seconds=20" \

I had the same problem.
After I added those to the systemctl service, those setting are applied to every new pod. So i had to terminate the old ones by hand, and i worked like a charm on the new one.
My k3s version is v1.24.3+k3s1

timowevel1 · 2022-09-10T17:48:28Z

--kubelet-arg

Hey, where exactly did you pass these arguments?

Jeffote · 2022-09-12T04:30:58Z

to the ExecStart in the systemd service:
ExecStart=/usr/local/bin/k3s server --https-listen-port '7443' '--kubelet-arg' "node-status-update-frequency=4s" etc

caroline-suse-rancher · 2022-11-30T17:27:25Z

Closing as this appears to be expected upstream behavior with a valid workaround

framctr · 2023-09-21T06:54:12Z

With these options I was able to reduce your mentioned 8 minutes to ~20 seconds:

--kubelet-arg "node-status-update-frequency=4s" \
--kube-controller-manager-arg "node-monitor-period=4s" \
--kube-controller-manager-arg "node-monitor-grace-period=16s" \
--kube-controller-manager-arg "pod-eviction-timeout=20s" \
--kube-apiserver-arg "default-not-ready-toleration-seconds=20" \
--kube-apiserver-arg "default-unreachable-toleration-seconds=20" \

Unfortunately, many of these parameters are removed from Kubernetes v1.27. See for example the node-status-update-frequency argument on the official Kubernetes docs.

brandond · 2023-09-21T07:29:38Z

They have not been removed. They've been listed as depreciated for ages but I am not aware of any actual work to remove them and force use of a config file.

anshuman852 · 2023-09-26T06:50:12Z

I have a deployment with a pvc attached in mode ReadWriteOnce, So to test this, i turned off the k3s service on one of the nodes, after waiting some time, the pods did go to terminating state, but now the deployment with pvc wont start because of the volume is still attached to the older pod

is it possible to delete or evict the pods instead of them being stuck in terminating stage?

camaeel · 2023-09-26T07:20:48Z

@anshuman852 I think this "terminating" state means it tries to perform eviction or delete. But it is not able - either because kubelet is not responding or because there is finalizer on the pod. You can try checking pod manifests and logs of kube-controller-manager what is happening and what is the issue.

davidnuzik added this to the Backlog milestone Jan 14, 2020

davidnuzik added the [zube]: To Triage label Jan 14, 2020

billimek mentioned this issue Jan 17, 2020

k3s not recovering when a node is lost billimek/k8s-gitops#111

Closed

davidnuzik added the [zube]: Unscheduled label May 1, 2020

zube bot added [zube]: To Triage and removed [zube]: To Triage labels May 1, 2020

davidnuzik added the [zube]: Unscheduled label May 1, 2020

zube bot added [zube]: To Triage and removed [zube]: To Triage labels May 1, 2020

onedr0p mentioned this issue Aug 23, 2021

Disabling the primary k3s master node results in kube api becoming unreachable. kube-vip/kube-vip#260

Closed

zifeo mentioned this issue Oct 14, 2021

Unregister node from RKE2 after agent deletion remche/terraform-openstack-rke2#12

Closed

dereknola removed the [zube]: Unscheduled label Jun 23, 2022

brandond mentioned this issue Sep 26, 2022

Random K3s node gets marked as "NotReady" at random intervals for a random amount of time. #6178

Closed

brandond mentioned this issue Oct 10, 2022

Node crash and all pod on it still running ? #6239

Closed

caroline-suse-rancher modified the milestones: Not Scheduled, Backlog Nov 21, 2022

caroline-suse-rancher closed this as completed Nov 30, 2022

brandond mentioned this issue Mar 8, 2024

Some pods are falsely evicted from the stopped node #9703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K3S Claims that pods are running but hosts (nodes) are dead #1264

K3S Claims that pods are running but hosts (nodes) are dead #1264

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 5, 2020

serverbaboon commented Jan 6, 2020

serverbaboon commented Jan 6, 2020

kamilgregorczyk commented Jan 6, 2020 •

edited

Loading

erikwilson commented Jan 6, 2020

kamilgregorczyk commented Jan 6, 2020 •

edited

Loading

erikwilson commented Jan 6, 2020

serverbaboon commented Jan 6, 2020 •

edited

Loading

izeau commented Apr 29, 2020 •

edited

Loading

onedr0p commented Aug 23, 2021 •

edited

Loading

brandond commented Aug 26, 2021 •

edited

Loading

piellick commented Oct 4, 2021

Cryingmouse commented Dec 10, 2021 •

edited

Loading

bufo333 commented Jan 2, 2022

brandond commented Jan 4, 2022 •

edited

Loading

ccwalterhk commented Feb 11, 2022

brandond commented Feb 11, 2022

ccwalterhk commented Feb 12, 2022

ccwalterhk commented Feb 12, 2022 •

edited

Loading

dfoxg commented Feb 12, 2022

janvanveldhuizen commented Mar 30, 2022

helletheone commented Jul 13, 2022 •

edited

Loading

Jeffote commented Aug 24, 2022 •

edited

Loading

timowevel1 commented Sep 10, 2022

Jeffote commented Sep 12, 2022 •

edited

Loading

caroline-suse-rancher commented Nov 30, 2022

framctr commented Sep 21, 2023 •

edited

Loading

brandond commented Sep 21, 2023

anshuman852 commented Sep 26, 2023

camaeel commented Sep 26, 2023

K3S Claims that pods are running but hosts (nodes) are dead #1264

K3S Claims that pods are running but hosts (nodes) are dead #1264

Comments

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 2, 2020

kamilgregorczyk commented Jan 5, 2020

serverbaboon commented Jan 6, 2020

serverbaboon commented Jan 6, 2020

kamilgregorczyk commented Jan 6, 2020 • edited Loading

erikwilson commented Jan 6, 2020

kamilgregorczyk commented Jan 6, 2020 • edited Loading

erikwilson commented Jan 6, 2020

serverbaboon commented Jan 6, 2020 • edited Loading

izeau commented Apr 29, 2020 • edited Loading

onedr0p commented Aug 23, 2021 • edited Loading

brandond commented Aug 26, 2021 • edited Loading

piellick commented Oct 4, 2021

Cryingmouse commented Dec 10, 2021 • edited Loading

bufo333 commented Jan 2, 2022

brandond commented Jan 4, 2022 • edited Loading

ccwalterhk commented Feb 11, 2022

brandond commented Feb 11, 2022

ccwalterhk commented Feb 12, 2022

ccwalterhk commented Feb 12, 2022 • edited Loading

dfoxg commented Feb 12, 2022

janvanveldhuizen commented Mar 30, 2022

helletheone commented Jul 13, 2022 • edited Loading

Jeffote commented Aug 24, 2022 • edited Loading

timowevel1 commented Sep 10, 2022

Jeffote commented Sep 12, 2022 • edited Loading

caroline-suse-rancher commented Nov 30, 2022

framctr commented Sep 21, 2023 • edited Loading

brandond commented Sep 21, 2023

anshuman852 commented Sep 26, 2023

camaeel commented Sep 26, 2023

kamilgregorczyk commented Jan 6, 2020 •

edited

Loading

kamilgregorczyk commented Jan 6, 2020 •

edited

Loading

serverbaboon commented Jan 6, 2020 •

edited

Loading

izeau commented Apr 29, 2020 •

edited

Loading

onedr0p commented Aug 23, 2021 •

edited

Loading

brandond commented Aug 26, 2021 •

edited

Loading

Cryingmouse commented Dec 10, 2021 •

edited

Loading

brandond commented Jan 4, 2022 •

edited

Loading

ccwalterhk commented Feb 12, 2022 •

edited

Loading

helletheone commented Jul 13, 2022 •

edited

Loading

Jeffote commented Aug 24, 2022 •

edited

Loading

Jeffote commented Sep 12, 2022 •

edited

Loading

framctr commented Sep 21, 2023 •

edited

Loading