Pods are not moved when Node in NotReady state #55713

marczahn · 2017-11-14T15:33:42Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:

To simulate a crashed worker node I stopped the kubelet service on that node (Debian Jessie).
The node got into unknow state (resp. NotReady) as expected:

NAME                STATUS     ROLES         AGE       VERSION
lls-lon-db-master   Ready      master,node   7d        v1.8.0+coreos.0
lls-lon-testing01   NotReady   node          6d        v1.8.0+coreos.0

The pods running on the lls-lon-testing01 stay declared as running:

test-core-services   infrastructure-service-deployment-5cb868f49-94gh4                 1/1       Running   0          4h        10.233.96.204   lls-lon-testing01

But is declared as ready: false on describe:

Name:           infrastructure-service-deployment-5cb868f49-94gh4
Namespace:      test-core-services
Node:           lls-lon-testing01/10.100.0.5
Start Time:     Tue, 14 Nov 2017 10:31:41 +0000
Labels:         app=infrastructure-service
                pod-template-hash=176424905
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"test-core-services","name":"infrastructure-service-deployment-5cb868f49","uid":"d...
Status:         Running
IP:             10.233.96.204
Created By:     ReplicaSet/infrastructure-service-deployment-5cb868f49
Controlled By:  ReplicaSet/infrastructure-service-deployment-5cb868f49
Containers:
  infrastructure-service:
    Container ID:  docker://3b750d7cad0c24386cade1e4fedac24ab2621f4991d3302d15c30d9e68749b7b
    Image:         index.docker.io/looplinesystems/infrastructure-service:latest
    Image ID:      docker-pullable://looplinesystems/infrastructure-service@sha256:632591a86ca67f3e19718727e717b07da3b5c79251ce9deede969588b6958272
    Ports:         7110/TCP, 7111/TCP, 7112/TCP
    Command:
      /infrastructurectl
      daemon
    State:          Running
      Started:      Tue, 14 Nov 2017 10:31:48 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      infrastructure-service-config  Secret  Optional: false
    Environment:                     <none>
    Mounts:
      /var/log/services from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hm2hs (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  logs:
    Type:  HostPath (bare host directory volume)
    Path:  /var/log/services
  default-token-hm2hs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hm2hs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:          <none>

What you expected to happen:
I excpected the pods on the "crashed" node to be moved to the remaining node.

How to reproduce it (as minimally and precisely as possible):
In my situtation: Having a node (A)and a master+node (B) installed with Kubespray. Running at least one pod on each node. Stopping Kubelet on A and wait

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0+coreos.0", GitCommit:"a65654ef5b593ac19fbfaf33b1a1873c0320353b", GitTreeState:"clean", BuildDate:"2017-09-29T21:51:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0+coreos.0", GitCommit:"a65654ef5b593ac19fbfaf33b1a1873c0320353b", GitTreeState:"clean", BuildDate:"2017-09-29T21:51:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:

Intel(R) Xeon(R) CPU E5-2670
4 Cores
4 GB RAM

OS (e.g. from /etc/os-release): Debian GNU/Linux 8 (jessie)
Kernel (e.g. uname -a): Linux lls-lon-db-master 3.16.0-4-amd64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux
Install tools: Kubespray
Others: -

The text was updated successfully, but these errors were encountered:

marczahn · 2017-11-14T15:37:41Z

/sig node

jhorwit2 · 2017-12-11T01:01:18Z

@marczahn how long did you wait after turning off the Kubelet? By default pods won't be moved for 5m minutes which is configurable via the following flag on the controller manager.

--pod-eviction-timeout duration                                     The grace period for deleting pods on failed nodes. (default 5m0s)

This allows for cases like a node reboot to not reschedule pods unnecessarily.

marczahn · 2017-12-11T12:41:27Z

I know this parameter and Iwas waiting way longer for than the eviction-timeout. It definitely happened nothing.

jamesgetx · 2017-12-21T04:16:49Z

We encountered the same problem. Our k8s version is 1.8.4, docker version is 1.12.4

marczahn · 2018-01-18T12:28:33Z

I wrote a script, that can be run as a cronjob:

#!/bin/sh

KUBECTL="/usr/local/bin/kubectl"

# Get only nodes which are not drained yet
NOT_READY_NODES=$($KUBECTL get nodes | grep -P 'NotReady(?!,SchedulingDisabled)' | awk '{print $1}' | xargs echo)
# Get only nodes which are still drained
READY_NODES=$($KUBECTL get nodes | grep '\sReady,SchedulingDisabled' | awk '{print $1}' | xargs echo)

echo "Unready nodes that are undrained: $NOT_READY_NODES"
echo "Ready nodes: $READY_NODES"


for node in $NOT_READY_NODES; do
  echo "Node $node not drained yet, draining..."
  $KUBECTL drain --ignore-daemonsets --force $node
  echo "Done"
done;

for node in $READY_NODES; do
  echo "Node $node still drained, uncordoning..."
  $KUBECTL uncordon $node
  echo "Done"
done;

It is actually checking if a node is down and not drained and vice versa. Hope it helps

erkules · 2018-03-06T19:07:20Z

Got the same issue 1.9.3 . No eviction after 30 minutes.

$ kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

albertvaka · 2018-03-14T14:25:39Z

+1. Is this the intended behavior? If it is, then load balancers should keep serving traffic to those pods (now they do not).

mypine · 2018-05-17T10:16:06Z

we encounter the same problem on 1.6.3

trajakovic · 2018-05-30T13:45:05Z

got same problem as @erkules

kubectl version                                                                                                                                       
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-05-12T04:12:12Z", GoVersion:"go1.9.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

tried to drain master node (in setup of 3 nodes), but command keeps hanging.

if it helps, cluster is installed on AWS using kops 1.9.x, 3 masters on separate AZs (m4.large instances)

kubectl logs -n kube-system -f kube-controller-manager-ip-<redacted>


I0529 21:59:35.448967       1 flags.go:52] FLAG: --address="0.0.0.0"
I0529 21:59:35.449022       1 flags.go:52] FLAG: --allocate-node-cidrs="true"
I0529 21:59:35.449096       1 flags.go:52] FLAG: --allow-untagged-cloud="false"
I0529 21:59:35.449106       1 flags.go:52] FLAG: --allow-verification-with-non-compliant-keys="false"
I0529 21:59:35.449116       1 flags.go:52] FLAG: --alsologtostderr="false"
I0529 21:59:35.449122       1 flags.go:52] FLAG: --attach-detach-reconcile-sync-period="1m0s"
I0529 21:59:35.449157       1 flags.go:52] FLAG: --cidr-allocator-type="RangeAllocator"
I0529 21:59:35.449191       1 flags.go:52] FLAG: --cloud-config=""
I0529 21:59:35.449216       1 flags.go:52] FLAG: --cloud-provider="aws"
I0529 21:59:35.449229       1 flags.go:52] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,35.191.0.0/16,209.85.152.0/22,209.85.204.0/22"
I0529 21:59:35.449276       1 flags.go:52] FLAG: --cluster-cidr="100.96.0.0/11"
I0529 21:59:35.449283       1 flags.go:52] FLAG: --cluster-name="staging.kubernetes.sa.dev.superbet.k8s.local"
I0529 21:59:35.449290       1 flags.go:52] FLAG: --cluster-signing-cert-file="/srv/kubernetes/ca.crt"
I0529 21:59:35.449296       1 flags.go:52] FLAG: --cluster-signing-key-file="/srv/kubernetes/ca.key"
I0529 21:59:35.449302       1 flags.go:52] FLAG: --concurrent-deployment-syncs="5"
I0529 21:59:35.449313       1 flags.go:52] FLAG: --concurrent-endpoint-syncs="5"
I0529 21:59:35.449319       1 flags.go:52] FLAG: --concurrent-gc-syncs="20"
I0529 21:59:35.449325       1 flags.go:52] FLAG: --concurrent-namespace-syncs="10"
I0529 21:59:35.449331       1 flags.go:52] FLAG: --concurrent-rc-syncs="5"
I0529 21:59:35.449337       1 flags.go:52] FLAG: --concurrent-replicaset-syncs="5"
I0529 21:59:35.449343       1 flags.go:52] FLAG: --concurrent-resource-quota-syncs="5"
I0529 21:59:35.449349       1 flags.go:52] FLAG: --concurrent-service-syncs="1"
I0529 21:59:35.449355       1 flags.go:52] FLAG: --concurrent-serviceaccount-token-syncs="5"
I0529 21:59:35.449361       1 flags.go:52] FLAG: --configure-cloud-routes="true"
I0529 21:59:35.449367       1 flags.go:52] FLAG: --contention-profiling="false"
I0529 21:59:35.449373       1 flags.go:52] FLAG: --controller-start-interval="0s"
I0529 21:59:35.449379       1 flags.go:52] FLAG: --controllers="[*]"
I0529 21:59:35.449449       1 flags.go:52] FLAG: --deleting-pods-burst="0"
I0529 21:59:35.449456       1 flags.go:52] FLAG: --deleting-pods-qps="0.1"
I0529 21:59:35.449466       1 flags.go:52] FLAG: --deployment-controller-sync-period="30s"
I0529 21:59:35.449473       1 flags.go:52] FLAG: --disable-attach-detach-reconcile-sync="false"
I0529 21:59:35.449479       1 flags.go:52] FLAG: --enable-dynamic-provisioning="true"
I0529 21:59:35.449485       1 flags.go:52] FLAG: --enable-garbage-collector="true"
I0529 21:59:35.449491       1 flags.go:52] FLAG: --enable-hostpath-provisioner="false"
I0529 21:59:35.449529       1 flags.go:52] FLAG: --enable-taint-manager="true"
I0529 21:59:35.449536       1 flags.go:52] FLAG: --experimental-cluster-signing-duration="8760h0m0s"
I0529 21:59:35.449568       1 flags.go:52] FLAG: --feature-gates=""
I0529 21:59:35.449598       1 flags.go:52] FLAG: --flex-volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
I0529 21:59:35.449606       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-downscale-delay="5m0s"
I0529 21:59:35.449612       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-sync-period="30s"
I0529 21:59:35.449618       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-tolerance="0.1"
I0529 21:59:35.449628       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-upscale-delay="3m0s"
I0529 21:59:35.449634       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-use-rest-clients="true"
I0529 21:59:35.449640       1 flags.go:52] FLAG: --insecure-experimental-approve-all-kubelet-csrs-for-group=""
I0529 21:59:35.449646       1 flags.go:52] FLAG: --kube-api-burst="30"
I0529 21:59:35.449652       1 flags.go:52] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0529 21:59:35.449660       1 flags.go:52] FLAG: --kube-api-qps="20"
I0529 21:59:35.449667       1 flags.go:52] FLAG: --kubeconfig="/var/lib/kube-controller-manager/kubeconfig"
I0529 21:59:35.449674       1 flags.go:52] FLAG: --large-cluster-size-threshold="50"
I0529 21:59:35.449680       1 flags.go:52] FLAG: --leader-elect="true"
I0529 21:59:35.449686       1 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I0529 21:59:35.449692       1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I0529 21:59:35.449698       1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints"
I0529 21:59:35.449705       1 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I0529 21:59:35.449711       1 flags.go:52] FLAG: --log-backtrace-at=":0"
I0529 21:59:35.449721       1 flags.go:52] FLAG: --log-dir=""
I0529 21:59:35.449728       1 flags.go:52] FLAG: --log-flush-frequency="5s"
I0529 21:59:35.449735       1 flags.go:52] FLAG: --loglevel="1"
I0529 21:59:35.449741       1 flags.go:52] FLAG: --logtostderr="true"
I0529 21:59:35.449747       1 flags.go:52] FLAG: --master=""
I0529 21:59:35.449753       1 flags.go:52] FLAG: --min-resync-period="12h0m0s"
I0529 21:59:35.449759       1 flags.go:52] FLAG: --namespace-sync-period="5m0s"
I0529 21:59:35.449766       1 flags.go:52] FLAG: --node-cidr-mask-size="24"
I0529 21:59:35.449772       1 flags.go:52] FLAG: --node-eviction-rate="0.1"
I0529 21:59:35.449779       1 flags.go:52] FLAG: --node-monitor-grace-period="40s"
I0529 21:59:35.449786       1 flags.go:52] FLAG: --node-monitor-period="5s"
I0529 21:59:35.449792       1 flags.go:52] FLAG: --node-startup-grace-period="1m0s"
I0529 21:59:35.449826       1 flags.go:52] FLAG: --node-sync-period="0s"
I0529 21:59:35.449834       1 flags.go:52] FLAG: --pod-eviction-timeout="5m0s"
I0529 21:59:35.449867       1 flags.go:52] FLAG: --port="10252"
I0529 21:59:35.449877       1 flags.go:52] FLAG: --profiling="true"
I0529 21:59:35.449884       1 flags.go:52] FLAG: --pv-recycler-increment-timeout-nfs="30"
I0529 21:59:35.449917       1 flags.go:52] FLAG: --pv-recycler-minimum-timeout-hostpath="60"
I0529 21:59:35.449924       1 flags.go:52] FLAG: --pv-recycler-minimum-timeout-nfs="300"
I0529 21:59:35.449955       1 flags.go:52] FLAG: --pv-recycler-pod-template-filepath-hostpath=""
I0529 21:59:35.449962       1 flags.go:52] FLAG: --pv-recycler-pod-template-filepath-nfs=""
I0529 21:59:35.449985       1 flags.go:52] FLAG: --pv-recycler-timeout-increment-hostpath="30"
I0529 21:59:35.449992       1 flags.go:52] FLAG: --pvclaimbinder-sync-period="15s"
I0529 21:59:35.449998       1 flags.go:52] FLAG: --register-retry-count="10"
I0529 21:59:35.450004       1 flags.go:52] FLAG: --resource-quota-sync-period="5m0s"
I0529 21:59:35.450011       1 flags.go:52] FLAG: --root-ca-file="/srv/kubernetes/ca.crt"
I0529 21:59:35.450018       1 flags.go:52] FLAG: --route-reconciliation-period="10s"
I0529 21:59:35.450024       1 flags.go:52] FLAG: --secondary-node-eviction-rate="0.01"
I0529 21:59:35.450031       1 flags.go:52] FLAG: --service-account-private-key-file="/srv/kubernetes/server.key"
I0529 21:59:35.450039       1 flags.go:52] FLAG: --service-cluster-ip-range=""
I0529 21:59:35.450045       1 flags.go:52] FLAG: --service-sync-period="5m0s"
I0529 21:59:35.450051       1 flags.go:52] FLAG: --stderrthreshold="2"
I0529 21:59:35.450057       1 flags.go:52] FLAG: --terminated-pod-gc-threshold="12500"
I0529 21:59:35.450064       1 flags.go:52] FLAG: --unhealthy-zone-threshold="0.55"
I0529 21:59:35.450071       1 flags.go:52] FLAG: --use-service-account-credentials="true"
I0529 21:59:35.450077       1 flags.go:52] FLAG: --v="2"
I0529 21:59:35.450084       1 flags.go:52] FLAG: --version="false"
I0529 21:59:35.450095       1 flags.go:52] FLAG: --vmodule=""
I0529 21:59:35.450113       1 controllermanager.go:108] Version: v1.9.3

rchicoli · 2018-07-14T17:49:33Z

I am not quite sure, if this is related to this issue. Let me know, if I should create a new one.
After restarting the cluster, Kubernetes API is reporting wrong POD status.
As you can see all Nodes are offline (kubelet and docker are not running), so I expected at least an Unknown Pod status. Somehow it shows running, even after few minutes later.

root@kube-controller-1:~# kubectl get all
NAME                            READY     STATUS    RESTARTS   AGE
pod/webapper-856ff74c66-59b2t   1/1       Running   0          9h
pod/webapper-856ff74c66-qhlmb   1/1       Running   0          9h

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/kubernetes   ClusterIP   10.32.0.1     <none>        443/TCP    2d
service/webapper     ClusterIP   10.32.0.100   <none>        8080/TCP   6h

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/webapper   2         2         2            0           9h

NAME                                  DESIRED   CURRENT   READY     AGE
replicaset.apps/webapper-856ff74c66   2         2         0         9h

root@kube-controller-1:~# kubectl get nodes
NAME            STATUS     ROLES     AGE       VERSION
kube-worker-1   NotReady   <none>    2d        v1.11.0
kube-worker-2   NotReady   <none>    2d        v1.11.0

root@kube-controller-1:~# kubectl exec -ti webapper-856ff74c66-qhlmb sh
Error from server: error dialing backend: dial tcp 10.0.0.17:10250: connect: connection refused

I am using the latest kubernetes version

root@kube-controller-1:~# kube-apiserver --version
Kubernetes v1.11.0
root@kube-controller-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}

adampl · 2018-07-24T18:48:29Z

I have just run into this issue (v1.10.1). I suspect it has something to do with volumes not being detached/unmounted.

zeelichsheng · 2018-08-13T17:10:41Z

I encountered a similar issue. I experimenting with the kubernetes autoscaler. When I manually stop a node VM, the node itself goes into NotReady state. And after a while, the pod scheduled on the removed node goes to Unknown state.

At this point, Kubernetes behaves correctly by creating a new pod, and autoscaler creates a new node to schedule the new pod.

However, the removed pod gets stuck in Unknown state. The original node cannot be removed by autoscaler from Kubernetes because autoscaler still thinks there is load (i.e. the stuck pod) on the node.

NAME READY STATUS RESTARTS AGE
busybox-6b76d7d9c8-7xb48 0/1 Pending 0 2m
busybox-6b76d7d9c8-xlgpz 1/1 Unknown 0 11m

NAME STATUS ROLES AGE VERSION
master-5f517752-9b64-11e8-8caa-0612df8b7178 Ready master 4d v1.10.2
worker-a20f2c3e-9f19-11e8-a52a-0612df8b7178 NotReady worker 10m v1.10.2
worker-f2f997c8-9f1a-11e8-a52a-0612df8b7178 Ready worker 56s v1.10.2

This is part of the output of describing the NotReady worker node, which shows that Kubernetes still thinks the stuck pod is scheduled on this node:

Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits

default busybox-6b76d7d9c8-xlgpz 1 (50%) 1 (50%) 0 (0%) 0 (0%)

If we have to manually and forcefully remove the pod by using "kubectl delete pod --force --grade-period=0", it means that autoscaler will be affected and not correctly managing cluster resources without user interference.

adampl · 2018-08-13T17:25:40Z

@zeelichsheng check kubernetes/enhancements#551 and #58635.

huyqut · 2018-09-21T10:51:29Z

Hi, I have the same problem. No pods are evicted if a node is "NotReady" even after --pod-eviction-timeout set on kube-controller-manager. Are there any workarounds?

fejta-bot · 2018-12-20T11:26:33Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

hzxuzhonghu · 2018-12-21T01:22:17Z

/remove-lifecycle stale

jitendra1987 · 2019-01-29T11:41:45Z

I am also seeing this issue.
Cluster : 1 master, 1 worker
kubernetes=1.12.5, docker-ce=18.09.1
steps:

launched few test pods scheduled to worker.
Then Shutdown the worker node.
Observed that kubernetes marked worker as NotReady but pod eviction is not getting started even after 5 mins.

jtackaberry · 2019-02-02T22:28:10Z

Surely this is one of the first failure modes everyone tests? It's the first worker-related failure I tested while evaluating Kubernetes. I even gracefully shutdown the worker node and let all kube processes exit cleanly. IMO it very much violates the Principle of Least Astonishment that pods assigned to NotReady nodes remain in the Running state.

(1.13.3 with a single node test cluster.)

adampl · 2019-02-05T15:03:27Z

@jtackaberry It's not so simple. The cluster nodes need an external monitor or hypervisor to reliably determine whether the NotReady node is actually shut down, in order to take into account a possible split-brain scenario. In other words, you cannot assume that pods are not running just because the node is not responding. See: kubernetes/enhancements#719

huyqut · 2019-02-11T05:48:15Z

@jitendra1987 can you test a cluster with 1 master and 2 workers? I also tested with 1 master and 1 worker and pod eviction didn't happen. However, when there are 3 machines in the cluster, it happens normally.

ironreality · 2019-02-26T11:19:22Z

With 1 master + 2 nodes configuration the problem isn't reproducible.
With 1 master + 1 node - it occurs.
Tested it with kubeadm-installed cluster.
The testing platform: Virtualbox+ Ubuntu 16.04 + K8s 1.13.3 + Docker 18.09.2.

Craftoncu · 2021-05-12T21:39:12Z

Any fix yet? Annoying issue

inboxamitraj · 2021-05-13T23:27:59Z

Any fix yet? Annoying issue

fixed in 1.21.0 pods moved to healthy worker01 node from worker02 node which went down.
vagrant@master01:~$ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-deploy-686f764bd-4vsqh 1/1 Running 0 5m53s 10.32.0.3 worker01
test-deploy-686f764bd-5nvz6 1/1 Running 0 5m53s 10.32.0.4 worker01
test-deploy-686f764bd-c76v8 1/1 Terminating 1 21h 10.32.0.2 worker02
test-deploy-686f764bd-wqk6j 1/1 Terminating 1 21h 10.32.0.3 worker02

adampl · 2021-05-14T12:19:00Z

Apparently there is still some issue: #101674

victor-sudakov · 2021-05-19T09:38:18Z

I seem to have this issue with v1.21.0 on Debian10/amd64, a test cluster of 1 master and 3 worker nodes:

I create a pod without a nodeSelector, find the node it is running on, and shut down or poweroff its host emulating hardware failure or maintenance. I expect my pod to be recreated at another healthy node, but this never happens. I have to reapply the pod definition to make it run again and it says "pod XXX created.

Expected behavior: the pod should have been moved/recreated to some healthy node when the timeout expires.

hconnan · 2021-05-19T12:01:42Z

Hi guys, I got the same issue on a Kubernetes cluster v1.19.7 i.e. some nodes did not get the taint NoExecute as expected.
However, this has been fixed recently in:

v1.21.0: PR // CHANGELOG
v1.20.5: PR // CHANGELOG
v1.19.9: PR // CHANGELOG

Hope it's helpful!

adampl · 2021-05-21T17:49:02Z

@elChipardo Did you actually read the preceding comment of @victor-sudakov ? He mentions v1.21.0

antoinetran · 2021-06-08T14:26:29Z

I confirm the fix and validated with Kubernetes 1.20.6 (contained in Rke 1.2.8 / Rancher 2.5.8).

victor-sudakov · 2021-06-09T04:49:08Z

I confirm the fix and validated with Kubernetes 1.20.6 (contained in Rke 1.2.8 / Rancher 2.5.8).

What do you mean by "confirm the fix"? I've just checked, on Kubernetes v1.21.1/Debian10, when a Node is powered off or dies, its Pods are in Terminating status forever, and never get moved elsewhere. When the Node is back alive, its Pods are gone for good and have to be redeployed again.

antoinetran · 2021-06-09T15:38:27Z

I confirm the fix and validated with Kubernetes 1.20.6 (contained in Rke 1.2.8 / Rancher 2.5.8).

What do you mean by "confirm the fix"? I've just checked, on Kubernetes v1.21.1/Debian10, when a Node is powered off or dies, its Pods are in Terminating status forever, and never get moved elsewhere. When the Node is back alive, its Pods are gone for good and have to be redeployed again.

In Kubernetes 1.20.4: the shutdown of a node results in node being NotReady, but the pods hosted by the node runs like nothing happened. However doing logs or exec does not work (normal).
In Kubernetes 1.20.6: the shutdown of a node results, after the eviction timeout, of pods being in Terminating status, with pods being rescheduled in other nodes. The never ending Terminating seems normal to me.

However we noticed that pods from statefulset are not moved in another node, but are still in Terminating, while pods from deployments and jobs are also in Terminating but also rescheduled elsewhere. Maybe that is your case.

victor-sudakov · 2021-06-10T03:56:19Z

with pods being rescheduled in other nodes

I have never seen this happen unless the pods are part of a deployment. If you have created a pod ("kind: Pod", not "kind: Deployment") it never gets rescheduled. Maybe it's by design?

adampl · 2021-06-10T20:12:26Z

@victor-sudakov Yes, a Pod is by definition bound to a certain Node. Rescheduling is nothing else than deleting and creating a new Pod, which is usually controlled by a ReplicaSet, which is usually owned by a Deployment.

ricosega · 2021-06-17T13:15:03Z

I have the same situation in version 1.20.0, when shutting down a node it remains tainted node.kubernetes.io/unreachable: NoSchedule for the eternity and all the pods are in status Running just like here:

Apparently there is still some issue: #101674

But if I taint myself the node with node.kubernetes.io/unreachable:NoExecute then after the eviction time I've noticed the same things @antoinetran said:

The pods from Deployments are rescheduled to other node but keeping the old one in a Terminating status forever.
The pods from Statefulsets remains always in Terminating status and are never rescheduled.

This seems to be same issue also: #98851

Can anyone confirm what version is solved?

haircommander · 2021-06-24T20:16:56Z

It sounds like this has been fixed in all supported versions of k8s: #55713 (comment)

as such, I'm closing this
/close

k8s-ci-robot · 2021-06-24T20:17:03Z

@haircommander: Closing this issue.

In response to this:

It sounds like this has been fixed in all supported versions of k8s: #55713 (comment)

as such, I'm closing this
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

amitkatyal · 2021-08-17T17:45:44Z

I am facing a similar issue with the daemon sets. The daemon sets pods remain in the running state when a node is in a not-ready state. Even though the node is in a not-ready state but since the pod is still in the running state, the headless service exposing the daemonset as endpoints returns the IP address of the daemon set pod corresponding to the not ready node.
Since headless service is returning the IP address of the daemon set pod which is not running is causing the problem.

I understand that the daemonset pod to remain in the running state is expected behavior as the daemon set controller is not able to reach out to the API server but is there an option to ensure that headless doesn't return the IP address of the POD corresponding to the down node.

zhangguanzhang · 2021-09-18T03:03:11Z

may you need set this for kube-apiserver :

   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=60 \
  --default-unreachable-toleration-seconds=60 \

the value of --node-monitor-grace-period=50s is default for kube-controller-manager.
so if set --default-unreachable-toleration-seconds=60, after a node shutdown, a pod will to be Terminating after 50s+60s。
The state does not affect the flow direction of the svc, and it should go to the end of the trip. so the best way is :

   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=300 \
  --default-unreachable-toleration-seconds=10 \

knkarthik · 2021-09-21T23:22:51Z

I'm still having this issue on EKS v1.20.7-eks-d88609. The behaviour is same as observed by @ricosega and others.

Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

zhangguanzhang · 2021-09-22T11:34:33Z

I'm still having this issue on EKS v1.20.7-eks-d88609. The behaviour is same as observed by @ricosega and others.

Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

Did you solve this problem？

riupie · 2021-10-13T05:21:39Z

may you need set this for kube-apiserver :
   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=60 \
  --default-unreachable-toleration-seconds=60 \
the value of --node-monitor-grace-period=50s is default for kube-controller-manager. so if set --default-unreachable-toleration-seconds=60, after a node shutdown, a pod will to be Terminating after 50s+60s。 The state does not affect the flow direction of the svc, and it should go to the end of the trip. so the best way is :
   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=300 \
  --default-unreachable-toleration-seconds=10 \

I think this is the best workaround for now. I use k8s 1.20.7 and pod-eviction-timeout still not work.

m0sh1x2 · 2021-10-13T06:12:07Z

may you need set this for kube-apiserver :
   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=60 \
  --default-unreachable-toleration-seconds=60 \
the value of --node-monitor-grace-period=50s is default for kube-controller-manager. so if set --default-unreachable-toleration-seconds=60, after a node shutdown, a pod will to be Terminating after 50s+60s。 The state does not affect the flow direction of the svc, and it should go to the end of the trip. so the best way is :
   --enable-admission-plugins=....,DefaultTolerationSeconds \
  --default-not-ready-toleration-seconds=300 \
  --default-unreachable-toleration-seconds=10 \
I think this is the best workaround for now. I use k8s 1.20.7 and pod-eviction-timeout still not work.

This doesn't seem to apply for StatefulSets that have PVCs under a terminated/powered off node(using rook-ceph with rbd). The cronjob that @marczahn noted should work as I have tested it manually #55713 (comment) but this functionality should be covered by the Scheduler?

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 14, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 14, 2017

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Nov 14, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 14, 2017

gaocegege mentioned this issue Jul 10, 2018

Pod stuck in unknown status when kubernetes node is down kubeflow/training-operator#720

Closed

adampl mentioned this issue Aug 30, 2018

add detach logic for node shutdown taint #67977

Closed

ravisantoshgudimetla mentioned this issue Oct 30, 2018

How can I re-schedule a running pod to another worker node #70363

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2018

neolit123 mentioned this issue Apr 25, 2021

Pods are not evicted if node is in NotReady state #101391

Closed

gitlawr mentioned this issue May 31, 2021

[BUG] When a host is down, VMs on it stay in running status harvester/harvester#913

Closed

k8s-ci-robot closed this as completed Jun 24, 2021

onedr0p mentioned this issue Aug 23, 2021

Disabling the primary k3s master node results in kube api becoming unreachable. kube-vip/kube-vip#260

Closed

guangbochen mentioned this issue Oct 26, 2021

[BUG] VM status doesn't update when host goes down harvester/harvester#982

Closed

mateiidavid mentioned this issue Nov 18, 2021

HA Linkerd doesn't self-heal when one node goes down on k3s cluster linkerd/linkerd2#7232

Closed

mittachaitu mentioned this issue Dec 28, 2021

Rescheduled application pod remains in container creating state forever after powering off the node openebs/cstor-operators#410

Closed

chrischdi mentioned this issue Oct 24, 2022

Proactively bounce capi-controller-manager in case of netsplits kubernetes-sigs/cluster-api#7445

Closed

sayap mentioned this issue Aug 1, 2023

BUG+FIX: Endpoints controller fails to deregister services hashicorp/consul-k8s#2491

Closed

MEllis-github mentioned this issue Aug 22, 2023

Enhancement: handle infinite-requeuing of jobs with pod-stuck-terminating project-codeflare/multi-cluster-app-dispatcher#599

Open

Pods are not moved when Node in NotReady state #55713

Pods are not moved when Node in NotReady state #55713

Comments

marczahn commented Nov 14, 2017

marczahn commented Nov 14, 2017

jhorwit2 commented Dec 11, 2017 • edited

marczahn commented Dec 11, 2017

jamesgetx commented Dec 21, 2017

marczahn commented Jan 18, 2018

erkules commented Mar 6, 2018 • edited

albertvaka commented Mar 14, 2018

mypine commented May 17, 2018

trajakovic commented May 30, 2018 • edited

rchicoli commented Jul 14, 2018

adampl commented Jul 24, 2018 • edited

zeelichsheng commented Aug 13, 2018 • edited

adampl commented Aug 13, 2018

huyqut commented Sep 21, 2018

fejta-bot commented Dec 20, 2018

hzxuzhonghu commented Dec 21, 2018

jitendra1987 commented Jan 29, 2019 • edited

jtackaberry commented Feb 2, 2019 • edited

adampl commented Feb 5, 2019

huyqut commented Feb 11, 2019

ironreality commented Feb 26, 2019

Craftoncu commented May 12, 2021

inboxamitraj commented May 13, 2021

adampl commented May 14, 2021

victor-sudakov commented May 19, 2021

hconnan commented May 19, 2021

adampl commented May 21, 2021

antoinetran commented Jun 8, 2021

victor-sudakov commented Jun 9, 2021

antoinetran commented Jun 9, 2021 • edited

victor-sudakov commented Jun 10, 2021

adampl commented Jun 10, 2021

ricosega commented Jun 17, 2021

haircommander commented Jun 24, 2021

k8s-ci-robot commented Jun 24, 2021

amitkatyal commented Aug 17, 2021

zhangguanzhang commented Sep 18, 2021 • edited

knkarthik commented Sep 21, 2021

zhangguanzhang commented Sep 22, 2021

riupie commented Oct 13, 2021

m0sh1x2 commented Oct 13, 2021

jhorwit2 commented Dec 11, 2017 •

edited

erkules commented Mar 6, 2018 •

edited

trajakovic commented May 30, 2018 •

edited

adampl commented Jul 24, 2018 •

edited

zeelichsheng commented Aug 13, 2018 •

edited

jitendra1987 commented Jan 29, 2019 •

edited

jtackaberry commented Feb 2, 2019 •

edited

antoinetran commented Jun 9, 2021 •

edited

zhangguanzhang commented Sep 18, 2021 •

edited