Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods are not moved when Node in NotReady state #55713

Open
marczahn opened this issue Nov 14, 2017 · 41 comments
Open

Pods are not moved when Node in NotReady state #55713

marczahn opened this issue Nov 14, 2017 · 41 comments

Comments

@marczahn
Copy link

@marczahn marczahn commented Nov 14, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:

To simulate a crashed worker node I stopped the kubelet service on that node (Debian Jessie).
The node got into unknow state (resp. NotReady) as expected:

NAME                STATUS     ROLES         AGE       VERSION
lls-lon-db-master   Ready      master,node   7d        v1.8.0+coreos.0
lls-lon-testing01   NotReady   node          6d        v1.8.0+coreos.0

The pods running on the lls-lon-testing01 stay declared as running:

test-core-services   infrastructure-service-deployment-5cb868f49-94gh4                 1/1       Running   0          4h        10.233.96.204   lls-lon-testing01

But is declared as ready: false on describe:

Name:           infrastructure-service-deployment-5cb868f49-94gh4
Namespace:      test-core-services
Node:           lls-lon-testing01/10.100.0.5
Start Time:     Tue, 14 Nov 2017 10:31:41 +0000
Labels:         app=infrastructure-service
                pod-template-hash=176424905
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"test-core-services","name":"infrastructure-service-deployment-5cb868f49","uid":"d...
Status:         Running
IP:             10.233.96.204
Created By:     ReplicaSet/infrastructure-service-deployment-5cb868f49
Controlled By:  ReplicaSet/infrastructure-service-deployment-5cb868f49
Containers:
  infrastructure-service:
    Container ID:  docker://3b750d7cad0c24386cade1e4fedac24ab2621f4991d3302d15c30d9e68749b7b
    Image:         index.docker.io/looplinesystems/infrastructure-service:latest
    Image ID:      docker-pullable://looplinesystems/infrastructure-service@sha256:632591a86ca67f3e19718727e717b07da3b5c79251ce9deede969588b6958272
    Ports:         7110/TCP, 7111/TCP, 7112/TCP
    Command:
      /infrastructurectl
      daemon
    State:          Running
      Started:      Tue, 14 Nov 2017 10:31:48 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      infrastructure-service-config  Secret  Optional: false
    Environment:                     <none>
    Mounts:
      /var/log/services from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hm2hs (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  logs:
    Type:  HostPath (bare host directory volume)
    Path:  /var/log/services
  default-token-hm2hs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hm2hs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:          <none>

What you expected to happen:
I excpected the pods on the "crashed" node to be moved to the remaining node.

How to reproduce it (as minimally and precisely as possible):
In my situtation: Having a node (A)and a master+node (B) installed with Kubespray. Running at least one pod on each node. Stopping Kubelet on A and wait

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0+coreos.0", GitCommit:"a65654ef5b593ac19fbfaf33b1a1873c0320353b", GitTreeState:"clean", BuildDate:"2017-09-29T21:51:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0+coreos.0", GitCommit:"a65654ef5b593ac19fbfaf33b1a1873c0320353b", GitTreeState:"clean", BuildDate:"2017-09-29T21:51:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
Intel(R) Xeon(R) CPU E5-2670
4 Cores
4 GB RAM
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 8 (jessie)
  • Kernel (e.g. uname -a): Linux lls-lon-db-master 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux
  • Install tools: Kubespray
  • Others: -
@marczahn

This comment has been minimized.

Copy link
Author

@marczahn marczahn commented Nov 14, 2017

/sig node

@jhorwit2

This comment has been minimized.

Copy link
Member

@jhorwit2 jhorwit2 commented Dec 11, 2017

@marczahn how long did you wait after turning off the Kubelet? By default pods won't be moved for 5m minutes which is configurable via the following flag on the controller manager.

--pod-eviction-timeout duration                                     The grace period for deleting pods on failed nodes. (default 5m0s)

This allows for cases like a node reboot to not reschedule pods unnecessarily.

@marczahn

This comment has been minimized.

Copy link
Author

@marczahn marczahn commented Dec 11, 2017

I know this parameter and Iwas waiting way longer for than the eviction-timeout. It definitely happened nothing.

@jamesgetx

This comment has been minimized.

Copy link

@jamesgetx jamesgetx commented Dec 21, 2017

We encountered the same problem. Our k8s version is 1.8.4, docker version is 1.12.4

@marczahn

This comment has been minimized.

Copy link
Author

@marczahn marczahn commented Jan 18, 2018

I wrote a script, that can be run as a cronjob:

#!/bin/sh

KUBECTL="/usr/local/bin/kubectl"

# Get only nodes which are not drained yet
NOT_READY_NODES=$($KUBECTL get nodes | grep -P 'NotReady(?!,SchedulingDisabled)' | awk '{print $1}' | xargs echo)
# Get only nodes which are still drained
READY_NODES=$($KUBECTL get nodes | grep '\sReady,SchedulingDisabled' | awk '{print $1}' | xargs echo)

echo "Unready nodes that are undrained: $NOT_READY_NODES"
echo "Ready nodes: $READY_NODES"


for node in $NOT_READY_NODES; do
  echo "Node $node not drained yet, draining..."
  $KUBECTL drain --ignore-daemonsets --force $node
  echo "Done"
done;

for node in $READY_NODES; do
  echo "Node $node still drained, uncordoning..."
  $KUBECTL uncordon $node
  echo "Done"
done;

It is actually checking if a node is down and not drained and vice versa. Hope it helps

@erkules

This comment has been minimized.

Copy link

@erkules erkules commented Mar 6, 2018

Got the same issue 1.9.3 . No eviction after 30 minutes.

$ kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

@albertvaka

This comment has been minimized.

Copy link

@albertvaka albertvaka commented Mar 14, 2018

+1. Is this the intended behavior? If it is, then load balancers should keep serving traffic to those pods (now they do not).

@mypine

This comment has been minimized.

Copy link

@mypine mypine commented May 17, 2018

we encounter the same problem on 1.6.3

@trajakovic

This comment has been minimized.

Copy link

@trajakovic trajakovic commented May 30, 2018

got same problem as @erkules

kubectl version                                                                                                                                       
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-05-12T04:12:12Z", GoVersion:"go1.9.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

tried to drain master node (in setup of 3 nodes), but command keeps hanging.

if it helps, cluster is installed on AWS using kops 1.9.x, 3 masters on separate AZs (m4.large instances)

kubectl logs -n kube-system -f kube-controller-manager-ip-<redacted>


I0529 21:59:35.448967       1 flags.go:52] FLAG: --address="0.0.0.0"
I0529 21:59:35.449022       1 flags.go:52] FLAG: --allocate-node-cidrs="true"
I0529 21:59:35.449096       1 flags.go:52] FLAG: --allow-untagged-cloud="false"
I0529 21:59:35.449106       1 flags.go:52] FLAG: --allow-verification-with-non-compliant-keys="false"
I0529 21:59:35.449116       1 flags.go:52] FLAG: --alsologtostderr="false"
I0529 21:59:35.449122       1 flags.go:52] FLAG: --attach-detach-reconcile-sync-period="1m0s"
I0529 21:59:35.449157       1 flags.go:52] FLAG: --cidr-allocator-type="RangeAllocator"
I0529 21:59:35.449191       1 flags.go:52] FLAG: --cloud-config=""
I0529 21:59:35.449216       1 flags.go:52] FLAG: --cloud-provider="aws"
I0529 21:59:35.449229       1 flags.go:52] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,35.191.0.0/16,209.85.152.0/22,209.85.204.0/22"
I0529 21:59:35.449276       1 flags.go:52] FLAG: --cluster-cidr="100.96.0.0/11"
I0529 21:59:35.449283       1 flags.go:52] FLAG: --cluster-name="staging.kubernetes.sa.dev.superbet.k8s.local"
I0529 21:59:35.449290       1 flags.go:52] FLAG: --cluster-signing-cert-file="/srv/kubernetes/ca.crt"
I0529 21:59:35.449296       1 flags.go:52] FLAG: --cluster-signing-key-file="/srv/kubernetes/ca.key"
I0529 21:59:35.449302       1 flags.go:52] FLAG: --concurrent-deployment-syncs="5"
I0529 21:59:35.449313       1 flags.go:52] FLAG: --concurrent-endpoint-syncs="5"
I0529 21:59:35.449319       1 flags.go:52] FLAG: --concurrent-gc-syncs="20"
I0529 21:59:35.449325       1 flags.go:52] FLAG: --concurrent-namespace-syncs="10"
I0529 21:59:35.449331       1 flags.go:52] FLAG: --concurrent-rc-syncs="5"
I0529 21:59:35.449337       1 flags.go:52] FLAG: --concurrent-replicaset-syncs="5"
I0529 21:59:35.449343       1 flags.go:52] FLAG: --concurrent-resource-quota-syncs="5"
I0529 21:59:35.449349       1 flags.go:52] FLAG: --concurrent-service-syncs="1"
I0529 21:59:35.449355       1 flags.go:52] FLAG: --concurrent-serviceaccount-token-syncs="5"
I0529 21:59:35.449361       1 flags.go:52] FLAG: --configure-cloud-routes="true"
I0529 21:59:35.449367       1 flags.go:52] FLAG: --contention-profiling="false"
I0529 21:59:35.449373       1 flags.go:52] FLAG: --controller-start-interval="0s"
I0529 21:59:35.449379       1 flags.go:52] FLAG: --controllers="[*]"
I0529 21:59:35.449449       1 flags.go:52] FLAG: --deleting-pods-burst="0"
I0529 21:59:35.449456       1 flags.go:52] FLAG: --deleting-pods-qps="0.1"
I0529 21:59:35.449466       1 flags.go:52] FLAG: --deployment-controller-sync-period="30s"
I0529 21:59:35.449473       1 flags.go:52] FLAG: --disable-attach-detach-reconcile-sync="false"
I0529 21:59:35.449479       1 flags.go:52] FLAG: --enable-dynamic-provisioning="true"
I0529 21:59:35.449485       1 flags.go:52] FLAG: --enable-garbage-collector="true"
I0529 21:59:35.449491       1 flags.go:52] FLAG: --enable-hostpath-provisioner="false"
I0529 21:59:35.449529       1 flags.go:52] FLAG: --enable-taint-manager="true"
I0529 21:59:35.449536       1 flags.go:52] FLAG: --experimental-cluster-signing-duration="8760h0m0s"
I0529 21:59:35.449568       1 flags.go:52] FLAG: --feature-gates=""
I0529 21:59:35.449598       1 flags.go:52] FLAG: --flex-volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
I0529 21:59:35.449606       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-downscale-delay="5m0s"
I0529 21:59:35.449612       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-sync-period="30s"
I0529 21:59:35.449618       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-tolerance="0.1"
I0529 21:59:35.449628       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-upscale-delay="3m0s"
I0529 21:59:35.449634       1 flags.go:52] FLAG: --horizontal-pod-autoscaler-use-rest-clients="true"
I0529 21:59:35.449640       1 flags.go:52] FLAG: --insecure-experimental-approve-all-kubelet-csrs-for-group=""
I0529 21:59:35.449646       1 flags.go:52] FLAG: --kube-api-burst="30"
I0529 21:59:35.449652       1 flags.go:52] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0529 21:59:35.449660       1 flags.go:52] FLAG: --kube-api-qps="20"
I0529 21:59:35.449667       1 flags.go:52] FLAG: --kubeconfig="/var/lib/kube-controller-manager/kubeconfig"
I0529 21:59:35.449674       1 flags.go:52] FLAG: --large-cluster-size-threshold="50"
I0529 21:59:35.449680       1 flags.go:52] FLAG: --leader-elect="true"
I0529 21:59:35.449686       1 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I0529 21:59:35.449692       1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I0529 21:59:35.449698       1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints"
I0529 21:59:35.449705       1 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I0529 21:59:35.449711       1 flags.go:52] FLAG: --log-backtrace-at=":0"
I0529 21:59:35.449721       1 flags.go:52] FLAG: --log-dir=""
I0529 21:59:35.449728       1 flags.go:52] FLAG: --log-flush-frequency="5s"
I0529 21:59:35.449735       1 flags.go:52] FLAG: --loglevel="1"
I0529 21:59:35.449741       1 flags.go:52] FLAG: --logtostderr="true"
I0529 21:59:35.449747       1 flags.go:52] FLAG: --master=""
I0529 21:59:35.449753       1 flags.go:52] FLAG: --min-resync-period="12h0m0s"
I0529 21:59:35.449759       1 flags.go:52] FLAG: --namespace-sync-period="5m0s"
I0529 21:59:35.449766       1 flags.go:52] FLAG: --node-cidr-mask-size="24"
I0529 21:59:35.449772       1 flags.go:52] FLAG: --node-eviction-rate="0.1"
I0529 21:59:35.449779       1 flags.go:52] FLAG: --node-monitor-grace-period="40s"
I0529 21:59:35.449786       1 flags.go:52] FLAG: --node-monitor-period="5s"
I0529 21:59:35.449792       1 flags.go:52] FLAG: --node-startup-grace-period="1m0s"
I0529 21:59:35.449826       1 flags.go:52] FLAG: --node-sync-period="0s"
I0529 21:59:35.449834       1 flags.go:52] FLAG: --pod-eviction-timeout="5m0s"
I0529 21:59:35.449867       1 flags.go:52] FLAG: --port="10252"
I0529 21:59:35.449877       1 flags.go:52] FLAG: --profiling="true"
I0529 21:59:35.449884       1 flags.go:52] FLAG: --pv-recycler-increment-timeout-nfs="30"
I0529 21:59:35.449917       1 flags.go:52] FLAG: --pv-recycler-minimum-timeout-hostpath="60"
I0529 21:59:35.449924       1 flags.go:52] FLAG: --pv-recycler-minimum-timeout-nfs="300"
I0529 21:59:35.449955       1 flags.go:52] FLAG: --pv-recycler-pod-template-filepath-hostpath=""
I0529 21:59:35.449962       1 flags.go:52] FLAG: --pv-recycler-pod-template-filepath-nfs=""
I0529 21:59:35.449985       1 flags.go:52] FLAG: --pv-recycler-timeout-increment-hostpath="30"
I0529 21:59:35.449992       1 flags.go:52] FLAG: --pvclaimbinder-sync-period="15s"
I0529 21:59:35.449998       1 flags.go:52] FLAG: --register-retry-count="10"
I0529 21:59:35.450004       1 flags.go:52] FLAG: --resource-quota-sync-period="5m0s"
I0529 21:59:35.450011       1 flags.go:52] FLAG: --root-ca-file="/srv/kubernetes/ca.crt"
I0529 21:59:35.450018       1 flags.go:52] FLAG: --route-reconciliation-period="10s"
I0529 21:59:35.450024       1 flags.go:52] FLAG: --secondary-node-eviction-rate="0.01"
I0529 21:59:35.450031       1 flags.go:52] FLAG: --service-account-private-key-file="/srv/kubernetes/server.key"
I0529 21:59:35.450039       1 flags.go:52] FLAG: --service-cluster-ip-range=""
I0529 21:59:35.450045       1 flags.go:52] FLAG: --service-sync-period="5m0s"
I0529 21:59:35.450051       1 flags.go:52] FLAG: --stderrthreshold="2"
I0529 21:59:35.450057       1 flags.go:52] FLAG: --terminated-pod-gc-threshold="12500"
I0529 21:59:35.450064       1 flags.go:52] FLAG: --unhealthy-zone-threshold="0.55"
I0529 21:59:35.450071       1 flags.go:52] FLAG: --use-service-account-credentials="true"
I0529 21:59:35.450077       1 flags.go:52] FLAG: --v="2"
I0529 21:59:35.450084       1 flags.go:52] FLAG: --version="false"
I0529 21:59:35.450095       1 flags.go:52] FLAG: --vmodule=""
I0529 21:59:35.450113       1 controllermanager.go:108] Version: v1.9.3
@rchicoli

This comment has been minimized.

Copy link

@rchicoli rchicoli commented Jul 14, 2018

I am not quite sure, if this is related to this issue. Let me know, if I should create a new one.
After restarting the cluster, Kubernetes API is reporting wrong POD status.
As you can see all Nodes are offline (kubelet and docker are not running), so I expected at least an Unknown Pod status. Somehow it shows running, even after few minutes later.

root@kube-controller-1:~# kubectl get all
NAME                            READY     STATUS    RESTARTS   AGE
pod/webapper-856ff74c66-59b2t   1/1       Running   0          9h
pod/webapper-856ff74c66-qhlmb   1/1       Running   0          9h

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/kubernetes   ClusterIP   10.32.0.1     <none>        443/TCP    2d
service/webapper     ClusterIP   10.32.0.100   <none>        8080/TCP   6h

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/webapper   2         2         2            0           9h

NAME                                  DESIRED   CURRENT   READY     AGE
replicaset.apps/webapper-856ff74c66   2         2         0         9h

root@kube-controller-1:~# kubectl get nodes
NAME            STATUS     ROLES     AGE       VERSION
kube-worker-1   NotReady   <none>    2d        v1.11.0
kube-worker-2   NotReady   <none>    2d        v1.11.0

root@kube-controller-1:~# kubectl exec -ti webapper-856ff74c66-qhlmb sh
Error from server: error dialing backend: dial tcp 10.0.0.17:10250: connect: connection refused

I am using the latest kubernetes version

root@kube-controller-1:~# kube-apiserver --version
Kubernetes v1.11.0
root@kube-controller-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
@adampl

This comment has been minimized.

Copy link

@adampl adampl commented Jul 24, 2018

I have just run into this issue (v1.10.1). I suspect it has something to do with volumes not being detached/unmounted.

@zeelichsheng

This comment has been minimized.

Copy link

@zeelichsheng zeelichsheng commented Aug 13, 2018

I encountered a similar issue. I experimenting with the kubernetes autoscaler. When I manually stop a node VM, the node itself goes into NotReady state. And after a while, the pod scheduled on the removed node goes to Unknown state.

At this point, Kubernetes behaves correctly by creating a new pod, and autoscaler creates a new node to schedule the new pod.

However, the removed pod gets stuck in Unknown state. The original node cannot be removed by autoscaler from Kubernetes because autoscaler still thinks there is load (i.e. the stuck pod) on the node.

NAME READY STATUS RESTARTS AGE
busybox-6b76d7d9c8-7xb48 0/1 Pending 0 2m
busybox-6b76d7d9c8-xlgpz 1/1 Unknown 0 11m

NAME STATUS ROLES AGE VERSION
master-5f517752-9b64-11e8-8caa-0612df8b7178 Ready master 4d v1.10.2
worker-a20f2c3e-9f19-11e8-a52a-0612df8b7178 NotReady worker 10m v1.10.2
worker-f2f997c8-9f1a-11e8-a52a-0612df8b7178 Ready worker 56s v1.10.2

This is part of the output of describing the NotReady worker node, which shows that Kubernetes still thinks the stuck pod is scheduled on this node:

Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits


default busybox-6b76d7d9c8-xlgpz 1 (50%) 1 (50%) 0 (0%) 0 (0%)

If we have to manually and forcefully remove the pod by using "kubectl delete pod --force --grade-period=0", it means that autoscaler will be affected and not correctly managing cluster resources without user interference.

@adampl

This comment has been minimized.

Copy link

@adampl adampl commented Aug 13, 2018

@huyqut

This comment has been minimized.

Copy link

@huyqut huyqut commented Sep 21, 2018

Hi, I have the same problem. No pods are evicted if a node is "NotReady" even after --pod-eviction-timeout set on kube-controller-manager. Are there any workarounds?

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented Dec 20, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@hzxuzhonghu

This comment has been minimized.

Copy link
Member

@hzxuzhonghu hzxuzhonghu commented Dec 21, 2018

/remove-lifecycle stale

@jitendra1987

This comment has been minimized.

Copy link

@jitendra1987 jitendra1987 commented Jan 29, 2019

I am also seeing this issue.
Cluster : 1 master, 1 worker
kubernetes=1.12.5, docker-ce=18.09.1
steps:

  1. launched few test pods scheduled to worker.
  2. Then Shutdown the worker node.
  3. Observed that kubernetes marked worker as NotReady but pod eviction is not getting started even after 5 mins.
@jtackaberry

This comment has been minimized.

Copy link

@jtackaberry jtackaberry commented Feb 2, 2019

Surely this is one of the first failure modes everyone tests? It's the first worker-related failure I tested while evaluating Kubernetes. I even gracefully shutdown the worker node and let all kube processes exit cleanly. IMO it very much violates the Principle of Least Astonishment that pods assigned to NotReady nodes remain in the Running state.

(1.13.3 with a single node test cluster.)

@adampl

This comment has been minimized.

Copy link

@adampl adampl commented Feb 5, 2019

@jtackaberry It's not so simple. The cluster nodes need an external monitor or hypervisor to reliably determine whether the NotReady node is actually shut down, in order to take into account a possible split-brain scenario. In other words, you cannot assume that pods are not running just because the node is not responding. See: kubernetes/enhancements#719

@huyqut

This comment has been minimized.

Copy link

@huyqut huyqut commented Feb 11, 2019

@jitendra1987 can you test a cluster with 1 master and 2 workers? I also tested with 1 master and 1 worker and pod eviction didn't happen. However, when there are 3 machines in the cluster, it happens normally.

@ironreality

This comment has been minimized.

Copy link

@ironreality ironreality commented Feb 26, 2019

With 1 master + 2 nodes configuration the problem isn't reproducible.
With 1 master + 1 node - it occurs.
Tested it with kubeadm-installed cluster.
The testing platform: Virtualbox+ Ubuntu 16.04 + K8s 1.13.3 + Docker 18.09.2.

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented May 27, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@deepankermishra

This comment has been minimized.

Copy link

@deepankermishra deepankermishra commented Jun 7, 2019

I am hitting the same issue on a 5 node cluster. After deploying a statefulset nginx app and ensuring that each node is assigned only 1 pod, rescheduling of pods does not happen.

[root@temp-1gs-514e4f-default-0 ~]# kubectl get nodes
NAME                        STATUS     ROLES              AGE       VERSION
temp-1gs-514e4f-default-0   Ready      etcd,master,node   4d        v1.11.6-5+2f651cb7b21f47
temp-1gs-514e4f-default-1   Ready      etcd,master,node   4d        v1.11.6-5+2f651cb7b21f47
temp-1gs-514e4f-default-2   Ready      etcd,node          4d        v1.11.6-5+2f651cb7b21f47
temp-1gs-514e4f-default-3   NotReady   node               4d        v1.11.6-5+2f651cb7b21f47
temp-1gs-514e4f-default-4   NotReady   node               4d        v1.11.6-5+2f651cb7b21f47
[root@temp-1gs-514e4f-default-0 ~]# kubectl get pods -o wide
NAME       READY     STATUS        RESTARTS   AGE       IP            NODE                        NOMINATED NODE
nginx1-0   1/1       Terminating   0          12m       10.100.3.8    temp-1gs-514e4f-default-3   <none>
nginx1-1   1/1       Running       0          12m       10.100.2.8    temp-1gs-514e4f-default-2   <none>
nginx1-2   1/1       Running       0          12m       10.100.1.9    temp-1gs-514e4f-default-1   <none>
nginx1-3   1/1       Terminating   0          12m       10.100.4.12   temp-1gs-514e4f-default-4   <none>
nginx1-4   1/1       Running       0          12m       10.100.0.14   temp-1gs-514e4f-default-0   <none>
[root@temp-1gs-514e4f-default-0 ~]# kubectl get pods -o wide
NAME       READY     STATUS        RESTARTS   AGE       IP            NODE                        NOMINATED NODE
nginx1-0   1/1       Terminating   0          17m       10.100.3.8    temp-1gs-514e4f-default-3   <none>
nginx1-1   1/1       Running       0          17m       10.100.2.8    temp-1gs-514e4f-default-2   <none>
nginx1-2   1/1       Running       0          17m       10.100.1.9    temp-1gs-514e4f-default-1   <none>
nginx1-3   1/1       Terminating   0          17m       10.100.4.12   temp-1gs-514e4f-default-4   <none>
nginx1-4   1/1       Running       0          17m       10.100.0.14   temp-1gs-514e4f-default-0   <none>
[root@temp-1gs-514e4f-default-0 ~]# 
@huangjiasingle

This comment has been minimized.

Copy link

@huangjiasingle huangjiasingle commented Jun 24, 2019

@marczahn you can see the pod spec's tolerations and the node taints which the node of not ready. the logic of the pod eviction when the kubelet if down is like this: it's will macth the toleration of the pod and the node taint's, if the tolateration include the taint's, itl's will be trigger the evition(the act is to call apiserver delete pod by deleteOption{gracePeriod:0} ).

@yanchicago

This comment has been minimized.

Copy link

@yanchicago yanchicago commented Jul 9, 2019

Same issue here. The pods on the cordoned nodes are not terminated and started on the other ready nodes. After manually/force terminating the pods, the pods are rescheduled on the same cordoned/NotReady,ScheduleingDisabled nodes.

@swaykg

This comment has been minimized.

Copy link

@swaykg swaykg commented Jul 22, 2019

i'm having the same issue, node is NotReady, but pods are showing Running status, although they are hanging, but they do not reschedule. any workaround on this issue?

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Aug 3, 2019

still a problem in 1.15.1.
/sig scheduling
/lifecycle frozen

@neolit123

This comment has been minimized.

@Huang-Wei

This comment has been minimized.

Copy link
Member

@Huang-Wei Huang-Wei commented Aug 4, 2019

Some "implicit" points to shared here:

  • If the cluster is running using kubeadm (or an equivalent tool), master isn't counted as a regular worker.
    func IsMasterNode(nodeName string) bool {
  • If all workers become unhealthy, it will trigger an internal "fully disrupted" mode, the eviction behavior will be paused - as there is no place to evict and re-place the pods.
    // - fullyDisrupted if there're no Ready Nodes,
  • By default, when a node becomes not ready, the pod will be applied with a toleration (with effect unreachable or not-ready) with default 300 seconds as a grace period. So wait for that time to see if it's evicted.

That means it's as expected to see the pods staying there if there is no Ready worker nodes (again, master node doesn't count, even if you take out the taints). What #55713 (comment)
described was consistent with this.

So please re-check your env, and if you still think you hit a bug, please provide the following result:

  • kubectl version --short
  • kubectl get no -o wide
  • kubectl get po --all-namespaces -o wide
  • any featuregates you explicitly enabled/disabled
@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Aug 4, 2019

@Huang-Wei
thanks,

If the cluster is running using kubeadm (or an equivalent tool), master isn't counted as a regular worker.

there is actually a plan to deprecate IsMasterNode:
kubernetes/enhancements#1144
#80238

is this the logic you mention?
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L715-L717

it seems to use the node name, and in my tests the node does not have "master" in the name.
but kubeadm labels the node as node-role.kubernetes.io/master and taints it using the same node-role as key + effect:NoSchedule. so regular workloads will not schedule on the "master" node.

If all workers become unhealthy, it will trigger an internal "fully disrupted" mode, the eviction behavior will be paused - as there is no place to evict and re-place the pods.

this seems to apply to DaemonSets too. a DaemonSet Pod scheduled on a worker that became NotReady remains Running even past 300 seconds. if the node is shutdown but NotReady one has to manually clean such a Pod. but deleting the Node (kubectl delete) does make the Pods terminate / reschedule.

this can be verified with kubeadm by observing the kube-proxy DaemonSet or the CNI plugin DaemonSet (e.g. WeaveNet).

By default, when a node becomes not ready, the pod will be applied with a toleration (with effect unreachable or not-ready) with default 300 seconds as a grace period. So wait for that time to see if it's evicted.

nope, Pods are not evicted and remain with status Ready after the grace period

That means it's as expected to see the pods staying there if there is no Ready worker nodes (again, master node doesn't count, even if you take out the taints). What #55713 (comment)
described was consistent with this.

i have only tested the single master + single worker case, but i think it's still viable as DaemonSet Pods remain Running on the NotReady worker node.

So please re-check your env, and if you still think you hit a bug, please provide the following result:

kubectl version --short

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-08-04T13:03:21Z", GoVersion:"go1.12.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

kubectl get no -o wide

NAME         STATUS     ROLES    AGE     VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION      CONTAINER-RUNTIME
lubo2        NotReady   <none>   5m6s    v1.15.1        192.168.0.103   <none>        Ubuntu 17.10   4.13.0-41-generic   docker://18.6.3
luboitvbox   Ready      master   7m18s   v1.15.1        192.168.0.102   <none>        Ubuntu 17.10   4.13.0-41-generic   docker://18.6.3
right after the worker node joined:

$ kubectl get po --all-namespaces -o wide
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
kube-system   coredns-5c98db65d4-jfsls             1/1     Running   0          3m50s   10.32.0.2       luboitvbox   <none>           <none>
kube-system   coredns-5c98db65d4-q8tw2             1/1     Running   0          3m50s   10.44.0.1       lubo2        <none>           <none>
kube-system   etcd-luboitvbox                      1/1     Running   0          6m55s   192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-apiserver-luboitvbox            1/1     Running   0          7m12s   192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-controller-manager-luboitvbox   1/1     Running   0          7m5s    192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-proxy-ft4dt                     1/1     Running   0          8m7s    192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-proxy-rdgqp                     1/1     Running   0          6m6s    192.168.0.103   lubo2        <none>           <none>
kube-system   kube-scheduler-luboitvbox            1/1     Running   0          7m10s   192.168.0.102   luboitvbox   <none>           <none>
kube-system   weave-net-qr8vb                      2/2     Running   0          8m7s    192.168.0.102   luboitvbox   <none>           <none>
kube-system   weave-net-vxcsb                      2/2     Running   1          6m6s    192.168.0.103   lubo2        <none>           <none>

# after the grace period and the second node has become NonReady:
# one of the CoreDNS Pods (Deployment with 2 replicas) enters Terminating state, but remains there.
# a new replacement Pod for it is scheduled on the `luboitvbox` node.
# the weave / kube-proxy Pods (from  DaemonSet) remain with status Running on the
# node `lubo2` that has been NonReady for more than 300seconds.

$ kubectl get po --all-namespaces -o wide
NAMESPACE     NAME                                 READY   STATUS        RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
kube-system   coredns-5c98db65d4-fmmkh             1/1     Running       0          3m22s   10.32.0.3       luboitvbox   <none>           <none>
kube-system   coredns-5c98db65d4-jfsls             1/1     Running       0          9m45s   10.32.0.2       luboitvbox   <none>           <none>
kube-system   coredns-5c98db65d4-q8tw2             1/1     Terminating   0          9m45s   10.44.0.1       lubo2        <none>           <none>
kube-system   etcd-luboitvbox                      1/1     Running       0          12m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-apiserver-luboitvbox            1/1     Running       0          13m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-controller-manager-luboitvbox   1/1     Running       0          13m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-proxy-ft4dt                     1/1     Running       0          14m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   kube-proxy-rdgqp                     1/1     Running       0          12m     192.168.0.103   lubo2        <none>           <none>
kube-system   kube-scheduler-luboitvbox            1/1     Running       0          13m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   weave-net-qr8vb                      2/2     Running       0          14m     192.168.0.102   luboitvbox   <none>           <none>
kube-system   weave-net-vxcsb                      2/2     Running       1          12m     192.168.0.103   lubo2        <none>           <none>
$ kubectl describe po coredns-5c98db65d4-q8tw2 -n kube-system
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

$ kubectl describe po kube-proxy-rdgqp -n kube-system
...
Tolerations:     
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule


$ kubectl describe po weave-net-vxcsb -n kube-system
...
Tolerations:     :NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule

any featuregates you explicitly enabled/disabled

nope.

i think what the users are expecting is that in cases of failure where the node cannot be recovered, the Pods associated with it should terminate or reschedule to different nodes.

also i think Node objects that are NotReady for more than > X minutes should be optionally garbage collected (not sure if there is an option for that already).
but perhaps that's something for https://github.com/kubernetes/autoscaler or an external controller to perform.

@tedyu

This comment has been minimized.

Copy link
Contributor

@tedyu tedyu commented Aug 4, 2019

Node objects that are not NotReady for more than > X minutes should be optionally garbage collected

I guess the double negation is unintended - you meant 'are NotReady for more than > X minutes'.

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Aug 4, 2019

I guess the double negation is unintended - you meant 'are NotReady for more than > X minutes'.

yes! updated.

@Huang-Wei

This comment has been minimized.

Copy link
Member

@Huang-Wei Huang-Wei commented Aug 5, 2019

there is actually a plan to deprecate IsMasterNode

That sounds good. At least it gives kubeadm users an option to use two machines to build a "real" 2-node cluster, so as to avoid "fully disrupted" mode.

this seems to apply to DaemonSets too. a DaemonSet Pod scheduled on a worker that became NotReady remains Running even past 300 seconds.

Daemonset is special here. By default, ds pod will be applied with multiple tolerations: not-ready:NoExecute, unreachable:NoExecute, unschedulable:NoSchedule, etc. So a daemonset pod can tolerate almost all unhealthy conditions (taints) of a node.

BTW: it's also pointless to "remove" a daemonset pod to another node, b/c daemonset literally means one pod per node.

if the node is shutdown but NotReady one has to manually clean such a Pod. but deleting the Node (kubectl delete) does make the Pods terminate / reschedule.

If a Node becomes NotReady, the admin/operator should get the issue fixed. And if it's fixed by simple deletion (k delete node), the pods are cleaned up as well. So the behavior sounds reasonable.

# after the grace period and the second node has become NonReady:
# one of the CoreDNS Pods (Deployment with 2 replicas) enters Terminating state, but remains there.
# a new replacement Pod for it is scheduled on the luboitvbox node.

This is what we discussed here. CoreDNS is a Deployment, so in a non-fully-disrupted env, its pod on "unreachable" node will be evicted after 300 seconds. In your case, a replacement pod is put onto node luboitvbox.

Why is it still "Terminating". It's because a pod deletion from apiserver/etcd needs "2-phase verification" - one is to set spec.deletionTimestamp to a non-nil value (by controller-manager), the other is to get the deletion confirmation from kubelet. And apparently the latter part can't be satisfied, so the pod metatdata is still there. But if the node comes back online, controller-manager/kubelet will get it reconciled.

# the weave / kube-proxy Pods (from DaemonSet) remain with status Running on the
# node lubo2 that has been NonReady for more than 300seconds.

Again, Daemonset is different.

@misterikkit

This comment has been minimized.

Copy link
Contributor

@misterikkit misterikkit commented Aug 5, 2019

For anyone who encountered this issue with StatefulSet pods, this is a known issue that we hope to address with kubernetes/enhancements#1116.

In particular, StatefulSetController does not create a replacement pod until it is sure that the volume has been unmounted. The only way to be sure today is with pod deletion, because we can't distinguish between, say, a network partition and a node that has lost power.

Same issue here. The pods on the cordoned nodes are not terminated and started on the other ready nodes. After manually/force terminating the pods, the pods are rescheduled on the same cordoned/NotReady,ScheduleingDisabled nodes.

@yanchicago I think you are hitting a slightly different problem. I have seen this in my lab, where I delete a pod from a cordoned node, ReplicaSetController creates a new pod, and it gets scheduled to the cordoned node (because it has a blanket toleration).

@misterikkit

This comment has been minimized.

Copy link
Contributor

@misterikkit misterikkit commented Aug 5, 2019

if the node is shutdown but NotReady one has to manually clean such a Pod. but deleting the Node (kubectl delete) does make the Pods terminate / reschedule.

If a Node becomes NotReady, the admin/operator should get the issue fixed. And if it's fixed by simple deletion (k delete node), the pods are cleaned up as well. So the behavior sounds reasonable.

It would be better to force-delete the pods on that node, assuming the node is going to rejoin the cluster when it is healthy again. IIRC Node objects being deleted and recreated caused other issues in the past.

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Aug 5, 2019

@Huang-Wei thanks for the explanation.
this clarifies the picture and it seems that ultimately node cleanup is in the hands of the operator, which will then trigger rescheduling / removal of the pods hosted on such a node.

@sarjeet2013

This comment has been minimized.

Copy link

@sarjeet2013 sarjeet2013 commented Sep 17, 2019

I am currently seeing the same behavior in 1.15. From what I read, this is still the intended behavior for statefulset pods. But, I am seeing the daemonset pod is still continue to show up as Running, though the statefulset pods are stuck in Terminating status. Is this intended behavior as well?

What's the temporary workaround or solution for now, until the kubernetes fix it natively. Does force deleting a pod is viable option or deleting a node object is considered as viable option?

@mattmattox

This comment has been minimized.

Copy link

@mattmattox mattmattox commented Sep 24, 2019

I was able to workaround this using this script to force drain any node that has gone into Not Ready status for greater than 5 mins (adjustable) then it will uncordon node the after it returns.

@RajdeepSardar

This comment has been minimized.

Copy link

@RajdeepSardar RajdeepSardar commented Oct 5, 2019

I found PODs running while Node has become UnReady. But I noticed that PODs terminated about 5 minutes after Node becoming UnReady.

Digging Deep, I found 2 parameters useful here.

pod-eviction-timeout in the Kubernetes cluster level

If you want to manage this attribute in POD level, you can add tolerations. After adding Toleration as below, my POD started terminating in 30 seconds

spec:
  tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30

Details about related cluster parameters can be found here
https://github.com/kubernetes-sigs/kubespray/blob/master/docs/kubernetes-reliability.md

@holmesb

This comment has been minimized.

Copy link

@holmesb holmesb commented Oct 10, 2019

"BTW: it's also pointless to "remove" a daemonset pod to another node, b/c daemonset literally means one pod per node."

@Huang-Wei, Since daemonset pods remain Running throughout a node failure that is not due to OS\K8s crash, this can cause problems when the node returns. For example, MetalLB DS pods publish routes to the BGP router at startup, but only infrequently thereafter. If already "Running", they take a long time to publish routes, which can cause network problems and downtime. If I manually restart the DS pod on the recovered node, MLB works fine. Also, as others have suggested, using this script is a valid workaround. But not a root fix.

I have raised an MLB issue to make this router-publishing event occur more frequently, but it would help if k8s allowed this "remain running during node failure" DS pods behaviour configurable so they are restarted upon node recovery.

@non7top

This comment has been minimized.

Copy link

@non7top non7top commented Jan 8, 2020

Still an issue. Simple scenario: cluster with 3 mastes and 3 workers (all in virtualbox), shutdown all, start 2 of each type, so 1 worker down, 1 master down.
Right after startup all nodes are marked as Ready (which is also not intuitive). After default timeouts 2 nodes are marked as NotReady as expected (since they are powered off). One of the coredns pods gets relocated to a running node (expected behaviour). Otherwise all pods from 2 non-existent nodes are still marked as Running, which is totally not expected and obviously not right. At least those should be indicated as having some problems.

(and yes, I waited for 40 minutes as you can see on the coredns pod)

non7top@kube-master01:~$ kubectl version --short
Client Version: v1.17.0
Server Version: v1.17.0
non7top@kube-master01:~$ kubectl get no -o wide
NAME            STATUS     ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kube-master01   Ready      master   29h   v1.17.0   10.0.10.51    <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://19.3.5
kube-master02   Ready      master   27h   v1.17.0   10.0.10.105   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://19.3.5
kube-master03   NotReady   master   23h   v1.17.0   10.0.10.229   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://19.3.5
kube-node01     Ready      <none>   28h   v1.17.0   10.0.10.111   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://19.3.5
kube-node02     NotReady   <none>   27h   v1.17.0   10.0.10.183   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://19.3.5
kube-node03     Ready      <none>   30m   v1.17.0   10.0.10.157   <none>        Ubuntu 18.04.3 LTS   4.15.0-74-generic   docker://19.3.5
non7top@kube-master01:~$ kubectl get po --all-namespaces -o wide
NAMESPACE     NAME                                    READY   STATUS        RESTARTS   AGE   IP            NODE            NOMINATED NODE   READINESS GATES
kube-system   coredns-6955765f44-mtcpm                1/1     Running       0          43m   10.244.0.6    kube-master01   <none>           <none>
kube-system   coredns-6955765f44-xgpfh                1/1     Running       1          23h   10.244.1.3    kube-node01     <none>           <none>
kube-system   coredns-6955765f44-zkgn6                1/1     Terminating   0          23h   10.244.2.2    kube-node02     <none>           <none>
kube-system   etcd-kube-master01                      1/1     Running       8          29h   10.0.10.51    kube-master01   <none>           <none>
kube-system   etcd-kube-master02                      1/1     Running       3          24h   10.0.10.105   kube-master02   <none>           <none>
kube-system   etcd-kube-master03                      1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
kube-system   kube-apiserver-kube-master01            1/1     Running       199        29h   10.0.10.51    kube-master01   <none>           <none>
kube-system   kube-apiserver-kube-master02            1/1     Running       6          24h   10.0.10.105   kube-master02   <none>           <none>
kube-system   kube-apiserver-kube-master03            1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
kube-system   kube-controller-manager-kube-master01   1/1     Running       9          29h   10.0.10.51    kube-master01   <none>           <none>
kube-system   kube-controller-manager-kube-master02   1/1     Running       3          24h   10.0.10.105   kube-master02   <none>           <none>
kube-system   kube-controller-manager-kube-master03   1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
kube-system   kube-flannel-ds-amd64-b5b46             1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
kube-system   kube-flannel-ds-amd64-pchx5             1/1     Running       9          28h   10.0.10.51    kube-master01   <none>           <none>
kube-system   kube-flannel-ds-amd64-rxmfj             1/1     Running       4          28h   10.0.10.112   kube-node01     <none>           <none>
kube-system   kube-flannel-ds-amd64-swjzr             1/1     Running       9          24h   10.0.10.105   kube-master02   <none>           <none>
kube-system   kube-flannel-ds-amd64-tsp6n             1/1     Running       0          22m   10.0.10.157   kube-node03     <none>           <none>
kube-system   kube-flannel-ds-amd64-z8wlj             1/1     Running       0          27h   10.0.10.183   kube-node02     <none>           <none>
kube-system   kube-proxy-6z8b2                        1/1     Running       0          30m   10.0.10.157   kube-node03     <none>           <none>
kube-system   kube-proxy-gkrdz                        1/1     Running       4          29h   10.0.10.51    kube-master01   <none>           <none>
kube-system   kube-proxy-lrsh6                        1/1     Running       0          27h   10.0.10.183   kube-node02     <none>           <none>
kube-system   kube-proxy-p5jqd                        1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
kube-system   kube-proxy-pgzxl                        1/1     Running       1          28h   10.0.10.112   kube-node01     <none>           <none>
kube-system   kube-proxy-rf267                        1/1     Running       2          27h   10.0.10.105   kube-master02   <none>           <none>
kube-system   kube-scheduler-kube-master01            1/1     Running       8          29h   10.0.10.51    kube-master01   <none>           <none>
kube-system   kube-scheduler-kube-master02            1/1     Running       2          24h   10.0.10.105   kube-master02   <none>           <none>
kube-system   kube-scheduler-kube-master03            1/1     Running       1          23h   10.0.10.229   kube-master03   <none>           <none>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.