Pod running on NotReady node all the time #98511

CaoDonghui123 · 2021-01-28T06:57:04Z

What happened:
I have three nodes. when I shutdown cdh-k8s-3.novalocal ,pods running on it all the time

# kubectl get node
NAME                  STATUS     ROLES                  AGE   VERSION
cdh-k8s-1.novalocal   Ready      control-plane,master   15d   v1.20.0
cdh-k8s-2.novalocal   Ready      <none>                 9d    v1.20.0
cdh-k8s-3.novalocal   NotReady   <none>                 9d    v1.20.0

# kubectl get pod -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
nginx-deployment-66b6c48dd5-5jtqv   1/1     Running   0          3h28m   10.244.26.110   cdh-k8s-3.novalocal   <none>           <none>
nginx-deployment-66b6c48dd5-fntn4   1/1     Running   0          3h28m   10.244.26.108   cdh-k8s-3.novalocal   <none>           <none>
nginx-deployment-66b6c48dd5-vz7hr   1/1     Running   0          3h28m   10.244.26.109   cdh-k8s-3.novalocal   <none>           <none>

What you expected to happen:
Pod status change to Unknown

How to reproduce it (as minimally and precisely as possible):
step:

- 1. kubectl create deployment
- 2. shutdown node

Anything else we need to know?:
yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

# kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   0/3     3            0           3h28m

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:51:19Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-01-28T06:57:11Z

@CaoDonghui123: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-01-28T06:57:12Z

@CaoDonghui123: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CaoDonghui123 · 2021-01-28T10:05:46Z

I find the Doc

DaemonSet pods are created with NoExecute tolerations for the following taints with no tolerationSeconds:

node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
This ensures that DaemonSet pods are never evicted due to these problems.

I'm not sure if this needs to be fix
If it need fix , I will try to fix

lunhuijie · 2021-01-28T10:41:17Z

there are two vars
one is --node-monitor-grace-period. it's used to change the status of node(ready to notready, 40s)
another is --pod-eviction-timeout. this para was used to change the pod status(means make this pod which in notready node drive out to others node). Also you can set it in pod's yaml by using such para(tolerationSeconds 300s).
i tried it
wait 300s you will found the status of pod had been changed

neolit123 · 2021-01-28T19:49:11Z

please try the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

k8s-ci-robot · 2021-01-28T19:49:24Z

@neolit123: Closing this issue.

In response to this:

please try the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CaoDonghui123 · 2021-01-29T03:08:44Z

there are two vars
one is --node-monitor-grace-period. it's used to change the status of node(ready to notready, 40s)
another is --pod-eviction-timeout. this para was used to change the pod status(means make this pod which in notready node drive out to others node). Also you can set it in pod's yaml by using such para(tolerationSeconds 300s).
i tried it
wait 300s you will found the status of pod had been changed

@lunhuijie I think it has nothing to do with these two vars. I add this in YAML

      tolerations:
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 20
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 20

it may fix it

lunhuijie · 2021-01-29T03:20:45Z

what 's my means is that's a normal scenes。
see https://www.qikqiak.com/post/kubelet-sync-node-status/ for more

CaoDonghui123 · 2021-01-29T03:42:49Z

you mean if I wait for 5m0s, the status of pod had been changed? I wait one-night pod Still running

lunhuijie · 2021-01-29T06:17:03Z

a mistake para of kubelet will cause it happen (i want to set nodeStatusUpdateFrequency = 4s but set nodeStatusUpdateFrequency = "4s"), when i cat logs of kubelet, i found nothing, at this time, pod will not change its status. Check your paras again, if there are no more problem ,maybe you can propose another issue and remind me. thanks a lot!

CaoDonghui123 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 28, 2021

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 28, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 28, 2021

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jan 28, 2021

k8s-ci-robot closed this as completed Jan 28, 2021

lunhuijie mentioned this issue Jun 21, 2021

REQUEST: New membership for lunhuijie kubernetes/org#2798

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod running on NotReady node all the time #98511

Pod running on NotReady node all the time #98511

CaoDonghui123 commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

CaoDonghui123 commented Jan 28, 2021 •

edited

Loading

lunhuijie commented Jan 28, 2021 •

edited

Loading

neolit123 commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

CaoDonghui123 commented Jan 29, 2021

lunhuijie commented Jan 29, 2021

CaoDonghui123 commented Jan 29, 2021

lunhuijie commented Jan 29, 2021

Pod running on NotReady node all the time #98511

Pod running on NotReady node all the time #98511

Comments

CaoDonghui123 commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

CaoDonghui123 commented Jan 28, 2021 • edited Loading

lunhuijie commented Jan 28, 2021 • edited Loading

neolit123 commented Jan 28, 2021

k8s-ci-robot commented Jan 28, 2021

CaoDonghui123 commented Jan 29, 2021

lunhuijie commented Jan 29, 2021

CaoDonghui123 commented Jan 29, 2021

lunhuijie commented Jan 29, 2021

CaoDonghui123 commented Jan 28, 2021 •

edited

Loading

lunhuijie commented Jan 28, 2021 •

edited

Loading