Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod running on NotReady node all the time #98511

Closed
CaoDonghui123 opened this issue Jan 28, 2021 · 10 comments
Closed

Pod running on NotReady node all the time #98511

CaoDonghui123 opened this issue Jan 28, 2021 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@CaoDonghui123
Copy link
Contributor

What happened:
I have three nodes. when I shutdown cdh-k8s-3.novalocal ,pods running on it all the time

# kubectl get node
NAME                  STATUS     ROLES                  AGE   VERSION
cdh-k8s-1.novalocal   Ready      control-plane,master   15d   v1.20.0
cdh-k8s-2.novalocal   Ready      <none>                 9d    v1.20.0
cdh-k8s-3.novalocal   NotReady   <none>                 9d    v1.20.0
# kubectl get pod -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
nginx-deployment-66b6c48dd5-5jtqv   1/1     Running   0          3h28m   10.244.26.110   cdh-k8s-3.novalocal   <none>           <none>
nginx-deployment-66b6c48dd5-fntn4   1/1     Running   0          3h28m   10.244.26.108   cdh-k8s-3.novalocal   <none>           <none>
nginx-deployment-66b6c48dd5-vz7hr   1/1     Running   0          3h28m   10.244.26.109   cdh-k8s-3.novalocal   <none>           <none>

What you expected to happen:
Pod status change to Unknown

How to reproduce it (as minimally and precisely as possible):
step:

- 1. kubectl create deployment
- 2. shutdown node

Anything else we need to know?:
yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
# kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   0/3     3            0           3h28m

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:51:19Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@CaoDonghui123 CaoDonghui123 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 28, 2021
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 28, 2021
@k8s-ci-robot
Copy link
Contributor

@CaoDonghui123: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 28, 2021
@k8s-ci-robot
Copy link
Contributor

@CaoDonghui123: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CaoDonghui123
Copy link
Contributor Author

CaoDonghui123 commented Jan 28, 2021

I find the Doc

DaemonSet pods are created with NoExecute tolerations for the following taints with no tolerationSeconds:

node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
This ensures that DaemonSet pods are never evicted due to these problems.

I'm not sure if this needs to be fix
If it need fix , I will try to fix

@lunhuijie
Copy link
Contributor

lunhuijie commented Jan 28, 2021

there are two vars
one is --node-monitor-grace-period. it's used to change the status of node(ready to notready, 40s)
another is --pod-eviction-timeout. this para was used to change the pod status(means make this pod which in notready node drive out to others node). Also you can set it in pod's yaml by using such para(tolerationSeconds 300s).
i tried it
wait 300s you will found the status of pod had been changed

@neolit123
Copy link
Member

please try the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jan 28, 2021
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

please try the support channels:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CaoDonghui123
Copy link
Contributor Author

there are two vars
one is --node-monitor-grace-period. it's used to change the status of node(ready to notready, 40s)
another is --pod-eviction-timeout. this para was used to change the pod status(means make this pod which in notready node drive out to others node). Also you can set it in pod's yaml by using such para(tolerationSeconds 300s).
i tried it
wait 300s you will found the status of pod had been changed

@lunhuijie I think it has nothing to do with these two vars. I add this in YAML

      tolerations:
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 20
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 20

it may fix it

@lunhuijie
Copy link
Contributor

what 's my means is that's a normal scenes。
see https://www.qikqiak.com/post/kubelet-sync-node-status/ for more

@CaoDonghui123
Copy link
Contributor Author

you mean if I wait for 5m0s, the status of pod had been changed? I wait one-night pod Still running

@lunhuijie
Copy link
Contributor

a mistake para of kubelet will cause it happen (i want to set nodeStatusUpdateFrequency = 4s but set nodeStatusUpdateFrequency = "4s"), when i cat logs of kubelet, i found nothing, at this time, pod will not change its status. Check your paras again, if there are no more problem ,maybe you can propose another issue and remind me. thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants