docs: StatefulSet pod is never evicted from shutdown node #54368

at1984z · 2017-10-22T20:26:11Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

StatefulSet at scale 1 is created.
The only pod is placed and running on one of 2 worker nodes.
The worker node with running pod shuts down and never starts up again.
The pod never moved to other node.

What you expected to happen:

The pod would move to other node after expiration of default tolerations "node.alpha.kubernetes.io/notReady:NoExecute for 300s" and "node.alpha.kubernetes.io/unreachable:NoExecute for 300s"
"kubectl delete pod pod-on-shutdown-node" would induce the expected movement while the node is down -- it did not happen either.

How to reproduce it (as minimally and precisely as possible):

Create StatefulSet spec with one container and one replica in, say, sset.yml.
Have kubernetes installation with 2 worker nodes.
kubectl create -f sset.yml
kubectl get pod, to check to see where the only pod is scheduled, say, node N.
shutdown node N with "shutdown -h".
check to see that the pod did not move to other worker node in 10 minutes after node N halt time.

Anything else we need to know?:

A Deployment behaves as indicated in the "What you expected to happen" section.

Environment:

Kubernetes version (use kubectl version): 1.8.1
Cloud provider or hardware configuration**: Virtual machines with Vagrant 2.0.0 and VirtualBox 5.1.28-117968 on Intel(R) Xeon(R) CPU E5-2690 v3 24 cores with Ubuntu 16.04 LTS
OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS (VM)
Kernel (e.g. uname -a): 4.4.0-96-generic (VM)
Install tools: kubeadm 1.8.1-00
Others:

Edit: Goal if this issue is to update the documentation and clarify the expected behavior as per: #54368 (comment)

The text was updated successfully, but these errors were encountered:

at1984z · 2017-10-22T20:27:08Z

/sig area/app-lifecycle area/stateful-apps area/usability

at1984z · 2017-10-22T21:13:49Z

/sig area/node-lifecycle are/nodecontroller

dims · 2017-10-22T23:50:16Z

/area app-lifecycle
/area stateful-apps
/area usability
/area node-lifecycle
/area nodecontroller

dims · 2017-10-22T23:50:35Z

/sig node

dixudx · 2017-10-23T10:21:24Z

/assign

dixudx · 2017-10-24T05:39:00Z

Thanks @at1984z for finding this bug.

I did reproduce this.

The pod get stuck and cannot be evicted.

root@server-01:~# kubectl get node
NAME        STATUS     ROLES     AGE       VERSION
server-01   Ready      master    1d        v1.8.1
server-02   NotReady   <none>    1d        v1.8.1
root@server-01:~# kubectl get sts
NAME      DESIRED   CURRENT   AGE
web       1         1         21h
root@server-01:~# kubectl describe pod web-0
Name:                      web-0
Namespace:                 default
Node:                      server-02/10.0.2.15
Start Time:                Mon, 23 Oct 2017 05:15:39 +0000
Labels:                    app=nginx
                           controller-revision-hash=web-68cf95767f
Annotations:               kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"default","name":"web","uid":"3599ae6e-b7b1-11e7-a673-023e88328eba","apiVersion":...
Status:                    Terminating (expires Mon, 23 Oct 2017 08:22:52 +0000)
Termination Grace Period:  10s
Reason:                    NodeLost
Message:                   Node server-02 which was running pod web-0 is unresponsive
IP:                        172.17.0.2
Created By:                StatefulSet/web
Controlled By:             StatefulSet/web
Containers:
  nginx:
    Container ID:   docker://d3d1dc99ba5f6a29870f94fe2ad41124fe5326955973d736a65740841d03e118
    Image:          gcr.io/google_containers/nginx-slim:0.8
    Image ID:       docker-pullable://gcr.io/google_containers/nginx-slim@sha256:8b4501fe0fe221df663c22e16539f399e89594552f400408303c42f3dd8d0e52
    Port:           80/TCP
    State:          Running
      Started:      Mon, 23 Oct 2017 05:16:03 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-q26fw (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-q26fw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-q26fw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

And kubectl delete pod web-0 cannot delete it, even with --force.

at1984z · 2017-10-24T16:27:33Z

@dixudx I should clarify that with Deployments new pod instance gets created on an available node after tolerations notReady and unreachable expire but the old instance is not evicted from the lost node and it cannot be deleted either.

dixudx · 2017-10-25T01:27:14Z

old instance is not evicted from the lost node and it cannot be deleted either

@at1984z Yes. This is because pod.spec.terminationGracePeriodSeconds is set to a non-zero value, which keeps pods from being deleted gracefully.

Please refer to #54472 and my comment.

smarterclayton · 2017-10-25T15:53:56Z

This is by design. When a node goes "down", the master does not know whether it was a safe down (deliberate shutdown) or a network partition. If the master said "ok, the pod is deleted" then the pod could actually be running somewhere on the cluster, thus violating the guarantees of stateful sets only having one pod.

In your case, if you intend the node to be deleted, you must delete the node object. That will cause the master to understand that you wish the node to be gone, and delete the pods.

smarterclayton · 2017-10-25T15:55:10Z

If you think that https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-identity does not clearly explain this behavior, we should fix the documentation to describe the expected outcome.

You can also see https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md for a more detailed explanation why this is by design.

at1984z · 2017-10-25T17:10:00Z

@smarterclayton Why is the issue re-opened? Is it to fix the design and implementation or to fix documentation?

smarterclayton · 2017-10-25T18:29:38Z

To fix the documentation (if it wasn't clear that this is desired behavior, I think that's a documentation gap).

It's also possible the stateful set should record a better message on the pod indicating that without a positive admin action it can't safely reschedule the pod.

at1984z · 2017-10-25T21:27:00Z

@smarterclayton Given the explanations above, I'd appreciate the answers to a couple of questions below.

Why do Deployment and ReplicationController behave differently from StatefulSet in the described scenario? Should an issue be filed against them? The network partitioning argument can be applied to them as well.
How will "Taint based Evictions" and "Taint Nodes by Conditions" features (see https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) introduced in 1.8 be implemented and used in light of implied inability of kubernetes to deal with network partitioning? Will the system operator have to monitor the nodes and set taints at will?

dixudx · 2017-10-26T03:06:08Z

Why do Deployment and ReplicationController behave differently from StatefulSet in the described scenario?

@at1984z This is because StatefulSet is designed to maintain a sticky identity for each of their Pods. These pods are created in ordered with the same spec, stable network identity and stable storage, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

But for deployments and RC, we don't apply such restrictions.

How will "Taint based Evictions" and "Taint Nodes by Conditions" features (see https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) introduced in 1.8 be implemented and used in light of implied inability of kubernetes to deal with network partitioning? Will the system operator have to monitor the nodes and set taints at will?

If the node is down or inaccessible, the master cannot receive heart beats from the node. New pods will not be scheduled to not ready nodes. Also those not ready nodes can not evict pods from master successfully.

            {
                "lastHeartbeatTime": "2017-10-26T02:41:02Z",
                "lastTransitionTime": "2017-10-26T02:41:46Z",
                "message": "Kubelet stopped posting node status.",
                "reason": "NodeStatusUnknown",
                "status": "Unknown",
                "type": "Ready"
            }

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes.

fejta-bot · 2018-01-24T03:28:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

saschagrunert · 2022-07-28T12:01:29Z

Looks like the docs have not been updated, yet.
/reopen

k8s-ci-robot · 2022-07-28T12:01:45Z

@saschagrunert: Reopened this issue.

In response to this:

Looks like the docs have not been updated, yet.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-07-28T12:01:52Z

@at1984z: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

endocrimes · 2022-08-03T17:24:53Z

/remove-kind bug

k8s-triage-robot · 2022-11-01T17:59:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-12-01T18:25:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-12-31T19:13:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-12-31T19:13:42Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 22, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 22, 2017

k8s-ci-robot added area/app-lifecycle area/stateful-apps area/usability area/node-lifecycle Issues or PRs related to Node lifecycle area/nodecontroller labels Oct 22, 2017

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 22, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 22, 2017

dixudx mentioned this issue Oct 23, 2017

set pod phase when deleting pods #54396

Closed

k8s-ci-robot assigned dixudx Oct 23, 2017

dixudx mentioned this issue Oct 24, 2017

force deleting sts pods when nodes get lost #54472

Closed

smarterclayton closed this as completed Oct 25, 2017

smarterclayton reopened this Oct 25, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2018

gyliu513 mentioned this issue Mar 5, 2019

Statefulset should be able to evicted if the worker node goes down #74947

Closed

Tristan1900 mentioned this issue May 9, 2019

Pods are not getting migrated to ready nodes if one of the cluster node becomes in "NotReady" Status pravega/pravega-operator#171

Closed

kfeh mentioned this issue Jun 7, 2019

Kubernetes worker node looses mounts after reboot and plugin can't reestablish them hpe-storage/python-hpedockerplugin#641

Closed

ilyam8 mentioned this issue Apr 13, 2020

netdata master: survive node failure netdata/helmchart#86

Closed

zmalik mentioned this issue Apr 24, 2020

re-schedule sts pods after node deletion/failure kudobuilder/kudo#1482

Open

simonpasquier mentioned this issue Aug 21, 2020

want prometheus recovery when node not-ready prometheus-operator/prometheus-operator#3425

Closed

lurenjia528 mentioned this issue Aug 22, 2020

hope add terminationGracePeriodSeconds support prometheus-operator/prometheus-operator#3433

Open

JoshuaSmeda mentioned this issue Sep 10, 2020

Pods are not moved when Node in NotReady state #55713

Closed

mkimuram mentioned this issue Jun 7, 2021

add non-graceful node shutdown KEP kubernetes/enhancements#1116

Merged

lentzi90 mentioned this issue Mar 11, 2022

CAPI waiting forever for the volume to be detached kubernetes-sigs/cluster-api#6285

Closed

saschagrunert removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 28, 2022

k8s-ci-robot reopened this Jul 28, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 28, 2022

saschagrunert changed the title ~~StatefulSet pod is never evicted from shutdown node~~ docs: StatefulSet pod is never evicted from shutdown node Jul 28, 2022

pacoxu added this to Triage in SIG Node Bugs Aug 3, 2022

endocrimes moved this from Triage to Done in SIG Node Bugs Aug 3, 2022

k8s-ci-robot removed the kind/bug Categorizes issue or PR as related to a bug. label Aug 3, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 1, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2022

kimwnasptd mentioned this issue Nov 27, 2023

Feature request : high-availability canonical/bundle-kubeflow#561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: StatefulSet pod is never evicted from shutdown node #54368

docs: StatefulSet pod is never evicted from shutdown node #54368

at1984z commented Oct 22, 2017 •

edited by saschagrunert

Loading

at1984z commented Oct 22, 2017

at1984z commented Oct 22, 2017

dims commented Oct 22, 2017

dims commented Oct 22, 2017

dixudx commented Oct 23, 2017

dixudx commented Oct 24, 2017

at1984z commented Oct 24, 2017

dixudx commented Oct 25, 2017

smarterclayton commented Oct 25, 2017

smarterclayton commented Oct 25, 2017 •

edited

Loading

at1984z commented Oct 25, 2017

smarterclayton commented Oct 25, 2017

at1984z commented Oct 25, 2017

dixudx commented Oct 26, 2017

fejta-bot commented Jan 24, 2018

saschagrunert commented Jul 28, 2022

k8s-ci-robot commented Jul 28, 2022

k8s-ci-robot commented Jul 28, 2022

endocrimes commented Aug 3, 2022

k8s-triage-robot commented Nov 1, 2022

k8s-triage-robot commented Dec 1, 2022

k8s-triage-robot commented Dec 31, 2022

k8s-ci-robot commented Dec 31, 2022

docs: StatefulSet pod is never evicted from shutdown node #54368

docs: StatefulSet pod is never evicted from shutdown node #54368

Comments

at1984z commented Oct 22, 2017 • edited by saschagrunert Loading

at1984z commented Oct 22, 2017

at1984z commented Oct 22, 2017

dims commented Oct 22, 2017

dims commented Oct 22, 2017

dixudx commented Oct 23, 2017

dixudx commented Oct 24, 2017

at1984z commented Oct 24, 2017

dixudx commented Oct 25, 2017

smarterclayton commented Oct 25, 2017

smarterclayton commented Oct 25, 2017 • edited Loading

at1984z commented Oct 25, 2017

smarterclayton commented Oct 25, 2017

at1984z commented Oct 25, 2017

dixudx commented Oct 26, 2017

fejta-bot commented Jan 24, 2018

saschagrunert commented Jul 28, 2022

k8s-ci-robot commented Jul 28, 2022

k8s-ci-robot commented Jul 28, 2022

endocrimes commented Aug 3, 2022

k8s-triage-robot commented Nov 1, 2022

k8s-triage-robot commented Dec 1, 2022

k8s-triage-robot commented Dec 31, 2022

k8s-ci-robot commented Dec 31, 2022

at1984z commented Oct 22, 2017 •

edited by saschagrunert

Loading

smarterclayton commented Oct 25, 2017 •

edited

Loading