Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: StatefulSet pod is never evicted from shutdown node #54368

Closed
at1984z opened this issue Oct 22, 2017 · 25 comments
Closed

docs: StatefulSet pod is never evicted from shutdown node #54368

at1984z opened this issue Oct 22, 2017 · 25 comments
Assignees
Labels
area/app-lifecycle area/node-lifecycle Issues or PRs related to Node lifecycle area/nodecontroller area/stateful-apps area/usability lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@at1984z
Copy link

at1984z commented Oct 22, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

  1. StatefulSet at scale 1 is created.
  2. The only pod is placed and running on one of 2 worker nodes.
  3. The worker node with running pod shuts down and never starts up again.
  4. The pod never moved to other node.

What you expected to happen:

  1. The pod would move to other node after expiration of default tolerations "node.alpha.kubernetes.io/notReady:NoExecute for 300s" and "node.alpha.kubernetes.io/unreachable:NoExecute for 300s"
  2. "kubectl delete pod pod-on-shutdown-node" would induce the expected movement while the node is down -- it did not happen either.

How to reproduce it (as minimally and precisely as possible):

  1. Create StatefulSet spec with one container and one replica in, say, sset.yml.
  2. Have kubernetes installation with 2 worker nodes.
  3. kubectl create -f sset.yml
  4. kubectl get pod, to check to see where the only pod is scheduled, say, node N.
  5. shutdown node N with "shutdown -h".
  6. check to see that the pod did not move to other worker node in 10 minutes after node N halt time.

Anything else we need to know?:

  1. A Deployment behaves as indicated in the "What you expected to happen" section.

Environment:

  • Kubernetes version (use kubectl version): 1.8.1
  • Cloud provider or hardware configuration**: Virtual machines with Vagrant 2.0.0 and VirtualBox 5.1.28-117968 on Intel(R) Xeon(R) CPU E5-2690 v3 24 cores with Ubuntu 16.04 LTS
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS (VM)
  • Kernel (e.g. uname -a): 4.4.0-96-generic (VM)
  • Install tools: kubeadm 1.8.1-00
  • Others:

Edit: Goal if this issue is to update the documentation and clarify the expected behavior as per: #54368 (comment)

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 22, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 22, 2017
@at1984z
Copy link
Author

at1984z commented Oct 22, 2017

/sig area/app-lifecycle area/stateful-apps area/usability

@at1984z
Copy link
Author

at1984z commented Oct 22, 2017

/sig area/node-lifecycle are/nodecontroller

@dims
Copy link
Member

dims commented Oct 22, 2017

/area app-lifecycle
/area stateful-apps
/area usability
/area node-lifecycle
/area nodecontroller

@dims
Copy link
Member

dims commented Oct 22, 2017

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 22, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 22, 2017
@dixudx
Copy link
Member

dixudx commented Oct 23, 2017

/assign

@dixudx
Copy link
Member

dixudx commented Oct 24, 2017

Thanks @at1984z for finding this bug.

I did reproduce this.

The pod get stuck and cannot be evicted.

root@server-01:~# kubectl get node
NAME        STATUS     ROLES     AGE       VERSION
server-01   Ready      master    1d        v1.8.1
server-02   NotReady   <none>    1d        v1.8.1
root@server-01:~# kubectl get sts
NAME      DESIRED   CURRENT   AGE
web       1         1         21h
root@server-01:~# kubectl describe pod web-0
Name:                      web-0
Namespace:                 default
Node:                      server-02/10.0.2.15
Start Time:                Mon, 23 Oct 2017 05:15:39 +0000
Labels:                    app=nginx
                           controller-revision-hash=web-68cf95767f
Annotations:               kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"default","name":"web","uid":"3599ae6e-b7b1-11e7-a673-023e88328eba","apiVersion":...
Status:                    Terminating (expires Mon, 23 Oct 2017 08:22:52 +0000)
Termination Grace Period:  10s
Reason:                    NodeLost
Message:                   Node server-02 which was running pod web-0 is unresponsive
IP:                        172.17.0.2
Created By:                StatefulSet/web
Controlled By:             StatefulSet/web
Containers:
  nginx:
    Container ID:   docker://d3d1dc99ba5f6a29870f94fe2ad41124fe5326955973d736a65740841d03e118
    Image:          gcr.io/google_containers/nginx-slim:0.8
    Image ID:       docker-pullable://gcr.io/google_containers/nginx-slim@sha256:8b4501fe0fe221df663c22e16539f399e89594552f400408303c42f3dd8d0e52
    Port:           80/TCP
    State:          Running
      Started:      Mon, 23 Oct 2017 05:16:03 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-q26fw (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-q26fw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-q26fw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

And kubectl delete pod web-0 cannot delete it, even with --force.

@at1984z
Copy link
Author

at1984z commented Oct 24, 2017

@dixudx I should clarify that with Deployments new pod instance gets created on an available node after tolerations notReady and unreachable expire but the old instance is not evicted from the lost node and it cannot be deleted either.

@dixudx
Copy link
Member

dixudx commented Oct 25, 2017

old instance is not evicted from the lost node and it cannot be deleted either

@at1984z Yes. This is because pod.spec.terminationGracePeriodSeconds is set to a non-zero value, which keeps pods from being deleted gracefully.

Please refer to #54472 and my comment.

@smarterclayton
Copy link
Contributor

This is by design. When a node goes "down", the master does not know whether it was a safe down (deliberate shutdown) or a network partition. If the master said "ok, the pod is deleted" then the pod could actually be running somewhere on the cluster, thus violating the guarantees of stateful sets only having one pod.

In your case, if you intend the node to be deleted, you must delete the node object. That will cause the master to understand that you wish the node to be gone, and delete the pods.

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 25, 2017

If you think that https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-identity does not clearly explain this behavior, we should fix the documentation to describe the expected outcome.

You can also see https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md for a more detailed explanation why this is by design.

@at1984z
Copy link
Author

at1984z commented Oct 25, 2017

@smarterclayton Why is the issue re-opened? Is it to fix the design and implementation or to fix documentation?

@smarterclayton
Copy link
Contributor

To fix the documentation (if it wasn't clear that this is desired behavior, I think that's a documentation gap).

It's also possible the stateful set should record a better message on the pod indicating that without a positive admin action it can't safely reschedule the pod.

@at1984z
Copy link
Author

at1984z commented Oct 25, 2017

@smarterclayton Given the explanations above, I'd appreciate the answers to a couple of questions below.

  1. Why do Deployment and ReplicationController behave differently from StatefulSet in the described scenario? Should an issue be filed against them? The network partitioning argument can be applied to them as well.
  2. How will "Taint based Evictions" and "Taint Nodes by Conditions" features (see https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) introduced in 1.8 be implemented and used in light of implied inability of kubernetes to deal with network partitioning? Will the system operator have to monitor the nodes and set taints at will?

@dixudx
Copy link
Member

dixudx commented Oct 26, 2017

Why do Deployment and ReplicationController behave differently from StatefulSet in the described scenario?

@at1984z This is because StatefulSet is designed to maintain a sticky identity for each of their Pods. These pods are created in ordered with the same spec, stable network identity and stable storage, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

But for deployments and RC, we don't apply such restrictions.

How will "Taint based Evictions" and "Taint Nodes by Conditions" features (see https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) introduced in 1.8 be implemented and used in light of implied inability of kubernetes to deal with network partitioning? Will the system operator have to monitor the nodes and set taints at will?

If the node is down or inaccessible, the master cannot receive heart beats from the node. New pods will not be scheduled to not ready nodes. Also those not ready nodes can not evict pods from master successfully.

            {
                "lastHeartbeatTime": "2017-10-26T02:41:02Z",
                "lastTransitionTime": "2017-10-26T02:41:46Z",
                "message": "Kubelet stopped posting node status.",
                "reason": "NodeStatusUnknown",
                "status": "Unknown",
                "type": "Ready"
            }

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2018
@saschagrunert
Copy link
Member

Looks like the docs have not been updated, yet.
/reopen

@saschagrunert saschagrunert removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 28, 2022
@k8s-ci-robot
Copy link
Contributor

@saschagrunert: Reopened this issue.

In response to this:

Looks like the docs have not been updated, yet.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 28, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 28, 2022
@k8s-ci-robot
Copy link
Contributor

@at1984z: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@saschagrunert saschagrunert changed the title StatefulSet pod is never evicted from shutdown node docs: StatefulSet pod is never evicted from shutdown node Jul 28, 2022
@pacoxu pacoxu added this to Triage in SIG Node Bugs Aug 3, 2022
@endocrimes endocrimes moved this from Triage to Done in SIG Node Bugs Aug 3, 2022
@endocrimes
Copy link
Member

/remove-kind bug

@k8s-ci-robot k8s-ci-robot removed the kind/bug Categorizes issue or PR as related to a bug. label Aug 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle area/node-lifecycle Issues or PRs related to Node lifecycle area/nodecontroller area/stateful-apps area/usability lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

10 participants