Node controller to not force delete pods #35235

foxish · 2016-10-20T21:36:43Z

e2e tests to test Petset, RC, Job.
Remove and cover other locations where we force-delete pods within the NodeController.

Release note:

Node controller no longer force-deletes pods from the api-server.

* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.

This change is

AlexMioMio · 2016-10-21T07:42:17Z

@foxish why we abandon this feature? i think force delete pod may be useful in some situation

foxish · 2016-10-21T19:08:01Z

@AlexMioMio, please see the discussion here #35145 and #34160, and feel free to comment on it. This is still a WIP.

smarterclayton · 2016-10-23T22:16:29Z

We are not abandoning this feature, but we are making it the administrator's responsibility to decide to force delete. Node controller does not know whether the machine is partitioned and coming back or not, so if the node controller deletes those pods it can cause split brain in the cluster for pet sets.

smarterclayton · 2016-10-23T22:20:20Z

With this change, if you want to delete a node (because it's no longer running), delete the node and the pods will be cleaned up. If you want to delete a pod, run kubectl delete pod NAME --grace-period=0

foxish · 2016-10-28T22:13:02Z

Waiting for #35581 to merge before rebase. The other parts are now ready for review.

foxish · 2016-11-01T18:46:48Z

Addressed comments by @erictune, will reapply LGTM after tests pass.

k8s-github-robot · 2016-11-02T03:08:57Z

Automatic merge from submit-queue

ozbillwang · 2017-02-15T00:52:13Z

I understand the request came from #35145 and #34160, but we get side effects because of this changes.

We implement kubernetes in aws autoscaling group. kubernetes nodes are keeping scaled up and down. If the terminated nodes keep NotReady status, a lot of daemonset pods are in pending status.

If we start to disable this feature force delete pods/nodes, and it is incompatible with old version, could we have the choices to decide, so we can have force delete as options when setup.

smarterclayton · 2017-02-15T01:45:00Z

That violates the safety guarantees of the cluster - if those nodes aren't coming back, you should delete the node. Can you describe exactly why the nodes are in not ready state? That seems like a flaw in the autoscaler or the cloud provider - if the instance is deleted in AWS, the cloud provider should be removing the instance from the API (the node should be deleted) which should clear the pods off the node, regardless of readiness.

ozbillwang · 2017-02-15T02:47:20Z

Thanks, @smarterclayton

We set up Kuberentes masters and nodes both with aws autoscaling group (ASG). Version is 1.4.6.

When scale down a node by ASG in aws, the node has been terminated, but the node stay in node list kubectl get nodes as NotReady status forever.

My colleague runs another kubernetes stack which is only v1.2, after the nodes are terminated by ASG as well in that environment, nodes disappear from node list in very short time.

I try to fix this issue, but failed. I think maybe the problem is related to this pull request.

foxish · 2017-02-15T02:55:46Z

When scale down a node by ASG in aws, the node has been terminated, but the node stay in node list kubectl get nodes as NotReady status forever.

I think that's the issue here. The nodes should be kept in sync with the underlying infrastructure always. The node object should be removed by either the cloud-provider specific code, or be deleted by an external loop. It shouldn't stick around indefinitely as "NotReady".
The rationale for this change is explained also in https://kubernetes.io/docs/admin/node/#condition

In your case, I'd expect the cloud provider specific code to figure out that the instance is gone and remove the node object.
/cc @justinsb for AWS

smarterclayton · 2017-02-15T03:05:14Z

https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/node/nodecontroller.go#L563 is what is supposed to be firing when we detect that the node no longer exists in AWS

…

On Tue, Feb 14, 2017 at 9:55 PM, Anirudh Ramanathan < ***@***.***> wrote: When scale down a node by ASG in aws, the node has been terminated, but the node stay in node list kubectl get nodes as NotReady status forever. I think that's the issue here. The nodes should be kept in sync with the underlying infrastructure always. The node object should be removed by either the cloud-provider specific code, or be deleted by an external loop. It shouldn't stick around indefinitely as "NotReady". The rationale for this change is explained also in https://kubernetes.io/docs/admin/node/#condition In this case, I'd expect the cloud provider specific code to figure out that the instance is gone and remove the node object. /cc @justinsb <https://github.com/justinsb> for AWS — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35235 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p6HAnR09zK4Av3iWqaWd1-mCqctQks5rcmk7gaJpZM4Kcocv> .

foxish · 2017-02-15T03:20:33Z

I confirmed that the node object does get deleted when the instance is deleted in case of GCE, GKE. So, the fault here is likely in the AWS cloud provider specific code.

foxish · 2017-02-15T03:22:50Z

I guess it is also possible that the instance is being stopped, but not deleted in this case. If that is the case, the cloud provider code may still find the instance, and the NC wouldn't delete it (not sure about this). Then the responsibility lies with the cluster admin to clean up such nodes.

gmarek · 2017-02-15T06:39:04Z

Maybe we should delete stopped Nodes as well?

ozbillwang · 2017-02-15T07:57:11Z

Thanks for the comments above.

I am still not sure if this is related to my environment only. Because the PR is merged to version 1.5, my kubernetes version is 1.4.6 only and should have force delete enabled.

gmarek · 2017-02-15T08:14:24Z

The only thing that was change here was handling Pods on not-ready Nodes. The code that is supposed to remove Nodes when they're not present in the cloud provider was untouched (i.e. Node should disappear when corresponding VM was deleted and cloud provider starts answering with "NotFound" when asked about it). We also moved the code that was responsible for deleting Pods from not existing Nodes, but that should have been no-op.

foxish added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Oct 20, 2016

foxish added this to the v1.5 milestone Oct 20, 2016

foxish self-assigned this Oct 20, 2016

googlebot added the cla: yes label Oct 20, 2016

foxish force-pushed the node-controller-no-force-deletion branch from 6b7c2b6 to 2c48694 Compare October 20, 2016 21:37

foxish added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 20, 2016

k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 20, 2016

foxish force-pushed the node-controller-no-force-deletion branch from 2c48694 to 84e1676 Compare October 24, 2016 17:41

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2016

foxish closed this Oct 28, 2016

foxish deleted the node-controller-no-force-deletion branch October 28, 2016 17:02

foxish restored the node-controller-no-force-deletion branch October 28, 2016 18:53

foxish reopened this Oct 28, 2016

foxish force-pushed the node-controller-no-force-deletion branch from 02f5c1c to fa74c29 Compare October 28, 2016 22:11

foxish changed the title ~~WIP: Node controller to not force delete pods~~ Node controller to not force delete pods Oct 28, 2016

foxish mentioned this pull request Oct 28, 2016

Added e2e test for petset to work across node restart #34045

Closed

k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 28, 2016

foxish force-pushed the node-controller-no-force-deletion branch from fa74c29 to 92f4f21 Compare October 28, 2016 22:20

foxish assigned smarterclayton and erictune and unassigned foxish Oct 28, 2016

foxish force-pushed the node-controller-no-force-deletion branch from e1a15b3 to cf9c787 Compare November 1, 2016 18:34

foxish added 2 commits November 1, 2016 11:44

Removing force deletion of pods from the node-controller

5ccd7a3

Fix old e2e tests, refactor and add new e2e tests.

7194101

foxish force-pushed the node-controller-no-force-deletion branch from cf9c787 to 7194101 Compare November 1, 2016 18:46

foxish added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2016

foxish mentioned this pull request Nov 1, 2016

Set reason and message on Pod during nodecontroller eviction #36017

Merged

foxish added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 1, 2016

k8s-github-robot merged commit 49e7d64 into kubernetes:master Nov 2, 2016

foxish mentioned this pull request Nov 3, 2016

[k8s.io] Nodes [Disruptive] [k8s.io] Resize [Slow] should be able to delete nodes {Kubernetes e2e suite} #27233

Closed

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

dashpole mentioned this pull request Dec 7, 2016

[Upgrade test] "Nodes [Disruptive] [k8s.io] Resize [Slow] should be able to delete nodes" in ci-kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster #38247

Closed

This was referenced Dec 7, 2016

Fixing node-resize test to add a sleep #38324

Merged

Fix 1.3 e2e test to add an additional wait for 1.5 behavior #38534

Closed

jsravn mentioned this pull request Jan 31, 2017

EBS stuck detaching when volume controller detaches due to NodeNotReady pod eviction #33491

Closed

ozbillwang mentioned this pull request Feb 15, 2017

Nodes status is NotReady #32522

Closed

foxish deleted the node-controller-no-force-deletion branch February 15, 2017 02:56

sgotti mentioned this pull request Sep 7, 2017

doc: persistent volumes for etcd data proposal coreos/etcd-operator#1374

Merged

moonek mentioned this pull request Aug 14, 2018

Do not count soft-deleted pods for scaling purposes in HPA controller #67067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node controller to not force delete pods #35235

Node controller to not force delete pods #35235

foxish commented Oct 20, 2016 •

edited

Loading

AlexMioMio commented Oct 21, 2016

foxish commented Oct 21, 2016

smarterclayton commented Oct 23, 2016

smarterclayton commented Oct 23, 2016

foxish commented Oct 28, 2016

foxish commented Nov 1, 2016

k8s-github-robot commented Nov 2, 2016

ozbillwang commented Feb 15, 2017 •

edited

Loading

smarterclayton commented Feb 15, 2017

ozbillwang commented Feb 15, 2017 •

edited

Loading

foxish commented Feb 15, 2017 •

edited

Loading

smarterclayton commented Feb 15, 2017 via email

foxish commented Feb 15, 2017

foxish commented Feb 15, 2017 •

edited

Loading

gmarek commented Feb 15, 2017

ozbillwang commented Feb 15, 2017 •

edited

Loading

gmarek commented Feb 15, 2017

Node controller to not force delete pods #35235

Node controller to not force delete pods #35235

Conversation

foxish commented Oct 20, 2016 • edited Loading

AlexMioMio commented Oct 21, 2016

foxish commented Oct 21, 2016

smarterclayton commented Oct 23, 2016

smarterclayton commented Oct 23, 2016

foxish commented Oct 28, 2016

foxish commented Nov 1, 2016

k8s-github-robot commented Nov 2, 2016

ozbillwang commented Feb 15, 2017 • edited Loading

smarterclayton commented Feb 15, 2017

ozbillwang commented Feb 15, 2017 • edited Loading

foxish commented Feb 15, 2017 • edited Loading

smarterclayton commented Feb 15, 2017 via email

foxish commented Feb 15, 2017

foxish commented Feb 15, 2017 • edited Loading

gmarek commented Feb 15, 2017

ozbillwang commented Feb 15, 2017 • edited Loading

gmarek commented Feb 15, 2017

foxish commented Oct 20, 2016 •

edited

Loading

ozbillwang commented Feb 15, 2017 •

edited

Loading

ozbillwang commented Feb 15, 2017 •

edited

Loading

foxish commented Feb 15, 2017 •

edited

Loading

foxish commented Feb 15, 2017 •

edited

Loading

ozbillwang commented Feb 15, 2017 •

edited

Loading