Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet does not restart or reregister in response to removed Node API object #71398

Open
liggitt opened this issue Nov 24, 2018 · 16 comments · May be fixed by #123535
Open

Kubelet does not restart or reregister in response to removed Node API object #71398

liggitt opened this issue Nov 24, 2018 · 16 comments · May be fixed by #123535
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@liggitt
Copy link
Member

liggitt commented Nov 24, 2018

Once a kubelet has started up, if its Node API object is removed, the kubelet perpetually attempts and fails to update the status on the now-missing Node object.

I would have expected it to do one of the following:

  • exit after a period of time or number of retries
  • re-register the Node object

/kind bug
/sig node

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 24, 2018
@liggitt liggitt added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Nov 24, 2018
@Pingan2017
Copy link
Member

if no one plan to work on it, i'd like to take this.

@justinsb
Copy link
Member

I'm a little unsure about whether we should be re-registering the Node. My guess is that if a Node was deleted, it was usually because the administrator is preparing to terminate the underlying machine. I guess the scenario we're thinking about here is that the node controller removed it mistakenly? I don't know if there's some other scenario?

We had a bug in kops where when a machine is deleted from a GCE MIG it often comes back with the same name. So if the machine came back faster than the Node controller could remove the Node, then the new machine would assume the previous identity. And - critically - it wouldn't remove any taints, and of course we had cordoned it prior to shutdown.

(In theory this can happen on AWS also, but it's less likely)

Maybe there's a better fix than the one we did in kops. e.g. maybe the kubelet would know whether it was the "same machine" or not and remove its own taints on boot, but I'm not sure.

I think this is also a sig-cluster-lifecycle issue therefore.

/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Nov 26, 2018
@liggitt
Copy link
Member Author

liggitt commented Nov 26, 2018

since if the kubelet process restarted, it would re-register the Node object, the current behavior is fragile at best, and perpetually attempting and failing to report status really seems like a bug.

@liggitt
Copy link
Member Author

liggitt commented Nov 26, 2018

if a Node was deleted, it was usually because the administrator is preparing to terminate the underlying machine

wouldn't the proper order be to:

  • terminate the machine (or in a cloud environment, have the cloud provider terminate the instance)
  • delete the Node object (or in a cloud environment, let the node controller delete the Node object)

Deleting the node object first is racy if the kubelet process happens to restart.

I guess the scenario we're thinking about here is that the node controller removed it mistakenly? I don't know if there's some other scenario?

accidental deletion, or assumption that a deleted Node object will get re-registered and updated by the kubelet (since that's what the kubelet currently does in some cases)

@justinsb
Copy link
Member

@liggitt I agree - deleting the Node first breaks if the kubelet restarts (but less pathologically than all your new nodes coming up cordoned, IMO - the node that is about to be terminated gets uncordoned and presumably gets pods scheduled to it that will be short-lived).

Terminating the machine and then deleting the Node is also potentially problematic - e.g. if we fail to delete the Node (or get delayed and delete the new Node).

None of these are likely, but we needed to do something to stop nodes with the same Name coming back cordoned.

I do think we should figure out what we want to do here - maybe involving the kubelet checking the instance id or machine id. I'm hoping that this overlaps with the cluster-api work (cc @roberthbailey ), which is why I looped in sig-cluster-lifecycle also.

@roberthbailey
Copy link
Contributor

Deleting the node object first is racy if the kubelet process happens to restart.

I think that @justinsb is saying that both orders are racy (at least in cloud environments) for different reasons.

Terminating the machine and then deleting the Node is also potentially problematic - e.g. if we fail to delete the Node (or get delayed and delete the new Node).

This is certainly true for certain types of deletions. If you "recreate" a VM that is part of a MIG it comes back with the same name. Also if you delete and recreate a preembtible VM (on GCE) it will come back with the same name (and a different IP address). Not having deleted the node object in both of those cases can be problematic if the new kubelet / machine assume the identity (and any running pods) from the old kubelet / machine.

/cc @dchen1107

@erwbgy
Copy link

erwbgy commented Feb 3, 2019

Since this issue is still open I presume there was no conclusion to this topic? If there was, could someone post an update here?

Personally I like the "exit after a period of time or number of retries" @liggitt proposed, with the default being 0 to retry forever to keep the current behavior, while still making it possible to have the node re-registered after some time when the kubelet restarts, if that is what the admin wants.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 3, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt liggitt reopened this Apr 5, 2022
@liggitt liggitt added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Apr 5, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 5, 2022
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Apr 6, 2022
@SergeyKanzhelev
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 6, 2022
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Apr 6, 2022
@sftim
Copy link
Contributor

sftim commented Dec 20, 2022

I'm a little unsure about whether we should be re-registering the Node. My guess is that if a Node was deleted, it was usually because the administrator is preparing to terminate the underlying machine. I guess the scenario we're thinking about here is that the node controller removed it mistakenly? I don't know if there's some other scenario?

You can set up your cluster with a finalizer on a node, so that deleting the Node triggers deletion of the underlying computer. For example, Karpenter (a cluster autoscaler) does this.

In that case, the kubelet won't need to reregister because the computer shutdown that Karpenter triggers will stop the kubelet and the running containers.

However, if you don't define such a finalizer, the kubelet could / should attempt to reregister. It'd also be nice to have a metric available for the number of times that the kubelet has seen its node object deleted (etc).

@sftim
Copy link
Contributor

sftim commented Dec 20, 2022

💭 we could make a per-node addon that sets set itself as a finalizer for the associated node? The addon could detect pending node deletion, and trigger a graceful kubelet shutdown before finally allowing the node to delete from the API.

If the OS then restarts the kubelet because that's what the sysadmin has configured, I'd be happy for that kubelet to register as a new node (presumably by the same name) and take things from there.

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 19, 2024
@cartermckinnon
Copy link
Contributor

/triage accepted

I think this issue is still relevant. We've observed a scenario on EKS in which:

  1. A node joins the cluster.
  2. A few minutes later, the aws cloud node lifecycle controller receives an empty response from ec2:DescribeInstances (due to eventual consistency) and determines the instance no longer exists, so it deletes the Node object.
  3. The instance hangs around forever, costing money and doing nothing.

If kubelet were to re-register with the API server in this scenario, everything would be fine. 😄

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects