Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node taints should go back to baseline on machine restart #404

Closed
philpearl opened this issue Nov 21, 2022 · 15 comments
Closed

Node taints should go back to baseline on machine restart #404

philpearl opened this issue Nov 21, 2022 · 15 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@philpearl
Copy link

We apply a taint to nodes, and remove the taint once our daemon is up and running on the node. This then allows other pods to run on the node. We don't want other pods to run on the node until the daemon is up & running.

This works well until the underlying machine is restarted. If the kubernetes node is destroyed and we get a fresh new node then everything is OK. If not then our taint is not reapplied, and pods are started on the node even though the daemon is not fully active.

I believe the changes made under #322 (Remove taints re-application from node annotator) have caused or contributed to this issue.

  • what would be the view on reverting 322 or making the behaviour configurable in some way? Perhaps having a set of taints that are always re-applied when the controller starts?
  • is it possible to ensure the node is destroyed and recreated at the kubernetes level if the underlying VM is replaced or restarted? We're using auto-scaling node pools.
@philpearl
Copy link
Author

@achandrasekar @mikedanese I wonder if you considered this regression when you were looking at #322?

@aojea
Copy link
Member

aojea commented Dec 15, 2022

We apply a taint to nodes, and remove the taint once our daemon is up and running on the node. This then allows other pods to run on the node. We don't want other pods to run on the node until the daemon is up & running.

can you expand on this?

do you have a controller that watches nodes and taint based on the events?

@philpearl
Copy link
Author

Yes, there's a controller that watches for the daemon and it removes the taint when it sees the daemon has started.

@aojea
Copy link
Member

aojea commented Dec 15, 2022

Yes, there's a controller that watches for the daemon and it removes the taint when it sees the daemon has started.

shouldn't you be watching the Node and acting on the Node object?

you can detect a node has changed based on the UUID

@philpearl
Copy link
Author

We tried that too. But there's nothing that means we get to make our change to the node before the scheduler starts putting pods on it. That's what the taint would guarantee, but it's not added back even though the rest of the node state has returned to baseline

@achandrasekar
Copy link
Contributor

@philpearl #322 was not a regression. In fact it was a fix to address the regression introduced with #285. We introduced the behavior to reapply taints. But there are a lot of people relying on the older behavior of not reapplying taints. They use these taints similar to the way you mentioned. Except for them it is only needed on new node startup and not on node restarts. Since this was a regression, we had to revert the change.

But ideally, like you mentioned, we need a way to configure if the taints should be reapplied and when.

is it possible to ensure the node is destroyed and recreated at the kubernetes level if the underlying VM is replaced or restarted? We're using auto-scaling node pools.
I believe on VM replacement like during preemption, this should already happen. Let me know if this is not the case. Restart behavior should be addressed separately.

@philpearl
Copy link
Author

@achandrasekar thanks for the update - it's good to understand the full history! Is finding a way to configure if the taints should be reapplied and when something that you're working on / might happen soon? If not would you be open to a PR (although it looks like I'd have a huge amount of ramp up to do to work out how to build / test / deploy!) ??

@achandrasekar
Copy link
Contributor

@philpearl There is work planned to address this. But we don't have an ETA yet. I can provide an update once we do. The tainting behavior has a lot of implications to Cluster Autoscaler's scale up/down behavior. So, accepting a PR on this is difficult.

@achandrasekar
Copy link
Contributor

Are you facing the issue only with preemptible nodes like the one mentioned here - cilium/cilium#21594? There is ongoing work to address this specific case.

@thejosephstevens
Copy link

Hey all, I think I might be running into the same issue here (also running Cilium with NOEXECUTE taints). The specific symptoms I'm seeing are that there's a node reboot event (like Node nodename-c9lq has been rebooted, boot id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx), but then I see some pods that actually continue running through the reboot (they have an age that dates back to before the reboot).

The effective outcome is the same as what phil is dealing with. I was expecting the NOEXECUTE taint to kick in and blow those pods away and keep them from coming back until the daemon starts, but instead they persisted, which resulted in application failures.

Let me know if there's anything I can provide that would be helpful to debugging this, this is impacting our customers with some regularity.

@owais
Copy link

owais commented Mar 10, 2023

We are facing the same issue with underlying VM restarts in GKE. It is pretty easy to replicate by restarting the underlying VM instance. In some cases, some pods (both deployments and daemonsets) run before cilium and enter a bad state requiring restarts.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants