Node taints should go back to baseline on machine restart #404

philpearl · 2022-11-21T17:22:42Z

We apply a taint to nodes, and remove the taint once our daemon is up and running on the node. This then allows other pods to run on the node. We don't want other pods to run on the node until the daemon is up & running.

This works well until the underlying machine is restarted. If the kubernetes node is destroyed and we get a fresh new node then everything is OK. If not then our taint is not reapplied, and pods are started on the node even though the daemon is not fully active.

I believe the changes made under #322 (Remove taints re-application from node annotator) have caused or contributed to this issue.

what would be the view on reverting 322 or making the behaviour configurable in some way? Perhaps having a set of taints that are always re-applied when the controller starts?
is it possible to ensure the node is destroyed and recreated at the kubernetes level if the underlying VM is replaced or restarted? We're using auto-scaling node pools.

philpearl · 2022-12-15T14:30:56Z

@achandrasekar @mikedanese I wonder if you considered this regression when you were looking at #322?

aojea · 2022-12-15T14:39:37Z

We apply a taint to nodes, and remove the taint once our daemon is up and running on the node. This then allows other pods to run on the node. We don't want other pods to run on the node until the daemon is up & running.

can you expand on this?

do you have a controller that watches nodes and taint based on the events?

philpearl · 2022-12-15T15:41:57Z

Yes, there's a controller that watches for the daemon and it removes the taint when it sees the daemon has started.

aojea · 2022-12-15T16:00:44Z

Yes, there's a controller that watches for the daemon and it removes the taint when it sees the daemon has started.

shouldn't you be watching the Node and acting on the Node object?

you can detect a node has changed based on the UUID

philpearl · 2022-12-15T16:03:42Z

We tried that too. But there's nothing that means we get to make our change to the node before the scheduler starts putting pods on it. That's what the taint would guarantee, but it's not added back even though the rest of the node state has returned to baseline

achandrasekar · 2022-12-15T19:02:42Z

@philpearl #322 was not a regression. In fact it was a fix to address the regression introduced with #285. We introduced the behavior to reapply taints. But there are a lot of people relying on the older behavior of not reapplying taints. They use these taints similar to the way you mentioned. Except for them it is only needed on new node startup and not on node restarts. Since this was a regression, we had to revert the change.

But ideally, like you mentioned, we need a way to configure if the taints should be reapplied and when.

is it possible to ensure the node is destroyed and recreated at the kubernetes level if the underlying VM is replaced or restarted? We're using auto-scaling node pools.
I believe on VM replacement like during preemption, this should already happen. Let me know if this is not the case. Restart behavior should be addressed separately.

philpearl · 2022-12-16T16:00:04Z

@achandrasekar thanks for the update - it's good to understand the full history! Is finding a way to configure if the taints should be reapplied and when something that you're working on / might happen soon? If not would you be open to a PR (although it looks like I'd have a huge amount of ramp up to do to work out how to build / test / deploy!) ??

achandrasekar · 2022-12-27T19:16:21Z

@philpearl There is work planned to address this. But we don't have an ETA yet. I can provide an update once we do. The tainting behavior has a lot of implications to Cluster Autoscaler's scale up/down behavior. So, accepting a PR on this is difficult.

achandrasekar · 2022-12-28T19:13:39Z

Are you facing the issue only with preemptible nodes like the one mentioned here - cilium/cilium#21594? There is ongoing work to address this specific case.

thejosephstevens · 2022-12-30T17:50:37Z

Hey all, I think I might be running into the same issue here (also running Cilium with NOEXECUTE taints). The specific symptoms I'm seeing are that there's a node reboot event (like Node nodename-c9lq has been rebooted, boot id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx), but then I see some pods that actually continue running through the reboot (they have an age that dates back to before the reboot).

The effective outcome is the same as what phil is dealing with. I was expecting the NOEXECUTE taint to kick in and blow those pods away and keep them from coming back until the daemon starts, but instead they persisted, which resulted in application failures.

Let me know if there's anything I can provide that would be helpful to debugging this, this is impacting our customers with some regularity.

owais · 2023-03-10T19:32:05Z

We are facing the same issue with underlying VM restarts in GKE. It is pretty easy to replicate by restarting the underlying VM instance. In some cases, some pods (both deployments and daemonsets) run before cilium and enter a bad state requiring restarts.

k8s-triage-robot · 2023-06-08T19:35:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-07-08T19:59:48Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-01-19T13:59:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-01-19T14:00:02Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

philpearl mentioned this issue Nov 21, 2022

Unmanaged pods with preemptible nodes on GKE cilium/cilium#21594

Closed

2 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 8, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node taints should go back to baseline on machine restart #404

Node taints should go back to baseline on machine restart #404

philpearl commented Nov 21, 2022

philpearl commented Dec 15, 2022

aojea commented Dec 15, 2022

philpearl commented Dec 15, 2022

aojea commented Dec 15, 2022

philpearl commented Dec 15, 2022

achandrasekar commented Dec 15, 2022

philpearl commented Dec 16, 2022

achandrasekar commented Dec 27, 2022

achandrasekar commented Dec 28, 2022

thejosephstevens commented Dec 30, 2022

owais commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-triage-robot commented Jul 8, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

Node taints should go back to baseline on machine restart #404

Node taints should go back to baseline on machine restart #404

Comments

philpearl commented Nov 21, 2022

philpearl commented Dec 15, 2022

aojea commented Dec 15, 2022

philpearl commented Dec 15, 2022

aojea commented Dec 15, 2022

philpearl commented Dec 15, 2022

achandrasekar commented Dec 15, 2022

philpearl commented Dec 16, 2022

achandrasekar commented Dec 27, 2022

achandrasekar commented Dec 28, 2022

thejosephstevens commented Dec 30, 2022

owais commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-triage-robot commented Jul 8, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024