New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drain daemonsets #75482
Comments
Taking a guess here... |
/sig node |
/milestone v1.16 I'd say it's a bug as Prioritizing this for the next release, ideally |
Hey, I would like to give this a try. @nikopen do you think a new contributor to kubernetes could fix this issue? If yes, could you give me some directions? |
I was going through the docs and found this statement:
Is this really a bug or is it the intended behavior? |
@paivagustavo I think the implementation is not a problem (likely trivial), the problem lies mostly in regards to decision-making from the responsible SIG. Could be viewed as both a bug and intented, but there should be a way to manually force this behavior. |
Yeah. Generally, you want the network stack (kube-proxy,flannel,etc) to continue to function while draining the node of pods, so you don't want those to get deleted too early. But you do want a way to eventually completely clean all pods out. Typically I've seen upgrading docker/containerd/etc loose track of pods because of this issue, as kubelet still thinks they are on the node, but the engine no longer finds them. I hit it again today. So we do need a solution to it. Maybe an additional kind of drain that taints the node as going completely off line and deletes all remaining pods? This would prevent the daemonsets from relaunching pods on them. |
@kfox1111 Hello! I'm bug triage lead for the 1.16 release cycle and considering this issue is tagged for 1.16, but not updated for a long time, I'd like to check its status. The code freeze is starting on August 29th (about 1.5 weeks from now), which means that there should be a PR ready (and merged) until then. Do we still target this issue to be fixed for 1.16? |
I hope so, but I don't think anyone is actively working on it at the moment. So probably will miss 1.16. |
/milestone v1.17 |
Bug triage for 1.17 here with a gentle reminder that code freeze for this release is on Nov. 18. Is this issue still intended for 1.17? |
This still periodically affects me. Need a fix. |
Correction: Code freeze is on Thursday, November 14. Sorry for the mix up. |
Are you able to address this by performing a power cycle? Typically, we do not recommend updating the host without rebooting it. |
No. Sometimes upgrading the container engine may cause it to loose track of containers unsafely but persistently (storage). The instructions say, to clean off 'all' containers off of the system before upgrading the container runtime. Daemonsets are ignoring this and not actually draining. (Sometimes useful). There needs to be an easy way to completely remove all running kubernetes managed containers off of a host. Like, a flag on the host or something to say, kill all the rest and don't let it restart automatically. |
/milestone v1.18 |
We ran into this today, this issue is the result from a design decision at containerd:
Source: https://github.com/containerd/containerd/blob/master/docs/ops.md#systemd So when containerd is upgraded it will abandon its child processes by design. So all the ignored daemonsets still running on the drained node are abandoned but Kubernetes will not pick them back up after uncordoning. This is seems the issue that @kfox1111 describes by:
This is resolved by rebooting node as @derekwaynecarr suggests since this kill the abandoned child processes. If needed I can provide some examples. In short this is reproducible by:
|
Containerd has it for sure. Other runtimes might as well. The need to drain a node completely of workload is a reasonable one on its own regardless of which runtime is involved though. I've seen kubelet continuously complain in the logs about pods that disappeared after an upgrade of containerd as well, so just rebooting doesn't entirely fix the issue. |
Pretty sure its not on anyone's radar anymore. Someone should bring it back up to sig-node maybe? |
Got it, sorry I am not aware of the process, what exactly does one have to do to take this up to sig-node/'s attention? |
Sorry. I don't really know. Guessing it may need to be brought up at one of the meetings? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
still an issue. :/ |
+1 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/remove-kind bug |
/priority backlog to reflect the reality |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/remove-priority important-soon |
This feature might be eventually supported with Declarative Node Maintenance: kubernetes/enhancements#4213 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What would you like to be added:
a flag to drain to support draining daemonset managed pods too.
Why is this needed:
I needed to blow out container runtime images completely to fix an issue. daemonsets tried to keep pods running.
The text was updated successfully, but these errors were encountered: