kubelet does not start even when the agent node is unable to connect to the server node #1686

carlosrmendes · 2020-04-25T19:22:44Z

I have an agent node with some static pods (I've set the --kubelet-arg=pod-manifest-path=... argument) but sometimes that node goes offline and could get some reboots. The problem is when the node starts after a reboot and still is offline, the k3s doesn't start anything (i.e. containerd, kubelet, ...) and with that, don't start the static pods.

Is there any solution or workaround for this use case?

The text was updated successfully, but these errors were encountered:

brandond · 2020-04-27T23:02:40Z

I tried to do the same thing - I was hoping to run etcd as a static pod, but there's a chicken-and-egg problem where k3s won't start until etcd is up, so it won't start the static pod, etc.

carlosrmendes · 2020-04-27T23:12:47Z

as described in https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/:
"Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them"

With that, IMO, I think that kubelet on the k3s process must be started before the agent node tries to connect to the server node.

brandond · 2020-04-28T02:49:36Z

Yeah, having the agent run without server connectivity would be great. Bonus points if the kubelet on server instances would start up independent of the apiserver so that we could do thinks like running etcd or mysql as a static pod.

ibuildthecloud · 2020-05-05T00:10:05Z

The reason why this is a bit difficult is that we download the config of the kubelet from the api server. To do this properly requires a bootstrap mode for the kubelet. Basically run with the last (or no) configuration and then download the configuration and restart. The kubelet is ran embedded in the same process as the k3s agent so restart means we'd have to reexec ourselve as the kubelet can't just be restarted in memory. This all gets a bit messy.

@carlosrmendes What static pods are you running that can't be done with a daemonset? I personally haven't found any great use cases for static pods beyond bootstraping k8s itself.

ibuildthecloud · 2020-05-05T00:12:13Z

I do think it's a reasonable request for "agent should start without server connectivity." If the server goes down and then you restart an agent it shouldn't be blocked. Supporting static pods on the agent only nodes will be tricky, but supporting them on the server is probably feasible as this is how rke2 works.

carlosrmendes · 2020-05-05T00:35:20Z

my use case is run specific pods (workloads) on some agent nodes, even the nodes are offline and disconnected from the master. I don't want DaemonSets, because I want to create/schedule specific pods into specific agent nodes and despite that, to start the pods of a DaemonSet in an agent node that is offline or disconnected from the master, the agent node must have connection to the api-server (or the node must be visible as Ready by the api-server). With static pods that is not necessary, only kubelet running is needed to start static pod on an offline node.

tdbs · 2020-05-05T09:33:44Z

I have to agree, this is an issue. I would expect the "lightweight kubernetes" to work like kubernetes, but in this instance it does things in a way that I cannot use the kuberenetes instructions on static pods.

stangerm2 · 2021-01-06T01:51:45Z

Want to +1 this.

K3S's on the homepage certified Kubernetes distribution built for IoT & Edge computing but IoT & Edge computing isn't a data-center. Edge devices almost exclusively have intermittent connectivity. I want pod's to run on IoT & Edge even if my thing isn't connected.. because unlike a webapp my container/pod is still doing something productive even when it's not 'online'.

Static pods without a control plane is a great way to get managed app's on top of firmware. Kubelet also supports this behavior without issue. It's just really hard to justify putting a 200Mb bin(s) on a embedded device. K3S would be an amazing solution in that place if this issue wasn't present.

As a device engineer I'd like to see K3S support offline static manifest so I can use it to run edge app's via pod's and manage them as I see fit, via manifest updates when they are online and run as last configure, via manifest when they aren't.

stale · 2021-07-30T22:45:29Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

tdbs · 2021-07-30T22:53:24Z

Ignoring an issue doesn't solve it.

cwayne18 · 2021-07-30T22:59:42Z

Hi @tdbs, the reason for marking things stale is not to ignore them, but rather to give us a better idea of what is still an issue. By commenting here, you've ensured that this issue is no longer marked as stale and remains open. Thank you!

stale · 2022-01-26T23:09:42Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

sloveridge · 2022-01-30T13:37:02Z

Would anyone be able to confirm if this functionality will be added to k3s or if I should look to RKE2 for this?

It is essential to be able to roll out k8s at the edge for us.

brandond · 2022-02-02T21:08:23Z

We don't currently have any plans to support operating agents without a server. Note that (as far as I know) RKE2 would suffer a similar limitation.

Our usual edge use-case would support multiple self-contained clusters in each potentially isolatable location, managed by a multi-cluster management product (Rancher/Fleet/etc) - as opposed to worker nodes that attempt to operate while detached from the control-plane. Kubernetes is not really designed for offline node operation.

sloveridge · 2022-02-03T03:17:11Z

Hi @brandond

The use case we are looking at is similar "multiple self-contained clusters in each potentially isolatable location" however some of the nodes are connected to physical systems / user interfaces. This requires static pods as the pods providing the UI or the physical system communication must be on the related node. The functionality I am looking for is just K3s reflecting the way k8s is meant to do with static pods from the k8s docs:

"Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them. Unlike Pods that are managed by the control plane (for example, a Deployment); instead, the kubelet watches each static Pod (and restarts it if it fails)."

wrt RKE2 support I am referring to this issue (rancher/rke2#251) which although it has been pushed back a couple times seems to have been accepted as something that will be done. Considering you also work with RKE2 do you know if that is likely to be done?

To summarise the use case. In edge deployments it is often required that a node run a specific service due to physical input occurring at that node or something directly connected to it. Static pods are part of k8s to support this configuration but the kubelet not starting the static pods without communication to the control plane is not an acceptable scenario.

If I am missing something here please let me know.

Thank you for your time.

brandond · 2022-02-03T20:14:35Z

The core issue that makes this difficult is that the agent generates the kubelet config using information pulled down from the server. This includes cluster configuration, certificates and keys, apiserver addresses, etc. While this is all written out to disk every startup and could potentially be used on subsequent startups if the server is unavailable, we don't have any logic to do that at the moment. We'd essentially need to start the kubelet using an untrusted existing configuration, and then restart it later once a server becomes available and the configuration has been updated. While this is all theoretically doable, it's not currently something that's prioritized on our product roadmap.

Edit: I just realized I retyped basically the exact same thing that was said at #1686 (comment)

sloveridge · 2022-02-04T00:45:57Z

I guess that is the trade off of everything running in the one process with k3s.

It is helpful to know it is unlikely to come from the core team in the future. I will think through some work around at the application layer.

stale · 2022-08-03T01:19:30Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

carlosrmendes changed the title ~~kubelet won't start even of the agent node could not connect to the server node~~ kubelet does not start even when the agent node is unable to connect to the server node Apr 27, 2020

brandond mentioned this issue May 1, 2020

Suppport static pods at ${datadir}/agent/staticpods #1691

Merged

carone1 mentioned this issue Aug 27, 2020

rke2-agent kubelet does not start static pods if agent node is unable to connect to server node rancher/rke2#251

Closed

stale bot added the status/stale label Jul 30, 2021

stale bot removed the status/stale label Jul 30, 2021

stale bot added the status/stale label Jan 26, 2022

stale bot removed the status/stale label Jan 30, 2022

stale bot added the status/stale label Aug 3, 2022

stale bot closed this as completed Aug 17, 2022

brandond mentioned this issue Apr 28, 2024

Fail to run static pod without master running #10036

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet does not start even when the agent node is unable to connect to the server node #1686

kubelet does not start even when the agent node is unable to connect to the server node #1686

carlosrmendes commented Apr 25, 2020 •

edited

brandond commented Apr 27, 2020 •

edited

carlosrmendes commented Apr 27, 2020 •

edited

brandond commented Apr 28, 2020 •

edited

ibuildthecloud commented May 5, 2020

ibuildthecloud commented May 5, 2020

carlosrmendes commented May 5, 2020

tdbs commented May 5, 2020

stangerm2 commented Jan 6, 2021 •

edited

stale bot commented Jul 30, 2021

tdbs commented Jul 30, 2021

cwayne18 commented Jul 30, 2021

stale bot commented Jan 26, 2022

sloveridge commented Jan 30, 2022 •

edited

brandond commented Feb 2, 2022 •

edited

sloveridge commented Feb 3, 2022

brandond commented Feb 3, 2022 •

edited

sloveridge commented Feb 4, 2022

stale bot commented Aug 3, 2022

kubelet does not start even when the agent node is unable to connect to the server node #1686

kubelet does not start even when the agent node is unable to connect to the server node #1686

Comments

carlosrmendes commented Apr 25, 2020 • edited

brandond commented Apr 27, 2020 • edited

carlosrmendes commented Apr 27, 2020 • edited

brandond commented Apr 28, 2020 • edited

ibuildthecloud commented May 5, 2020

ibuildthecloud commented May 5, 2020

carlosrmendes commented May 5, 2020

tdbs commented May 5, 2020

stangerm2 commented Jan 6, 2021 • edited

stale bot commented Jul 30, 2021

tdbs commented Jul 30, 2021

cwayne18 commented Jul 30, 2021

stale bot commented Jan 26, 2022

sloveridge commented Jan 30, 2022 • edited

brandond commented Feb 2, 2022 • edited

sloveridge commented Feb 3, 2022

brandond commented Feb 3, 2022 • edited

sloveridge commented Feb 4, 2022

stale bot commented Aug 3, 2022

carlosrmendes commented Apr 25, 2020 •

edited

brandond commented Apr 27, 2020 •

edited

carlosrmendes commented Apr 27, 2020 •

edited

brandond commented Apr 28, 2020 •

edited

stangerm2 commented Jan 6, 2021 •

edited

sloveridge commented Jan 30, 2022 •

edited

brandond commented Feb 2, 2022 •

edited

brandond commented Feb 3, 2022 •

edited