-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random K3s node gets marked as "NotReady" at random intervals for a random amount of time. #6178
Comments
You have reduced the values of these settings way below their defaults. I'm guessing you've been reading the conversation in #1264 (comment)? I don't believe that Kubernetes itself is tested with the intervals and timeouts turned down that low. You're probably going to need to tune those to something that is stable for your environment. Also, if you are customizing the controller-manager args on one server, you should customize them on all of them. Same with kubelet args - if you want the correct values on all the nods in your cluster, you must do so manually. Customized component args are not automatically shared across cluster nodes. |
Thanks, that fixed this issue, but now nodes never get marked as Unknown after being shutdown? I changed those parameters to this:
The shutdown is from within Proxmox, so I'm guessing it's not really graceful on the K8s side of things. Before they would get marked Unknown after a bit of time, but now they're just stuck on NotReady. Pods thus don't get evicted and are just stuck on a node that doesn't exist. Example, kwn-0 (one of the agents) is shutdown, here is the output of
|
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
Environmental Info:
K3s Version:
k3s version v1.24.4+k3s1 (c3f830e)
go version go1.18.1
Node(s) CPU architecture, OS, and Version:
Debian 11 Proxmox VM created using the Generic cloud image with 2GB RAM and 2 CPU cores, running on an x86 CPU.
Linux kcp-1 5.10.0-18-cloud-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux
Cluster Configuration:
3 servers, 3 agents, made with these commands:
Where $ENDPOINT is the address of my HAProxy instance, which sits outside the cluster to load balance all the nodes, and is automatically updated once a new node is added/removed.
Describe the bug:
Upon creation, random nodes get marked as "NotReady" at random intervals for a random amount of time (but never too long, mostly between 1-7 seconds). This persists until all nodes are shut down and started up again, which I figured out by rebooting the Proxmox node.
Steps To Reproduce:
kubectl get nodes
a few timesExpected behavior:
Nodes are marked as "Ready" since nothing external is running on them and the VMs are all healthy.
Actual behavior:
Nodes are marked as "NotReady" at random intervals for a random amount of time, despite nothing external running on them and the VMs being all healthy.
Additional context / logs:
Output of
journalctl -u k3s
on kcp-1 (the second k3s server) is in file.issue.txt
kubectl get node kcp-1 -o yaml
says that "Kubelet stopped posting node status."Any help is much appreciated.
The text was updated successfully, but these errors were encountered: