-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"k0s start" on worker node fails with "Error: service in failed state" #1638
Comments
Hi Team, I am also facing this issue while executing "k0s start" in the worker node. Can you please provide some pointers? |
The error message stems from the fact that the k0s systemd service is in a failed state. The k0s start command looks for the installed service and checks the status of the service to check if it's actually installed. That's where the "failed" error slips through. K0s should probably treat that case a bit differently in the start/stop/delete subcommands. Can you try to start k0s manually via systemctl? Does it work then? |
Apparently in my case an extra new line char was added accidentally to the token file on the worker nodes which prevented them from joining the controller. After I fixed this it works now. |
Thanks for the feedback. Even if the root cause in your case was some configuration error, I'll be reopening this, since there's definitely something to be improved here on k0s side. |
What if we check the service state in |
@twz123 , thank you for reopening the ticket. As @sebthom said, I checked my token-file and there was no extra character added. Yet, I am still getting the same issue. FYI, I am working on AWS x64 ubuntu instance as a controller node and another AWS Ubuntu x64 instance as a worker node. Also, as mentioned above, I checked the systemctl list on the worker node, the k0sworker.service was found failing. I restarted the service manually using systemctl, but that did not resolve the issue. Also, |
As the service is failing, the logs probably contain some hints why it fails. Check with |
Hi Team, I again followed the manual installation for k0s. I successfully created the controller node on an AWS Ubuntu instance and created another worker node on another AWS instance, using the join token created in the controller node. FYI: I needed to include ‘--enable-worker’ flag along with While I execute
This is OK as per the documents. But my understanding says that the May I know, am I correct in my understanding, or is the above output an expected behavior? |
if you have another instance running with I would advise to check the logs on the worker node using something like As this is AWS infra, is there security groups configured in a way that allows the two nodes to properly connect with eachother? I'd start by enabling full allow within the SG where both nodes are in. |
I even tried k0s installation on the local servers. The result is the same as on AWS. ‘ Jul 27 11:39:58 ip-172-31-19-8 k0s[8785]: time="2022-07-27 11:39:58" level=warning msg="failed to get initial kubelet config with join token: failed to get kubelet config from API: Get "https://172.31.46.24:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-default-1.24\": dial tcp 172.31.46.24:6443: i/o timeout" This means, workers were able to get the controller’s IP, but could not connect to it, or could not read some config information. I checked that my k0s.yml file already has the public IP of my controller, and now I am working on the local servers, so security group is no longer an issue. Can you please guide me what next, I can check? |
So clearly the worker node cannot connect to the controllers IP. Few things I'd check:
|
Thank you for the suggestions. I curled the controller from worker and got the below results:
Connection failed. Also, decoding the join token showed the correct IP address of the controller. And with the I tried running nginx service on the controller machine on port 80 and curled controller on port 80 (curl http://IP:80) from the worker machine. The connection is successful. I’m not sure why it’s failing with k0s. I am reading this document and found that we need to configure firewall to outbound access port 6443 and 8132. I did that as follows:
But nothing really affected. |
So, from worker, you can |
I just ran into a similar problem: I rebooted all my worker nodes at the same time (to see what would happen in case there is some kind of failure). Each worker is now stuck:
I also tried to run |
The join token is empty:
But I am also not sure why it would need the join token after a (simple) reboot? |
Managed to recover with |
Before creating an issue, make sure you've checked the following:
Version
v1.23.5+k0s.0
Platform
What happened?
I followed the multi node setup described at https://docs.k0sproject.io/v1.23.5+k0s.0/k0s-multi-node/
Setup of the controller node worked as described. However when trying to start a worker node the following error message appears without any further information:
Steps to reproduce
sudo su - root curl -sSLf https://get.k0s.sh | sh k0s install worker --token-file /tmp/worker-join-token k0s start
Expected behavior
The worker node gets federated into the controller and the k0s service on the worker node starts without failures.
Actual behavior
Starting k0s on the worker node fails with Error: service in failed state
Screenshots and logs
No response
Additional context
The text was updated successfully, but these errors were encountered: