Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: "waiting for the k3s server to start" #1148

Closed
janhaa opened this issue Jan 4, 2024 · 14 comments
Closed

[Bug]: "waiting for the k3s server to start" #1148

janhaa opened this issue Jan 4, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@janhaa
Copy link

janhaa commented Jan 4, 2024

Description

EDIT: This is actually a duplicate, see: #1145 (comment)

Provisioning the servers using terraform apply does not work unfortunately:

module.kube-hetzner.null_resource.first_control_plane: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Job for k3s.service failed because the control process exited with error code.
module.kube-hetzner.null_resource.first_control_plane (remote-exec): See "systemctl status k3s.service" and "journalctl -xeu k3s.service" for details.
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Waiting for the k3s server to start...
╷
│ Error: remote-exec provisioner error
│
│   with module.kube-hetzner.null_resource.first_control_plane,
│   on .terraform/modules/kube-hetzner/init.tf line 73, in resource "null_resource" "first_control_plane":
│   73:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_2100530671.sh": Process exited with status 124

Investigating the control planes journal yields:

Jan 04 20:10:19 k3s-control-plane-ads systemd[1]: Starting Lightweight Kubernetes...
Jan 04 20:10:19 k3s-control-plane-ads sh[1982]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jan 04 20:10:19 k3s-control-plane-ads (k3s)[1988]: k3s.service: Failed to locate executable /usr/local/bin/k3s: Permission denied
Jan 04 20:10:19 k3s-control-plane-ads (k3s)[1988]: k3s.service: Failed at step EXEC spawning /usr/local/bin/k3s: Permission denied
Jan 04 20:10:19 k3s-control-plane-ads systemd[1]: k3s.service: Main process exited, code=exited, status=203/EXEC
Jan 04 20:10:19 k3s-control-plane-ads systemd[1]: k3s.service: Failed with result 'exit-code'.

Although:

k3s-control-plane-ads:~ # stat -c "%U %G" /usr/local/bin/k3s
root root

Manual run works fine:

k3s-control-plane-ads:~ # /usr/local/bin/k3s server
INFO[0000] Starting k3s v1.28.5+k3s1 (5b2d1271)
INFO[0000] Managed etcd cluster initializing
...

Thank you alot for your efforts!

Kube.tf file

I only modified the nodepool settings:  

control_plane_nodepools = [
    {
      name        = "control-plane",
      server_type = "cax11",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 3
      # swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      # zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      # kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]

      # Enable automatic backups via Hetzner (default: false)
      # backups = true
    }
  ]

  agent_nodepools = [
    {
      name        = "agent-medium",
      server_type = "cax21",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 2
      # swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      # zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      # kubelet_args = ["kube-reserved=cpu=50m,memory=300Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]

      # Enable automatic backups via Hetzner (default: false)
      # backups = true
    }
  ]

Screenshots

No response

Platform

WSL

@janhaa janhaa added the bug Something isn't working label Jan 4, 2024
@janhaa
Copy link
Author

janhaa commented Jan 4, 2024

Some digging with the help of almighty ChatGPT revealed an issue related to SELinux.

k3s-control-plane-1-myr:~ # sudo ausearch -m AVC -ts recent | grep k3s
type=AVC msg=audit(1704401173.178:542): avc:  denied  { execute } for  pid=2234 comm="(k3s)" name="k3s" dev="sda3" ino=279 scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1704401178.471:544): avc:  denied  { execute } for  pid=2251 comm="(k3s)" name="k3s" dev="sda3" ino=279 scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1704401183.721:546): avc:  denied  { execute } for  pid=2264 comm="(k3s)" name="k3s" dev="sda3" ino=279 scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
...

Running sudo restorecon -v /usr/local/bin/k3s allowed me to get past the issue on this control plane...

@janhaa
Copy link
Author

janhaa commented Jan 4, 2024

After running sudo restorecon -v /usr/local/bin/k3s on all machines deployment works!

@Wayneoween
Copy link

I'm observing the same issue. Fixing this once might be fine but I presume the issue will come up if there is an automated upgrade of a node?

@CroutonDigital
Copy link

Today 2 k3s nodes got status not Ready, reboot not helped.
I made rollback system snaphot to 1 day ago use snapper rollback. After start k3s node comeback to status Ready.

rebuild Suse MicroOs and try add new k3s node, but not success with same errors:

module.kube-hetzner.null_resource.agents["2-2-bots-large"]: Still creating... [2m10s elapsed]
module.kube-hetzner.null_resource.agents["2-2-bots-large"] (remote-exec): Waiting for the k3s agent to start...
module.kube-hetzner.null_resource.agents["2-2-bots-large"] (remote-exec): Waiting for the k3s agent to start...
module.kube-hetzner.null_resource.agents["2-2-bots-large"]: Still creating... [2m20s elapsed]
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.agents["2-2-bots-large"],
│   on .terraform/modules/kube-hetzner/agents.tf line 107, in resource "null_resource" "agents":
│  107:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1588448047.sh": Process exited with status 124

How add new additional node to k3s?

@CroutonDigital
Copy link

When I connect to VM:

h-k3s-test-bots-large-wto:~ # journalctl -xeu k3s-agent
░░ The error number returned by this process is ERRNO.
Jan 05 07:49:17 h-k3s-test-bots-large-wto (k3s)[3475]: k3s-agent.service: Failed at step EXEC spawning /usr/local/bin/k3s: Permission denied
░░ Subject: Process /usr/local/bin/k3s could not be executed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ The process /usr/local/bin/k3s could not be executed and failed.
░░ 
░░ The error number returned by this process is ERRNO.
Jan 05 07:49:17 h-k3s-test-bots-large-wto systemd[1]: k3s-agent.service: Main process exited, code=exited, status=203/EXEC

PS: Autoscaller create new 6 VMs and I don't see on k3s )))))

@CroutonDigital
Copy link

restorecon -v /usr/local/bin/k3s helped, too

@janhaa
Copy link
Author

janhaa commented Jan 5, 2024

See also for a possible workaround: #1145 (comment)

@Silvest89
Copy link
Contributor

See also for a possible workaround: #1145 (comment)

@mysticaltech
What do you think of this issue and the work around?

@mysticaltech
Copy link
Collaborator

@Silvest89 I think the work around is safe to do just after setup. I will introduce it right away. And will also update the k3s selinux package.

@mysticaltech
Copy link
Collaborator

@janhaa @CroutonDigital This is fixed in v2.11.4, please upgrade to it with terraform init -upgrade.

@CroutonDigital
Copy link

Thank you! All worked fine

@Taronyuu
Copy link
Sponsor

Taronyuu commented Jan 9, 2024

@mysticaltech I just ran into this issue while updating my cluster, remembered this issue and upgraded right away. All solved now. Just wanted to thank you for your effort 🙏🏻

@jimping
Copy link

jimping commented Feb 12, 2024

I am getting the same error.
Newest Version, Mac, Fresh Install unchanged config (except hcloud token)

@mysticaltech
Copy link
Collaborator

@jimping Please open a new issue with all the details to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants