New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s won't start due to segv/Segmentation fault #1215
Comments
I more or less have exactly the same on a bunch of Rock64, Out of 10, I made three masters and the rest workers. When I went to bed last night all looked well. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME When I get up this morning k3s is just segfaulting at start: Dec 19 09:26:55 localhost systemd[1]: Starting Lightweight Kubernetes... |
Could you please start k3s with debug logs? Otherwise, it's hard to guess what's happening. You can pass
to the server, so for example:
|
systemctl status k3s.service:
journalctl -xe:
|
Mine is no more helpful either: [INFO] Finding latest release |
can you please also provide |
Mine just repeats itself: Dec 19 09:57:51 rock64-1 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart. |
of course, see here
|
This seems to be the same as #1181 |
I'm pretty sure this has to do with dsqlite, I was digging around on my other nodes which ended up failing too and I was a reference somewhere in the logs saying something similar to "unable to elect master". I recently abandoned an LXD raspberry Pi cluster for exactly the same reason. Even if I just rebooted a single node on the LXD cluster the whole thing would become broken due to this dsqlite not seeming to get its act together. Anyway, I switched out to an external postgres DB yesterday and everything has been working just fine since. |
I was able to reproduce the issue but unfortunately, but I have issues compiling k3s on my OrangePi when I was trying to fix it :( So I'll have to back off from this, hopefully someone from rancher can
But I also couldn't find where does "CREATE /registry/health" happens which pops up just before a panic attack - maybe it would tell a bit more. P.S. Anyway, here are the debug logs:
|
Running into the same issue on three different k3s cluster masters (all set up with $ /usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/HASH
DEBU[0000] Running /var/lib/rancher/k3s/data/HASH/bin/k3s-server [/usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik]
INFO[2020-02-05T19:34:01.279721010+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
Segmentation fault strace: [..]
futex(0x6d87010, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d86f10, FUTEX_WAKE_PRIVATE, 1) = 1
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
recvfrom(5, {{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1, pid=32278}, 0}, 4096, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [112->12]) = 20
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
close(5) = 0
getuid() = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 5
read(5, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 4096) = 2335
read(5, "", 4096) = 0
close(5) = 0
write(2, "\33[36mINFO\33[0m[2020-02-05T19:38:4"..., 97INFO[2020-02-05T19:38:45.395473030+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
) = 97
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc000186848, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = ?
+++ killed by SIGSEGV +++
Segmentation fault |
What platform (architecture) is this on? |
Debian 9 and 10, latest patch level, x86-64. |
I've seen this error when using cluster with dsqlite on RPi 4 and x86-64. I don't have logs because I removed k3s and will install again with an external DB. |
|
Same issue on RHEL7 amd64 v1.17.3+k3s1. SEGV after "Cluster bootstrap already complete". |
The issue still exists on Debian 10 amd64 with |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
We had built a cluster with 3 master nodes and a bunch of worker nodes. Over night 2 of the master died and didn't came back up. Trying to start the k3s.service wie systemctl (Debian 10) is to no avail, the process gets killed immediately.
Calling the k3s binary does basically the same.
Cluster is build by running
curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="server --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --cluster-init" sh -
We can't get the cluster healthy again, because these two masters won't start their k3s again. Any idea how to fix?
The text was updated successfully, but these errors were encountered: