Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s won't start due to segv/Segmentation fault #1215

Closed
aberfeldy opened this issue Dec 18, 2019 · 18 comments
Closed

k3s won't start due to segv/Segmentation fault #1215

aberfeldy opened this issue Dec 18, 2019 · 18 comments

Comments

@aberfeldy
Copy link

We had built a cluster with 3 master nodes and a bunch of worker nodes. Over night 2 of the master died and didn't came back up. Trying to start the k3s.service wie systemctl (Debian 10) is to no avail, the process gets killed immediately.

Dec 18 17:33:15 master-3 k3s[5645]: time="2019-12-18T17:33:15.329364828+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 18 17:33:15 master-3 k3s[5645]: time="2019-12-18T17:33:15.329884440+01:00" level=info msg="Cluster bootstrap already complete"
Dec 18 17:33:17 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 18 17:33:17 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 18 17:33:17 master-3 systemd[1]: Failed to start Lightweight Kubernetes.

Calling the k3s binary does basically the same.

I1218 17:36:26.952042    6205 interface.go:384] Looking for default routes with IPv4 addresses
I1218 17:36:26.952132    6205 interface.go:392] Default route transits interface "eth0"
I1218 17:36:26.952262    6205 interface.go:196] Interface eth0 is up
I1218 17:36:26.952327    6205 interface.go:244] Interface "eth0" has 3 addresses :[88.xx.xx.xx/32 2a01:xxx:xxx:352c::1/64 fe80::xx:xx:xx:b26b/64].
I1218 17:36:26.952359    6205 interface.go:211] Checking addr  88.xx.xx.xx/32.
I1218 17:36:26.952369    6205 interface.go:218] IP found 88.xx.xx.xx
I1218 17:36:26.952384    6205 interface.go:250] Found valid IPv4 address 88.xx.xx.xx for interface "eth0".
I1218 17:36:26.952392    6205 interface.go:398] Found active IP 88.xx.xx.xx
I1218 17:36:26.952418    6205 services.go:45] Setting service IP to "10.43.0.1" (read-write).
INFO[2019-12-18T17:36:26.952441448+01:00] Starting k3s v1.0.0 (18bd921c)
I1218 17:36:26.973940    6205 services.go:45] Setting service IP to "10.43.0.1" (read-write).
I1218 17:36:26.975450    6205 interface.go:384] Looking for default routes with IPv4 addresses
I1218 17:36:26.975470    6205 interface.go:392] Default route transits interface "eth0"
I1218 17:36:26.975540    6205 interface.go:196] Interface eth0 is up
I1218 17:36:26.975593    6205 interface.go:244] Interface "eth0" has 3 addresses :[88.xx.xx.xx/32 2a01:xxx:xxx:352c::1/64 fe80::xx:xx:xx:b26b/64].
I1218 17:36:26.975617    6205 interface.go:211] Checking addr  88.xx.xx.xx/32.
I1218 17:36:26.975627    6205 interface.go:218] IP found 88.xx.xx.xx/
I1218 17:36:26.975640    6205 interface.go:250] Found valid IPv4 address 88.xx.xx.xx for interface "eth0".
I1218 17:36:26.975656    6205 interface.go:398] Found active IP 88.xx.xx.xx
Segmentation fault

Cluster is build by running
curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="server --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --cluster-init" sh -

We can't get the cluster healthy again, because these two masters won't start their k3s again. Any idea how to fix?

@PBXForums
Copy link

I more or less have exactly the same on a bunch of Rock64, Out of 10, I made three masters and the rest workers.

When I went to bed last night all looked well.

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rock64-10 Ready 2m32s v1.16.3-k3s.2 192.168.1.39 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-1 Ready master 64m v1.16.3-k3s.2 192.168.1.30 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-2 Ready master 54m v1.16.3-k3s.2 192.168.1.31 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-3 Ready master 16m v1.16.3-k3s.2 192.168.1.32 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-4 Ready 9m4s v1.16.3-k3s.2 192.168.1.33 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-5 Ready 8m32s v1.16.3-k3s.2 192.168.1.34 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-6 Ready 8m7s v1.16.3-k3s.2 192.168.1.35 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-7 Ready 3m27s v1.16.3-k3s.2 192.168.1.36 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-8 Ready 3m25s v1.16.3-k3s.2 192.168.1.37 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4
rock64-9 Ready 3m6s v1.16.3-k3s.2 192.168.1.38 Debian GNU/Linux 10 (buster) 5.3.11-rockchip64 containerd://1.3.0-k3s.4

When I get up this morning k3s is just segfaulting at start:

Dec 19 09:26:55 localhost systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:26:56 localhost k3s[2597]: time="2019-12-19T09:26:56.915987995Z" level=info msg="Starting k3s v1.0.0 (18bd921)"
Dec 19 09:26:57 localhost systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:26:57 localhost systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:26:57 localhost systemd[1]: Failed to start Lightweight Kubernetes.

@DavidZisky
Copy link

Could you please start k3s with debug logs? Otherwise, it's hard to guess what's happening. You can pass

-v 3

to the server, so for example:

curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="server -v 3 --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --cluster-init" sh -

@aberfeldy
Copy link
Author

[INFO]  Finding latest release
[INFO]  Using v1.0.0 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.0.0/sha256sum-amd64.txt
[INFO]  Skipping binary downloaded, installed k3s matches hash
[INFO]  Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
Job for k3s.service failed because a fatal signal was delivered to the control process.
See "systemctl status k3s.service" and "journalctl -xe" for details.

systemctl status k3s.service:

● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: signal) since Thu 2019-12-19 11:26:17 CET; 3s ago
     Docs: https://k3s.io
  Process: 32762 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 32763 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 32764 ExecStart=/usr/local/bin/k3s server -v 3 --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --server https://195.201.223.208:6443
 Main PID: 32764 (code=killed, signal=SEGV)

journalctl -xe:

Dec 19 11:26:44 master-3 systemd[1]: Stopped Lightweight Kubernetes.
-- Subject: A stop job for unit k3s.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit k3s.service has finished.
--
-- The job identifier is 869556 and the job result is done.
Dec 19 11:26:44 master-3 systemd[1]: Starting Lightweight Kubernetes...
-- Subject: A start job for unit k3s.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 869556.
Dec 19 11:26:44 master-3 k3s[386]: time="2019-12-19T11:26:44.335172229+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:26:44 master-3 k3s[386]: time="2019-12-19T11:26:44.335378931+01:00" level=info msg="Cluster bootstrap already complete"
Dec 19 11:26:44 master-3 sshd[340]: Failed password for root from 222.186.173.215 port 61980 ssh2
Dec 19 11:26:46 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'killed' and its exit status is 11.
Dec 19 11:26:46 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit k3s.service has entered the 'failed' state with result 'signal'.
Dec 19 11:26:46 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has finished with a failure.
--
-- The job identifier is 869556 and the job result is failed.

@PBXForums
Copy link

Mine is no more helpful either:

[INFO] Finding latest release
[INFO] Using v1.0.0 as release
[INFO] Downloading hash https://github.com/rancher/k3s/releases/download/v1.0.0/sha256sum-arm64.txt
[INFO] Skipping binary downloaded, installed k3s matches hash
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
Job for k3s.service failed because a fatal signal was delivered to the control process.
See "systemctl status k3s.service" and "journalctl -xe" for details.
root@rock64-1:~# tail /var/log/syslog
Dec 19 10:31:25 localhost systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 10:31:25 localhost systemd[1]: Starting Lightweight Kubernetes...
Dec 19 10:31:26 localhost k3s[6774]: time="2019-12-19T10:31:26.698005505Z" level=info msg="Starting k3s v1.0.0 (18bd921)"
Dec 19 10:31:27 localhost systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 10:31:27 localhost systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 10:31:27 localhost systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 10:31:32 localhost systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 10:31:32 localhost systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5.
Dec 19 10:31:32 localhost systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 10:31:32 localhost systemd[1]: Starting Lightweight Kubernetes...

@DavidZisky
Copy link

can you please also provide journalctl -u k3s ? Will be a bit easier to follow

@PBXForums
Copy link

Mine just repeats itself:

Dec 19 09:57:51 rock64-1 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 09:57:51 rock64-1 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5.
Dec 19 09:57:51 rock64-1 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 09:57:51 rock64-1 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:57:52 rock64-1 k3s[1398]: time="2019-12-19T09:57:52.227235578Z" level=info msg="Starting k3s v1.0.0 (18bd921)"
Dec 19 09:57:52 rock64-1 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:57:52 rock64-1 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:57:52 rock64-1 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 09:57:58 rock64-1 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 09:57:58 rock64-1 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 6.
Dec 19 09:57:58 rock64-1 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 09:57:58 rock64-1 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:57:59 rock64-1 k3s[1415]: time="2019-12-19T09:57:59.221760455Z" level=info msg="Starting k3s v1.0.0 (18bd921)"
Dec 19 09:57:59 rock64-1 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:57:59 rock64-1 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:57:59 rock64-1 systemd[1]: Failed to start Lightweight Kubernetes.

@aberfeldy
Copy link
Author

of course, see here

Dec 19 11:35:50 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 11:35:50 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 11:35:50 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 11:35:55 master-3 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 11:35:55 master-3 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 89.
Dec 19 11:35:55 master-3 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 11:35:55 master-3 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 11:35:55 master-3 k3s[2032]: time="2019-12-19T11:35:55.352137616+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:35:55 master-3 k3s[2032]: time="2019-12-19T11:35:55.352358530+01:00" level=info msg="Cluster bootstrap already complete"
Dec 19 11:35:57 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 11:35:57 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 11:35:57 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 11:36:02 master-3 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 11:36:02 master-3 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 90.
Dec 19 11:36:02 master-3 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 11:36:02 master-3 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 11:36:02 master-3 k3s[2052]: time="2019-12-19T11:36:02.351682007+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:36:02 master-3 k3s[2052]: time="2019-12-19T11:36:02.352019332+01:00" level=info msg="Cluster bootstrap already complete"

@vFondevilla
Copy link

This seems to be the same as #1181

@PBXForums
Copy link

I'm pretty sure this has to do with dsqlite, I was digging around on my other nodes which ended up failing too and I was a reference somewhere in the logs saying something similar to "unable to elect master". I recently abandoned an LXD raspberry Pi cluster for exactly the same reason. Even if I just rebooted a single node on the LXD cluster the whole thing would become broken due to this dsqlite not seeming to get its act together.

Anyway, I switched out to an external postgres DB yesterday and everything has been working just fine since.

@DavidZisky
Copy link

DavidZisky commented Dec 22, 2019

I was able to reproduce the issue but unfortunately, but I have issues compiling k3s on my OrangePi when I was trying to fix it :( So I'll have to back off from this, hopefully someone from rancher can
follow along. The issue seems to lie in canonical go-dqlite library.

panic: runtime error: makeslice: len out of range happens when calling getBLob func.

But I also couldn't find where does "CREATE /registry/health" happens which pops up just before a panic attack - maybe it would tell a bit more.

P.S.
So something is trying to create slice with a length which can't be tracked therefore we're getting "len out of range" of the supported length for a byte sized slice but I can't find what. I don't have Raspberry and my OrangePi has 32bit CPU so I can't test but maybe Raspberry CPU is 64-bit only fr instructions and 32-bit for pointers and 32-bit type INTs and UINTs?

Anyway, here are the debug logs:

root@orangepilite:~# k3s --debug server
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/182bf1607a98af006c64bf65c7e0aeaa6fef00309ac072b56edef511f34d2ac4 
DEBU[0000] Running /var/lib/rancher/k3s/data/182bf1607a98af006c64bf65c7e0aeaa6fef00309ac072b56edef511f34d2ac4/bin/k3s-server [k3s --debug server] 
INFO[2019-12-22T10:27:05.883284862Z] Starting k3s v1.0.1 (e94a3c60)               
INFO[2019-12-22T10:27:06.322994959Z] Testing connection to peers [192.168.1.16:6443] 
DEBU[2019-12-22T10:27:06.324675242Z] connected address=192.168.1.16:6443 attempt=0 
DEBU[2019-12-22T10:27:06.329403269Z] connected address=192.168.1.16:6443 attempt=0 
DEBU[2019-12-22T10:27:06.361818933Z] connected address=192.168.1.16:6443 attempt=0 
DEBU[2019-12-22T10:27:06.367374769Z] CREATE /registry/health, size=17, lease=0 => rev=1, err=<nil> 
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Message).getBlob(0x70d88bc, 0x0, 0x585d720, 0x7673b10)
	/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:356 +0x3c
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Rows).Next(0x760c3d4, 0x72507e0, 0xb, 0xb, 0x7bf5c, 0x978ca8)
	/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:557 +0x2ec
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver.(*Rows).Next(0x760c3c0, 0x72507e0, 0xb, 0xb, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver/driver.go:585 +0x40
database/sql.(*Rows).nextLocked(0x7b4e1e0, 0x970000)
	/usr/local/go/src/database/sql/sql.go:2767 +0xb4
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2745 +0x2c
database/sql.withLock(0x3795d08, 0x7b4e1f8, 0x7b2104c)
	/usr/local/go/src/database/sql/sql.go:3184 +0x60
database/sql.(*Rows).Next(0x7b4e1e0, 0x7673b00)
	/usr/local/go/src/database/sql/sql.go:2744 +0x78
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0x7b4e1e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:221 +0xc8
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.(*SQLLog).List(0x760c270, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x0, 0x0, 0x1, 0x0, 0x0, ...)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:188 +0xe0
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).get(0x73f0170, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x0, 0x0, 0x7a36001, 0x70d8400, 0xa83bd4, ...)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:55 +0x80
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).Create(0x73f0170, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x76945a0, 0x11, 0x11, 0x0, 0x0, ...)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:88 +0xfc
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).Start(0x73f0170, 0x37bbfd8, 0x7165c80, 0x6, 0x781a059)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:36 +0xd0
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/endpoint.Listen(0x37bbfd8, 0x7165c80, 0x0, 0x0, 0x0, 0x781a050, 0x48, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/endpoint/endpoint.go:58 +0xe4
github.com/rancher/k3s/pkg/cluster.(*Cluster).startStorage(0x70e3040, 0x37bbfd8, 0x7165c80, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:62 +0x60
github.com/rancher/k3s/pkg/cluster.(*Cluster).Start(0x70e3040, 0x37bbfd8, 0x7165c80, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:53 +0xa8
github.com/rancher/k3s/pkg/daemons/control.prepare(0x37bbfd8, 0x7165c80, 0x79ed204, 0x7482c60, 0x5847fe8, 0x1a)
	/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:337 +0xfa8
github.com/rancher/k3s/pkg/daemons/control.Server(0x37bbfd8, 0x7165c80, 0x79ed204, 0x37bbfd8, 0x7165c80)
	/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:83 +0x168
github.com/rancher/k3s/pkg/server.StartServer(0x37bbfd8, 0x7165c80, 0x79ed200, 0x7165c80, 0x2)
	/go/src/github.com/rancher/k3s/pkg/server/server.go:51 +0x70
github.com/rancher/k3s/pkg/cli/server.run(0x730bb80, 0x5848a48, 0x1786c8, 0x37b5698)
	/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:173 +0xc58
github.com/rancher/k3s/pkg/cli/server.Run(0x730bb80, 0x5604a10, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:35 +0x44
github.com/rancher/k3s/vendor/github.com/urfave/cli.HandleAction(0x29d5a98, 0x30a7bb0, 0x730bb80, 0x730bb80, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/app.go:514 +0xac
github.com/rancher/k3s/vendor/github.com/urfave/cli.Command.Run(0x2eea252, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2f1866f, 0x15, 0x7896e40, ...)
	/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/command.go:171 +0x370
github.com/rancher/k3s/vendor/github.com/urfave/cli.(*App).Run(0x731a700, 0x700c0a0, 0x3, 0x4, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/app.go:265 +0x510
main.main()
	/go/src/github.com/rancher/k3s/cmd/server/main.go:46 +0x2ec

@leonklingele
Copy link

Running into the same issue on three different k3s cluster masters (all set up with k3sup install --ip $SERVER_IP --sudo=false --cluster --k3s-extra-args '--no-deploy traefik').

$ /usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/HASH
DEBU[0000] Running /var/lib/rancher/k3s/data/HASH/bin/k3s-server [/usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik]
INFO[2020-02-05T19:34:01.279721010+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
Segmentation fault

strace:

[..]
futex(0x6d87010, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d86f10, FUTEX_WAKE_PRIVATE, 1) = 1
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
recvfrom(5, {{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1, pid=32278}, 0}, 4096, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [112->12]) = 20
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
close(5)                                = 0
getuid()                                = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 5
read(5, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 4096) = 2335
read(5, "", 4096)                       = 0
close(5)                                = 0
write(2, "\33[36mINFO\33[0m[2020-02-05T19:38:4"..., 97INFO[2020-02-05T19:38:45.395473030+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
) = 97
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc000186848, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = ?
+++ killed by SIGSEGV +++
Segmentation fault

@brandond
Copy link
Contributor

brandond commented Feb 5, 2020

What platform (architecture) is this on?

@leonklingele
Copy link

Debian 9 and 10, latest patch level, x86-64.

@pauloalima
Copy link

I've seen this error when using cluster with dsqlite on RPi 4 and x86-64. I don't have logs because I removed k3s and will install again with an external DB.

@leonklingele
Copy link

v1.17.3+k3s1 unfortunately didn't fix the issue for me :(

@amiga23
Copy link

amiga23 commented Mar 22, 2020

Same issue on RHEL7 amd64 v1.17.3+k3s1. SEGV after "Cluster bootstrap already complete".

@rungej
Copy link

rungej commented Apr 23, 2020

The issue still exists on Debian 10 amd64 with v1.17.4+k3s1 in a multi server environment with embedded dsqlite DB.

@stale
Copy link

stale bot commented Jul 31, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 31, 2021
@stale stale bot closed this as completed Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants