--cluster-reset renders entire cluster hosed #5070

withinboredom · 2022-02-03T20:21:25Z

Environmental Info:
K3s Version:

k3s version v1.22.5+k3s1 (405bf79)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux delightful 5.13.0-28-generic #31-Ubuntu SMP Thu Jan 13 17:41:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Linux capital 5.13.0-28-generic #31-Ubuntu SMP Thu Jan 13 17:41:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Linux donkey 5.13.0-28-generic #31-Ubuntu SMP Thu Jan 13 17:41:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

3 servers

Describe the bug:

If a cluster is up for more than approximately 3 weeks, rebooting a node will cause that node to be unable to rejoin the cluster. The last few times this happened, I just rebuilt from scratch. This time, I decided to try and use --cluster-reset to salvage the cluster. However, after it runs, it tells me to start as normal, so I run systemctl start k3s and it fails with:

Feb 03 20:57:10 delightful k3s[1591061]: {"level":"panic","ts":"2022-02-03T20:57:10.654+0100","caller":"rafthttp/transport.go:346","msg":"unexpected removal of unknown remote peer","remote-peer-id":"ec5b7981eb5dd5de","stacktrace":"github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).removePeer\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp/transport.go:346\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).RemovePeer\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp/transport.go:329\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:2303\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:2135\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1357\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1179\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1111\ngithub.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\t/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule/schedule.go:157"}
Feb 03 20:57:10 delightful k3s[1591061]: panic: unexpected removal of unknown remote peer
Feb 03 20:57:10 delightful k3s[1591061]: goroutine 295 [running]:
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000ff4000, 0xc005980340, 0x1, 0x1)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.uber.org/zap/zapcore/entry.go:234 +0x58d
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.uber.org/zap.(*Logger).Panic(0xc000160910, 0x4fe1c86, 0x29, 0xc005980340, 0x1, 0x1)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.uber.org/zap/logger.go:227 +0x85
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).removePeer(0xc000fee000, 0xec5b7981eb5dd5de)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp/transport.go:346 +0x58c
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).RemovePeer(0xc000fee000, 0xec5b7981eb5dd5de)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp/transport.go:329 +0x7d
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange(0xc00182ea00, 0x1, 0xec5b7981eb5dd5de, 0x0, 0x0, 0x0, 0x0, 0xc00158ea80, 0x0, 0x0, ...)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:2303 +0x872
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply(0xc00182ea00, 0xc0044691a0, 0x10e, 0x1a2, 0xc00158ea80, 0xc002624118, 0xc0026240d0, 0x24b1a1d)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:2135 +0x59a
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries(0xc00182ea00, 0xc00158ea80, 0xc0049bc0b0)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1357 +0xe5
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll(0xc00182ea00, 0xc00158ea80, 0xc0049bc0b0)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1179 +0x88
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8(0x5959b78, 0xc004c88480)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/server/v3/etcdserver/server.go:1111 +0x3c
Feb 03 20:57:10 delightful k3s[1591061]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run(0xc0003d5020)
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule/schedule.go:157 +0xf3
Feb 03 20:57:10 delightful k3s[1591061]: created by github.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler
Feb 03 20:57:10 delightful k3s[1591061]:         /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/pkg/v3/schedule/schedule.go:70 +0x13b
Feb 03 20:57:10 delightful systemd[1]: k3s.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 03 20:57:10 delightful systemd[1]: k3s.service: Failed with result 'exit-code'.

Steps To Reproduce:

Run a 3 node cluster for more than a few weeks (this is a requirement, works flawlessly if less than a few weeks)
Drain a node
Reboot said node
See errors on said node about bering unable to connect to etcd
See errors on etcd master about sending buffer full
drain another node
systemctl stop k3s on the now drained node
stop k3s on the remaining node
run k3s --cluster-reset on the remaining node
start k3s when prompted, see error

Expected behavior:

A cluster not to fail just because it has been running a few weeks...

Actual behavior:

Any 3 node cluster will fall apart if a node is rebooted after running more than a few weeks.

Additional context / logs:

I'm aware of etcd quorum and as mentioned, this is not an issue if the cluster has been running for a "short" period of time. The cluster handles this all gracefully. The node being rebooted is not the leader and no voting occurs during the reboot, thus the cluster should continue running just fine. I don't believe this is the cause of this issue.

Backporting

Needs backporting to older releases

The text was updated successfully, but these errors were encountered:

brandond · 2022-02-03T20:23:41Z

Feb 03 20:57:10 delightful k3s[1591061]: panic: unexpected removal of unknown remote peer

This is a bug in etcd 3.5.0; the most recent round of k3s releases has updated to etcd 3.5.1 which includes a fix for this issue. Please upgrade your version and try again.

duplicate of #3997

brandond · 2022-02-03T20:27:04Z

I am curious why just rebooting your nodes causes the cluster to break though. Are your servers getting different addresses from DHCP when they come back up?

withinboredom · 2022-02-03T20:30:24Z

Are your servers getting different addresses from DHCP when they come back up?

No, it is all static ips.

I am curious why just rebooting your nodes causes the cluster to break

Me too...

brandond · 2022-02-03T20:33:29Z

Upgrade first, see what happens. If you still run into issues with nodes not rejoining after a reboot, post the logs from that failure.

withinboredom · 2022-02-03T20:49:54Z

The originally rebooted node still fails to connect, and basically repeating:

Feb 03 21:46:45 capital k3s[6465]: time="2022-02-03T21:46:45+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:46:45 capital k3s[6465]: time="2022-02-03T21:46:45+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:46:45 capital k3s[6465]: time="2022-02-03T21:46:45+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:46:49 capital k3s[6465]: {"level":"warn","ts":"2022-02-03T21:46:49.159+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00122ea80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:46:49 capital k3s[6465]: time="2022-02-03T21:46:49+01:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Feb 03 21:46:50 capital k3s[6465]: {"level":"warn","ts":"2022-02-03T21:46:50.237+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00122ea80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:46:50 capital k3s[6465]: time="2022-02-03T21:46:50+01:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Feb 03 21:46:50 capital k3s[6465]: time="2022-02-03T21:46:50+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:46:50 capital k3s[6465]: time="2022-02-03T21:46:50+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:46:50 capital k3s[6465]: time="2022-02-03T21:46:50+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:46:55 capital k3s[6465]: time="2022-02-03T21:46:55+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:46:55 capital k3s[6465]: time="2022-02-03T21:46:55+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:46:55 capital k3s[6465]: time="2022-02-03T21:46:55+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:46:55 capital k3s[6465]: {"level":"warn","ts":"2022-02-03T21:46:55.326+0100","caller":"etcdserver/server.go:2050","msg":"failed to publish local member to cluster through raft","local-member-id":"71dc45d5eb7cb06f","local-member-attributes":"{Name:capital-c6feee58 ClientURLs:[https://192.168.100.3:2379]}","request-path":"/0/members/71dc45d5eb7cb06f/attributes","publish-timeout":"15s","error":"etcdserver: request timed out"}
Feb 03 21:47:00 capital k3s[6465]: time="2022-02-03T21:47:00+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:47:00 capital k3s[6465]: time="2022-02-03T21:47:00+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:47:00 capital k3s[6465]: time="2022-02-03T21:47:00+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:47:02 capital kernel: [UFW BLOCK] IN=enp8s0.4000 OUT= MAC=04:42:1a:1f:02:f0:a8:a1:59:8e:72:b5:08:00 SRC=192.168.100.2 DST=192.168.100.3 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=55636 DPT=2380 WINDOW=0 RES=0x00 RST URGP=0
Feb 03 21:47:04 capital k3s[6465]: {"level":"warn","ts":"2022-02-03T21:47:04.159+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00122ea80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:47:04 capital k3s[6465]: time="2022-02-03T21:47:04+01:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Feb 03 21:47:05 capital k3s[6465]: {"level":"warn","ts":"2022-02-03T21:47:05.238+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00122ea80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:47:05 capital k3s[6465]: time="2022-02-03T21:47:05+01:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Feb 03 21:47:05 capital k3s[6465]: time="2022-02-03T21:47:05+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:47:05 capital k3s[6465]: time="2022-02-03T21:47:05+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:47:05 capital k3s[6465]: time="2022-02-03T21:47:05+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"

withinboredom · 2022-02-03T21:00:24Z

Just went through the whole thing again, here's where things get interesting:

Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.949+0100","caller":"rafthttp/http.go:257","msg":"receiving database snapshot","local-member-id":"64f8579255e91227","remote-snapshot-sender-id":"689da5af1f9c0825","incoming-snapshot-index":11225279,"incoming-snapshot-message-size-bytes":81929,"incoming-snapshot-message-size":"82 kB"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.980+0100","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"64f8579255e91227","to":"e272d568cc232d52","stream-type":"stream Message"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.980+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"64f8579255e91227","remote-peer-id":"e272d568cc232d52"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.981+0100","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"64f8579255e91227","to":"e272d568cc232d52","stream-type":"stream MsgApp v2"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.981+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"64f8579255e91227","remote-peer-id":"e272d568cc232d52"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.982+0100","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"64f8579255e91227","to":"689da5af1f9c0825","stream-type":"stream Message"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.982+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"64f8579255e91227","remote-peer-id":"689da5af1f9c0825"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.982+0100","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"64f8579255e91227","to":"689da5af1f9c0825","stream-type":"stream MsgApp v2"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.982+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"64f8579255e91227","remote-peer-id":"689da5af1f9c0825"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.986+0100","caller":"etcdserver/server.go:767","msg":"initialized peer connections; fast-forwarding election ticks","local-member-id":"64f8579255e91227","forward-ticks":8,"forward-duration":"4s","election-ticks":10,"election-timeout":"5s","active-remote-members":2}
Feb 03 21:58:46 capital k3s[6596]: time="2022-02-03T21:58:46+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:58:46 capital k3s[6596]: time="2022-02-03T21:58:46+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:58:46 capital k3s[6596]: time="2022-02-03T21:58:46+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:58:46 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:58:46.964+0100","caller":"rafthttp/http.go:271","msg":"failed to save incoming database snapshot","local-member-id":"64f8579255e91227","remote-snapshot-sender-id":"689da5af1f9c0825","incoming-snapshot-index":11225279,"error":"read tcp 192.168.100.3:2380->192.168.100.2:25862: i/o timeout"}
Feb 03 21:58:50 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:58:50.823+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0027181c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Feb 03 21:58:50 capital k3s[6596]: time="2022-02-03T21:58:50+01:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Feb 03 21:58:51 capital k3s[6596]: time="2022-02-03T21:58:51+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:58:51 capital k3s[6596]: time="2022-02-03T21:58:51+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:58:51 capital k3s[6596]: time="2022-02-03T21:58:51+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:58:56 capital k3s[6596]: time="2022-02-03T21:58:56+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:58:56 capital k3s[6596]: time="2022-02-03T21:58:56+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:58:56 capital k3s[6596]: time="2022-02-03T21:58:56+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:58:56 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:58:56.936+0100","caller":"etcdserver/server.go:2050","msg":"failed to publish local member to cluster through raft","local-member-id":"64f8579255e91227","local-member-attributes":"{Name:capital-bb983397 ClientURLs:[https://192.168.100.3:2379]}","request-path":"/0/members/64f8579255e91227/attributes","publish-timeout":"15s","error":"etcdserver: request timed out, possibly due to connection lost"}
Feb 03 21:59:01 capital k3s[6596]: time="2022-02-03T21:59:01+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:59:01 capital k3s[6596]: time="2022-02-03T21:59:01+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:59:01 capital k3s[6596]: time="2022-02-03T21:59:01+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Feb 03 21:59:05 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:59:05.824+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0027181c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:59:05 capital k3s[6596]: time="2022-02-03T21:59:05+01:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Feb 03 21:59:06 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:59:06.864+0100","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0027181c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 21:59:06 capital k3s[6596]: time="2022-02-03T21:59:06+01:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Feb 03 21:59:06 capital k3s[6596]: time="2022-02-03T21:59:06+01:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Feb 03 21:59:06 capital k3s[6596]: time="2022-02-03T21:59:06+01:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Feb 03 21:59:06 capital k3s[6596]: time="2022-02-03T21:59:06+01:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"

withinboredom · 2022-02-03T21:40:08Z

From one of the other nodes, I think this is the main issue?

r: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 03 22:37:18 donkey k3s[1229182]: time="2022-02-03T22:37:18+01:00" level=warning msg="Learner capital-25aa429b stalled at RaftAppliedIndex=0 for 1m15.973650914s"
Feb 03 22:37:22 donkey k3s[1229182]: {"level":"warn","ts":"2022-02-03T22:37:22.726+0100","caller":"backend/backend.go:344","msg":"snapshotting taking too long to transfer","taking":"1m30.000285888s","bytes":77922304,"size":"78 MB"}

It seems that etcd may need longer to transfer bigger snapshots and why ~3 weeks in my environment is "long enough."

withinboredom · 2022-02-03T21:44:28Z

@brandond this smells suspiciously like a timeout somewhere. Is there any way that you know of to configure it?

withinboredom · 2022-02-03T21:48:16Z

Aha, there are no snapshots in the snapshot directory for the leader 🤔 so it has nothing to transfer. The original node that (re)bootstrapped the cluster, however, does have snapshots...

brandond · 2022-02-03T22:05:11Z

This is all internal etcd stuff that we don't manage too closely. I believe etcd takes an internal snapshot to bring the newly joined member up to speed... that said I've never seen an IO timeout while the member is streaming the initial from the peer:

Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.982+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"64f8579255e91227","remote-peer-id":"689da5af1f9c0825"}
Feb 03 21:58:41 capital k3s[6596]: {"level":"info","ts":"2022-02-03T21:58:41.986+0100","caller":"etcdserver/server.go:767","msg":"initialized peer connections; fast-forwarding election ticks","local-member-id":"64f8579255e91227","forward-ticks":8,"forward-duration":"4s","election-ticks":10,"election-timeout":"5s","active-remote-members":2}
Feb 03 21:58:46 capital k3s[6596]: {"level":"warn","ts":"2022-02-03T21:58:46.964+0100","caller":"rafthttp/http.go:271","msg":"failed to save incoming database snapshot","local-member-id":"64f8579255e91227","remote-snapshot-sender-id":"689da5af1f9c0825","incoming-snapshot-index":11225279,"error":"read tcp 192.168.100.3:2380->192.168.100.2:25862: i/o timeout"}

Can you attach logs from all three nodes, covering the time period where you're restarting the node that fails to come back up? Is there anything unique about your configuration? Odd network topology, slow disks, unusual k3s cli flags?

withinboredom · 2022-02-03T23:08:11Z

I'm able to reproduce it ~1/3 times, so I'm able to most likely get the first and second node in the cluster, but so far I haven't been lucky enough to get all three.

Can you attach logs from all three nodes, covering the time period where you're restarting the node that fails to come back up?

I'll get you some logs tomorrow, due to all the failing services trying to anti-affinity themselves, it's pretty noisy. I'll see if I can get them scaled to 0 for some nice clean logs.

Is there anything unique about your configuration? Odd network topology, slow disks, unusual k3s cli flags?

The only thing I can think of is that it's bare-metal with a VLAN between the nodes. Here are the k3s flags:

--no-deploy traefik --no-deploy local-storage --node-name $NODE--node-label topology.kubernetes.io/region=west-eu --node-external-ip=XX.XX.XX.XX --advertise-address=$VLAN_ADDR --node-ip=$VLAN_ADDR --flannel-iface enp9s0.4000

withinboredom · 2022-02-04T05:45:51Z

Both logs can be found in this gist.

"master" node was started with:

 server \
        'server' \
        '--no-deploy' \
        'traefik' \
        '--no-deploy' \
        'local-storage' \
        '--node-name' \
        'delightful' \
        '--node-label' \
        'topology.kubernetes.io/region=west-eu' \
        '--node-label' \
        'topology.kubernetes.io/zone=HEL1-DC6' \
        '--node-external-ip=X.X.X.X' \
        '--advertise-address=192.168.100.2' \
        '--node-ip=192.168.100.2' \
        '--flannel-iface' \
        'enp9s0.4000' \
        '--cluster-init' \

Second node was started with:

        'server' \
        '--no-deploy' \
        'traefik' \
        '--no-deploy' \
        'local-storage' \
        '--node-name' \
        'capital' \
        '--node-label' \
        'topology.kubernetes.io/region=west-eu' \
        '--node-label' \
        'topology.kubernetes.io/zone=HEL1-DC6' \
        '--node-external-ip=X.X.X.X' \
        '--advertise-address=192.168.100.3' \
        '--node-ip=192.168.100.3' \
        '--flannel-iface' \
        'enp8s0.4000' \
        '--server' \
        'https://192.168.100.2:6443' \

Looking at the logs, it seems like there is (maybe) an etcd bug?

"msg":"sending merged snapshot","from":"689da5af1f9c0825","to":"ee4a2ddc00ac1817","bytes":78041332,"size":"78 MB"

and then on the receiving end:

"incoming-snapshot-message-size-bytes":119028,"incoming-snapshot-message-size":"119 kB"

Those sizes are vastly different...

brandond · 2022-02-04T06:43:45Z

That does appear to be a bug in etcd; one node reports expecting a much smaller snapshot than the other node is prepared to send. Are these nodes both the same architecture? I can't imagine that would be it, but there must be something unique about your environment since I haven't run into this anywhere else and can't reproduce it on my end.

withinboredom · 2022-02-04T08:09:24Z

Yeah, they're the same architecture, in fact they're almost the exact same with the only difference being the brand of networking card. I've tried doing a manual restore using etcdctl and the disaster recover instructions. It ended up taking a slightly different way to the same result.

Anyway, I did discover this in the logs when the server comes online:

"msg":"recovered v2 store from snapshot","snapshot-index":11300115,"snapshot-size":"117 kB"

So, maybe there is some v2 in the snapshot which is causing an issue, also maybe why you can't replicate it? It doesn't explain why I see this every three weeks ... but maybe every few weeks I just run into a different problem :)

withinboredom · 2022-02-04T08:44:42Z

Hmm, there could also be some interesting bugs in how etcd calculates sizes since it up-casts to uint64, does operations and (I don't know golang too well) and then casts back to an int, which I assume will result in an overflow or some other shenanigans in some specific conditions. Granted the limit is hard-coded at 512mb, which should never reach the edge of an int...

Anyway, I don't know what to do except convert to a single node cluster with two workers OR rebuild the cluster from scratch.

withinboredom · 2022-02-04T15:14:14Z

Wireshark to the rescue.

I noticed a number of retransmissions and other shenanigans on the link. So, I switched the cluster to use the external IP address and everything started working correctly. I don't think this is directly a k3s or etcd bug, but maybe it could be more resilient.

Thanks for helping me @brandond!

withinboredom closed this as completed Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--cluster-reset renders entire cluster hosed #5070

--cluster-reset renders entire cluster hosed #5070

withinboredom commented Feb 3, 2022 •

edited by brandond

brandond commented Feb 3, 2022 •

edited

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 4, 2022 •

edited

brandond commented Feb 4, 2022

withinboredom commented Feb 4, 2022

withinboredom commented Feb 4, 2022

withinboredom commented Feb 4, 2022

--cluster-reset renders entire cluster hosed #5070

--cluster-reset renders entire cluster hosed #5070

Comments

withinboredom commented Feb 3, 2022 • edited by brandond

brandond commented Feb 3, 2022 • edited

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 3, 2022

brandond commented Feb 3, 2022

withinboredom commented Feb 3, 2022

withinboredom commented Feb 4, 2022 • edited

brandond commented Feb 4, 2022

withinboredom commented Feb 4, 2022

withinboredom commented Feb 4, 2022

withinboredom commented Feb 4, 2022

withinboredom commented Feb 3, 2022 •

edited by brandond

brandond commented Feb 3, 2022 •

edited

withinboredom commented Feb 4, 2022 •

edited