-
Notifications
You must be signed in to change notification settings - Fork 2.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s cluster is not able to start after etcd db size reached 2.2GB #7293
Comments
etcd cluster tries to start with a new size, however it fails with the below error message : PROD-K3S[root@node-003 ~]$journalctl -u k3s -f
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.666+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000aa7340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: authentication handshake failed: context deadline exceeded\""}
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Waiting for API server to become available"
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Waiting for etcd server to become available"
Apr 17 16:02:57 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:57.085+0200","caller":"etcdserver/server.go:2065","msg":"failed to publish local member to cluster through raft","local-member-id":"7e7eca77c4f3cf9e","local-member-attributes":"{Name:node-003-984f7e02 ClientURLs:[https://10.10.10.3:2379]}","request-path":"/0/members/7e7eca77c4f3cf9e/attributes","publish-timeout":"15s","error":"etcdserver: request timed out"}
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:02:58 node-003 k3s[3805670]: time="2023-04-17T16:02:58+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Apr 17 16:02:58 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:58.118+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000aa7340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Apr 17 16:02:58 node-003 k3s[3805670]: time="2023-04-17T16:02:58+02:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e is starting a new election at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e became pre-candidate at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 7e7eca77c4f3cf9e at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to 25d08fcc10b19806 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to a106f287be777c56 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to ec632d3e343f6707 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 25d08fcc10b19806 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e has received 2 MsgPreVoteResp votes and 0 vote rejections"}
Apr 17 16:03:02 node-003 k3s[3805670]: time="2023-04-17T16:03:02+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:03:03 node-003 k3s[3805670]: time="2023-04-17T16:03:03+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Apr 17 16:03:04 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:04.107+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630, vote: 7e7eca77c4f3cf9e] cast MsgPreVote for 25d08fcc10b19806 [logterm: 59, index: 16630] at term 59"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:07 node-003 k3s[3805670]: time="2023-04-17T16:03:07+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e is starting a new election at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e became pre-candidate at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 7e7eca77c4f3cf9e at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to 25d08fcc10b19806 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to a106f287be777c56 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to ec632d3e343f6707 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.026+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 25d08fcc10b19806 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.026+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e has received 2 MsgPreVoteResp votes and 0 vote rejections"}
Apr 17 16:03:08 node-003 k3s[3805670]: time="2023-04-17T16:03:08+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error" |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Environmental Info:
K3s Version:
v1.25.4+k3s1
Node(s) CPU architecture, OS, and Version:
Linux aokn-nlam-003 5.15.0-5.76.5.1.el9uek.x86_64 #2 SMP Fri Dec 9 18:37:36 PST 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
Cluster consists of 5 master nodes.
Describe the bug:
![image](https://user-images.githubusercontent.com/76592232/232400275-6e01d5a4-06b7-48ef-b9c5-a9942735ec1e.png)
2 days ago, I found that the k3s service on all nodes was in "activating" status. Quickly checked log file with "journalctl -u k3s" command and it was clear that etcd db size was exceeded over the default size.
I tried to increase the db size by adding '--etcd-arg=quota-backend-bytes=6442450944' to /etc/systemd/system/k3s.service file. I believe etcd started with a new settings, but still k3s service fails to start.
Steps To Reproduce:
Expected behavior:
I expect k3s to run after set a new value of quota-backend-bytes.
Actual behavior:
Journalctl log says there is no space in etcd and k3s service is not able to start.
Additional context / logs:
The text was updated successfully, but these errors were encountered: