Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s cluster is not able to start after etcd db size reached 2.2GB #7293

Closed
batulziiy opened this issue Apr 17, 2023 · 1 comment
Closed

k3s cluster is not able to start after etcd db size reached 2.2GB #7293

batulziiy opened this issue Apr 17, 2023 · 1 comment

Comments

@batulziiy
Copy link

Environmental Info:
K3s Version:
v1.25.4+k3s1

Node(s) CPU architecture, OS, and Version:
Linux aokn-nlam-003 5.15.0-5.76.5.1.el9uek.x86_64 #2 SMP Fri Dec 9 18:37:36 PST 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
Cluster consists of 5 master nodes.

Describe the bug:
2 days ago, I found that the k3s service on all nodes was in "activating" status. Quickly checked log file with "journalctl -u k3s" command and it was clear that etcd db size was exceeded over the default size.
image
I tried to increase the db size by adding '--etcd-arg=quota-backend-bytes=6442450944' to /etc/systemd/system/k3s.service file. I believe etcd started with a new settings, but still k3s service fails to start.

Steps To Reproduce:

  1. Deploy k3s cluster with at least 3 nodes to run etcd
  2. Fill etcd db with 2GB size
  3. k3s service will fail to start
  4. Add extraArg --etcd-arg=quota-backend-bytes=6442450944 in /etc/systemd/system/k3s.service
  5. systemctl restart k3s
  • Installed K3s:

Expected behavior:
I expect k3s to run after set a new value of quota-backend-bytes.

Actual behavior:
Journalctl log says there is no space in etcd and k3s service is not able to start.

Additional context / logs:

@batulziiy
Copy link
Author

batulziiy commented Apr 17, 2023

etcd cluster tries to start with a new size, however it fails with the below error message :

PROD-K3S[root@node-003 ~]$journalctl -u k3s -f
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:02:56 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:56.666+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000aa7340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: authentication handshake failed: context deadline exceeded\""}
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Waiting for API server to become available"
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Waiting for etcd server to become available"
Apr 17 16:02:57 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:57.085+0200","caller":"etcdserver/server.go:2065","msg":"failed to publish local member to cluster through raft","local-member-id":"7e7eca77c4f3cf9e","local-member-attributes":"{Name:node-003-984f7e02 ClientURLs:[https://10.10.10.3:2379]}","request-path":"/0/members/7e7eca77c4f3cf9e/attributes","publish-timeout":"15s","error":"etcdserver: request timed out"}
Apr 17 16:02:57 node-003 k3s[3805670]: time="2023-04-17T16:02:57+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:02:58 node-003 k3s[3805670]: time="2023-04-17T16:02:58+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Apr 17 16:02:58 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:02:58.118+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000aa7340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Apr 17 16:02:58 node-003 k3s[3805670]: time="2023-04-17T16:02:58+02:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:01.115+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e is starting a new election at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e became pre-candidate at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 7e7eca77c4f3cf9e at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to 25d08fcc10b19806 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to a106f287be777c56 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to ec632d3e343f6707 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 25d08fcc10b19806 at term 59"}
Apr 17 16:03:01 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:01.525+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e has received 2 MsgPreVoteResp votes and 0 vote rejections"}
Apr 17 16:03:02 node-003 k3s[3805670]: time="2023-04-17T16:03:02+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:03:03 node-003 k3s[3805670]: time="2023-04-17T16:03:03+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Apr 17 16:03:04 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:04.107+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630, vote: 7e7eca77c4f3cf9e] cast MsgPreVote for 25d08fcc10b19806 [logterm: 59, index: 16630] at term 59"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"a106f287be777c56","rtt":"0s","error":"dial tcp 10.10.10.2:2380: connect: connection refused"}
Apr 17 16:03:06 node-003 k3s[3805670]: {"level":"warn","ts":"2023-04-17T16:03:06.116+0200","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ec632d3e343f6707","rtt":"0s","error":"dial tcp 10.10.10.1:2380: connect: connection refused"}
Apr 17 16:03:07 node-003 k3s[3805670]: time="2023-04-17T16:03:07+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e is starting a new election at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e became pre-candidate at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 7e7eca77c4f3cf9e at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to 25d08fcc10b19806 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to a106f287be777c56 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.025+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e [logterm: 59, index: 16630] sent MsgPreVote request to ec632d3e343f6707 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.026+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e received MsgPreVoteResp from 25d08fcc10b19806 at term 59"}
Apr 17 16:03:08 node-003 k3s[3805670]: {"level":"info","ts":"2023-04-17T16:03:08.026+0200","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e7eca77c4f3cf9e has received 2 MsgPreVoteResp votes and 0 vote rejections"}
Apr 17 16:03:08 node-003 k3s[3805670]: time="2023-04-17T16:03:08+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"

@k3s-io k3s-io locked and limited conversation to collaborators Apr 17, 2023
@brandond brandond converted this issue into discussion #7299 Apr 17, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant