k3s log fill up my disk in short time #7128

liyimeng · 2023-03-21T09:35:54Z

Environmental Info:
K3s Version: 1.25.6

Node(s) CPU architecture, OS, and Version:

x86_64, ubuntu 22.04
Cluster Configuration:

3 server 3 nodes
Describe the bug:

I have the fresh installed cluster running for 3-4 days, suddenly one of the master get filled up by k3s-service.log.
which keep printing

msg="Failed to test temporary data store connection: failed to dial endpoint http://127.0.0.1:2399 with maintenance client: context canceled"

MILLIONS line of this text make the k3s-service.log grow into hundreds of GB in a couple of hours.

Steps To Reproduce:

Installed K3s:
install 1.25.6
Expected behavior:

cluster nodes keep running stalely.
Actual behavior:

one of the master get filled up with massive log printout, then kill the node in the end.

Additional context / logs:

It keep printing

msg="Failed to test temporary data store connection: failed to dial endpoint http://127.0.0.1:2399 with maintenance client: context canceled"

The text was updated successfully, but these errors were encountered:

brandond · 2023-03-21T19:05:37Z

You'll need to provide more than just the one repeating log message. Can you go back in the logs to just before that message started repeating, or perhaps just stop k3s, clean up the logs, and then start it again so that you can get the logs from the beginning of startup onwards?

You might also confirm that nothing else obvious has gone wrong with this host, such as running out of disk space.

liyimeng · 2023-03-22T09:50:13Z

@brandond Thanks for attention! Yes, I know the log provide no clue here. When I see the issue, the log file is 400GB+, impossible to see the beginning part of log. I restart the service to recollect the log, but the problem is gone when doing so.

So losing the chance to collect meaningful log. Is this because of something going wrong with embedded etcd?

Btw, my friend said he experienced the same on 1.23.10. Rebooting the node and the problem is gone.

I will try to see if I can collect a meaningful log when it occurs again.

liyimeng · 2023-03-27T10:08:05Z

It is happening again, I observe that there are more than one k3s server instances are running on the node, even I have stopped the k3s-service.

ps -ef | grep  server | grep k3s
root     11974     1 99 17:54 ?        00:11:03 /sbin/k3s server
root     15326     1 99 16:15 ?        03:30:54 /sbin/k3s server
root     27884     1 47 16:14 ?        00:50:24 /sbin/k3s server
root     32143     1 99 17:50 ?        00:18:19 /sbin/k3s server

My system use openrc to start the service. On the normal node, I have

ps -ef | grep  server | grep k3s
root     37587     1  0 13:49 ?        00:00:00 supervise-daemon k3s-service --start --stdout /var/log/k3s-service.log --stderr /var/log/k3s-service.log --pidfile /var/run/k3s-service.pid --respawn-delay 5 --respawn-max 0 /sbin/k3s -- server --disable servicelb --server https://kubernetes --node-external-ip 172.27.13.170 --protect-kernel-defaults=true --secrets-encryption=true --kube-apiserver-arg=audit-policy-file=/var/lib/rancher/k3s/server/audit.yaml --kube-apiserver-arg=audit-log-path=/var/lib/rancher/k3s/server/audit/audit.log --kube-apiserver-arg=audit-log-maxage=30 --kube-apiserver-arg=audit-log-maxbackup=10 --kube-apiserver-arg=audit-log-maxsize=100 --kube-apiserver-arg=request-timeout=300s --kube-apiserver-arg=service-account-lookup=true --kube-apiserver-arg=enable-admission-plugins=NodeRestriction,PodSecurity,NamespaceLifecycle,ServiceAccount --kube-apiserver-arg=feature-gates=MemoryQoS=true,PodSecurity=true --kube-controller-manager-arg=terminated-pod-gc-threshold=10 --kube-controller-manager-arg=use-service-account-credentials=true --kubelet-arg=streaming-connection-idle-timeout=5m --kubelet-arg=make-iptables-util-chains=true --node-label k3os.io/mode=local --node-label k3os.io/version=0404260
root     37588 37587 27 13:49 ?        01:09:30 /sbin/k3s server

For some reason, k3s-service script dose not actually kill '/sbin/k3s server' processes, the left over process conflict to each other, and racing to write into the log files, hence collecting GBs of log in a couple of minutes.

@brandond Is there any chance we can improve create_openrc_service_file() in the install.sh, make it robust, avoiding such situation from happening?

Before starting up new k3s instance, make sure old ones are gone. To fix issue k3s-io#7128

Before starting up new k3s instance, make sure old ones are gone. To fix issue k3s-io#7128 Signed-off-by: Liyi Meng <meng.mobile@gmail.com>

caroline-suse-rancher · 2024-01-05T21:22:23Z

@liyimeng is this still an issue for you? I see the open PR, but it's been some time without an update. Thanks!

liyimeng · 2024-01-06T20:20:01Z

@caroline-suse-rancher Thanks for your attention! I have been using the solution in the PR to solve this problem. So far so good. Not sure if it can help others.

liyimeng added a commit to liyimeng/k3s that referenced this issue Mar 27, 2023

Ensure old k3s processes are clean up

c0a7159

Before starting up new k3s instance, make sure old ones are gone. To fix issue k3s-io#7128

liyimeng mentioned this issue Mar 27, 2023

Ensure old k3s processes are clean up #7153

Closed

liyimeng added a commit to liyimeng/k3s that referenced this issue Mar 28, 2023

Ensure old k3s processes are clean up

13d0743

Before starting up new k3s instance, make sure old ones are gone. To fix issue k3s-io#7128 Signed-off-by: Liyi Meng <meng.mobile@gmail.com>

caroline-suse-rancher added the kind/bug Something isn't working label Apr 18, 2023

caroline-suse-rancher added this to the v1.28.3+k3s1 milestone Oct 11, 2023

caroline-suse-rancher removed this from the v1.28.3+k3s1 milestone Nov 14, 2023

caroline-suse-rancher added the status/stale label Jan 5, 2024

stale bot removed the status/stale label Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s log fill up my disk in short time #7128

k3s log fill up my disk in short time #7128

liyimeng commented Mar 21, 2023

brandond commented Mar 21, 2023

liyimeng commented Mar 22, 2023

liyimeng commented Mar 27, 2023 •

edited

Loading

caroline-suse-rancher commented Jan 5, 2024

liyimeng commented Jan 6, 2024

k3s log fill up my disk in short time #7128

k3s log fill up my disk in short time #7128

Comments

liyimeng commented Mar 21, 2023

brandond commented Mar 21, 2023

liyimeng commented Mar 22, 2023

liyimeng commented Mar 27, 2023 • edited Loading

caroline-suse-rancher commented Jan 5, 2024

liyimeng commented Jan 6, 2024

liyimeng commented Mar 27, 2023 •

edited

Loading