Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--log-file-max-size not getting honored by kubelet #86984

Closed
marosset opened this issue Jan 8, 2020 · 23 comments
Closed

--log-file-max-size not getting honored by kubelet #86984

marosset opened this issue Jan 8, 2020 · 23 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/needs-information Indicates an issue needs more information in order to work on it. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.

Comments

@marosset
Copy link
Contributor

marosset commented Jan 8, 2020

What happened:
I configured some nodes in a cluster to log to files and also specified --log-max-file-size and noticed that the log files grew significantly larger than what was specified.

What you expected to happen:
Log files would rotate or trucate after the size exceeded the value specified with --log-max-file-size

How to reproduce it (as minimally and precisely as possible):

I was able to repro this on both Linux and Windows nodes by setting --logdir --logtostderr=flase --alsologtostderr --log-file-max-size as kubelet parameters.

In both cases I set --log-file-max-size=2 (the documentation says the flag is of type uint and in megabytes)

After running some work I observed the files mapped to kubelet.INFO / kubelet.exe.INFO growing to well paste 2Mb

on linux:
image

on windows:
image

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.17.0
  • Cloud provider or hardware configuration: azure
  • OS (e.g: cat /etc/os-release): Ubuntu 16.04.6 LTS and Windows Server 2019
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@marosset marosset added the kind/bug Categorizes issue or PR as related to a bug. label Jan 8, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 8, 2020
@marosset
Copy link
Contributor Author

marosset commented Jan 8, 2020

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 8, 2020
@BenTheElder
Copy link
Member

this is a probably a klog bug
cc @yuwenma xref kubernetes/klog#56

@zhouya0
Copy link
Contributor

zhouya0 commented Jan 10, 2020

I'm willing to help this, but can you show me you kubelet config file?

@marosset
Copy link
Contributor Author

marosset commented Jan 10, 2020

The tools i'm using for deployment still use kubelet args for most settings.
This is what the config looks like for my linux cluster. Let me know if you'd like to see the config for the windows cluster too.

kubelet.service

[Unit]
Description=Kubelet
ConditionPathExists=/usr/local/bin/kubelet


[Service]
Restart=always
EnvironmentFile=/etc/default/kubelet
SuccessExitStatus=143
ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh
ExecStartPre=/bin/mkdir -p /var/lib/kubelet
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/bash -c "if [ $(mount | grep \"/var/lib/kubelet\" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi"
ExecStartPre=/bin/mount --make-shared /var/lib/kubelet


ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_retries2=8
ExecStartPre=/sbin/sysctl -w net.core.somaxconn=16384
ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_max_syn_backlog=16384
ExecStartPre=/sbin/sysctl -w net.core.message_cost=40
ExecStartPre=/sbin/sysctl -w net.core.message_burst=80

ExecStartPre=/bin/bash -c "if [ $(nproc) -gt 8 ]; then /sbin/sysctl -w net.ipv4.neigh.default.gc_thresh1=4096; fi"
ExecStartPre=/bin/bash -c "if [ $(nproc) -gt 8 ]; then /sbin/sysctl -w net.ipv4.neigh.default.gc_thresh2=8192; fi"
ExecStartPre=/bin/bash -c "if [ $(nproc) -gt 8 ]; then /sbin/sysctl -w net.ipv4.neigh.default.gc_thresh3=16384; fi"

ExecStartPre=-/sbin/ebtables -t nat --list
ExecStartPre=-/sbin/iptables -t nat --numeric --list
ExecStart=/usr/local/bin/kubelet \
        --enable-server \
        --node-labels="${KUBELET_NODE_LABELS}" \
        --v=2  \
        --volume-plugin-dir=/etc/kubernetes/volumeplugins \
        $KUBELET_CONFIG \
        $KUBELET_REGISTER_NODE $KUBELET_REGISTER_WITH_TAINTS

[Install]
WantedBy=multi-user.target

/etc/default/kubelet

KUBELET_CONFIG=--address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=PodPriority=true,RotateKubeletServerCertificate=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --image-pull-progress-deadline=30m --keep-terminated-pod-volumes=false --kubeconfig=/var/lib/kubelet/kubeconfig --log-dir=/var/log/kubelet --logtostderr=false --alsologtostderr --log-file-max-size=5 --max-pods=30 --network-plugin=cni --node-status-update-frequency=10s --non-masquerade-cidr=10.240.0.0/12 --pod-infra-container-image=mcr.microsoft.com/k8s/core/pause:1.2.0 --pod-manifest-path=/etc/kubernetes/manifests --pod-max-pids=-1 --protect-kernel-defaults=true --read-only-port=0 --rotate-certificates=true --streaming-connection-idle-timeout=4h --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key
KUBELET_IMAGE=k8s.gcr.io/
KUBELET_REGISTER_SCHEDULABLE=true

KUBELET_NODE_LABELS=kubernetes.azure.com/role=agent,agentpool=agentlinux,storageprofile=managed,storagetier=Premium_LRS,kubernetes.azure.com/cluster=kubelogs-lx

and /var/lib/kubelet/kubeconfig

apiVersion: v1
kind: Config
clusters:
- name: localcluster
  cluster:
    certificate-authority: /etc/kubernetes/certs/ca.crt
    server: https://10.255.255.5:443
users:
- name: client
  user:
    client-certificate: /etc/kubernetes/certs/client.crt
    client-key: /etc/kubernetes/certs/client.key
contexts:

@zhouya0
Copy link
Contributor

zhouya0 commented Jan 13, 2020

In brief, --log-file-max-size can only be useful for a specific file which is set by --log-file.

This logic in klog is :

func CalculateMaxSize() uint64 {
	if logging.logFile != "" {
		if logging.logFileMaxSizeMB == 0 {
			// If logFileMaxSizeMB is zero, we don't have limitations on the log size.
			return math.MaxUint64
		}
		// Flag logFileMaxSizeMB is in MB for user convenience.
		return logging.logFileMaxSizeMB * 1024 * 1024
	}
	// If "log_file" flag is not specified, the target file (sb.file) will be cleaned up when reaches a fixed size.
	return MaxSize
}

Which means, if there is not a log-file specified, then use the default max bytes: 1800M.

@marosset
Copy link
Contributor Author

I see - let me try that out.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 12, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ehashman
Copy link
Member

/remove-lifecycle rotten
/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jun 24, 2021
@k8s-ci-robot
Copy link
Contributor

@ehashman: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 24, 2021
@ehashman
Copy link
Member

/triage needs-information

Can someone please reproduce this?

I think there's a few things we should do to follow up:

  1. Validate if --log-file-max-size is set but --log-file is not set, and print a warning that the first flag will be ignored.
  2. Improve documentation for these flags to indicate they both need to be used together.

I know there's also some discussion of deprecating these flags.

/wg structured-logging

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. labels Jun 24, 2021
@ehashman
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@ehashman:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 24, 2021
@yashikabadaya
Copy link

/assign

@n4j
Copy link
Member

n4j commented Jul 9, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 9, 2021
@n4j n4j added this to Triaged in SIG Node Bugs Jul 9, 2021
@yashikabadaya yashikabadaya removed their assignment Jul 20, 2021
@serathius serathius moved this from Assigned to Accepted in WG Structured Logging - Enhancement work Jul 22, 2021
@ehashman ehashman moved this from Triaged to Needs Information in SIG Node Bugs Aug 5, 2021
@ehashman
Copy link
Member

/remove-triage accepted

This still needs a reproducer

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 14, 2021
@k8s-ci-robot
Copy link
Contributor

@marosset: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor

pohly commented Sep 16, 2021

The plan indeed is to deprecate --log-file-max-size and --log-file, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components

@ehashman
Copy link
Member

/close

Since we are deprecating this flag, I think there's no point in fixing any bugs here.

@k8s-ci-robot
Copy link
Contributor

@ehashman: Closing this issue.

In response to this:

/close

Since we are deprecating this flag, I think there's no point in fixing any bugs here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

WG Structured Logging - Enhancement work automation moved this from Accepted to Done Sep 22, 2021
SIG Node Bugs automation moved this from Needs Information to Done Sep 22, 2021
@marosset
Copy link
Contributor Author

@ehashman Thanks for the updates and closing this issue!
(and sorry I have been slow to respond on this issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/needs-information Indicates an issue needs more information in order to work on it. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Development

No branches or pull requests

9 participants