Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: kubelet panic & crash if --config-dir is used #124796

Open
hegerdes opened this issue May 10, 2024 · 12 comments · Fixed by kubernetes/website#46360
Open

bug: kubelet panic & crash if --config-dir is used #124796

hegerdes opened this issue May 10, 2024 · 12 comments · Fixed by kubernetes/website#46360
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@hegerdes
Copy link

hegerdes commented May 10, 2024

What happened?

I created a v1.30.0 k8s cluster with kubeadm and created a drop-in directory for kubelet configuration under /etc/kubernetes/kubelet.conf.d. The normal conf file created by kubeadm is in /var/lib/kubelet/config.yaml

According to the docs the kubelet should merge all configs in specified order and start. But the kubelet crashes under two setups:

Setup 1:

The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf is completely empty the kubelet crashes with:

>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:47:28.749138    9050 run.go:74] "command failed" err="failed to merge kubelet configs: failed to walk through kubelet dropin directory \"/etc/kubernetes/kubelet.conf.d\": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: kubelet config file \"/etc/kubernetes/kubelet.conf.d/20-kubelet.conf\" was empty"

If there is just one space or an newline in that file it crashes with:

>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:49:26.243517    9062 run.go:74] "command failed" err=<
        failed to merge kubelet configs: failed to walk through kubelet dropin directory "/etc/kubernetes/kubelet.conf.d": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: Object 'Kind' is missing in '
        '

Even when the docs say:

These files may contain partial configurations and might not be valid config files by themselves. Validation is only performed on the final resulting configuration structure stored internally in the kubelet.

If there is nothing or only whitspaces in /etc/kubernetes/kubelet.conf.d the merged config should not be invalid and not every file in /etc/kubernetes/kubelet.conf.d must contain a kind property.

Setup 2:
The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf is the same as the /var/lib/kubelet/config.yaml the kubelet panics and crashes with:

> kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
panic: non-positive interval for NewTicker

goroutine 1 [running]:
time.NewTicker(0xc00098f7a0?)
        time/tick.go:22 +0xe5
k8s.io/klog/v2/internal/clock.RealClock.NewTicker(...)
        k8s.io/klog/v2@v2.120.1/internal/clock/clock.go:111
k8s.io/klog/v2.(*flushDaemon).run(0xc000229650, 0x0)
        k8s.io/klog/v2@v2.120.1/klog.go:1169 +0x10f
k8s.io/klog/v2.StartFlushDaemon(0x0)
        k8s.io/klog/v2@v2.120.1/klog.go:1220 +0x36
k8s.io/component-base/logs/api/v1.apply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130})
        k8s.io/component-base/logs/api/v1/options.go:285 +0x890
k8s.io/component-base/logs/api/v1.validateAndApply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130}, 0x4?)
        k8s.io/component-base/logs/api/v1/options.go:132 +0x71
k8s.io/component-base/logs/api/v1.ValidateAndApplyAsField(...)
        k8s.io/component-base/logs/api/v1/options.go:124
k8s.io/kubernetes/cmd/kubelet/app.NewKubeletCommand.func1(0xc0000d8908, {0xc000100060, 0x4, 0x4})
        k8s.io/kubernetes/cmd/kubelet/app/server.go:238 +0x4bc
github.com/spf13/cobra.(*Command).execute(0xc0000d8908, {0xc000100060, 0x4, 0x4})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x882
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000d8908)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
k8s.io/component-base/cli.run(0xc0000d8908)
        k8s.io/component-base/cli/run.go:146 +0x290
k8s.io/component-base/cli.Run(0xc0000061c0?)
        k8s.io/component-base/cli/run.go:46 +0x17
main.main()
        k8s.io/kubernetes/cmd/kubelet/kubelet.go:36 +0x18

Using the same config only via --config works.

What did you expect to happen?

The kubelet should start with empty files in /etc/kubernetes/kubelet.conf.d or if the files in /var/lib/kubelet/config.yaml and /etc/kubernetes/kubelet.conf.d are the same.

If this is not the expected usage, the docs have to be adjusted.

How can we reproduce it (as minimally and precisely as possible)?

Create a cluster with kubeadm v1.30 with kublet config in /var/lib/kubelet/config.yaml and /etc/kubernetes/kubelet.conf.d or setup kubelet by hand.

# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration

nodeRegistration:
  kubeletExtraArgs:
    config: /var/lib/kubelet/config.yaml
    config-dir: /etc/kubernetes/kubelet.conf.d

Anything else we need to know?

The kublet conf api version is apiVersion: kubelet.config.k8s.io/v1beta1 and is was with and without setting KUBELET_CONFIG_DROPIN_DIR_ALPHA

Functionality was introduced in #119390

Kubernetes version

$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.0", GitCommit:"7c48c2bd72b9bf5c44d21d7338cc7bea77d0ad2a", GitTreeState:"clean", BuildDate:"2024-04-17T17:34:08Z", GoVersion:"go1.22.2", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Hetzner/None

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux controlplane-node-amd64-vcxkzu 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

containerd github.com/containerd/containerd v1.7.16

Related plugins (CNI, CSI, ...) and versions (if applicable)

not relevant
@hegerdes hegerdes added the kind/bug Categorizes issue or PR as related to a bug. label May 10, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 10, 2024
@ffromani
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 10, 2024
@carlory
Copy link
Member

carlory commented May 11, 2024

I think it is a doc issue for step 1.

From KEP, it said,

If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file

@hegerdes
Copy link
Author

I think it is a doc issue for step 1.

From KEP, it said,

If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file

Just to be sure I recreated the setup given in the KEP. It did not work. Every file in the drop-ins seems to be required to have a kind (which suggest the validation is not just done at the end) and empty files produce an error.

There seems to be something wrong while merging the files.

@saschagrunert
Copy link
Member

@haircommander
Copy link
Contributor

yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?

@sohankunkerkar
Copy link
Contributor

yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?

+1 to what @haircommander said. Let me go ahead and update the docs.

@ffromani
Copy link
Contributor

/triage accepted

per #124796 (comment) and #124796 (comment)

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024
@ffromani ffromani moved this from Triage to Triaged in SIG Node Bugs May 14, 2024
@ffromani
Copy link
Contributor

I think in the case/step 2 described above the kubelet should preferably abort like it does in the case/step1 rather than crash with a stacktrace. Perhaps a stretch goal?

SIG Node Bugs automation moved this from Triaged to Done May 21, 2024
@hegerdes
Copy link
Author

hegerdes commented May 21, 2024

Nice, thanks for updating the docs so fast.

Just wanted to check in what's up with problem 2 described in Setup 2. Should this be reopened or is a new Issue needed?

@carlory
Copy link
Member

carlory commented May 21, 2024

/reopen

For problem 2 described in Setup 2

@k8s-ci-robot k8s-ci-robot reopened this May 21, 2024
@k8s-ci-robot
Copy link
Contributor

@carlory: Reopened this issue.

In response to this:

/reopen

For problem 2 described in Setup 2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

SIG Node Bugs automation moved this from Done to Triage May 21, 2024
@haircommander haircommander moved this from Triage to Triaged in SIG Node Bugs May 22, 2024
@haircommander
Copy link
Contributor

/assign @sohankunkerkar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.

7 participants