Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet fails to start when set --protect-kernel-defaults=true. #66241

Closed
haoqing0110 opened this issue Jul 16, 2018 · 5 comments
Closed

kubelet fails to start when set --protect-kernel-defaults=true. #66241

haoqing0110 opened this issue Jul 16, 2018 · 5 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@haoqing0110
Copy link

haoqing0110 commented Jul 16, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:

  1. Set --protect-kernel-defaults=true in kubelet.service, reload & restart kubelet.service, it runs well.
  2. Restart host. kubelet.service fails to start. Below is the log from journalctl -u kubelet
Jul 16 13:39:43 hostname hyperkube[779]: F0716 13:39:43.007658     779 
kubelet.go:1335] Failed to start ContainerManager [Invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0, Invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0, Invalid kernel flag: kernel/panic, expected value: 10, actual value: 0]
Jul 16 13:39:43 hostname systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jul 16 13:39:43 hostname systemd[1]: kubelet.service: Unit entered failed state.
Jul 16 13:39:43 hostname systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jul 16 13:39:53 hostname systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jul 16 13:39:53 hostname systemd[1]: Stopped Kubelet Service.
# systemctl status kubelet.service
● kubelet.service - Kubelet Service
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2018-07-16 13:39:53 UTC; 6s ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 1567 ExecStart=/opt/kubernetes/hyperkube kubelet --protect-kernel-defaults=true (code=exited, status=255)
 Main PID: 1567 (code=exited, status=255)

Jul 16 13:39:53 hostname systemd[1]: kubelet.service: Unit entered failed state.
Jul 16 13:39:53 hostname systemd[1]: kubelet.service: Failed with result 'exit-code'.
  1. Remove --protect-kernel-defaults=true, and restart host, kubelet.service can run successfully.

Why this happens? Below is my kubelet.service file

[Unit]
Description=Kubelet Service
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/etc/environment
ExecStart=/opt/kubernetes/hyperkube kubelet \
  --protect-kernel-defaults=true

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

What you expected to happen:
Add --protect-kernel-defaults=true and restart host, kubelet.service and can run.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    1.11
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
# uname -a
Linux hostname 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 16, 2018
@krmayankk
Copy link

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 17, 2018
@liqlin2015
Copy link

Please close it as --protect-kernel-defaults=true requires some system configuration to make it work.

@jnummelin
Copy link
Contributor

Can't find any docs on what "some system configuration" means in this case, anyone care to share anything on that?

By NOT setting that fail one CIS security benchmark test so it would be nice to be able to set on a cluster.

@jtackaberry
Copy link

Can't find any docs on what "some system configuration" means in this case, anyone care to share anything on that?

Fortunately the log line itself makes this clear:

Failed to start ContainerManager [Invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0, Invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0, Invalid kernel flag: kernel/panic, expected value: 10, actual value: 0]

So

cat > /etc/sysctl.d/90-kubelet.conf << EOF
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
EOF
sysctl -p /etc/sysctl.d/90-kubelet.conf

@max-lobur
Copy link

EKS AMI fix: awslabs/amazon-eks-ami#392

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

7 participants