New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to start ContainerManager failed to initialise top level QOS containers #43856

Closed
zetaab opened this Issue Mar 30, 2017 · 34 comments

Comments

Projects
None yet
@zetaab
Copy link
Member

zetaab commented Mar 30, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-1+c0b74ebf3ce26e-dirty", GitCommit:"c0b74ebf3ce26e46b5397ffb5ce71cd02f951130", GitTreeState:"dirty", BuildDate:"2017-03-30T08:19:21Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: openstack
  • OS (e.g. from /etc/os-release): NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
  • Kernel (e.g. uname -a): Linux kube-dev-master-2-0 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: ansible
  • Others:

What happened: When creating new cluster with kubernetes 1.6.0 always two of three masters are going to notready state.

kubectl get nodes
NAME STATUS AGE VERSION
kube-dev-master-1-0 Ready 2m v1.6.0
kube-dev-master-1-1 NotReady 2m v1.6.0
kube-dev-master-2-0 NotReady 2m v1.6.0
kube-dev-node-1-0 Ready 1m v1.6.0
kube-dev-node-2-0 Ready 1m v1.6.0

log files from kubelet is spamming:
Mar 30 15:20:44 centos7 kubelet: I0330 12:20:44.297149 8724 kubelet.go:1752] skipping pod synchronization - [Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.]

What you expected to happen: I except that masters should join to cluster

How to reproduce it (as minimally and precisely as possible): kubelet args

ExecStart=/usr/bin/kubelet
--kubeconfig=/tmp/kubeconfig
--require-kubeconfig
--register-node=true
--hostname-override=kube-dev-master-2-0
--allow-privileged=true
--cgroup-driver=systemd
--cluster-dns=10.254.0.253
--cluster-domain=cluster.local
--pod-manifest-path=/etc/kubernetes/manifests
--v=4
--cloud-provider=openstack
--cloud-config=/etc/kubernetes/cloud-config

After restart of machines everything is working normally, but restart of machine after installation was not needed before 1.6.0

@youngdev

This comment has been minimized.

Copy link

youngdev commented Mar 30, 2017

Seeing the same issue on Red Hat Enterprise Linux Server 7.3
Kernel: 3.10.0-514.10.2.el7.x86_64 (Same as OP, so I'm suspecting the kernel might have something to do with it)

A reboot of the machine however seem to fix it.

@rootsongjc

This comment has been minimized.

Copy link
Member

rootsongjc commented Apr 1, 2017

+1 Same issue on CentOS 7.2.1511

Apr 01 14:24:08 sz-pg-oam-docker-test-001.tendcloud.com kubelet[103932]: I0401 14:24:08.359839  103932 kubelet.go:1752] skipping pod synchronization - [Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.]

And restart the host, it works.

@xialonglee

This comment has been minimized.

Copy link
Contributor

xialonglee commented Apr 10, 2017

+1 This issue occasionally appears in CentOS 7.2
and a reboot could fix it.
It never happens in kubernetes v1.5.
Seems this message

 kubelet[9752]: I0410 08:04:30.122802    9752 kubelet.go:1752] skipping pod synchronization - [Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.]

are printed from the codes which imported from #41833 , so I tag you @sjenning @vishh @derekwaynecarr and sorry for troubling.

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Apr 10, 2017

Yes, I've seen this before if the kubelet doesn't clean up completely.

QoS level cgroups were enabled by default in 1.6 which explains why this may have not been observed earlier.

The QoS cgroups are created as transient systemd slices when using the systemd cgroup driver. If those slices already exist, then container manager currently has a issue with that.

As I recall, a workaround that doesn't involve rebooting is systemctl stop kubepods-burstable.slice. I believe this deletes the slice and allows the kubelet to recreate it.

@derekwaynecarr can we add code to check if the QoS level slices are already running and just move along in that case?

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 11, 2017

@sjenning AFAIK, the cgroups driver is idempotent around QoS cgroup creation. Systemd driver should behave the same way.

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Apr 11, 2017

@sjenning -- yes, we should have an exist check.

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Apr 11, 2017

Seems like this check for the cgroup already exists
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/qos_container_manager_linux.go#L108

So something is off here. The absence of the cgroup does not imply the absence of the transient systemd slice apparently.

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Apr 11, 2017

@zetaab @youngdev @rootsongjc @xialonglee -- can you let me know the version of systemd used on the machines that have the problem (i.e. rpm -qa | grep systemd) thanks!

@xialonglee

This comment has been minimized.

Copy link
Contributor

xialonglee commented Apr 12, 2017

@derekwaynecarr

# rpm -qa | grep systemd
systemd-sysv-219-30.el7.x86_64
systemd-219-30.el7.x86_64
systemd-libs-219-30.el7.x86_64
oci-systemd-hook-0.1.4-4.git41491a3.el7.x86_64

And sometimes i got log from kubelet
Unit kubepods-besteffort.slice already exists
which is different from Unit kubepods-burstable.slice already exists, it can't solved through
systemctl stop kubepods-besteffort.slice, only a reboot can fix it.

I am not sure but when I

  1. deploy a HA(multi-masters) k8s-cluster,
  2. Kubelet on node run several static pods,
  3. Uninstall k8s-cluster (delete all service files, like kubelet.service )
  4. Reset systemd manager (systemctl daemon-reload, reset-failed, daemon-reexec)`

and reinstall k8s-cluster again, in this sistuation, it seems kubepods-*.slice already exists problems more likely to happen.

@youngdev

This comment has been minimized.

Copy link

youngdev commented Apr 12, 2017

You are really looking for anything kubepod*.slice to stop which is how I resolved my issue:

Try this:

for i in $(/usr/bin/systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do systemctl stop $i;done

@xialonglee

This comment has been minimized.

Copy link
Contributor

xialonglee commented Apr 13, 2017

@youngdev
Your method could solve the problem in many cases. But I face up a situation today:

skipping pod synchronization - [Failed to start ContainerManager Unit kubepods.slice already exists.]

and

systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice
-.slice
machine.slice
system.slice
user.slice

so you can not stop any service start with prefix kubepods and the problem could not be solved.

I decide add kubelet args with --cgroups-per-qos=false --enforce-node-allocatable= to disabled qos enhance feature till this problem solved.

@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Apr 13, 2017

Thanks @youngdev, after stopping all the node registered successfully.
I can confirm the issue is most likely due to reinstalling the kube* few times.

@xialonglee

This comment has been minimized.

Copy link
Contributor

xialonglee commented Apr 18, 2017

I think I might find the reason which cause the problem.
I check my docker configuration and find that native.cgroupdriver=systemd was assigned to docker init args.
Disabled that arg of docker, all things run well.

rootsongjc added a commit to rootsongjc/follow-me-install-kubernetes-cluster that referenced this issue Apr 18, 2017

rootsongjc added a commit to rootsongjc/kubernetes-handbook that referenced this issue Apr 18, 2017

@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Apr 19, 2017

@rootsongjc - instead of changing the docker service file to fit kubelet, ain't better to fix the kubelet?

FYI, for a fully working k8s 1.6.1 with flannel, one may take/look at the kubeadm ansible playbooks.
Tested on Centos/RHEL. Preparations started for Debian based also (e.g. Ubuntu), but there might needs some refining.

https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/README.md

PS: work based on sjenning/kubeadm-playbook - Many thanks @sjenning

@rootsongjc

This comment has been minimized.

Copy link
Member

rootsongjc commented Apr 19, 2017

@ReSearchITEng I have no further good solution by now.
These two method works for me:

  • Reboot
  • add the --native.cgroupdriver=systemd in your docker.service file, make sure it is the same with kubelet cgroup driver.
    Maybe you can try this as @youngdev mentioned above
for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);
do systemctl stop $i;
done
@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Apr 20, 2017

@rootsongjc the ansible code provided in the link above (kubeadm-playbook), it does both things:

  1. updates the kubelet.service by adding "--native.cgroupdriver=systemd" (this way it will match the default/existing docker.service)
  2. it has also the systemctl stop *.slice implemented.

I am testing it very often, it works perfectly.

rootsongjc added a commit to rootsongjc/follow-me-install-kubernetes-cluster that referenced this issue Apr 21, 2017

@jfpucheu

This comment has been minimized.

Copy link

jfpucheu commented Apr 21, 2017

Hello,

I got the same problem in CentOS Linux release 7.3.1611 (Core).

My workaround is to add:

ExecStopPost=/bin/bash -c "for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod); do systemctl stop $i; done"

in /etc/systemd/system/kubelet.service.d/kubelet.conf

Jeff

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Apr 25, 2017

/assign

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Apr 25, 2017

I debugged this further today.

I apologize for being dense on this issue.

I had previously fixed this in opencontainers/runc here:
see: opencontainers/runc#1124

But the code we vendor in kube is missing this fix:
https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/opencontainers/runc/libcontainer/cgroups/systemd/apply_systemd.go#L285

It should be a simple godep update.

Apologies to all.

@sjenning -- any chance you can bump the godep and get a cherry pick?

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Apr 25, 2017

@sjenning -- i started a pr locally to bump the dep, but ran out of time today. if you can bump today, that is fine, otherwise i will finish tomorrow.

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Apr 26, 2017

@derekwaynecarr attempting now

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Apr 26, 2017

Why you do this to me?!! 😞
opencontainers/runc#1375

pkg/kubelet/cm/cgroup_manager_linux.go:323: cannot use *resourceConfig.Memory (type int64) as type uint64 in assignment
pkg/kubelet/cm/cgroup_manager_linux.go:326: cannot use *resourceConfig.CpuShares (type int64) as type uint64 in assignment
pkg/kubelet/cm/cgroup_manager_linux.go:332: cannot use *resourceConfig.CpuPeriod (type int64) as type uint64 in assignment
pkg/kubelet/cm/container_manager_linux.go:416: cannot use memoryLimit (type int64) as type uint64 in field value
pkg/kubelet/cm/container_manager_linux.go:417: constant -1 overflows uint64
@djsly

This comment has been minimized.

Copy link
Contributor

djsly commented Jun 23, 2017

Getting hit by this issue on centOS 7.3, fresh OS, fresh install of 1.6.6. I have set the following flag to kubelet: --cgroup-driver=systemd
Looks like the PR was not cherry-picked, any way we could get this in 1.6.7 ?

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Jun 24, 2017

@djsly it looks like 1.6 was using the same runc version as 1.7 before this PR. I should be able to pick it if it gets approved.

@derekwaynecarr @vishh what do you think?

@djsly

This comment has been minimized.

Copy link
Contributor

djsly commented Jun 27, 2017

@sjenning that would be appreciated! currently our automated pipeline is failing with a 50% rate due to this issue.

k8s-merge-robot added a commit that referenced this issue Jun 27, 2017

Merge pull request #48117 from sjenning/bump-runc-1.6
Automatic merge from submit-queue

bump runc to d223e2a

cherry-pick #44940

by user request #43856 (comment)

@derekwaynecarr @djsly @vishh 

```release-note
Bump runc to v1.0.0-rc2-49-gd223e2a - fixes `failed to initialise top level QOS containers` kubelet error.
```
@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Jun 29, 2017

I faced this and other issues. I have collected solutions for all of them under: https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/reset.yml

e.g. :
For this defect #43856 :

for i in $(/usr/bin/systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o kube.*\.slice );do echo $i; systemctl stop $i ; done

For defect: #39557 :

rm -rf /var/lib/cni/ /var/lib/kubelet/* /etc/cni/ ; ip link delete cni0 ; ip link delete flannel.1; ip link delete weave
@trunet

This comment has been minimized.

Copy link

trunet commented Jul 14, 2017

@djsly did you manage to make this works?

I'm also running 1.7 with --cgroup-driver=systemd on RHEL 7.3 and during kubelet restart needed to pick new certificates every week, it fails and show "NotReady" on the node list.

the slice delete also didn't work. my only option is stop kubelet, stop docker, start docker, remove all stopped containers and start kubelet again, than it starts ok.

@djsly

This comment has been minimized.

Copy link
Contributor

djsly commented Jul 14, 2017

@trunet

This comment has been minimized.

Copy link

trunet commented Jul 14, 2017

Unfortunately I can't because we're using some 1.7 new features. Will this fix be ported to 1.7?

@djsly

This comment has been minimized.

Copy link
Contributor

djsly commented Jul 14, 2017

humm I guess you might be getting affected by a new issue ?

a92007b

Looks like 1.7.0 was the first branch to receive the runc bump. 1.6 was cherry picked after.

@trunet

This comment has been minimized.

Copy link

trunet commented Jul 14, 2017

hmmm... interesting. I think this is a new bug. I will open a new issue about it

@jbunce

This comment has been minimized.

Copy link

jbunce commented Aug 31, 2017

Deploying via Kubespray I ran into a similar problem: kubernetes-sigs/kubespray#1207 (comment)

cdrage added a commit to cdrage/container-pipeline-service that referenced this issue Feb 26, 2018

Allow ability to run Docker-in-Docker-in-OpenShift for testing
A bit of a complicated PR, but essentially, after much trial-and-error,
this is required in order to get OpenShift running within
Docker-in-Docker within a CentOS7 container.

Essentially, what happens, is that OpenShift tries to start Kubelet but
is unable to due to cgroup access within the CentOS7 Container host.

See issue: kubernetes/kubernetes#43856

In the end, it is instead required to:

 1. Generate the default configuration for the all-in-one OpenShift
 origin container
 2. Edit node-config.yaml to ignore QoS containers
 3. Start the OpenShift origin container.

This will only occur if the `deployment` value is set to `test`

Another note is that `selinux` has been added as a tag to "Set SELinux
context for openshift shared dirs" due to selinux not being deployed /
used on CentOS7 containers. This is added so we can use --skip-tag
"selinux" if we are deciding to deploy to CentOS7 containers.

cdrage added a commit to cdrage/container-pipeline-service that referenced this issue Mar 5, 2018

Allow ability to run Docker-in-Docker-in-OpenShift for testing
A bit of a complicated PR, but essentially, after much trial-and-error,
this is required in order to get OpenShift running within
Docker-in-Docker within a CentOS7 container.

Essentially, what happens, is that OpenShift tries to start Kubelet but
is unable to due to cgroup access within the CentOS7 Container host.

See issue: kubernetes/kubernetes#43856

In the end, it is instead required to:

 1. Generate the default configuration for the all-in-one OpenShift
 origin container
 2. Edit node-config.yaml to ignore QoS containers
 3. Start the OpenShift origin container.

This will only occur if the `deployment` value is set to `test`

Another note is that `selinux` has been added as a tag to "Set SELinux
context for openshift shared dirs" due to selinux not being deployed /
used on CentOS7 containers. This is added so we can use --skip-tag
"selinux" if we are deciding to deploy to CentOS7 containers.

cdrage added a commit to cdrage/container-pipeline-service that referenced this issue Mar 6, 2018

Allow ability to run Docker-in-Docker-in-OpenShift for testing
A bit of a complicated PR, but essentially, after much trial-and-error,
this is required in order to get OpenShift running within
Docker-in-Docker within a CentOS7 container.

Essentially, what happens, is that OpenShift tries to start Kubelet but
is unable to due to cgroup access within the CentOS7 Container host.

See issue: kubernetes/kubernetes#43856

In the end, it is instead required to:

 1. Generate the default configuration for the all-in-one OpenShift
 origin container
 2. Edit node-config.yaml to ignore QoS containers
 3. Start the OpenShift origin container.

This will only occur if the `deployment` value is set to `test`

Another note is that `selinux` has been added as a tag to "Set SELinux
context for openshift shared dirs" due to selinux not being deployed /
used on CentOS7 containers. This is added so we can use --skip-tag
"selinux" if we are deciding to deploy to CentOS7 containers.

cdrage added a commit to cdrage/container-pipeline-service that referenced this issue Mar 7, 2018

Allow ability to run Docker-in-Docker-in-OpenShift for testing
A bit of a complicated PR, but essentially, after much trial-and-error,
this is required in order to get OpenShift running within
Docker-in-Docker within a CentOS7 container.

Essentially, what happens, is that OpenShift tries to start Kubelet but
is unable to due to cgroup access within the CentOS7 Container host.

See issue: kubernetes/kubernetes#43856

In the end, it is instead required to:

 1. Generate the default configuration for the all-in-one OpenShift
 origin container
 2. Edit node-config.yaml to ignore QoS containers
 3. Start the OpenShift origin container.

This will only occur if the `deployment` value is set to `test`

Another note is that `selinux` has been added as a tag to "Set SELinux
context for openshift shared dirs" due to selinux not being deployed /
used on CentOS7 containers. This is added so we can use --skip-tag
"selinux" if we are deciding to deploy to CentOS7 containers.

cdrage added a commit to cdrage/container-pipeline-service that referenced this issue Apr 4, 2018

Allow ability to run Docker-in-Docker-in-OpenShift for testing
A bit of a complicated PR, but essentially, after much trial-and-error,
this is required in order to get OpenShift running within
Docker-in-Docker within a CentOS7 container.

Essentially, what happens, is that OpenShift tries to start Kubelet but
is unable to due to cgroup access within the CentOS7 Container host.

See issue: kubernetes/kubernetes#43856

In the end, it is instead required to:

 1. Generate the default configuration for the all-in-one OpenShift
 origin container
 2. Edit node-config.yaml to ignore QoS containers
 3. Start the OpenShift origin container.

This will only occur if the `deployment` value is set to `test`

Another note is that `selinux` has been added as a tag to "Set SELinux
context for openshift shared dirs" due to selinux not being deployed /
used on CentOS7 containers. This is added so we can use --skip-tag
"selinux" if we are deciding to deploy to CentOS7 containers.
@mjaow

This comment has been minimized.

Copy link

mjaow commented May 30, 2018

@xialonglee
you can try systemctl -a --no-legend --no-pager -l | grep --color=never -o .*.slice|grep kube and stop all these slice in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment