New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd cgroup scope somehow gets left behind not allowing containers to start #7015

Closed
ibuildthecloud opened this Issue Jul 14, 2014 · 42 comments

Comments

Projects
None yet
@ibuildthecloud
Contributor

ibuildthecloud commented Jul 14, 2014

I ran into a situation in which I got:

[error] container.go:468 Error running container: Unit docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope already exists.

It seems the scope did not get cleaned up by systemd. The scope file was still present in /run/systemd/system/... but systemd-cgls did not show it. In the end I had to delete the container because I couldn't start it.

The way I got into this situation is I had a container that would die on startup and I had a script that was auto-restarting it. So it was starting this container every 5 seconds and it would die every 5 seconds. Eventually it must have hit some race condition, and then it just started to always fail to start in docker because of this cgroup issue.

This is systemd 212 and docker 1.0.1 running CoreOS 367.1.0

@ibuildthecloud

This comment has been minimized.

Contributor

ibuildthecloud commented Jul 14, 2014

Upon further inspection I found the following cgroups still exist.

/sys/fs/cgroup/blkio/system.slice/docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope
/sys/fs/cgroup/memory/system.slice/docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope
/sys/fs/cgroup/systemd/system.slice/docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope

For all cgroups, cgroup.procs is empty. I suspect this is a systemd issue.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Jul 14, 2014

Yes, this is a systemd cgroup issue

@crosbymichael

This comment has been minimized.

@ibuildthecloud

This comment has been minimized.

Contributor

ibuildthecloud commented Jul 22, 2014

Looking into this, the docker systemd cgroup code is cleaning up properly. It's deleting all cgroups that it created, that systemd doesn't support. The issue seems to be that the cgroups that systemd manages are supposed to be deleted automatically and they are not. So, still seems like a specific systemd issue.

Since systemd is out there in the wild and this issue exists, it seems as though docker should use the cgroups if they already exist, and not fail. This does mean that systemd will be orphaning cgroups, but that will have to be address separately.

I will try to look into what the fix may be for this

@joschi127

This comment has been minimized.

joschi127 commented Jul 31, 2014

I have this issue when using manjaro linux as host system. When stopping a container, the cgroup cleanup does not seem to work, the following .scope folders still exist after stopping the container:

/sys/fs/cgroup/blkio/system.slice/docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope
/sys/fs/cgroup/memory/system.slice/docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope
/sys/fs/cgroup/systemd/system.slice/docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope

As far as I can see this happens every time for me, not only from time to time.

@djmaze

This comment has been minimized.

Contributor

djmaze commented Aug 6, 2014

Same problem on Arch Linux. Is there a way to fix this state, apart from rebooting?

@joschi127

This comment has been minimized.

joschi127 commented Aug 6, 2014

@djmaze For me the only real fix is rebooting. As a workaround you can save the container to an image like this:

docker commit containername imagename

Then you can remove the old container and run a new container from the saved image:

docker rm containername
docker run --name containername [...] -t imagename [...]
@mastercactapus

This comment has been minimized.

mastercactapus commented Aug 18, 2014

workaround in the meantime:

#systemctl stop docker-FULL_CONTAINER_ID.scope
systemctl stop docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope

SergeiDiachenko added a commit to smilart/smilart.os.distribution that referenced this issue Sep 4, 2014

@gegere

This comment has been minimized.

gegere commented Dec 9, 2014

@mastercactapus great, thank you for this post it solved my issue, for now!

@guilhermebr

This comment has been minimized.

guilhermebr commented Dec 12, 2014

Hi, this happened many times.

core@coreos03 ~ $ docker start containername
Error response from daemon: Cannot start container containername: Unit docker-93408c6f3c1bb4a944b04d39bb06509aa1b5159f715d3ce157b34a4874cb5109.scope already exists.

$ docker --version
Docker version 1.1.2, build d84a070

@gegere

This comment has been minimized.

gegere commented Dec 12, 2014

@guilhermebr ok... run the following command.

$ systemctl stop docker-93408c6f3c1bb4a944b04d39bb06509aa1b5159f715d3ce157b34a4874cb5109.scope

@defunctzombie

This comment has been minimized.

defunctzombie commented Jan 4, 2015

@crosbymichael I am still seeing this issue come up on one of my machines

Docker version 1.3.2, build 50b8feb
core@localtunnel ~ $ docker restart localtunnel
Error response from daemon: Cannot restart container localtunnel: Unit docker-9e57a7bb9f04302a5cc19f85536fff4b82d12fb4a735a45870689d680c78132c.scope already exists.
2015/01/04 19:14:31 Error: failed to restart one or more containers

I have the container launched with --restart=always and --net host. Seems that sometimes it crashes in a bad way that prevents docker from restarting it automatically causing downtime for the service until I go in and manually do the fix.

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Jan 4, 2015

I have this error on master too.

@jokesterfr

This comment has been minimized.

jokesterfr commented Feb 6, 2015

Same issue here on archlinux with systemd. A systemctl stop docker-xxxxxxxxxx does the trick, but well, it's uneasy to find out the issue if you are a complete beginner..

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Feb 6, 2015

I'm not sure why we closed this, because it's still an issue.
ping @crosbymichael

@Seth-Miller

This comment has been minimized.

Seth-Miller commented Feb 8, 2015

I just hit this issue with a brand new installation of Oracle Linux 7.0.

Here is the docker version I am running.
[root@localhost ~]# docker version
Client version: 1.3.3
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): 4e9bbfa/1.3.3
OS/Arch (client): linux/amd64
Server version: 1.3.3
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): 4e9bbfa/1.3.3

@agusgr

This comment has been minimized.

agusgr commented Feb 17, 2015

Hi all,

The same issue in a new installation or CoreOs 557.2.0 (current stable channel):

CoreOS stable (557.2.0)
core@core-01 ~ $ docker version
Client version: 1.4.1
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): 5bc2ff8-dirty
OS/Arch (client): linux/amd64
Server version: 1.4.1
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): 5bc2ff8-dirty
core@core-01 ~ $

@defunctzombie

This comment has been minimized.

defunctzombie commented Feb 17, 2015

@LK4D4 @crosbymichael what is it going to take to re-open this? This sounds like a hard issue to fix for just anyone to create a PR. Maybe someone with knowledge around the matter can take a quick look? Restart functionality is untrustworthy as a result.

@stefwalter

This comment has been minimized.

stefwalter commented Feb 19, 2015

In the Cockpit project our integration tests hit this race (?) issue routinely. For example, running on Fedora 21, docker-io-1.4.1-8.fc21.x86_64

@Malet

This comment has been minimized.

Malet commented Feb 19, 2015

Also coming across this on CoreOS beta (584.0.0) with

core@localhost ~ $ docker version
Client version: 1.4.1
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): 5bc2ff8-dirty
OS/Arch (client): linux/amd64
Server version: 1.4.1
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): 5bc2ff8-dirty

Does not happen for all containers, and can be brought back to a working state using:

sudo systemctl stop docker-<container_hash>.scope
@Hokutosei

This comment has been minimized.

Hokutosei commented Feb 24, 2015

systemctl stop docker-*

did the trick

@wptad

This comment has been minimized.

wptad commented Feb 26, 2015

helps for me. thanks

systemctl stop docker-*

@dreibh

This comment has been minimized.

dreibh commented Feb 26, 2015

The issue looks very similar to the bug reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1066926 . I have an issue with left-behind scope files when destroying LXC containers via libvirt (I am not using Docker).

@dreibh

This comment has been minimized.

dreibh commented Feb 26, 2015

In my case, the problem is with Fedora 20 and libvirt-1.2.11.

@fxposter

This comment has been minimized.

fxposter commented Mar 17, 2015

Debian 8
Linux hostname 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef

Same issue...

@agusgr

This comment has been minimized.

agusgr commented Apr 7, 2015

Hi all,

Only if this can help someone, after upgrading my enviroment I can avoid the error on docker restart.
My new environment:

CoreOS stable (607.0.0)

core@core-01 ~ $ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.3.3
Git commit (client): a8a31ef-dirty
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.3.3
Git commit (server): a8a31ef-dirty

The bad news are that now I get an error on docker stop:

Apr 07 16:38:41 core-01 systemd[1]: xx.service: main process exited, code=exited, status=143/n/a
Apr 07 16:38:41 core-01 bash[6595]: Failed to stop docker-345029b9c94588f154a9e517e0b480f18c6488f42c72d84fcc04f6f476d51406.scope: Interactive authentication required.
Apr 07 16:38:41 core-01 systemd[1]: xx.service: control process exited, code=exited status=1

I will continue looking for some light on this topic and keep you posted

Agus

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Apr 7, 2015

ping @philips
Can you imagine from where "Interactive authentication required. " error can come?

@philips

This comment has been minimized.

Contributor

philips commented Apr 7, 2015

@LK4D4 Looking at the systemd code it is around the policy kit, I am going to have to dig into it, very bizarre.

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Apr 7, 2015

@philips Thank you! Looks really weird.

@philips

This comment has been minimized.

Contributor

philips commented Apr 7, 2015

@agusgr I filed an issue on the CoreOS bug tracker so we can hunt this down and not make more noise on this already confusing issue. Can you followup over here with what you were doing to cause this: coreos/bugs#321

@defunctzombie

This comment has been minimized.

defunctzombie commented Apr 17, 2015

Can this issue be opened? Plenty of people have stated it is still a problem for them and until it can be verified as fixed every currently deployed system continues to suffer from this. My own services fail to restart daily and require manual intervention as a result.

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Apr 17, 2015

@defunctzombie Let's create new then. I actually can't reproduce it anymore.

@defunctzombie

This comment has been minimized.

defunctzombie commented Apr 17, 2015

@LK4D4 which version of docker stopped failing for you?

@onorua

This comment has been minimized.

onorua commented May 19, 2015

Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2
OS/Arch (server): linux/amd64

Same issue:
docker start 374af5c9a5c5
Error response from daemon: Cannot start container 374af5c9a5c5: [8] System error: Unit docker-374af5c9a5c57e749c0ed137455a67e1006a7fbeb603adccaac8d33a7ae8fb9b.scope already exists.
FATA[0000] Error: failed to start one or more containers

@runcom

This comment has been minimized.

Member

runcom commented Jun 3, 2015

1.7.0-RC1 just hit this on Ubuntu 15.04 with systemd
the only workround for me is to systemctl stop scope file

@runcom

This comment has been minimized.

Member

runcom commented Jun 3, 2015

ping @LK4D4

@harobed

This comment has been minimized.

harobed commented Jun 7, 2015

Same issue with Debian Jessie

@LK4D4

This comment has been minimized.

Contributor

LK4D4 commented Jun 8, 2015

@runcom I'll propose to create new issue with reproduction case there.

@runcom

This comment has been minimized.

Member

runcom commented Jun 8, 2015

@LK4D4 I'm trying to reproduce this again but it's intermittent (this happens to me when using compose)

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Jun 8, 2015

A new issue has been created to track this in #13802.

Please continue the discussion on that issue if you're still encountering this.

@lucaspottersky

This comment has been minimized.

lucaspottersky commented Nov 27, 2015

It's just happened to me.

Docker version 1.7.0, build 0baf609
Lubuntu/Ubuntu 15.04

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Nov 27, 2015

@lucaspottersky please see my comment above yours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment