New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd cgroup scope somehow gets left behind not allowing containers to start #7015
Comments
|
Upon further inspection I found the following cgroups still exist. For all cgroups, cgroup.procs is empty. I suspect this is a systemd issue. |
|
Yes, this is a systemd cgroup issue |
|
Look around this part for a fix https://github.com/docker/libcontainer/blob/master/cgroups/systemd/apply_systemd.go#L323 |
|
Looking into this, the docker systemd cgroup code is cleaning up properly. It's deleting all cgroups that it created, that systemd doesn't support. The issue seems to be that the cgroups that systemd manages are supposed to be deleted automatically and they are not. So, still seems like a specific systemd issue. Since systemd is out there in the wild and this issue exists, it seems as though docker should use the cgroups if they already exist, and not fail. This does mean that systemd will be orphaning cgroups, but that will have to be address separately. I will try to look into what the fix may be for this |
|
I have this issue when using manjaro linux as host system. When stopping a container, the cgroup cleanup does not seem to work, the following .scope folders still exist after stopping the container: /sys/fs/cgroup/blkio/system.slice/docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope As far as I can see this happens every time for me, not only from time to time. |
|
Same problem on Arch Linux. Is there a way to fix this state, apart from rebooting? |
|
@djmaze For me the only real fix is rebooting. As a workaround you can save the container to an image like this: Then you can remove the old container and run a new container from the saved image: |
|
workaround in the meantime: #systemctl stop docker-FULL_CONTAINER_ID.scope
systemctl stop docker-663a7122c20d75e6c34b94bac7dafa144ac06e6ce1d9c09c71d6dbb4fe94be4c.scope |
coreos/fleet#608 moby/moby#7015 Don't fix in docker 1.1.2!!!
|
@mastercactapus great, thank you for this post it solved my issue, for now! |
|
Hi, this happened many times. core@coreos03 ~ $ docker start containername $ docker --version |
|
@guilhermebr ok... run the following command.
|
|
@crosbymichael I am still seeing this issue come up on one of my machines I have the container launched with |
|
I have this error on master too. |
|
Same issue here on archlinux with systemd. A |
|
I'm not sure why we closed this, because it's still an issue. |
|
I just hit this issue with a brand new installation of Oracle Linux 7.0. Here is the docker version I am running. |
|
Hi all, The same issue in a new installation or CoreOs 557.2.0 (current stable channel): CoreOS stable (557.2.0) |
|
@LK4D4 @crosbymichael what is it going to take to re-open this? This sounds like a hard issue to fix for just anyone to create a PR. Maybe someone with knowledge around the matter can take a quick look? Restart functionality is untrustworthy as a result. |
|
In the Cockpit project our integration tests hit this race (?) issue routinely. For example, running on Fedora 21, docker-io-1.4.1-8.fc21.x86_64 |
|
Also coming across this on Does not happen for all containers, and can be brought back to a working state using: |
did the trick |
|
helps for me. thanks |
|
The issue looks very similar to the bug reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1066926 . I have an issue with left-behind scope files when destroying LXC containers via libvirt (I am not using Docker). |
|
In my case, the problem is with Fedora 20 and libvirt-1.2.11. |
|
Debian 8 Same issue... |
|
Hi all, Only if this can help someone, after upgrading my enviroment I can avoid the error on docker restart. CoreOS stable (607.0.0) core@core-01 ~ $ docker version The bad news are that now I get an error on docker stop: Apr 07 16:38:41 core-01 systemd[1]: xx.service: main process exited, code=exited, status=143/n/a I will continue looking for some light on this topic and keep you posted Agus |
|
ping @philips |
|
@LK4D4 Looking at the systemd code it is around the policy kit, I am going to have to dig into it, very bizarre. |
|
@philips Thank you! Looks really weird. |
|
@agusgr I filed an issue on the CoreOS bug tracker so we can hunt this down and not make more noise on this already confusing issue. Can you followup over here with what you were doing to cause this: coreos/bugs#321 |
|
Can this issue be opened? Plenty of people have stated it is still a problem for them and until it can be verified as fixed every currently deployed system continues to suffer from this. My own services fail to restart daily and require manual intervention as a result. |
|
@defunctzombie Let's create new then. I actually can't reproduce it anymore. |
|
@LK4D4 which version of docker stopped failing for you? |
|
Client version: 1.6.2 Same issue: |
|
1.7.0-RC1 just hit this on Ubuntu 15.04 with systemd |
|
ping @LK4D4 |
|
Same issue with Debian Jessie |
|
@runcom I'll propose to create new issue with reproduction case there. |
|
@LK4D4 I'm trying to reproduce this again but it's intermittent (this happens to me when using compose) |
|
A new issue has been created to track this in #13802. Please continue the discussion on that issue if you're still encountering this. |
|
It's just happened to me.
|
|
@lucaspottersky please see my comment above yours |
I ran into a situation in which I got:
[error] container.go:468 Error running container: Unit docker-d3ca1668bff4e74d22113886ba1433ecae920ece02c76ce1a9344409b68903bc.scope already exists.It seems the scope did not get cleaned up by systemd. The scope file was still present in
/run/systemd/system/...butsystemd-cglsdid not show it. In the end I had to delete the container because I couldn't start it.The way I got into this situation is I had a container that would die on startup and I had a script that was auto-restarting it. So it was starting this container every 5 seconds and it would die every 5 seconds. Eventually it must have hit some race condition, and then it just started to always fail to start in docker because of this cgroup issue.
This is systemd 212 and docker 1.0.1 running CoreOS 367.1.0
The text was updated successfully, but these errors were encountered: