-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.7.0 fails to remove containers #2714
Comments
Did you switch drivers from aufs to deviemapper manually without removing /var/lib/docker ? |
Not that I'm aware of. How would I find out? |
As a note I have had the exact same problem.
I have rebooted the host OS and the problem disappeared. It has happened after a |
I have the same problem and it appears on docker kill also on docker stop. Actually the problem according to me is that when mounted the container , when deleting the driver doesn't want to unmount it . Well depends whose responsibility it is( rm or kill/stop). Indeed the problem is fixed after restart because everything is unmounted. and you have no locked situations. |
I am encountering this with 0.7.1 also |
Hrm, and switching to the device mapper backend doesn't really help either. Got this just now:
|
@crosbymichael It seems like this isn't just about aufs. devicemapper is getting similar errors. #2714 (comment) |
I'm getting this still on 0.7.3 using devicemapper
However, the problem seems to resolve itself if you restart the docker server. If it happens again, I'll try running |
I have the same problem. $ docker version Client version: 0.7.3 Go version (client): go1.2 Git commit (client): 8502ad4 Server version: 0.7.3 Git commit (server): 8502ad4 Go version (server): go1.2 Last stable version: 0.7.3 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 538ab4938d5d 3c23bb541f74 /bin/sh -c apt-get - 12 minutes ago Exit 100 agitated_einstein bdfbff084c4d 3c23bb541f74 /bin/sh -c apt-get u 14 minutes ago Exit 0 sharp_torvalds 95cea6012869 6c5a63de23d9 /bin/sh -c echo 'for 14 minutes ago Exit 0 romantic_lovelace $ mount|grep 538ab4938d5d /dev/mapper/docker-8:3-2569260-538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278 on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278 type ext4 (rw,relatime,discard,stripe=16,data=ordered) /dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerinit type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerenv type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered) # lsof /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278 lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit (deleted) Output information may be incomplete. lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/bdfbff084c4d96b6817eb7ccb812a608e4a6a45cb4c06d423e26364b45b59c97/rootfs/.dockerinit (deleted) Output information may be incomplete. lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerinit (deleted) Output information may be incomplete. # ls -l /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit -rwx------ 0 root root 14406593 Jan 4 21:05 /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit* |
Restarting the deamon does not solve the problem. |
same problem:
restart the host doesn't solve the problem. then i run |
+1.
|
same here
|
Same problem here with 0.7.5. (or just the umount -l on the FS) All the question is why some FS are in "/rootfs/.dockerinit\040(deleted) " state ? |
I can confirm that this is an issue on 0.7.5 |
I don't know if it is related but : /home is a mount point Container's mount points on a symlink to a mount may be the cause ? |
I'm already using a different base directory. Problems may be coming when docker daemon is restarted without stop properly containers... there is a bad thing somewhere in the stop/start of a new docker start... |
+1 I've got three containers on my devicemapper machine now that I can't remove because their devices fail to be removed in devicemapper (and none of them are even mounted in /proc/mounts) Also, nothing in dmesg, and the only useful daemon output is highly cryptic and not very helpful:
|
+1 @vjeantet setting the docker base directory in |
+1 seen this as well, quite easy to repro, recommending people only use aufs for now |
As a workaround I managed to successfully remove a container stuck in this fashion by renaming the offending DM device (using You need to use the full DM id of the device which is at the end of the error (e.g docker-8:9-7880790-bc945261c1f97e7145604a4248e2c84535fb204c8e214fa394448e0b2dcd064a ). The stuck device also disappeared on reboot. This was achieved after much messing about with dmsetup so it's plausible something I did in between was also required. YMMV but it worked for me. Edit: Needed to restart docker and run wipe_table too |
... same problem with lsoave@basenode:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 53a9a8c4e29c 8dbd9e392a96 bash 17 minutes ago Exit 0 thirsty_davinci lsoave@basenode:~$ docker rm 53a9a8c4e29c Error: Cannot destroy container 53a9a8c4e29c: Driver aufs failed to remove root filesystem 53a9a8c4e29c2c99fdd8d5355833f07eca69cbfbefcd02915e267517111fbde8: device or resource busy 2014/01/19 20:38:50 Error: failed to remove one or more containers lsoave@basenode:~$ by re-booting the host and running lsoave@basenode:~$ uname -a Linux basenode 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux lsoave@basenode:~$ docker -v Docker version 0.7.6, build bc3b2ec |
Happened again today; machine went in to suspend with docker containers running but did not come out of suspend cleanly. Needed a reboot. Upon reboot the DM device for one of the containers that was running was stuck.
Running docker 0.7.4 build 010d74e |
@mikesimons ... did you remember the operational flow which brings you to the failure ? |
SolvedAt least in my case -- look for yourself if you don't have the same cause.I had created a MySQL server container myself, which worked fine. As I was puzzled about the size of the containers, I decided to create new MySQL containers based on the work of somebody else. This was indeed very interesting, as I found that size may differ substantially even when the Dockerfile looks similar or even identical. For example, my first has nearly 700 MB:
The container based on dhrp/mysql is nearly half the size of mine, and it works equally good:
The 2nd example produced the above-mentioned error, I'll get to that in just a second. When I tried to repeat my findings today, I got a lot more size with exactly the same Dockerfile, seemingly without reason:
It was no problem to remove this container as well. The next example introduced the problem, based on the 2nd result of my search https://index.docker.io/search?q=mysql: brice/mysql As I had enhanced his approach, I couldn't see right at the spot where the problem was, but diligent tracking down finally showed, that in this case the offense was the command
in the Dockerfile, which I had spent no thought at. Both directories exist in the container:
But not in the host:
The VOLUME directive ties the volume of the container to the correspondent volume of the host (or rather the other way around). Docker should throw an error if the directory does not exist in the host; by design it will "create" the directory in the container if it does not exist. Unfortunately I'm not able to write a patch, but I'm sure many of you can. |
@kklepper, I'm misunderstanding the relationship between the issue and your post. From what I read your "issue" was that you overlooked the behaviour of the VOLUME directive, but the issue at hand is that |
Sorry for the confusion, I should have clarified that the VOLUME error caused the docker rm error, exactly as reported above. I found this thread because I searched for exactly this error message. Obviously nobody was able to track the conditions down yet. |
@kklepper thanks for detailed report. Can you print on this board the Dockerfile which produce our object fault please ? I was looking for you on the pubbic index but no kklepper/Ms latest 33280c9a70a7 5 days ago 695.7 MB kklepper/mysqld latest 49223549bf47 24 hours ago 359.8 MB kklepper/mysqlda latest 6162b0c95e8c 2 hours ago 374.4 MB I'd like having a test on my own about what you're saying, because in my understanding of docker's Moreover unfortunately, by my side I cannot remember the things I did to get to the point I received that That's way I was asking @mikesimons for the steps he went through. |
@vieux – I’m not sure what you mean to ask. This is the default docker install ( If you are looking to reproduce this, I’d recommend running the script[2] provided by @lcarstensen above on an Ubuntu 12.04 VM. [1] docker daemon is run as:
with more system info:
|
With 0.11.1-4 from koji on RHEL 6.5 with native (not LXC) and selinux enabled I haven't been able to reproduce docker rm issues, either with my script or without, over the last week. Using the native execution driver seems like the key on RHEL. |
fwiw, we are using the native execution driver:
|
could you explain what you mean by "drop the private on /var/lib/docker" and "make /var/lib/docker --private in the daemon"? just as you've observed, i'm seeing aufs mnt directories appearing but in the mount namespace of several processes. |
@srid At startup docker effectively does mount --make-rprivate /var/lib/docker to work around some efficiency problems. Without this every mount (from e.g. a container start) is broadcasted to all sub-namespaces (including all other containers), which leads to a O(n^2) slowness. The problem with it being private is that unmounts in the global namespace are not sent to the containers either. Which causes EBUSY issues like the above if something creates a new mount namespace and doesn't unmount uninteresting parts of the hierarchy. |
ran into this in our prod today. docker was 0.9.1 |
@cywjackson can you tell us which mountpoints are still there before running the your |
hey @vieux , unfortunately i can't now, since we've resolved our issue. But i think it's pretty much all the containers and graphs (and maybe the aufs?) ... But if it helps, our problem probably triggered by running an older docker version to begin with (0.76_?), then chef-client was run and updated the version to > 0.9_. During the process it restarted the daemon but probably not the containers, resulting the filesystem looks as if it is still being used and the containers is still in the "running" state. We did examine the config.json in those containers' paths and it has the Running flag as true. (Simply updated that to false didn't resolve the problem though). i guess for us, we should have stopped the containers first before running the chef-client. |
Anybody has an easy way to reproduce ? |
@vieux I can reproduce with latest docker
But if I do
It could remove stopped containers |
@vieux I still have this problem with version Docker 1.1.1 and can reproduce it. It only happens when one or more ports are published to the host. If no ports are published forced remove works. How to reproduceDocker install is fresh: no image pulled and no other container run previously.
I simply chose Could somebody please try to reproduce the problem with the same setup as mine, thanks. Environment:Ubuntu 14.04 running with Vagrant/VirtualBox on OSX, the exact image is ubuntu/trusty64. Docker v1.1.1 is installed through official Docker repository:
Docker Info
Docker log
|
I could reproduce this on devicemapper too with latest docker on CentOS 6.5 |
I'm seeing this issue (removing a container with ports bound giving a "Device is Busy" error) in my own tests. The weirdest thing is, it seems that it actually does remove the container, just after it prints this error and crashes the script running the |
Think I've seen similar behavior sometimes, with the same |
Same device is busy error. Ubuntu trusty, docker 1.1.2, defaults. docker version
docker info
/var/log/upstart/docker.log
The server 500's breaking scripts, but it looks like in these cases the container actually is removed. This container does have ports exposed to the host as well as bindmounted volumes (it's from These errors aren't new; I've seen them since the early days at varying rates, but it's kind of silly to have to hack around them all the time. @thaJeztah is this what you were seeing? |
@howitzers yes! Exactly those kind of messages; and your example contains the Also, I think (at least some of) my containers were started/created using fig as well. Not sure if this is cause of this problem, but just to add that info for my situation as well. @bfirsh ? Thanks! |
Fig just uses the plain docker remote API with no funny stuff, so the naughty behavior is definitely docker's business, not fig's. What fig does do, though, is a lot of container recreate and delete, making hitting this more likely (it's got a very race condition flavor to it). Poked around a little more and this only happens after creating/removing a lot of containers (I usually hit this error under my own stress tests). @cywjackson's unmount suggestion does not fix it, nor does reboot. What does fix it for me is a total wipe of /var/lib/docker, so it looks like some resource or timing issue with a lot of containers. In any case, a convenient way to repro this is to just loop |
I'm thinking about closing this issue because it has been open for too long and has become some sort of catch all for any type of error remotely related to a failed rm. I think we will be better able to debug and fix new issues, reported from new docker versions if we have separate issues opened that are current and easier to ready. Any reason why we should keep this open right now? |
While it's definitely true that this has been open for too long and become a catch-all for lots of different bugs (holy cow, 60 participants), I think it should be left open (possibly with a name change) to address this specific current issue (where |
I split out the specific 1.1.2 "failed to remove root filesystem" case to a new issue with a backref, which might be easier to track. Can close either that one or this one, but there's definitely a specific open issue here, between myself, stuartpb, and thaJeztah. |
@crosbymichael (and indeed, wow, 59 others) I agree on closing this; the title has become outdated and it is collecting similar, but unrelated, issues (guilty myself I think). If the original Please if this issue is closed, add a clear comment to explain why and point people to the right issues as well. |
I'm closing and locking this issue right now. I agree that it's become a catch all for any failure to remove a container in any conditions. Please try to find existing issues related to the problem you're running into with the specific backend (btrfs, devicemapper, aufs) and comment there. If there are no existing issues which seem to be about the same problem using the same storage backend, please open a new issue. |
Script started on Fri 15 Nov 2013 04:28:56 PM UTC
root@thewordnerd:
# uname -a# docker versionLinux thewordnerd.info 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@thewordnerd:
Client version: 0.7.0-rc5
Go version (client): go1.2rc4
Git commit (client): 0c38f86-dirty
Server version: 0.7.0-rc5
Git commit (server): 0c38f86-dirty
Go version (server): go1.2rc4
Last stable version: 0.6.6, please update docker
root@thewordnerd:~# docker rm
docker ps -a -q
Error: Cannot destroy container ba8a9ec006c8: Driver devicemapper failed to remove root filesystem ba8a9ec006c8e38154bd697b3ab4810ddb5fe477ed1cfb48ac3bd604a5a59495: Error running removeDevice
Error: Cannot destroy container d2f56763e65a: Driver devicemapper failed to remove root filesystem d2f56763e65a66ffccb3137017dddad745e921f4bdaa084f6b4a0d6407ec030a: Error running removeDevice
Error: Cannot destroy container c22980febe50: Driver devicemapper failed to remove root filesystem
...
The text was updated successfully, but these errors were encountered: