Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7.0 fails to remove containers #2714

Closed
ndarilek opened this issue Nov 15, 2013 · 133 comments
Closed

0.7.0 fails to remove containers #2714

ndarilek opened this issue Nov 15, 2013 · 133 comments

Comments

@ndarilek
Copy link
Contributor

@ndarilek ndarilek commented Nov 15, 2013

Script started on Fri 15 Nov 2013 04:28:56 PM UTC
root@thewordnerd:# uname -a
Linux thewordnerd.info 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@thewordnerd:
# docker version
Client version: 0.7.0-rc5
Go version (client): go1.2rc4
Git commit (client): 0c38f86-dirty
Server version: 0.7.0-rc5
Git commit (server): 0c38f86-dirty
Go version (server): go1.2rc4
Last stable version: 0.6.6, please update docker
root@thewordnerd:~# docker rm docker ps -a -q
Error: Cannot destroy container ba8a9ec006c8: Driver devicemapper failed to remove root filesystem ba8a9ec006c8e38154bd697b3ab4810ddb5fe477ed1cfb48ac3bd604a5a59495: Error running removeDevice
Error: Cannot destroy container d2f56763e65a: Driver devicemapper failed to remove root filesystem d2f56763e65a66ffccb3137017dddad745e921f4bdaa084f6b4a0d6407ec030a: Error running removeDevice
Error: Cannot destroy container c22980febe50: Driver devicemapper failed to remove root filesystem
...

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Nov 15, 2013

Did you switch drivers from aufs to deviemapper manually without removing /var/lib/docker ?

@ndarilek
Copy link
Contributor Author

@ndarilek ndarilek commented Nov 15, 2013

Not that I'm aware of. How would I find out?

@PierreR
Copy link

@PierreR PierreR commented Dec 3, 2013

As a note I have had the exact same problem.

docker version
Client version: 0.7.0
Go version (client): go1.2rc5
Git commit (client): 0d078b6
Server version: 0.7.0
Git commit (server): 0d078b6
Go version (server): go1.2rc5
Last stable version: 0.7.0

I have rebooted the host OS and the problem disappeared. It has happened after a docker kill or docker stop (I don't remember) on the container.

@ghristov
Copy link

@ghristov ghristov commented Dec 9, 2013

I have the same problem and it appears on docker kill also on docker stop. Actually the problem according to me is that when mounted the container , when deleting the driver doesn't want to unmount it . Well depends whose responsibility it is( rm or kill/stop).

Indeed the problem is fixed after restart because everything is unmounted. and you have no locked situations.

@philips
Copy link
Contributor

@philips philips commented Dec 12, 2013

I am encountering this with 0.7.1 also

@philips
Copy link
Contributor

@philips philips commented Dec 13, 2013

Hrm, and switching to the device mapper backend doesn't really help either. Got this just now:

Error: Cannot destroy container keystone-1: Driver devicemapper failed to remove root filesystem 1d42834e2e806e0fd0ab0351ae504ec9a98e0a74be337fc2158a516ec8d6f36b: Error running removeDevice

@philips
Copy link
Contributor

@philips philips commented Dec 13, 2013

@crosbymichael It seems like this isn't just about aufs. devicemapper is getting similar errors. #2714 (comment)

@zhemao
Copy link

@zhemao zhemao commented Jan 4, 2014

I'm getting this still on 0.7.3 using devicemapper

Client version: 0.7.3
Go version (client): go1.2
Git commit (client): 8502ad4
Server version: 0.7.3
Git commit (server): 8502ad4
Go version (server): go1.2
Last stable version: 0.7.3

However, the problem seems to resolve itself if you restart the docker server. If it happens again, I'll try running lsof on the mount to see what process is causing it to be busy.

@Chris00
Copy link
Contributor

@Chris00 Chris00 commented Jan 4, 2014

I have the same problem.

$ docker version
Client version: 0.7.3
Go version (client): go1.2
Git commit (client): 8502ad4
Server version: 0.7.3
Git commit (server): 8502ad4
Go version (server): go1.2
Last stable version: 0.7.3
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
538ab4938d5d        3c23bb541f74        /bin/sh -c apt-get -   12 minutes ago      Exit 100                                agitated_einstein   
bdfbff084c4d        3c23bb541f74        /bin/sh -c apt-get u   14 minutes ago      Exit 0                                  sharp_torvalds      
95cea6012869        6c5a63de23d9        /bin/sh -c echo 'for   14 minutes ago      Exit 0                                  romantic_lovelace 
$  mount|grep 538ab4938d5d
/dev/mapper/docker-8:3-2569260-538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278 on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278 type ext4 (rw,relatime,discard,stripe=16,data=ordered)
/dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerinit type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerenv type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/root on /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered)
# lsof /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278
lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit (deleted)
      Output information may be incomplete.
lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/bdfbff084c4d96b6817eb7ccb812a608e4a6a45cb4c06d423e26364b45b59c97/rootfs/.dockerinit (deleted)
      Output information may be incomplete.
lsof: WARNING: can't stat() ext4 file system /opt/docker/devicemapper/mnt/538ab4938d5d0f2e4ccb66b1410b57c8923fd7881551e365ffc612fe629ac278/rootfs/.dockerinit (deleted)
      Output information may be incomplete.
# ls -l /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit
-rwx------ 0 root root 14406593 Jan  4 21:05 /opt/docker/devicemapper/mnt/95cea6012869809320920019f2a2732165915281b79538a84f3ee3adddcbc783/rootfs/.dockerinit*

@Chris00
Copy link
Contributor

@Chris00 Chris00 commented Jan 4, 2014

Restarting the deamon does not solve the problem.

@limboy
Copy link

@limboy limboy commented Jan 5, 2014

same problem:

limboy@gintama:~$ docker ps -a 
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
a7760911ecac        ubuntu:12.04        bash                About an hour ago   Exit 137                                backstabbing_mccarthy   

limboy@gintama:~$ docker rm a77
Error: Cannot destroy container a77: Driver devicemapper failed to remove root filesystem a7760911ecacb93b1c530d6a0bde4deeb79ef0cbf901488cb55df2f2ca02207a: device or resource busy
2014/01/05 16:04:21 Error: failed to remove one or more containers

limboy@gintama:~$ docker info
Containers: 1
Images: 5
Driver: devicemapper
 Pool Name: docker-202:0-93718-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1079.8 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.3 Mb
 Metadata Space Total: 2048.0 Mb
WARNING: No memory limit support
WARNING: No swap limit support

restart the host doesn't solve the problem.

then i run docker run -i ubuntu bash can't goto interactive mode, just blank.

@ptmt
Copy link

@ptmt ptmt commented Jan 7, 2014

+1.

$ docker version
Client version: 0.7.3
Go version (client): go1.2
Git commit (client): 8502ad4
Server version: 0.7.3
Git commit (server): 8502ad4
Go version (server): go1.2
Last stable version: 0.7.3

$ docker rm d33
2014/01/07 05:55:57 DELETE /v1.8/containers/d33
[error] mount.go:11 [warning]: couldn't run auplink before unmount: exit status 116
[error] api.go:1062 Error: Cannot destroy container d33: Driver aufs failed to remove root filesystem d3312bcdeb7dc241d4
870100beadfe94d6884904229cc50d66aacd66ab16e064: stale NFS file handle
[error] api.go:87 HTTP Error: statusCode=500 Cannot destroy container d33: Driver aufs failed to remove root filesystem
d3312bcdeb7dc241d4870100beadfe94d6884904229cc50d66aacd66ab16e064: stale NFS file handle
Error: Cannot destroy container d33: Driver aufs failed to remove root filesystem d3312bcdeb7dc241d4870100beadfe94d68849
04229cc50d66aacd66ab16e064: stale NFS file handle
2014/01/07 05:55:57 Error: failed to remove one or more containers

@vjeantet
Copy link

@vjeantet vjeantet commented Jan 11, 2014

same here

Client version: 0.7.5
Go version (client): go1.2
Git commit (client): c348c04
Server version: 0.7.5
Git commit (server): c348c04
Go version (server): go1.2
Last stable version: 0.7.5

$docker rm 9f017e610f24
2014/01/11 23:03:11 DELETE /v1.8/containers/9f017e610f24
[error] api.go:1064 Error: Cannot destroy container 9f017e610f24: Driver devicemapper failed to remove root filesystem 9f017e610f2401541558a93b5c3beafc2e20586c766dfe49e521bcdf878ebe3a: device or resource busy
[error] api.go:87 HTTP Error: statusCode=500 Cannot destroy container 9f017e610f24: Driver devicemapper failed to remove root filesystem 9f017e610f2401541558a93b5c3beafc2e20586c766dfe49e521bcdf878ebe3a: device or resource busy
Error: Cannot destroy container 9f017e610f24: Driver devicemapper failed to remove root filesystem 9f017e610f2401541558a93b5c3beafc2e20586c766dfe49e521bcdf878ebe3a: device or resource busy
2014/01/11 23:03:11 Error: failed to remove one or more containers

@LordFPL
Copy link

@LordFPL LordFPL commented Jan 15, 2014

Same problem here with 0.7.5.
"Resolved" with a lazy umount :
for fs in $(cat /proc/mounts | grep '.dockerinit\040(deleted)' | awk '{print $2}' | sed 's//rootfs/.dockerinit\040(deleted)//g'); do umount -l $fs; done

(or just the umount -l on the FS)

All the question is why some FS are in "/rootfs/.dockerinit\040(deleted) " state ?

@joelmoss
Copy link

@joelmoss joelmoss commented Jan 15, 2014

I can confirm that this is an issue on 0.7.5

@vjeantet
Copy link

@vjeantet vjeantet commented Jan 15, 2014

I don't know if it is related but :
My docker data were in /var/lib/docker which was a symlink to /home/docker

/home is a mount point

Container's mount points on a symlink to a mount may be the cause ?
Since I told docker to use /home/docker instead of /var/lib/docker I don't have this issue anymore.

@LordFPL
Copy link

@LordFPL LordFPL commented Jan 15, 2014

I'm already using a different base directory. Problems may be coming when docker daemon is restarted without stop properly containers... there is a bad thing somewhere in the stop/start of a new docker start...

@tianon
Copy link
Member

@tianon tianon commented Jan 16, 2014

+1 I've got three containers on my devicemapper machine now that I can't remove because their devices fail to be removed in devicemapper (and none of them are even mounted in /proc/mounts)

Also, nothing in dmesg, and the only useful daemon output is highly cryptic and not very helpful:

[debug] deviceset.go:358 libdevmapper(3): ioctl/libdm-iface.c:1768 (-1) device-mapper: remove ioctl on docker-8:3-43647873-f4985ed89768280bb537b88d9d779699c6858c45217742ea5a598d6db95abb31 failed: Device or resource busy
[debug] devmapper.go:495 [devmapper] removeDevice END
[debug] deviceset.go:574 Error removing device: Error running removeDevice
[error] api.go:1064 Error: Cannot destroy container f4985ed89768: Driver devicemapper failed to remove root filesystem f4985ed89768280bb537b88d9d779699c6858c45217742ea5a598d6db95abb31: Error running removeDevice
[error] api.go:87 HTTP Error: statusCode=500 Cannot destroy container f4985ed89768: Driver devicemapper failed to remove root filesystem f4985ed89768280bb537b88d9d779699c6858c45217742ea5a598d6db95abb31: Error running removeDevice

@mriehl
Copy link

@mriehl mriehl commented Jan 16, 2014

+1 @vjeantet setting the docker base directory in /etc/default/docker instead of using a symlinked /var/lib/docker fixed these problems for me.

@SamSaffron
Copy link

@SamSaffron SamSaffron commented Jan 17, 2014

+1 seen this as well, quite easy to repro, recommending people only use aufs for now

@mikesimons
Copy link

@mikesimons mikesimons commented Jan 19, 2014

As a workaround I managed to successfully remove a container stuck in this fashion by renaming the offending DM device (using dmsetup rename), executing dmsetup wipe_table <stuck_id>, restarting docker and re-running docker rm.

You need to use the full DM id of the device which is at the end of the error (e.g docker-8:9-7880790-bc945261c1f97e7145604a4248e2c84535fb204c8e214fa394448e0b2dcd064a ).

The stuck device also disappeared on reboot.

This was achieved after much messing about with dmsetup so it's plausible something I did in between was also required. YMMV but it worked for me.

Edit: Needed to restart docker and run wipe_table too

@lgs
Copy link

@lgs lgs commented Jan 19, 2014

... same problem with Docker version 0.7.6, build bc3b2ec

lsoave@basenode:~$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
53a9a8c4e29c        8dbd9e392a96        bash                17 minutes ago      Exit 0                                  thirsty_davinci     
lsoave@basenode:~$ docker rm 53a9a8c4e29c
Error: Cannot destroy container 53a9a8c4e29c: Driver aufs failed to remove root filesystem 53a9a8c4e29c2c99fdd8d5355833f07eca69cbfbefcd02915e267517111fbde8: device or resource busy
2014/01/19 20:38:50 Error: failed to remove one or more containers
lsoave@basenode:~$ 

by re-booting the host and running docker rm 53a9a8c4e29c again it works. My env:

lsoave@basenode:~$ uname -a
Linux basenode 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
lsoave@basenode:~$ docker -v
Docker version 0.7.6, build bc3b2ec

@mikesimons
Copy link

@mikesimons mikesimons commented Jan 21, 2014

Happened again today; machine went in to suspend with docker containers running but did not come out of suspend cleanly. Needed a reboot.

Upon reboot the DM device for one of the containers that was running was stuck.

> uname -a
Linux mv 3.9.9-1-ARCH #1 SMP PREEMPT Wed Jul 3 22:45:16 CEST 2013 x86_64 GNU/Linux

Running docker 0.7.4 build 010d74e

@lgs
Copy link

@lgs lgs commented Jan 21, 2014

@mikesimons ... did you remember the operational flow which brings you to the failure ?

@kklepper
Copy link

@kklepper kklepper commented Jan 21, 2014

Solved

At least in my case -- look for yourself if you don't have the same cause.

I had created a MySQL server container myself, which worked fine. As I was puzzled about the size of the containers, I decided to create new MySQL containers based on the work of somebody else.

This was indeed very interesting, as I found that size may differ substantially even when the Dockerfile looks similar or even identical. For example, my first has nearly 700 MB:

kklepper/Ms           latest              33280c9a70a7        5 days ago          695.7 MB

The container based on dhrp/mysql is nearly half the size of mine, and it works equally good:

kklepper/mysqld       latest              49223549bf47        24 hours ago        359.8 MB

The 2nd example produced the above-mentioned error, I'll get to that in just a second.

When I tried to repeat my findings today, I got a lot more size with exactly the same Dockerfile, seemingly without reason:

kklepper/mysqlda      latest              6162b0c95e8c        2 hours ago         374.4 MB

It was no problem to remove this container as well.

The next example introduced the problem, based on the 2nd result of my search https://index.docker.io/search?q=mysql: brice/mysql

As I had enhanced his approach, I couldn't see right at the spot where the problem was, but diligent tracking down finally showed, that in this case the offense was the command

VOLUME ["/var/lib/mysql", "/var/log/mysql"]

in the Dockerfile, which I had spent no thought at.

Both directories exist in the container:

root@mysql:/# ls /var/lib/mysql
debian-5.5.flag  ib_logfile0  ib_logfile1  ibdata1  mysql  performance_schema  test  voxx_biz_db1
root@mysql:/# ls /var/log/mysql
error.log

But not in the host:

vagrant@precise64:~$ ls /var/lib/mysql
ls: cannot access /var/lib/mysql: No such file or directory
vagrant@precise64:~$ ls /var/log/mysql
ls: cannot access /var/log/mysql: No such file or directory

The VOLUME directive ties the volume of the container to the correspondent volume of the host (or rather the other way around).

Docker should throw an error if the directory does not exist in the host; by design it will "create" the directory in the container if it does not exist.

Unfortunately I'm not able to write a patch, but I'm sure many of you can.

@pwaller
Copy link
Contributor

@pwaller pwaller commented Jan 21, 2014

@kklepper, I'm misunderstanding the relationship between the issue and your post. From what I read your "issue" was that you overlooked the behaviour of the VOLUME directive, but the issue at hand is that docker rm won't actually remove a stopped container in some circumstances, so I don't see in any sense how this issue is solved?

@kklepper
Copy link

@kklepper kklepper commented Jan 21, 2014

Sorry for the confusion, I should have clarified that the VOLUME error caused the docker rm error, exactly as reported above. I found this thread because I searched for exactly this error message. Obviously nobody was able to track the conditions down yet.

@lgs
Copy link

@lgs lgs commented Jan 21, 2014

@kklepper thanks for detailed report.

Can you print on this board the Dockerfile which produce our object fault please ?

I was looking for you on the pubbic index but no kklepper user found over there. Then, no way to me to reproduce your containers :

kklepper/Ms           latest              33280c9a70a7        5 days ago          695.7 MB
kklepper/mysqld       latest              49223549bf47        24 hours ago        359.8 MB
kklepper/mysqlda      latest              6162b0c95e8c        2 hours ago         374.4 MB

I'd like having a test on my own about what you're saying, because in my understanding of docker's VOLUME, it shouldn't need the same path hosting side. They should be just mount points.

Moreover unfortunately, by my side I cannot remember the things I did to get to the point I received that Cannot destroy container ... I mencioned before.

That's way I was asking @mikesimons for the steps he went through.

@srid
Copy link
Contributor

@srid srid commented May 22, 2014

@vieux – I’m not sure what you mean to ask.

This is the default docker install (apt-get install lxc-docker-0.11.1)[1], and everything generally works fine (creating and running containers) … except a certain number of ‘docker rm’ operations fail.

If you are looking to reproduce this, I’d recommend running the script[2] provided by @lcarstensen above on an Ubuntu 12.04 VM.


[1] docker daemon is run as: /usr/bin/docker -d -D -s aufs -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock and here’s the output of docker version:

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

with more system info:

$ uname -a
Linux stackato-jgz6 3.11.0-20-generic #35~precise1-Ubuntu SMP Fri May 2 21:32:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="12.04.4 LTS, Precise Pangolin"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu precise (12.04.4 LTS)"
VERSION_ID="12.04" 

[2] https://gist.github.com/lcarstensen/10513578

@lcarstensen
Copy link

@lcarstensen lcarstensen commented May 22, 2014

With 0.11.1-4 from koji on RHEL 6.5 with native (not LXC) and selinux enabled I haven't been able to reproduce docker rm issues, either with my script or without, over the last week.  Using the native execution driver seems like the key on RHEL.

@srid
Copy link
Contributor

@srid srid commented May 22, 2014

fwiw, we are using the native execution driver:

$ docker info
Containers: 9
Images: 132
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 152
Execution Driver: native-0.2
Kernel Version: 3.11.0-20-generic
Debug mode (server): true
Debug mode (client): false
Fds: 109
Goroutines: 163
EventsListeners: 2
Init Path: /usr/bin/docker
$

@srid
Copy link
Contributor

@srid srid commented May 22, 2014

@alexlarsson -

Maybe we can drop the private on /var/lib/docker once we start mounting container roots inside the container namespace, as we then won't have any long-running mounts visible in the root namespace (only short-lived ones).

could you explain what you mean by "drop the private on /var/lib/docker" and "make /var/lib/docker --private in the daemon"?

just as you've observed, i'm seeing aufs mnt directories appearing but in the mount namespace of several processes.

@alexlarsson
Copy link
Contributor

@alexlarsson alexlarsson commented May 22, 2014

@srid At startup docker effectively does mount --make-rprivate /var/lib/docker to work around some efficiency problems. Without this every mount (from e.g. a container start) is broadcasted to all sub-namespaces (including all other containers), which leads to a O(n^2) slowness.

The problem with it being private is that unmounts in the global namespace are not sent to the containers either. Which causes EBUSY issues like the above if something creates a new mount namespace and doesn't unmount uninteresting parts of the hierarchy.

@cywjackson
Copy link

@cywjackson cywjackson commented May 22, 2014

ran into this in our prod today. docker was 0.9.1
this solution allowed me to resolve the problem without restarting the host :)
so for those who are interested:
https://coderwall.com/p/h24pgw

@vieux
Copy link
Contributor

@vieux vieux commented May 27, 2014

@cywjackson can you tell us which mountpoints are still there before running the your umount all command ?

@cywjackson
Copy link

@cywjackson cywjackson commented May 28, 2014

hey @vieux , unfortunately i can't now, since we've resolved our issue. But i think it's pretty much all the containers and graphs (and maybe the aufs?) ... But if it helps, our problem probably triggered by running an older docker version to begin with (0.76_?), then chef-client was run and updated the version to > 0.9_. During the process it restarted the daemon but probably not the containers, resulting the filesystem looks as if it is still being used and the containers is still in the "running" state. We did examine the config.json in those containers' paths and it has the Running flag as true. (Simply updated that to false didn't resolve the problem though).

i guess for us, we should have stopped the containers first before running the chef-client.

@vieux
Copy link
Contributor

@vieux vieux commented May 29, 2014

Anybody has an easy way to reproduce ?

@vieux vieux removed this from the 1.0 milestone Jun 3, 2014
@nickleefly
Copy link

@nickleefly nickleefly commented Jun 12, 2014

@vieux I can reproduce with latest docker

$ boot2docker status
running

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu              latest              ad892dd21d60        7 days ago          275.5 MB
busybox             ubuntu-14.04        37fca75d01ff        7 days ago          5.609 MB
busybox             ubuntu-12.04        fd5373b3d938        7 days ago          5.455 MB
busybox             latest              a9eb17255234        7 days ago          2.433 MB

# run a few times
$ docker run  a9eb17255234 echo Hello world

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
32a3a2a21df0        busybox:latest      echo hello world    35 minutes ago      Exited (0) 35 minutes ago                       evil_mayer
bec3a46051f0        busybox:latest      echo hello world    35 minutes ago      Exited (0) 35 minutes ago                       hopeful_mclean
274ac126eb1f        busybox:latest      echo hello world    35 minutes ago      Exited (0) 35 minutes ago                       sharp_ptolemy

$ docker info
Containers: 3
Images: 12
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Dirs: 18
Execution Driver: native-0.2
Kernel Version: 3.14.1-tinycore64
Debug mode (server): true
Debug mode (client): false
Fds: 11
Goroutines: 10
EventsListeners: 0
Init Path: /usr/local/bin/docker
Username: nickleefly
Registry: [https://index.docker.io/v1/]

$ docker version
Client version: 1.0.0
Client API version: 1.12
Go version (client): go1.2.1
Git commit (client): 63fe64c
Server version: 1.0.0
Server API version: 1.12
Go version (server): go1.2.1
Git commit (server): 63fe64c

$ docker ps -a | grep Exit | awk '{print $1}' | sudo xargs docker rm
dial unix /var/run/docker.sock: no such file or directory
dial unix /var/run/docker.sock: no such file or directory
dial unix /var/run/docker.sock: no such file or directory
2014/06/11 22:34:25 Error: failed to remove one or more containers

But if I do

docker rm containerID

It could remove stopped containers

@geku
Copy link

@geku geku commented Jul 12, 2014

@vieux I still have this problem with version Docker 1.1.1 and can reproduce it. It only happens when one or more ports are published to the host. If no ports are published forced remove works.

How to reproduce

Docker install is fresh: no image pulled and no other container run previously.

$ docker pull tutum/redis
$ docker run -d -p 6379 tutum/redis
$ docker ps
CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS                     NAMES
5ffa59ef0879        tutum/redis:latest   /run.sh             2 seconds ago       Up 1 seconds        0.0.0.0:49153->6379/tcp   compassionate_ardinghelli
$ docker rm -f 5ffa59ef0879
Error response from daemon: Cannot destroy container 5ffa59ef0879: Driver devicemapper failed to remove root filesystem 5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5: Device is Busy
2014/07/12 12:40:12 Error: failed to remove one or more containers

I simply chose tutum/redis as it is a simple daemon, but had the problem with the ubuntu:14.04 image as well. As far as I remember I didn't have the problem with Docker 0.9.

Could somebody please try to reproduce the problem with the same setup as mine, thanks.

Environment:

Ubuntu 14.04 running with Vagrant/VirtualBox on OSX, the exact image is ubuntu/trusty64. Docker v1.1.1 is installed through official Docker repository:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sudo sh -c "echo deb http://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
sudo apt-get update
sudo apt-get install -y lxc-docker
Docker Info
$ docker info
Containers: 0
Images: 16
Storage Driver: devicemapper
 Pool Name: docker-8:1-140092-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 676.6 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.3 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.13.0-24-generic
WARNING: No swap limit support
Docker log /var/log/upstart/docker.log
[bd9f5304.initserver()] Creating pidfile
[bd9f5304.initserver()] Setting up signal traps
[bd9f5304] -job initserver() = OK (0)
[bd9f5304] +job acceptconnections()
[bd9f5304] -job acceptconnections() = OK (0)
2014/07/12 12:34:28 GET /v1.13/containers/json
[bd9f5304] +job containers()
[bd9f5304] -job containers() = OK (0)
2014/07/12 12:34:36 POST /images/create?fromImage=tutum%2Fredis&tag=
[bd9f5304] +job pull(tutum/redis, )
[bd9f5304] -job pull(tutum/redis, ) = OK (0)
2014/07/12 12:39:29 POST /v1.13/containers/create
[bd9f5304] +job create()
[bd9f5304] -job create() = OK (0)
2014/07/12 12:39:30 POST /v1.13/containers/5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5/start
[bd9f5304] +job start(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5)
[bd9f5304] +job allocate_interface(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5)
[bd9f5304] -job allocate_interface(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5) = OK (0)
[bd9f5304] +job allocate_port(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5)
[bd9f5304] -job allocate_port(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5) = OK (0)
[bd9f5304] -job start(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5) = OK (0)
2014/07/12 12:39:31 GET /v1.13/containers/json
[bd9f5304] +job containers()
[bd9f5304] -job containers() = OK (0)
2014/07/12 12:39:54 GET /v1.13/containers/json
[bd9f5304] +job containers()
[bd9f5304] -job containers() = OK (0)
2014/07/12 12:40:01 DELETE /v1.13/containers/5ffa59ef0879?force=1
[bd9f5304] +job container_delete(5ffa59ef0879)
[bd9f5304] +job release_interface(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5)
2014/07/12 12:40:01 Stopping proxy on tcp/[::]:49153 for tcp/172.17.0.2:6379 (accept tcp [::]:49153: use of closed network connection)
[bd9f5304] -job release_interface(5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5) = OK (0)
Cannot destroy container 5ffa59ef0879: Driver devicemapper failed to remove root filesystem 5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5: Device is Busy
[bd9f5304] -job container_delete(5ffa59ef0879) = ERR (1)
[error] server.go:1048 Error making handler: Cannot destroy container 5ffa59ef0879: Driver devicemapper failed to remove root filesystem 5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5: Device is Busy
[error] server.go:90 HTTP Error: statusCode=500 Cannot destroy container 5ffa59ef0879: Driver devicemapper failed to remove root filesystem 5ffa59ef08796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5: Device is Busy

@tiborvass
Copy link
Collaborator

@tiborvass tiborvass commented Jul 15, 2014

I could reproduce this on devicemapper too with latest docker on CentOS 6.5

@stuartpb
Copy link

@stuartpb stuartpb commented Aug 16, 2014

I'm seeing this issue (removing a container with ports bound giving a "Device is Busy" error) in my own tests.

The weirdest thing is, it seems that it actually does remove the container, just after it prints this error and crashes the script running the docker rm.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 16, 2014

Think I've seen similar behavior sometimes, with the same Device is busy on "/some/path/[container-id]-removed", and was unable to find any directory with a -removed suffix (might have been -deleted), will have to look if I have saved those messages somewhere.

@howitzers
Copy link

@howitzers howitzers commented Aug 19, 2014

Same device is busy error. Ubuntu trusty, docker 1.1.2, defaults.

docker version

Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070

docker info

Containers: 91
Images: 181
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Dirs: 3851
Execution Driver: native-0.2
Kernel Version: 3.13.0-34-generic

/var/log/upstart/docker.log

2014/08/19 15:30:41 DELETE /v1.13/containers/859796f54423
[069e87c2] +job container_delete(859796f54423)
Cannot destroy container 859796f54423: Driver aufs failed to remove root filesystem 859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11: rename /var/lib/docker/aufs/mnt/859796f544232c45fc7c086f8e20fa38ed79689be5256235
696c091bc88f8b11 /var/lib/docker/aufs/mnt/859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11-removing: device or resource busy
[069e87c2] -job container_delete(859796f54423) = ERR (1)
[error] server.go:1048 Error making handler: Cannot destroy container 859796f54423: Driver aufs failed to remove root filesystem 859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11: rename /var/lib/docker/aufs/mnt/859
796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11 /var/lib/docker/aufs/mnt/859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11-    removing: device or resource busy
[error] server.go:90 HTTP Error: statusCode=500 Cannot destroy container 859796f54423: Driver aufs failed to remove root filesystem 859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11: rename /var/lib/docker/aufs/mnt/
859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11 /var/lib/docker/aufs/mnt/859796f544232c45fc7c086f8e20fa38ed79689be5256235696c091bc88f8b11-removing: device or resource busy
2014/08/19 15:30:45 DELETE /v1.13/containers/859796f54423
[069e87c2] +job container_delete(859796f54423)

The server 500's breaking scripts, but it looks like in these cases the container actually is removed.

This container does have ports exposed to the host as well as bindmounted volumes (it's from fig up); similar cases were mentioned upthread and it might be relevant.

These errors aren't new; I've seen them since the early days at varying rates, but it's kind of silly to have to hack around them all the time.

@thaJeztah is this what you were seeing?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 19, 2014

@howitzers yes! Exactly those kind of messages; and your example contains the -removing suffix (which I incorrectly remembered as -removed)

Also, I think (at least some of) my containers were started/created using fig as well. Not sure if this is cause of this problem, but just to add that info for my situation as well. @bfirsh ?

Thanks!

@howitzers
Copy link

@howitzers howitzers commented Aug 19, 2014

Fig just uses the plain docker remote API with no funny stuff, so the naughty behavior is definitely docker's business, not fig's. What fig does do, though, is a lot of container recreate and delete, making hitting this more likely (it's got a very race condition flavor to it).

Poked around a little more and this only happens after creating/removing a lot of containers (I usually hit this error under my own stress tests). @cywjackson's unmount suggestion does not fix it, nor does reboot.

What does fix it for me is a total wipe of /var/lib/docker, so it looks like some resource or timing issue with a lot of containers.

In any case, a convenient way to repro this is to just loop fig up -d with a few host-bound services in the fig.yaml. You'll eventually error out when the removes start 500'ing in the aufs driver as above.

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Aug 19, 2014

I'm thinking about closing this issue because it has been open for too long and has become some sort of catch all for any type of error remotely related to a failed rm.

I think we will be better able to debug and fix new issues, reported from new docker versions if we have separate issues opened that are current and easier to ready. Any reason why we should keep this open right now?

@stuartpb
Copy link

@stuartpb stuartpb commented Aug 19, 2014

While it's definitely true that this has been open for too long and become a catch-all for lots of different bugs (holy cow, 60 participants), I think it should be left open (possibly with a name change) to address this specific current issue (where docker rm specifically fails with Cannot destroy container abcd123456: Driver devicemapper failed to remove root filesystem abcd123456796a2526e2d7b7c2a980f30f37f2216112cc764725d2c99a9aa6d5: Device is Busy), and then close it once that specific race is resolved.

@howitzers
Copy link

@howitzers howitzers commented Aug 19, 2014

I split out the specific 1.1.2 "failed to remove root filesystem" case to a new issue with a backref, which might be easier to track.

Can close either that one or this one, but there's definitely a specific open issue here, between myself, stuartpb, and thaJeztah.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 20, 2014

@crosbymichael (and indeed, wow, 59 others) I agree on closing this; the title has become outdated and it is collecting similar, but unrelated, issues (guilty myself I think).

If the original devicemapper failed to remove root filesystem issue still exists (can anybody confirm?) I think a new issue should be created for that with a clear title and a link to the other issue so that people are guided to the right one.

Please if this issue is closed, add a clear comment to explain why and point people to the right issues as well.

@unclejack
Copy link
Contributor

@unclejack unclejack commented Aug 20, 2014

I'm closing and locking this issue right now. I agree that it's become a catch all for any failure to remove a container in any conditions.

Please try to find existing issues related to the problem you're running into with the specific backend (btrfs, devicemapper, aufs) and comment there. If there are no existing issues which seem to be about the same problem using the same storage backend, please open a new issue.

@unclejack unclejack closed this Aug 20, 2014
@moby moby locked and limited conversation to collaborators Aug 20, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet