Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error removing container (1.10, 1.11/master) with AUFS #21704

Closed
vikstrous opened this issue Mar 31, 2016 · 79 comments
Closed

error removing container (1.10, 1.11/master) with AUFS #21704

vikstrous opened this issue Mar 31, 2016 · 79 comments

Comments

@vikstrous
Copy link
Contributor

@vikstrous vikstrous commented Mar 31, 2016

I've been seeing this error in our integration tests a lot recently:

Error response from daemon: 500 Internal Server Error: Driver aufs failed to remove root filesystem 36382c720964b0560df5fb858af8197169ee4eb399906c0e65c4ca85d795941e: rename /var/lib/docker/aufs/mnt/e7d36cc07ee4aad50f61259bea24876cc925f3c417b6d5ea9c2c1b055d243c82 /var/lib/docker/aufs/mnt/e7d36cc07ee4aad50f61259bea24876cc925f3c417b6d5ea9c2c1b055d243c82-removing: device or resource busy

This happens when a container is being removed and causes our tests to fail. I've seen it only on aufs so far.

Output of docker version:

$ docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3-cs2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   f02424d
 Built:        Thu Mar 17 21:52:14 2016
 OS/Arch:      linux/amd64
$ docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0-dev
 API version:  1.24
 Go version:   go1.5.3
 Git commit:   dd94c88
 Built:        Thu Mar 31 21:32:39 2016
 OS/Arch:      linux/amd64

Additional environment details (AWS, VirtualBox, physical, etc.):
This is happening on AWS with AUFS

Steps to reproduce the issue:
unknown

Describe the results you received:
500 error from the daemon

Describe the results you expected:
the container should be removed without an error

Additional information you deem important (e.g. issue happens only occasionally):
It happens less than half of the time

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Mar 31, 2016

Can you give the output of docker info as well?

@vikstrous
Copy link
Contributor Author

@vikstrous vikstrous commented Apr 1, 2016

$ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.11.0-dev
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 0
 Dirperm1 Supported: false
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-53-generic
Operating System: Ubuntu 14.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.305 GiB
Name: jenkins-dtr-integration-2023
ID: A64M:T365:F3GT:OPMD:H3YO:AQFH:65Y6:H2YZ:PHGN:4KZI:2BC5:ISLE
Username: dockerbuildbot
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
 provider=amazonec2
@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Apr 1, 2016

How recent is your master? I think we fixed this.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 1, 2016

@cpuguy83 looks like he's on dd94c88

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 1, 2016

Is this a duplicate of #21111 and #21101 ?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 1, 2016

Oh, and #17902

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 1, 2016

ping @anusha-ragunathan would you be able to look into this? I linked various related / similar issues above

@vikstrous
Copy link
Contributor Author

@vikstrous vikstrous commented Apr 2, 2016

I think the daemon logs from one successful run of our integration tests and one run that caused this error will be helpful, but I'm not sure if I can share them publicly here. If you have access, check these out:

success: https://ci.qa.aws.dckr.io/job/dtr-deploy/2749/artifact/integration/results/docker.log
failure: https://ci.qa.aws.dckr.io/job/dtr-deploy/2753/artifact/integration/results/docker.log

They are not exactly from the same PR, but they are very similar.

There is a potentially relevant one earlier in the logs:


�[34mINFO�[0m[0058] Failed to send signal 15 to the process, force killing 
�[31mERRO�[0m[0058] Handler for POST /v1.15/containers/2b0a7117aff26868e3f0cfaa29c60146199bc435a388ff79025a7ef951479410/stop returned error: Cannot stop container 2b0a7117aff26868e3f0cfaa29c60146199bc435a388ff79025a7ef951479410: Cannot kill container 2b0a7117aff26868e3f0cfaa29c60146199bc435a388ff79025a7ef951479410: rpc error: code = 2 desc = "no such process" 

This is the complete error at the time of the failed container delete:


�[31mERRO�[0m[0148] Error removing mounted layer e8b084c4ad5b491c20d610842ac96d22c457440418ebfc6a6c941d837ecdce72: rename /var/lib/docker/aufs/diff/e46c976939ee6366109ffb8bb95b09ed0ddd5f0c08f100040ed1abc656317c82 /var/lib/docker/aufs/diff/e46c976939ee6366109ffb8bb95b09ed0ddd5f0c08f100040ed1abc656317c82-removing: device or resource busy 
�[31mERRO�[0m[0148] Handler for DELETE /v1.15/containers/e8b084c4ad5b491c20d610842ac96d22c457440418ebfc6a6c941d837ecdce72 returned error: Driver aufs failed to remove root filesystem e8b084c4ad5b491c20d610842ac96d22c457440418ebfc6a6c941d837ecdce72: rename /var/lib/docker/aufs/diff/e46c976939ee6366109ffb8bb95b09ed0ddd5f0c08f100040ed1abc656317c82 /var/lib/docker/aufs/diff/e46c976939ee6366109ffb8bb95b09ed0ddd5f0c08f100040ed1abc656317c82-removing: device or resource busy 
�[31mERRO�[0m[0148] Handler for GET /v1.15/containers/e8b084c4ad5b491c20d610842ac96d22c457440418ebfc6a6c941d837ecdce72/json returned error: No such container: e8b084c4ad5b491c20d610842ac96d22c457440418ebfc6a6c941d837ecdce72 
�[31mERRO�[0m[0148] Handler for DELETE /v1.15/containers/155af974b06e6c09e0f59594812e4e0139e2a5f63a2fe22ca6e9693dccb4491f returned error: Unable to remove filesystem for 155af974b06e6c09e0f59594812e4e0139e2a5f63a2fe22ca6e9693dccb4491f: remove /var/lib/docker/containers/155af974b06e6c09e0f59594812e4e0139e2a5f63a2fe22ca6e9693dccb4491f/shm: device or resource busy 

It's interesting that the same error log appears when we restart the daemon earlier in the test.

@vikstrous
Copy link
Contributor Author

@vikstrous vikstrous commented Apr 2, 2016

If I had to guess, I'd say there are left over processes referencing the same layers from when the daemon tried to restart and failed to properly kill them.

@anusha-ragunathan
Copy link
Contributor

@anusha-ragunathan anusha-ragunathan commented Apr 6, 2016

@vikstrous : I cannot access the jenkins logs. Can you create a gist of the logs? I tried a quick test of creating and removing containers in a loop of 15 (not concurrent) on AUFS and didnt observe this issue. Is there a deterministic way to repro the issue?

Can you confirm that the containers start successfully? If yes, then a couple of things to proceed on:

  • If there was another concurrent request to stop container. This would result in a race and the rename in the context on the second request would error out. You can check the existence of the corresponding diff file. If its doesnt exist, then its most likely a race.
  • In 1.11, we recently changed the way reference counts work in aufs (and other graph drivers). If you can run some instrumented builds, then I can send over a docker binary to debug this more.
@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 6, 2016

@vikstrous @anusha-ragunathan please post it on slack if the jenkins log contains information that should not be shared publicly 👍

@vikstrous
Copy link
Contributor Author

@vikstrous vikstrous commented Apr 6, 2016

I haven't seen this bug since last time I posted in this thread. It's possible that it was fixed. I'll update you if I see it again.

@FelikZ
Copy link

@FelikZ FelikZ commented Apr 7, 2016

I have the similar issue, please have a look.

test.yml:

version: "2"
services:
    browser:
        image: elgalu/selenium:2.53.0e
        ports:
            - "5920:25900"
# - "4444:24444"
# volumes:
#     - "/dev/shm:/dev/shm"
        environment:
          - "VNC_PASSWORD=test"
          - "FIREFOX=false"
          - "CHROME=true"
        networks:
            my-net:
                aliases:
                  - browser
networks:
  my-net:
    driver: bridge

Stdout:

$ docker-compose --version
docker-compose version 1.6.2, build 4d72027
$ docker --version
Docker version 1.10.3, build 20f81dd
$ docker info
Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 142
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 245
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-85-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.58 GiB
Name: tk9
ID: CYDT:VFSD:2M77:W5P7:OQUD:J6G7:EWQR:KJWR:SOUX:JCLZ:2SBG:J7GX
WARNING: No swap limit support
$ docker-compose -f test.yml up -d
Creating network "homelocal_my-net" with driver "bridge"
Creating homelocal_browser_1
$ docker-compose -f test.yml stop
Stopping homelocal_browser_1 ... 

ERROR: for homelocal_browser_1  ('Connection aborted.', BadStatusLine("''",)) 
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
$ docker-compose -f test.yml rm -f
Going to remove homelocal_browser_1
Removing homelocal_browser_1 ... error

ERROR: for homelocal_browser_1  Driver aufs failed to remove root filesystem 0e6e88bcc931eb13e141ac871b4ba965d01aae880a20255a5e974f15dff40b0e: rename /var/lib/docker/aufs/mnt/d4e6ee5ebd3ac40e256afa4492451e25cbea87f5041a1dce0bec7a302f41cc45 /var/lib/docker/aufs/mnt/d4e6ee5ebd3ac40e256afa4492451e25cbea87f5041a1dce0bec7a302f41cc45-removing: device or resource busy 
$ docker-compose -f test.yml rm -f
Going to remove homelocal_browser_1
Removing homelocal_browser_1 ... error
@FelikZ
Copy link

@FelikZ FelikZ commented Apr 7, 2016

And this probably related as well #21845

@FelikZ
Copy link

@FelikZ FelikZ commented Apr 7, 2016

@cpuguy83 looks like it does not related to aufs...

Trying to solve this, I switched to overlayfs and see the same picture:

Error response from daemon: Driver overlay failed to remove root filesystem 8b21bec99eccde191ca98e944003274c5b45bbf6f1e4cc08560c0e454e5d3719: readdirent: no such file or directory
$ docker info
Containers: 4
 Running: 0
 Paused: 0
 Stopped: 4
Images: 44
Server Version: 1.10.3
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.19.0-58-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.58 GiB
Name: tk9
ID: CYDT:VFSD:2M77:W5P7:OQUD:J6G7:EWQR:KJWR:SOUX:JCLZ:2SBG:J7GX
WARNING: No swap limit support
@anusha-ragunathan
Copy link
Contributor

@anusha-ragunathan anusha-ragunathan commented Apr 7, 2016

@FelikZ : Can you upgrade to docker-engine 1.11 rc4 and try again?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Apr 18, 2016

ping @FelikZ do you still see this on 1.11.0?

@ncadou
Copy link

@ncadou ncadou commented May 24, 2016

Seeing this frequently on different machines, all on 1.11.1 plus aufs. Most are on 14.04 LTS (3.13 kernel).

@jbeda
Copy link
Contributor

@jbeda jbeda commented May 29, 2016

I just saw this when trying to start a container using the gcplog logdriver that wasn't able to launch successfully.

docker run -d --name my-container --log-driver=gcplogs --log-opt gcp-log-cmd=true [...]
38e6a733b02a825dc97208ee2436d31353480bfe31cfa4799cd48203d746fe6e
docker: Error response from daemon: Failed to initialize logging driver: unable to connect or authenticate with Google Cloud Logging: googleapi: Error 403: Google Cloud Logging API has not been used in project 784782548624 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging/overview?project=784782548624 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry., forbidden.
$ docker rm my-container
Error response from daemon: Driver aufs failed to remove root filesystem fd5e668e8cd14c1a7a2405b26ea3d75bdfd9f25525447b20f6b400eff02a7a23: rename /var/lib/docker/aufs/diff/ae3e5b240821fe702b68544188663ef092bf173b33e495678800c6c3ea498f92 /var/lib/docker/aufs/diff/ae3e5b240821fe702b68544188663ef092bf173b33e495678800c6c3ea498f92-removing: device or resource busy
# docker info
Containers: 9
 Running: 5
 Paused: 0
 Stopped: 4
Images: 382
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 609
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 6.338 GiB
Name: web
ID: IXJA:3H47:WFQC:GZPE:3WTJ:5W4P:LEKY:OCCF:CDXQ:IDV7:O23X:K7HQ
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

I couldn't find reference to that filesystem or any of those directories when groveling around in /proc or via lsof.

@ensilon
Copy link

@ensilon ensilon commented Jun 24, 2016

I've seen this several times recently also.

# docker info
Containers: 5
 Running: 3
 Paused: 0
 Stopped: 2
Images: 12
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 106
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.849 GiB
Name: docker-internal-01
ID: Z3B7:D5KD:QYKT:YMH3:V57J:MLIK:6HWR:XG3Q:3WR6:RWJV:YOZW:6LNG
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

@servomac
Copy link

@servomac servomac commented Jun 28, 2016

Same here with Ubuntu 14.04

tpiza@neptune:~$ docker info
Containers: 31
 Running: 27
 Paused: 0
 Stopped: 4
Images: 28
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 207
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: host bridge null
Kernel Version: 3.13.0-24-generic
Operating System: Ubuntu 14.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.12 GiB
Name: neptune.placeholder.lan
ID: RVOT:4V5S:Q7KF:DK7A:OKO7:EFAM:RAO4:6ZLF:4OJE:33GM:TQNY:YRYS
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
@lazize
Copy link

@lazize lazize commented Jul 4, 2017

I am facing some similar issue, I don't know if it is related or not.

I have on container, that was created from a service, that is dead, but I can't remove it. I tried to remove using name and id. I already try to stop and remove.

That service doesn't exist anymore, I already removed it. I already restarted the service and even the computer, none helps to remove it.

Is there any manual steps to remove it from the syste?

:~$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
78b0dcaffa89        ubuntu:latest       "bash -c 'while tr..."   31 hours ago        Dead                                    leo.1.bkbjt6w08vgeo39rt1nmi7ock

:~$ **docker stop 78b0dcaffa89**
78b0dcaffa89

:~$ docker rm --force 78b0dcaffa89
Error response from daemon: driver "aufs" failed to remove root filesystem for 78b0dcaffa89ac1e532748d44c9b2f57b940def0e34f1f0d26bf7ea1a10c222b: no such file or directory

:~$ docker stop leo.1.bkbjt6w08vgeo39rt1nmi7ock
leo.1.bkbjt6w08vgeo39rt1nmi7ock

:~$ docker rm --force leo.1.bkbjt6w08vgeo39rt1nmi7ock
Error response from daemon: driver "aufs" failed to remove root filesystem for 78b0dcaffa89ac1e532748d44c9b2f57b940def0e34f1f0d26bf7ea1a10c222b: no such file or directory

:~$ sudo find /var/lib/docker -name "78b0dcaffa89ac1e532748d44c9b2f57b94‌​0def0e34f1f0d26bf7ea‌​1a10c222b"

:~$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
@Puneeth-n
Copy link

@Puneeth-n Puneeth-n commented Jul 4, 2017

@lazize restart the docker daemon to remove the dead container.

Also, if you can, change the storage driver to overlay2 way better than aufs

@lazize
Copy link

@lazize lazize commented Jul 4, 2017

@Puneeth-n I already did it, I even restarted my computer, but the container still there.

Do you have some link to instruct me how to change to overlay2?

EDIT: I changed to overlay2, after restart the service the dead container is lost. All my images also, but I can pull it again.

@Puneeth-n
Copy link

@Puneeth-n Puneeth-n commented Jul 4, 2017

@lazize yes you lose everything cos the storage driver is different.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 4, 2017

@lazize can you open a new issue with details? Docker 17.06 has some changes related to removal of containers; if you're running docker 17.06, we may have to look into that (e.g. ignoring errors where the containers file system was already removed)

@lazize
Copy link

@lazize lazize commented Jul 5, 2017

@thaJeztah As I changed to overlay2, everything was gone, so I don't have the "dead" container anymore.
I definitely can open a new issue with my docker installation information and details, but I will not be able to test solutions, at least not until I face the problem again. Do you believe it help anyway? I am using 17.06-ce.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 5, 2017

@lazize if you still have logs from around the time it happened, that could be welcome (be sure to check them for confidential information)

@lazize
Copy link

@lazize lazize commented Jul 5, 2017

@thaJeztah Unfortunately SystemD wasn't configured to persist logs, it means that I just have logs from my boot of this morning. I changed it to persist logs now, if it happens again I will be able to help much more. Sorry for that!

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 5, 2017

No worries, thanks!

@padeyoung
Copy link

@padeyoung padeyoung commented Oct 10, 2017

I am not a linux expert by any means. I have installed docker-ce in order to use a container called elabftw. It does not start and the error is the one continually referenced in this thread. I am su throughout. I am using 17.09.0-ce, there are in fact dead containers, and I tried service docker restart but the problem persists. A reboot did not fix the problem. This is debian8 jessie amd64. Some captures are included below. Thanks for any help anyone can give me.

FROM THE END OF THE DOCKER INSTALLATION

pauldeyoung@local-root-analysis-3:~$ sudo docker run hello-world
[sudo] password for pauldeyoung:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
  3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://cloud.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/engine/userguide/

TRYING TO START THE CONTAINER
root@local-root-analysis-3:/home/pauldeyoung# elabctl start
elabctl © 2017 Nicolas CARPi - https://www.elabftw.net
Version: 0.6.2
Using configuration file: /etc/elabftw.yml

Removing mysql
ERROR: driver "aufs" failed to remove root filesystem for 4528a3598c4d6b846099a9e00e413cb7f29b1778d1dc73b434caeb8407f1b41f: could not remove diff path for id 1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851: error preparing atomic delete: rename /var/lib/docker/aufs/diff/1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851 /var/lib/docker/aufs/diff/1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851-removing: device or resource busy

INFO ABOUT INSTALLED VERSION OF DOCKER

root@local-root-analysis-3:/home/pauldeyoung# docker version
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:40:46 2017
OS/Arch: linux/amd64

Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:39:27 2017
OS/Arch: linux/amd64
Experimental: false

TRYING A DOCKER RESTART AND GETTING SAME ERROR

root@local-root-analysis-3:/home/pauldeyoung# service docker restart
root@local-root-analysis-3:/home/pauldeyoung# elabctl start
elabctl © 2017 Nicolas CARPi - https://www.elabftw.net
Version: 0.6.2
Using configuration file: /etc/elabftw.yml

Removing mysql
ERROR: driver "aufs" failed to remove root filesystem for 4528a3598c4d6b846099a9e00e413cb7f29b1778d1dc73b434caeb8407f1b41f: could not remove diff path for id 1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851: error preparing atomic delete: rename /var/lib/docker/aufs/diff/1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851 /var/lib/docker/aufs/diff/1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851-removing: device or resource busy

RUNNING A COMMAND FROM THE THREAD TO DIAGNOSE DOCKER ISSUE

root@local-root-analysis-3:/home/pauldeyoung# docker ps -a --filter status="dead"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4528a3598c4d mysql:5.7 "docker-entrypoint..." 3 days ago Dead mysql
388e2c3a5b0a elabftw/elabimg "/run.sh" 4 days ago Dead elabftw
a424097a27fb mysql:5.7 "docker-entrypoint..." 4 days ago Dead a424097a27fb_mysql

@mjmunger
Copy link

@mjmunger mjmunger commented Feb 15, 2018

This is supremely annoying. I have the same issue confirmed on:

Docker version 17.06.2-ce, build cec0b72
Distributor ID: Debian
Description: Debian GNU/Linux 8.9 (jessie)
Release: 8.9
Codename: jessie

Completely stopped all containers and the docker service trying to release this directory. No joy.

The only thing that is odd is that the directory in question ( /var/lib/docker/aufs/diff/1db9ace3915e17dc1a4753b7368598a4d0aacbde37e2f01c5589d3a3ea52f851) is mostly empty directories. The mysqld directory that was in it had a user: group of 999:docker.
As root, I am able to change to this directory and then delete the entire contents of it, but I cannot delete or rename the directory itself.

It appears that the aufs driver is what has a hold of it. Can't rmmod aufs because it's in use by PID #2. Tried rmmod -f (at my own peril) and got a seg fault.

My only recourse at the moment is to reboot when it happens.

Prior to using docker, I rebooted my Linux workstation once a year. It now has to be rebooted with the frequency of a Windows XP machine.

Please fix this or post a workaround.

Something, somewhere is not properly closing a file handle to this directory.

@mjmunger
Copy link

@mjmunger mjmunger commented Feb 15, 2018

...if changing to a different file driver would help, I'm open to that.

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Feb 15, 2018

This should be fixed on master and am working on backports for docker 17.12.
But without knowing exactly what is running that's holding onto the reference it's hard to tell for sure.

Typically what happens here is a mount has leaked into another namespace (could be a container, or even another service started by systemd).

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Feb 15, 2018

btw, you can use this to sniff out what's holding onto the mount reference: https://github.com/rhvgoyal/misc/blob/master/find-busy-mnt.sh

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Feb 15, 2018

That one is for devmapper, but can be modified for aufs.

@mjmunger
Copy link

@mjmunger mjmunger commented Feb 15, 2018

This should be fixed on master and am working on backports for docker 17.12.
What do I need to do to get to master? This is how we install docker currently.

Re: find-busy-mnt.sh
I'll use that next time this fails, and report back if there is anything useful. lsof was not helpful.

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Feb 15, 2018

And actually looking closer, should work for any graphdriver that does mounts.

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Feb 15, 2018

What do I need to do to get to master?

You can grab a nightly static binary from https://master.dockerproject.org/ and replace dockerd with it... but I wouldn't do this in anything but a test environment.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Feb 15, 2018

Not really announced yet, but we also now have nightly builds in our apt/yum repository, e.g.: https://download.docker.com/linux/ubuntu/dists/xenial/pool/nightly/

@kleptog
Copy link

@kleptog kleptog commented Feb 21, 2018

FWIW, we switched to overlay2 everywhere and never had any (graphdriver) issues since. aufs just doesn't appear to work very well (also solves the core-dumps-in-images issue).

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Jul 7, 2018

Closing because his is fixed in 17.12.1 and 18.03+

@cpuguy83 cpuguy83 closed this Jul 7, 2018
@moritz
Copy link

@moritz moritz commented Jul 23, 2018

I still get this error occasionally, on Debian 8.11 with kernel 3.16.0-6-amd64 and docker-ce 18.06.0ce3 obtained from https://download.docker.com/linux/debian/.

@cpuguy83 can you please reopen?

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented Jul 31, 2018

@moritz What's the exact error you received and how did you get it? Do you have daemon logs for this time period?

@ushuz
Copy link

@ushuz ushuz commented Jul 11, 2019

@cpuguy83 I'm having this error with docker run on 18.06.1

time="2019-07-11T16:01:05+08:00" level=error msg="Error waiting for container: container 408746bd1a8ab81d5824a39dca5db819025fc6b54d4f34a3251464cdad42c12a: driver \"aufs\" failed to remove root filesystem: could not remove diff path for id 0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b: error preparing atomic delete: rename /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b-removing: device or resource busy"

Corresponding daemon log:

time="2019-07-11T16:01:04.926716219+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2019-07-11T16:01:05.928152001+08:00" level=error msg="Error removing mounted layer 408746bd1a8ab81d5824a39dca5db819025fc6b54d4f34a3251464cdad42c12a: could not remove diff path for id 0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b: error preparing atomic delete: rename /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b-removing: device or resource busy"
time="2019-07-11T16:01:05.928300323+08:00" level=error msg="error removing container" container=408746bd1a8ab81d5824a39dca5db819025fc6b54d4f34a3251464cdad42c12a error="container 408746bd1a8ab81d5824a39dca5db819025fc6b54d4f34a3251464cdad42c12a: driver \"aufs\" failed to remove root filesystem: could not remove diff path for id 0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b: error preparing atomic delete: rename /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b /data/docker/aufs/diff/0b808bfc9d5d68e74bbc5fa952906afc44074c43224846390e8c8011d0b7825b-removing: device or resource busy"
time="2019-07-11T16:01:08.136034325+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

docker version

$ docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:58 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:24 2018
  OS/Arch:          linux/amd64
  Experimental:     false

docker info

$ docker info
Containers: 18
 Running: 8
 Paused: 0
 Stopped: 10
Images: 36
Server Version: 18.06.1-ce
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Backing Filesystem: extfs
 Dirs: 167
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 5e8e0171f9f9df0308ef497f668194a0a348e7a7 (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Security Options:
 apparmor
Kernel Version: 3.13.0-119-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.42GiB
Name:
ID: GTGZ:A2H6:HKSC:VHAP:BAAN:HKXS:TEEV:3MVS:7ANQ:PKL7:62EZ:GGDD
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
Live Restore Enabled: false

WARNING: No swap limit support
@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 11, 2019

Note that Docker 18.06 reached EOL and is no longer maintained; in addition Ubuntu 14.04 reached EOL on April 30, so I'd recommend upgrading both.

Docker 18.06.1 uses a version of runc that has a critical vulnerability (CVE-2019-5736), which has been addressed in Docker 18.06.3, which ships with a patched version of runc. However, running that version requires you to upgrade to a 4.x kernel.

If possible, I'd recommend upgrading, and consider using the overlay2 storage driver (which is now the default for all distros); switching storage driver will cause you to no longer have access to existing local images and containers, so you may have to push them to a registry first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet