Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers exiting with code 127, device-mapper device busy #8176

Closed
mikejholly opened this issue Sep 23, 2014 · 21 comments
Closed

Containers exiting with code 127, device-mapper device busy #8176

mikejholly opened this issue Sep 23, 2014 · 21 comments

Comments

@mikejholly
Copy link

We're running into a pretty critical issue with our Docker container setup. Some containers appear to be arbitrarily exiting with 127 exit codes. The 127 exit code doesn't make sense since the containers appear to be running fine beforehand. From what I understand this exit code indicates that the command was not found.

When we try to restart or rm the containers in question we see the errors below. The two problems appear to be related since we only see the errors with the 127'd containers.

Please note we are running Docker with the flag -g /ebs/docker to use EBS for the Docker root. We're doing this because our images are quite large and they quickly outgrow the standard EC2 volumes.

When I try to restart the container:

$ docker restart e2b28c2732ae
Error response from daemon: Cannot restart container e2b28c2732ae: Error getting container e2b28c2732ae212469df0d8f8d1d964d96549e0ea9fa5c795c9b42fb832ac9e8 from driver devicemapper: Error mounting '/dev/mapper/docker-202:16-2621442-e2b28c2732ae212469df0d8f8d1d964d96549e0ea9fa5c795c9b42fb832ac9e8' on '/ebs/docker/devicemapper/mnt/e2b28c2732ae212469df0d8f8d1d964d96549e0ea9fa5c795c9b42fb832ac9e8': device or resource busy
2014/09/23 02:04:26 Error: failed to restart one or more containers

When I try to remove the container:

$ docker rm e55
Error response from daemon: Cannot destroy container e55d46116006: Driver devicemapper failed to remove root filesystem e55d46116006b7232ae99b2aa8937b0e47f974df01cabd6cad0f277df194df6f: Device is Busy

But it seems like the container is actually removed:

$ docker rm e2b28c2732ae
Error response from daemon: No such container: e2b28c2732ae
2014/09/23 02:05:19 Error: failed to remove one or more containers

Other info:

DOCKER_OPTS="-g /ebs/docker -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
$ uname -a
Linux ip-10-0-2-128 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f
$ docker -D info
Containers: 78
Images: 1022
Storage Driver: devicemapper
 Pool Name: docker-202:16-2621442-pool
 Pool Blocksize: 64 Kb
 Data file: /ebs/docker/devicemapper/devicemapper/data
 Metadata file: /ebs/docker/devicemapper/devicemapper/metadata
 Data Space Used: 15093.7 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 47.7 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.13.0-29-generic
Operating System: Ubuntu 14.04.1 LTS
Debug mode (server): false
Debug mode (client): true
Fds: 109
Goroutines: 66
EventsListeners: 0
Init Path: /usr/bin/docker
WARNING: No swap limit support
@vbatts
Copy link
Contributor

vbatts commented Sep 23, 2014

@mikejholly so these are running on AWS? Can you provide the type of the root fs and snippets of the daemon log where this occurs (hopefully with -D debugging enabled)?

@mikejholly
Copy link
Author

@vbatts Thanks for your reply.

$ cat /etc/fstab
LABEL=cloudimg-rootfs   /    ext4   defaults,discard    0 0
/dev/xvdb       /ebs   ext4    defaults,nofail        0       2

My log file doesn't exist at /var/log/docker.log. Is that caused by the custom data dir -g flag? I don't see a flag for log file location here http://docs.docker.com/reference/commandline/cli/#daemon

@vbatts
Copy link
Contributor

vbatts commented Sep 23, 2014

@mikejholly on ubuntu, your log is likely /var/log/upstart/docker.io.log

@mikejholly
Copy link
Author

@vbatts thanks for the tip.

Here's a bit of output from when I attempt to rm apps in this state. I'll see if I can find more debug info.

[info] DELETE /v1.14/containers/924e565031bb
[7d1ca966] +job delete(924e565031bb)
Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[7d1ca966] -job delete(924e565031bb) = ERR (1)
[error] server.go:1062 Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[error] server.go:91 HTTP Error: statusCode=500 Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[info] DELETE /v1.14/containers/8005060fe2b2
[7d1ca966] +job delete(8005060fe2b2)
Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy
[7d1ca966] -job delete(8005060fe2b2) = ERR (1)
[error] server.go:1062 Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy
[error] server.go:91 HTTP Error: statusCode=500 Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy

@mikejholly
Copy link
Author

@vbatts It seems like the 127 containers were killed due to lack of available memory. I had swap disabled. Does it sound like that could be the problem? How does docker behave when processes are killed?

@vbatts
Copy link
Contributor

vbatts commented Sep 24, 2014

@mikejholly i'm not sure that running out of memory is it. I've just written an application to use up all the system memory and with swap turned off existing containers run, and new containers are started, just slowly. I'll keep digging.

@mikejholly
Copy link
Author

@vbatts Hmm. Ok thanks. Need anything more from me?

@vbatts
Copy link
Contributor

vbatts commented Sep 24, 2014

@mikejholly a solution!? :-)
Not really unless you have concise steps to reproduce getting the 127 error

@public
Copy link

public commented Sep 28, 2014

@vbatts There's a script to reproduce this in #8189

@vbatts
Copy link
Contributor

vbatts commented Sep 28, 2014

Thanks. I had found that when @jessfraz closed it.
I have been using it, though mine fails sound the 100th container.
Frustratingly, in my debugging, I added logic to get more information about
the situation and now it has not been failing on this issue. Though I'm
certain that my debugging did not fix it, it has sent me on a rabbit chase
of devicemapper code.
On Sep 28, 2014 7:59 AM, "Alex Stapleton" notifications@github.com wrote:

@vbatts https://github.com/vbatts There's a script to reproduce this in
#8189 #8189


Reply to this email directly or view it on GitHub
#8176 (comment).

@public
Copy link

public commented Sep 28, 2014

We were getting this issue very regularly in our CI when we switched to device-mapper. Like within an hour of the machine coming up regularly. We ran about 30 concurrent containers on the host. I think we had some dmesg output related to some kind of internal device mapper tables being full? Sorry I don't have a proper trace for you :)

It was so bad we've given up on that backend and gone back to AUFS these days. Which is slower but at least seems to be reliable.

@ghost
Copy link

ghost commented Oct 9, 2014

UPDATE: Despite the error - it does appear to be removing the container now (it wasn't before).

Getting the same issue here -- I hope this detail helps troubleshoot!

> docker rm -f 672c1482bf99
Error response from daemon: Cannot destroy container 672c1482bf99: Driver devicemapper failed to remove root filesystem 672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a: Device is Busy

Running on an Amazon CentOS EC2 instance:

> uname -r  
3.4.103-76.114.amzn1.x86_64
> docker version
Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f
> docker info
Containers: 4
Images: 7
Storage Driver: devicemapper
 Pool Name: docker-202:1-280651-pool
 Pool Blocksize: 64 Kb
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1555.8 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.8 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.4.103-76.114.amzn1.x86_64
Operating System: <unknown>
Debug mode (server): true
Debug mode (client): false
Fds: 11
Goroutines: 11
EventsListeners: 0
Init Path: /usr/bin/docker
WARNING: No memory limit support
WARNING: No swap limit support

Finally, here's the logs from /var/log/docker for the docker rm -f:

[debug] server.go:1036 Calling DELETE /containers/{name:.*}
[info] DELETE /v1.14/containers/672c1482bf99?force=1
[b9415f80] +job delete(672c1482bf99)
[debug] deviceset.go:281 activateDeviceIfNeeded(672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a)
--- {{ REPEATED ABOUT 100 TIMES }} --
[debug] devmapper.go:545 [devmapper] removeDevice START
[debug] deviceset.go:451 libdevmapper(3): ioctl/libdm-iface.c:1768 (-1) device-mapper: remove ioctl on docker-202:1-280651-672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a failed: Device or resource busy
[debug] devmapper.go:554 [devmapper] removeDevice END
--- {{ REPEATED ABOUT 100 TIMES }} --
[debug] deviceset.go:719 Error removing device: Device is Busy
Cannot destroy container 672c1482bf99: Driver devicemapper failed to remove root filesystem 672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a: Device is Busy
[b9415f80] -job delete(672c1482bf99) = ERR (1)
[error] server.go:1062 Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 672c1482bf99: Driver devicemapper failed to remove root filesystem 672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a: Device is Busy
[error] server.go:91 HTTP Error: statusCode=500 Cannot destroy container 672c1482bf99: Driver devicemapper failed to remove root filesystem 672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a: Device is Busy

@vbatts
Copy link
Contributor

vbatts commented Oct 9, 2014

@kailosbryan a couple of points. Do you have a newer kernel available than 3.4? That surprises me. Also, I'm thinking that "device or resource busy" (which is errno EBUSY) is a different issue than the original "127 error".

It would be great if they're related because I'm on the hunt to put an end to the EBUSY issue.

@ghost
Copy link

ghost commented Oct 9, 2014

@vbatts I don't have a newer kernel at the moment. I just tried to destroy docker and reinstall it from a package. I wasn't able to delete the /var/lib/docker folder manually, so I'm wondering if I have some disk issues:

rm: cannot remove `docker/devicemapper/mnt/672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a/rootfs/usr/sbin/tracepath': Input/output error

I'm bringing online my pre-docker AMI and going to try starting from scratch and see where that gets me. I'll post if that resolves this issue.

@vbatts
Copy link
Contributor

vbatts commented Oct 9, 2014

There is another issue that is tracking I/O issues on AWS.
On Oct 9, 2014 5:36 PM, "Bryan Stone" notifications@github.com wrote:

@vbatts https://github.com/vbatts I don't have a newer kernel at the
moment. I just tried to destroy docker and reinstall it from a package. I
wasn't able to delete the /var/lib/docker folder manually, so I'm wondering
if I have some disk issues:

rm: cannot remove `docker/devicemapper/mnt/672c1482bf99727d9fb2437f6d39c3ab4f702034e4dd11c68d4ef811a559ed3a/rootfs/usr/sbin/tracepath': Input/output error

I'm bringing online my pre-docker AMI and going to try starting from
scratch and see where that gets me. I'll post if that resolves this issue.


Reply to this email directly or view it on GitHub
#8176 (comment).

@vbatts
Copy link
Contributor

vbatts commented Nov 4, 2014

@mikejholly i've just done a quick write-up on a solution for this type issue. Could you review it and check if it fixes the issue for you?
http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/

@neurogenesis
Copy link

also seeing issues with device busy on rm of docker container (AWS + Ubuntu 14 LTS). @vbatts, let me know if you'd like additional log output / versions / etc.. from your blog, looks like you have a pretty good idea of what's causing this. will try the workaround until a more permanent solution is in place.

@gdm85
Copy link
Contributor

gdm85 commented Nov 8, 2014

Is it possible that this happens because container-related operations do not execute synchronously?

I originally ascribed the issue to that, although I have not verified in code.

@erks
Copy link

erks commented Nov 24, 2014

I have this exact problem on centos 6.5 + docker 1.2.0 + devicemapper. Although the containers seem to disappear after the errors, if I restart the docker daemon, they will reappear.

$ sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS                         PORTS                           NAMES
85cad8920094        network/mysql:latest   "/run.sh"           2 days ago          Exited (-127) 52 minutes ago   192.168.50.50:13306->3306/tcp   distracted_engelbart
$ sudo docker rm -f 85cad8920094
Error response from daemon: Cannot destroy container 85cad8920094: Driver devicemapper failed to remove root filesystem 85cad8920094287e8e18db2b5f6e5600aa71de171e235ca22e1f11d52c4a6ca5: Device is Busy
2014/11/24 00:53:01 Error: failed to remove one or more containers
$ sudo docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
$ sudo /sbin/service docker restart
Stopping docker:                                           [  OK  ]
Starting docker:                                       [  OK  ]
$ sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS                         PORTS                           NAMES
85cad8920094        network/mysql:latest   "/run.sh"           2 days ago          Exited (-127) 54 minutes ago   192.168.50.50:13306->3306/tcp   trusting_einstein

@jamshid
Copy link
Contributor

jamshid commented Dec 4, 2014

In case anyone else is looking for the Not Found http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/ it seems to be at http://blog.hashbangbash.com/?p=1281.

Ugh docker on CentOS 6 + device-mapper is a world of pain. If it's better in CentOS 7 seems docker should explicitly recommend people not use CentOS 6.

@vbatts
Copy link
Contributor

vbatts commented Jan 7, 2015

going to close this as a duplicate of #5684 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants