-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Containers exiting with code 127, device-mapper device busy #8176
Comments
@mikejholly so these are running on AWS? Can you provide the type of the root fs and snippets of the daemon log where this occurs (hopefully with |
@vbatts Thanks for your reply. $ cat /etc/fstab
LABEL=cloudimg-rootfs / ext4 defaults,discard 0 0
/dev/xvdb /ebs ext4 defaults,nofail 0 2 My log file doesn't exist at |
@mikejholly on ubuntu, your log is likely |
@vbatts thanks for the tip. Here's a bit of output from when I attempt to rm apps in this state. I'll see if I can find more debug info. [info] DELETE /v1.14/containers/924e565031bb
[7d1ca966] +job delete(924e565031bb)
Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[7d1ca966] -job delete(924e565031bb) = ERR (1)
[error] server.go:1062 Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[error] server.go:91 HTTP Error: statusCode=500 Cannot destroy container 924e565031bb: Driver devicemapper failed to remove root filesystem 924e565031bb884e6177ed9fbfa94fd978710d956dc1d0ce5f8f7f6467789978: Device is Busy
[info] DELETE /v1.14/containers/8005060fe2b2
[7d1ca966] +job delete(8005060fe2b2)
Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy
[7d1ca966] -job delete(8005060fe2b2) = ERR (1)
[error] server.go:1062 Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy
[error] server.go:91 HTTP Error: statusCode=500 Cannot destroy container 8005060fe2b2: Driver devicemapper failed to remove root filesystem 8005060fe2b2881bece92c94b86d09d03133f1c054ebbbc9b44d184182d5f0aa: Device is Busy |
@vbatts It seems like the 127 containers were killed due to lack of available memory. I had swap disabled. Does it sound like that could be the problem? How does docker behave when processes are killed? |
@mikejholly i'm not sure that running out of memory is it. I've just written an application to use up all the system memory and with swap turned off existing containers run, and new containers are started, just slowly. I'll keep digging. |
@vbatts Hmm. Ok thanks. Need anything more from me? |
@mikejholly a solution!? :-) |
Thanks. I had found that when @jessfraz closed it.
|
We were getting this issue very regularly in our CI when we switched to device-mapper. Like within an hour of the machine coming up regularly. We ran about 30 concurrent containers on the host. I think we had some dmesg output related to some kind of internal device mapper tables being full? Sorry I don't have a proper trace for you :) It was so bad we've given up on that backend and gone back to AUFS these days. Which is slower but at least seems to be reliable. |
UPDATE: Despite the error - it does appear to be removing the container now (it wasn't before). Getting the same issue here -- I hope this detail helps troubleshoot!
Running on an Amazon CentOS EC2 instance:
Finally, here's the logs from
|
@kailosbryan a couple of points. Do you have a newer kernel available than 3.4? That surprises me. Also, I'm thinking that "device or resource busy" (which is errno EBUSY) is a different issue than the original "127 error". It would be great if they're related because I'm on the hunt to put an end to the EBUSY issue. |
@vbatts I don't have a newer kernel at the moment. I just tried to destroy docker and reinstall it from a package. I wasn't able to delete the /var/lib/docker folder manually, so I'm wondering if I have some disk issues:
I'm bringing online my pre-docker AMI and going to try starting from scratch and see where that gets me. I'll post if that resolves this issue. |
There is another issue that is tracking I/O issues on AWS.
|
@mikejholly i've just done a quick write-up on a solution for this type issue. Could you review it and check if it fixes the issue for you? |
also seeing issues with device busy on rm of docker container (AWS + Ubuntu 14 LTS). @vbatts, let me know if you'd like additional log output / versions / etc.. from your blog, looks like you have a pretty good idea of what's causing this. will try the workaround until a more permanent solution is in place. |
Is it possible that this happens because container-related operations do not execute synchronously? I originally ascribed the issue to that, although I have not verified in code. |
I have this exact problem on centos 6.5 + docker 1.2.0 + devicemapper. Although the containers seem to disappear after the errors, if I restart the docker daemon, they will reappear.
|
In case anyone else is looking for the Not Found http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/ it seems to be at http://blog.hashbangbash.com/?p=1281. Ugh docker on CentOS 6 + device-mapper is a world of pain. If it's better in CentOS 7 seems docker should explicitly recommend people not use CentOS 6. |
going to close this as a duplicate of #5684 (comment) |
We're running into a pretty critical issue with our Docker container setup. Some containers appear to be arbitrarily exiting with 127 exit codes. The 127 exit code doesn't make sense since the containers appear to be running fine beforehand. From what I understand this exit code indicates that the command was not found.
When we try to restart or rm the containers in question we see the errors below. The two problems appear to be related since we only see the errors with the 127'd containers.
Please note we are running Docker with the flag
-g /ebs/docker
to use EBS for the Docker root. We're doing this because our images are quite large and they quickly outgrow the standard EC2 volumes.When I try to restart the container:
When I try to remove the container:
But it seems like the container is actually removed:
Other info:
DOCKER_OPTS="-g /ebs/docker -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
$ uname -a Linux ip-10-0-2-128 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: