Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing docker container causing lock up #3610

Closed
benbjohnson opened this issue Jan 15, 2014 · 18 comments
Closed

Removing docker container causing lock up #3610

benbjohnson opened this issue Jan 15, 2014 · 18 comments

Comments

@benbjohnson
Copy link

Note: This has only occurred once but I thought I'd report it in case others experience the same issue.

I tried to run docker kill against a container and it wouldn't stop so I tried docker stop but that didn't work either. Finally, I tried running docker rm which reported that it couldn't remove a running container and then soon after it caused the whole SSH session to freeze. At that point the box was inaccessible via any new SSH connections.

Here's a log from the box:

https://gist.github.com/snormore/20c261a4166d8049e238

And the docker version and info:

$ docker version
Client version: 0.7.5
Go version (client): go1.2
Git commit (client): c348c04

$ docker info
Containers: 5
Images: 27
Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 37

/cc: @snormore

@burke
Copy link
Contributor

burke commented Jan 16, 2014

@graemej and I just had a similar issue. We were working on a new Dockerfile:

Step 14 : ADD datadog_send_event /usr/local/bin/datadog_send_event
 ---> Using cache
 ---> 1187fdb27fd1
Step 15 : RUN sysctl kernel.msgmax | awk '{ if ($3 < 1048576)  { print "kernel.msgmax must be at least 1048576, but is " $3 ; exit 1 } }'
 ---> Running in ba02cf799603
2014/01/16 19:11:09 unexpected EOF
make: *** [build] Error 1

This was immediately after restarting dockerd. After this, we ran docker ps which hung. We can not establish new SSH connections to the box. Presumably we have to reboot.

@lgs
Copy link

lgs commented Jan 22, 2014

... it seems something spreading may facets:

https://github.com/search?q=%22remove+containers%22+docker&type=Issues&ref=searchresults

@benders
Copy link

benders commented Jan 24, 2014

This may be the same underlying issue as the problem @poe and I reported in #3744 . The signature of that issue is that we see BUG: soft lockup in the Ubuntu kern.log. In our case the system crash was generally preceded by restarting a docker daemon to try and resolve some other issue (stuck image pull, etc). We have been aggressively trying to isolate the issue, but so far we don't have much more to report.

It would be useful to know if others experiencing host crashes are also seeing the "soft lockup" message and what kernel version and hardware it is occurring on.

@crosbymichael
Copy link
Contributor

Could you all post your kernel version, os, docker info, docker version...

@graemej
Copy link

graemej commented Mar 28, 2014

The environment that @burke and I are using is:

uname -a

Linux hostname 3.8.0-35-generic #50~precise1-Ubuntu SMP Wed Dec 4 17:25:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/issue

Ubuntu 12.04.3 LTS \n \l

docker info

Containers: 37
Images: 355
Driver: aufs
 Root Dir: /u/lib/docker/aufs
 Dirs: 442
WARNING: No swap limit support

docker version

Client version: 0.8.0
Go version (client): go1.2
Git commit (client): cc3a8c8
Server version: 0.8.0
Git commit (server): cc3a8c8
Go version (server): go1.2

@rlpowell
Copy link

I have a docker container that cannot be removed:

rlpowell@ip-10-0-1-16> sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS                     PORTS               NAMES
76863037856d        rlpowell/packer:base   /sbin/docker.init   14 hours ago        Exited (-1) 14 hours ago                       jovial_morse
rlpowell@ip-10-0-1-16>

When I try, this happens in messages:

May 23 23:44:49 ip-10-0-1-16 docker: 2014/05/23 23:44:49 GET /v1.11/containers/json?all=1
May 23 23:44:49 ip-10-0-1-16 docker: [ead23f20] +job containers()
May 23 23:44:49 ip-10-0-1-16 docker: [ead23f20] -job containers() = OK (0)
May 23 23:45:29 ip-10-0-1-16 docker: 2014/05/23 23:45:29 DELETE /v1.11/containers/512a3c4e707d
May 23 23:45:29 ip-10-0-1-16 docker: [ead23f20] +job container_delete(512a3c4e707d)
May 23 23:45:29 ip-10-0-1-16 kernel: [  775.076989] bio: create slab <bio-2> at 2
May 23 23:45:31 ip-10-0-1-16 systemd-udevd: inotify_add_watch(7, /dev/dm-3, 10) failed: No such file or directory

And then the machine totally, irrevcably, hangs. Completely, no response of any kind. ssh sessions won't even disconnect.

Note that the machine running docker is an EC2 VM.

It may come back later, but I've waited more than an hour.

rlpowell@ip-10-0-1-16> sudo docker version
Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99/0.11.1
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99/0.11.1
Go version (server): go1.2.1
Last stable version: 0.11.1
rlpowell@ip-10-0-1-16> sudo docker info
Containers: 1
Images: 8
Storage Driver: devicemapper
 Pool Name: docker-253:0-33556480-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 15140.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 8.1 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.11.10-301.fc20.x86_64
rlpowell@ip-10-0-1-16>

@rlpowell
Copy link

So, uhh, now that I've given my report, how do I get around this? I literally can't remove this container and I very much want to; is there way for me to manually remove it?

@rlpowell
Copy link

I fixed it like so, just in case it helps anybody else:

  sudo mv /tmp/packer-* /var/tmp

  sudo sh -c "mv --backup=t /var/lib/docker/containers/76863037856d0e16051baf71b301f8ee2b634016975bd3d2a14e0b43068d3836 /var/lib/docker/devicemapper/mnt/76863037856d0e16051baf71b301f8ee2b634016975bd3d2a14e0b43068d3836* /var/tmp"

  sudo service docker restart

(that first line is only relevant if you're using packer, and probably isn't
essentialy even then)

@rlpowell
Copy link

So it just happened again. Here's output from my packer/docker run:

    docker: Notice: /Stage[main]/Users/Users::User[chris]/Users::Homedirsetup[chris]/Exec[expand home chris]/returns: executed successfully
    docker: Notice: /Stage[main]/Auth/Exec[fix auth-ac1]/returns: executed successfully
    docker: Notice: /Stage[main]/Users/Users::User[emilio]/Users::Homedirsetup[emilio]/Exec[expand home emilio]/returns: executed successfully
    docker: Notice: /Stage[main]/Users/Users::User[geoff]/User[geoff]/password: created password
    docker: Notice: /Stage[main]/Users/Users::User[geoff]/Users::Homedirsetup[geoff]/Exec[expand home geoff]/returns: executed successfully
    docker: Notice: /Stage[main]/Httpd/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
    docker: Info: /Service[httpd]: Unscheduling refresh on Service[httpd]
    docker: Notice: Finished catalog run in 46.22 seconds
==> docker: Exporting the container
==> docker: Error exporting: exit status 1
==> docker: Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device
==> docker:
==> docker: Killing the container: d61ba6c897b6130bf531bd3f8c3a37230e3d75751541c0068afd51e1ddf01ada
Build 'docker' errored: Error exporting: exit status 1
Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device


==> Some builds didn't complete successfully and had errors:
--> docker: Error exporting: exit status 1
Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device


==> Builds finished but no artifacts were created.

I have no idea what it's talking about ; /dev/stdout isn't a thing that can be
out of space? The smallest free partition at the time had 2.3GiB free, but
/var/lib/docker had 69GiB free, and /tmp/ had 2.5GiB free.

And now trying to rm that container is hanging the entire VM again.

@rlpowell
Copy link

This has happened several more times. -_-

@rlpowell
Copy link

Additional information:

I think, but am not sure, that this started happening after I moved /var/lib/docker to being NFS-based. So if docker relies on NFS operations, that may be relevant.

My basic process here is to create my "base" image, and then use the base image to make my "tomcat" and "mysql" images. Then I run a mysql and a tomcat, then I shut them all down, then I rm everything.

The VM that hangs the kernel seems to normally be the base image that was used to make the tomcat image. Dunno if that matters, but thought I'd mention.

@crosbymichael
Copy link
Contributor

@rlpowell I don't think we support the docker root on NFS. Aufs and other CoW filesystems do not play well on NFS.

@unclejack
Copy link
Contributor

This issue has been opened a very long time ago and many Docker versions have been released since.
The latest occurrence of this issue seems to have something to do with Docker being used on NFS. That's an unsupported setup.

If you happen to run into this issue with a supported setup and an up to date version of Docker, please comment and the issue will be opened again.

@gravyboat
Copy link

gravyboat commented Nov 10, 2015

@unclejack I'm getting this on CentOS 7 over at DigitalOcean (This does not happen locally on my Ubuntu 14 machine running Docker version 1.8.3, build f4bf5c7):

[root@server-test ~]# docker run -d -p 80:80 -v /var/run/docker.sock:/tmp/docker.sock:ro jwilder/nginx-proxy
65641cb2e0f2365eb139df9d11a6d558521c304bfad1520a408cac240ae8db12
[root@server-test ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                         NAMES
65641cb2e0f2        jwilder/nginx-proxy   "/app/docker-entrypoi"   4 seconds ago       Up 3 seconds        0.0.0.0:80->80/tcp, 443/tcp   prickly_yonath
[root@server-test ~]# docker halt 656
docker: 'halt' is not a docker command.
See 'docker --help'.
[root@server-test ~]# docker stop 656
656
[root@server-test ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@server-test ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED              STATUS                     PORTS               NAMES
65641cb2e0f2        jwilder/nginx-proxy   "/app/docker-entrypoi"   About a minute ago   Exited (2) 4 seconds ago                       prickly_yonath
[root@server-test ~]# docker rm 656

After this the box crashes and I have to restart it. This is running Docker version 1.9.0, build 76d6bc9

These are the logs from the container on reboot:

[root@server-test ~]# docker logs 656
forego     | starting nginx.1 on port 5000
forego     | starting dockergen.1 on port 5100
dockergen.1 | 2015/11/10 18:56:37 Generated '/etc/nginx/conf.d/default.conf' from 1 containers
dockergen.1 | 2015/11/10 18:56:37 Watching docker events

@FlorinAsavoaie
Copy link
Contributor

I got the same issue but it seems to have something to do with the way the storage is being handled automatically on RHEL boxes:

If you don't have enough LVM or unpartitioned space, docker-storage-setup (which seems to run automatically if /var/lib/docker does not exist) will create some weird setup (I'm guessing some files in /var/lib/docker that get mounted as loop devices and stuck into the device mapper, or something like this).

Once I added a secondary disk to the VM, added DEVS="/dev/sdb" to /etc/sysconfig/docker-storage-setup, stopped docker, run docker-storage-setup and then start the docker service again, everything is peachy because it uses the thin LVM allocation method.

@gravyboat
Copy link

@FlorinAsavoaie Hmm, that's very interesting. I'll be interested to say what some of the Docker team says. This is a pretty big issue for me in a cloud environment where I'm spinning systems up and down. I don't really want to have to add disks to the system.

@FlorinAsavoaie
Copy link
Contributor

You don't have to add disks, just make sure you have enough space in the LVM Volume Groups so that Docker can use its default and production ready storage setup for RHEL, which is thin allocated LVM.

@thaJeztah
Copy link
Member

The default device mapper configuration uses loopback devices, and is not intended for production usage. We recently added more documentation about picking and setting up storage drivers, see: http://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/ and this page describes device mapper; http://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests