Removing docker container causing lock up #3610

benbjohnson · 2014-01-15T19:55:41Z

Note: This has only occurred once but I thought I'd report it in case others experience the same issue.

I tried to run docker kill against a container and it wouldn't stop so I tried docker stop but that didn't work either. Finally, I tried running docker rm which reported that it couldn't remove a running container and then soon after it caused the whole SSH session to freeze. At that point the box was inaccessible via any new SSH connections.

Here's a log from the box:

https://gist.github.com/snormore/20c261a4166d8049e238

And the docker version and info:

$ docker version
Client version: 0.7.5
Go version (client): go1.2
Git commit (client): c348c04

$ docker info
Containers: 5
Images: 27
Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 37

/cc: @snormore

The text was updated successfully, but these errors were encountered:

burke · 2014-01-16T19:17:04Z

@graemej and I just had a similar issue. We were working on a new Dockerfile:

Step 14 : ADD datadog_send_event /usr/local/bin/datadog_send_event
 ---> Using cache
 ---> 1187fdb27fd1
Step 15 : RUN sysctl kernel.msgmax | awk '{ if ($3 < 1048576)  { print "kernel.msgmax must be at least 1048576, but is " $3 ; exit 1 } }'
 ---> Running in ba02cf799603
2014/01/16 19:11:09 unexpected EOF
make: *** [build] Error 1

This was immediately after restarting dockerd. After this, we ran docker ps which hung. We can not establish new SSH connections to the box. Presumably we have to reboot.

lgs · 2014-01-22T13:10:54Z

... it seems something spreading may facets:

https://github.com/search?q=%22remove+containers%22+docker&type=Issues&ref=searchresults

benders · 2014-01-24T01:01:27Z

This may be the same underlying issue as the problem @poe and I reported in #3744 . The signature of that issue is that we see BUG: soft lockup in the Ubuntu kern.log. In our case the system crash was generally preceded by restarting a docker daemon to try and resolve some other issue (stuck image pull, etc). We have been aggressively trying to isolate the issue, but so far we don't have much more to report.

It would be useful to know if others experiencing host crashes are also seeing the "soft lockup" message and what kernel version and hardware it is occurring on.

crosbymichael · 2014-03-28T00:09:31Z

Could you all post your kernel version, os, docker info, docker version...

graemej · 2014-03-28T14:36:16Z

The environment that @burke and I are using is:

uname -a

Linux hostname 3.8.0-35-generic #50~precise1-Ubuntu SMP Wed Dec 4 17:25:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/issue

Ubuntu 12.04.3 LTS \n \l

docker info

Containers: 37
Images: 355
Driver: aufs
 Root Dir: /u/lib/docker/aufs
 Dirs: 442
WARNING: No swap limit support

docker version

Client version: 0.8.0
Go version (client): go1.2
Git commit (client): cc3a8c8
Server version: 0.8.0
Git commit (server): cc3a8c8
Go version (server): go1.2

rlpowell · 2014-05-24T19:36:28Z

I have a docker container that cannot be removed:

rlpowell@ip-10-0-1-16> sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS                     PORTS               NAMES
76863037856d        rlpowell/packer:base   /sbin/docker.init   14 hours ago        Exited (-1) 14 hours ago                       jovial_morse
rlpowell@ip-10-0-1-16>

When I try, this happens in messages:

May 23 23:44:49 ip-10-0-1-16 docker: 2014/05/23 23:44:49 GET /v1.11/containers/json?all=1
May 23 23:44:49 ip-10-0-1-16 docker: [ead23f20] +job containers()
May 23 23:44:49 ip-10-0-1-16 docker: [ead23f20] -job containers() = OK (0)
May 23 23:45:29 ip-10-0-1-16 docker: 2014/05/23 23:45:29 DELETE /v1.11/containers/512a3c4e707d
May 23 23:45:29 ip-10-0-1-16 docker: [ead23f20] +job container_delete(512a3c4e707d)
May 23 23:45:29 ip-10-0-1-16 kernel: [  775.076989] bio: create slab <bio-2> at 2
May 23 23:45:31 ip-10-0-1-16 systemd-udevd: inotify_add_watch(7, /dev/dm-3, 10) failed: No such file or directory

And then the machine totally, irrevcably, hangs. Completely, no response of any kind. ssh sessions won't even disconnect.

Note that the machine running docker is an EC2 VM.

It may come back later, but I've waited more than an hour.

rlpowell@ip-10-0-1-16> sudo docker version
Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99/0.11.1
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99/0.11.1
Go version (server): go1.2.1
Last stable version: 0.11.1
rlpowell@ip-10-0-1-16> sudo docker info
Containers: 1
Images: 8
Storage Driver: devicemapper
 Pool Name: docker-253:0-33556480-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 15140.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 8.1 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.11.10-301.fc20.x86_64
rlpowell@ip-10-0-1-16>

rlpowell · 2014-05-24T19:37:41Z

So, uhh, now that I've given my report, how do I get around this? I literally can't remove this container and I very much want to; is there way for me to manually remove it?

rlpowell · 2014-05-24T19:43:48Z

I fixed it like so, just in case it helps anybody else:

  sudo mv /tmp/packer-* /var/tmp

  sudo sh -c "mv --backup=t /var/lib/docker/containers/76863037856d0e16051baf71b301f8ee2b634016975bd3d2a14e0b43068d3836 /var/lib/docker/devicemapper/mnt/76863037856d0e16051baf71b301f8ee2b634016975bd3d2a14e0b43068d3836* /var/tmp"

  sudo service docker restart

(that first line is only relevant if you're using packer, and probably isn't
essentialy even then)

rlpowell · 2014-05-24T22:43:22Z

So it just happened again. Here's output from my packer/docker run:

    docker: Notice: /Stage[main]/Users/Users::User[chris]/Users::Homedirsetup[chris]/Exec[expand home chris]/returns: executed successfully
    docker: Notice: /Stage[main]/Auth/Exec[fix auth-ac1]/returns: executed successfully
    docker: Notice: /Stage[main]/Users/Users::User[emilio]/Users::Homedirsetup[emilio]/Exec[expand home emilio]/returns: executed successfully
    docker: Notice: /Stage[main]/Users/Users::User[geoff]/User[geoff]/password: created password
    docker: Notice: /Stage[main]/Users/Users::User[geoff]/Users::Homedirsetup[geoff]/Exec[expand home geoff]/returns: executed successfully
    docker: Notice: /Stage[main]/Httpd/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
    docker: Info: /Service[httpd]: Unscheduling refresh on Service[httpd]
    docker: Notice: Finished catalog run in 46.22 seconds
==> docker: Exporting the container
==> docker: Error exporting: exit status 1
==> docker: Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device
==> docker:
==> docker: Killing the container: d61ba6c897b6130bf531bd3f8c3a37230e3d75751541c0068afd51e1ddf01ada
Build 'docker' errored: Error exporting: exit status 1
Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device


==> Some builds didn't complete successfully and had errors:
--> docker: Error exporting: exit status 1
Stderr: 2014/05/24 13:07:29 write /dev/stdout: no space left on device


==> Builds finished but no artifacts were created.

I have no idea what it's talking about ; /dev/stdout isn't a thing that can be
out of space? The smallest free partition at the time had 2.3GiB free, but
/var/lib/docker had 69GiB free, and /tmp/ had 2.5GiB free.

And now trying to rm that container is hanging the entire VM again.

rlpowell · 2014-05-26T19:01:24Z

This has happened several more times. -_-

rlpowell · 2014-05-26T19:37:07Z

Additional information:

I think, but am not sure, that this started happening after I moved /var/lib/docker to being NFS-based. So if docker relies on NFS operations, that may be relevant.

My basic process here is to create my "base" image, and then use the base image to make my "tomcat" and "mysql" images. Then I run a mysql and a tomcat, then I shut them all down, then I rm everything.

The VM that hangs the kernel seems to normally be the base image that was used to make the tomcat image. Dunno if that matters, but thought I'd mention.

crosbymichael · 2014-05-27T18:23:20Z

@rlpowell I don't think we support the docker root on NFS. Aufs and other CoW filesystems do not play well on NFS.

unclejack · 2014-07-09T15:19:32Z

This issue has been opened a very long time ago and many Docker versions have been released since.
The latest occurrence of this issue seems to have something to do with Docker being used on NFS. That's an unsupported setup.

If you happen to run into this issue with a supported setup and an up to date version of Docker, please comment and the issue will be opened again.

gravyboat · 2015-11-10T19:01:09Z

@unclejack I'm getting this on CentOS 7 over at DigitalOcean (This does not happen locally on my Ubuntu 14 machine running Docker version 1.8.3, build f4bf5c7):

[root@server-test ~]# docker run -d -p 80:80 -v /var/run/docker.sock:/tmp/docker.sock:ro jwilder/nginx-proxy
65641cb2e0f2365eb139df9d11a6d558521c304bfad1520a408cac240ae8db12
[root@server-test ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                         NAMES
65641cb2e0f2        jwilder/nginx-proxy   "/app/docker-entrypoi"   4 seconds ago       Up 3 seconds        0.0.0.0:80->80/tcp, 443/tcp   prickly_yonath
[root@server-test ~]# docker halt 656
docker: 'halt' is not a docker command.
See 'docker --help'.
[root@server-test ~]# docker stop 656
656
[root@server-test ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@server-test ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED              STATUS                     PORTS               NAMES
65641cb2e0f2        jwilder/nginx-proxy   "/app/docker-entrypoi"   About a minute ago   Exited (2) 4 seconds ago                       prickly_yonath
[root@server-test ~]# docker rm 656

After this the box crashes and I have to restart it. This is running Docker version 1.9.0, build 76d6bc9

These are the logs from the container on reboot:

[root@server-test ~]# docker logs 656
forego     | starting nginx.1 on port 5000
forego     | starting dockergen.1 on port 5100
dockergen.1 | 2015/11/10 18:56:37 Generated '/etc/nginx/conf.d/default.conf' from 1 containers
dockergen.1 | 2015/11/10 18:56:37 Watching docker events

FlorinAsavoaie · 2015-11-10T21:40:05Z

I got the same issue but it seems to have something to do with the way the storage is being handled automatically on RHEL boxes:

If you don't have enough LVM or unpartitioned space, docker-storage-setup (which seems to run automatically if /var/lib/docker does not exist) will create some weird setup (I'm guessing some files in /var/lib/docker that get mounted as loop devices and stuck into the device mapper, or something like this).

Once I added a secondary disk to the VM, added DEVS="/dev/sdb" to /etc/sysconfig/docker-storage-setup, stopped docker, run docker-storage-setup and then start the docker service again, everything is peachy because it uses the thin LVM allocation method.

gravyboat · 2015-11-10T21:57:22Z

@FlorinAsavoaie Hmm, that's very interesting. I'll be interested to say what some of the Docker team says. This is a pretty big issue for me in a cloud environment where I'm spinning systems up and down. I don't really want to have to add disks to the system.

FlorinAsavoaie · 2015-11-11T07:37:45Z

You don't have to add disks, just make sure you have enough space in the LVM Volume Groups so that Docker can use its default and production ready storage setup for RHEL, which is thin allocated LVM.

thaJeztah · 2015-11-11T12:54:16Z

The default device mapper configuration uses loopback devices, and is not intended for production usage. We recently added more documentation about picking and setting up storage drivers, see: http://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/ and this page describes device mapper; http://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/

unclejack closed this as completed Jul 9, 2014

gravyboat mentioned this issue Nov 10, 2015

dockerng fails to start container as the container 'does not exist' saltstack/salt#28728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing docker container causing lock up #3610

Removing docker container causing lock up #3610

benbjohnson commented Jan 15, 2014

burke commented Jan 16, 2014

lgs commented Jan 22, 2014

benders commented Jan 24, 2014

crosbymichael commented Mar 28, 2014

graemej commented Mar 28, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 26, 2014

rlpowell commented May 26, 2014

crosbymichael commented May 27, 2014

unclejack commented Jul 9, 2014

gravyboat commented Nov 10, 2015 •

edited

Loading

FlorinAsavoaie commented Nov 10, 2015

gravyboat commented Nov 10, 2015

FlorinAsavoaie commented Nov 11, 2015

thaJeztah commented Nov 11, 2015

Removing docker container causing lock up #3610

Removing docker container causing lock up #3610

Comments

benbjohnson commented Jan 15, 2014

burke commented Jan 16, 2014

lgs commented Jan 22, 2014

benders commented Jan 24, 2014

crosbymichael commented Mar 28, 2014

graemej commented Mar 28, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 24, 2014

rlpowell commented May 26, 2014

rlpowell commented May 26, 2014

crosbymichael commented May 27, 2014

unclejack commented Jul 9, 2014

gravyboat commented Nov 10, 2015 • edited Loading

FlorinAsavoaie commented Nov 10, 2015

gravyboat commented Nov 10, 2015

FlorinAsavoaie commented Nov 11, 2015

thaJeztah commented Nov 11, 2015

gravyboat commented Nov 10, 2015 •

edited

Loading