Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't create unix socket /var/run/docker.sock: is a directory #30348

Open
stuszynski opened this issue Jan 22, 2017 · 33 comments
Open

can't create unix socket /var/run/docker.sock: is a directory #30348

stuszynski opened this issue Jan 22, 2017 · 33 comments

Comments

@stuszynski
Copy link

stuszynski commented Jan 22, 2017

Description

Steps to reproduce the issue:

  1. Restart Docker

Describe the results you received:

Docker can't boot up after a restart. An error from journal:

dockerd[30701]: time="2017-01-22T08:38:55.077780858Z" level=fatal msg="can't create unix socket /var/run/docker.sock: is a directory"

At this point /var/run/docker.sock is indeed, a directory. (wut?)

$ ls -lah /var/run/docker.sock
total 0
drwxr-xr-x  2 root root   40 Jan 22 08:37 .
drwxr-xr-x 30 root root 1.2K Jan 22 08:38 ..

Describe the results you expected:

To restart Docker without an error

Additional information you deem important (e.g. issue happens only occasionally):

This happened from time to time upon restart of dockerd daemon. After I removed this directory manually, Docker boots up easily creating a socket, but after several restarts, this issue came back.

Output of docker version:

Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Wed Jan 11 00:23:16 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Wed Jan 11 00:23:16 2017
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 42
 Running: 20
 Paused: 0
 Stopped: 22
Images: 14
Server Version: 1.12.6
Storage Driver: overlay
 Backing Filesystem: xfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay bridge null host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-57-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.795 GiB
Name: <node-name>
ID: <id>
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS EC2 instance

@thaJeztah
Copy link
Member

Hm, what happened here is that when bind-mounting files or directories from the host, the host path is automatically created by docker if it doesn't exist (we tried deprecating that behavior, but there are many people relying on this; see #21666, and issues linked from that). If the path (/var/run/docker.sock in this case) does not exist, docker assumes it must be a directory, so creates a directory /var/run/docker.sock and bind-mounts that into the container. As an alternative, you could bind-mount /var/run into the container instead of the docker.sock.

I'm not sure though how the container / bind-mount could be created before the daemon was "up" (and the socket created).

ping @cpuguy83 any idea?

@cpuguy83
Copy link
Member

It's a race condition.
The http server is spun up separately from the daemon.

@ReSearchITEng
Copy link

We faced the same:

Feb 24 13:59:05 gitlabpet forward-journal[19091]: ......time="2017-02-24T13:59:05.589011765+02:00" level=error msg="Error unmounting device f6f1daa50eb8a24b36093101ea2e83392467e905aee88c4a69f56771acab2c6a: UnmountDevice: device not-mounted id f6f1daa50eb8a24b36093101ea2e83392467e905aee88c4a69f56771acab2c6a"
Feb 24 13:59:05 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:05.681093487+02:00" level=warning msg="Auto-creating non-existant volume host path /var/run/docker.sock, this is deprecated and will be removed soon"
Feb 24 13:59:05 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:05.984713491+02:00" level=warning msg="exit status 1"
Feb 24 13:59:06 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:06.176895311+02:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/26d235f17c94fc1208e2707c5476de248306493cb20a8f7c2692141ddc9d7c60/shm: invalid argument\nfailed to umount /var/lib/docker/containers/26d235f17c94fc1208e2707c5476de248306493cb20a8f7c2692141ddc9d7c60/mqueue: invalid argument"
Feb 24 13:59:06 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:06.176976428+02:00" level=error msg="Error unmounting device 26d235f17c94fc1208e2707c5476de248306493cb20a8f7c2692141ddc9d7c60: UnmountDevice: device not-mounted id 26d235f17c94fc1208e2707c5476de248306493cb20a8f7c2692141ddc9d7c60"
Feb 24 13:59:06 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:06.177169216+02:00" level=error msg="Failed to start container 26d235f17c94fc1208e2707c5476de248306493cb20a8f7c2692141ddc9d7c60: [8] System error: not a directory"
Feb 24 13:59:06 gitlabpet forward-journal[19091]:
Feb 24 13:59:06 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:06.486911942+02:00" level=info msg="Loading containers: done."
Feb 24 13:59:06 gitlabpet forward-journal[19091]: time="2017-02-24T13:59:06.486957473+02:00" level=fatal msg="is a directory"
Feb 24 13:59:06 gitlabpet systemd[1]: Started Docker Application Container Engine.

we tried to rm -rf /var/run/docker.sock, but while starting daemon, same thing happened again.
Finally, we had to rm -rf /var/run/docker.sock /var/lib/docker/containers .
FYI, it happened while doing some: "docker exec -it gitlab-runner gitlab-runner register"
(in parallel we were registering to RH satellite, but should not be related)

@dattatrayakumbhar
Copy link
Contributor

I have also observed this on my setup.
As part of solution, we can go with the solution as below:

  1. this issue usually happens when container is going to mount file instead of directory (might be due to reason like large files in directory, security )
    and it is not present on host, its get created as directory.
    we could provide option that types of file or directory which is going to mount.
    docker run -v /var/docker/docker.sock:/var/docker/docker.sock:ro:file busybox sh
    By default, it will be a directory.

    When it will be file and does not exist on host , then it wont be created on host. [We wont create that file or touch that as type instead of directory]
    It will get created for directory only as previous behaviour of docker.

@thaJeztah @cpuguy83 @mlaventure @stuszynski

@thaJeztah
Copy link
Member

Not creating the /var/run/docker.sock directory will still (probably) result in the container failing to start. We want to prevent adding more options to the shorthand -v syntax, as it's already quite overloaded, but the advanced --mount syntax (as is currently present for docker service) will find it's way into docker run, and wont automatically create host-directories.

For this case (see my earlier comment #30348 (comment)), would bind-mounting /var/run instead of /var/run/docker.sock help?

@xbglowx
Copy link

xbglowx commented Mar 14, 2017

I am also hitting this while upgrading from 1.12 to 17.03. My work around, added some code to docker's upstart init to remove the dir:

pre-start script
    if ! printf "%s" "$DOCKER_OPTS" | grep -qE -e '-H|--host'; then
            DOCKER_SOCKET=/var/run/docker.sock
    else
            DOCKER_SOCKET=$(printf "%s" "$DOCKER_OPTS" | grep -oP -e '(-H|--host)\W*unix://\K(\S+)' | sed 1q)
    fi
    test -d "$DOCKER_SOCKET" && rmdir "$DOCKER_SOCKET"`

@stuszynski
Copy link
Author

Hi. It's been a while since I posted this issue. In the meantime, we decided to use a systemd.socket to manage a /var/run/docker.sock for the docker.service and we didn't have any incident since.

@xbglowx
Copy link

xbglowx commented Mar 15, 2017

Disregard my upstart init mod. After talking more about this with @thaJeztah, bad things can happen with that approach. Instead, I am going to make sure any container that needs access to the docker socket, bind-mount /var/run instead of /var/run/docker.sock.

@thaJeztah
Copy link
Member

The script is probably "ok" to fix the daemon not starting, but won't prevent containers from running into issues. I haven't tried @stuszynski's approach, perhaps it's worth a try if you're running systemd.

@piontec
Copy link

piontec commented Mar 29, 2017

I confirm the same problem encountered with both 1.12.3 and 17.03. We already use systemd's socket activation for docker, but it doesn't help. We do have containers bind mounting /var/run/docker.sock, so it totally matches what @thaJeztah wrote. @cpuguy83 , can't you start HTTP server first, before any container starts, but return "503 Service Unavailable" until it's ready?

@leodotcloud
Copy link

I ran into the same issue this morning.

docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:40:33 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:40:33 2017
 OS/Arch:      linux/amd64
 Experimental: false

We do have containers with /var/run/docker.sock mounted.

All I did was service docker restart to see the following error messages in the logs.

tail -f /var/log/upstart/docker.log
can't create unix socket /var/run/docker.sock: is a directory
/var/run/docker.sock is up
can't create unix socket /var/run/docker.sock: is a directory
/var/run/docker.sock is up
can't create unix socket /var/run/docker.sock: is a directory
/var/run/docker.sock is up
can't create unix socket /var/run/docker.sock: is a directory
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

Manually removing the directory was the workaround for me.

rm -rf /var/run/docker.sock

clnperez pushed a commit to clnperez/moby that referenced this issue Jun 7, 2017
Don't create source directory while the daemon is being shutdown, fix moby#30348
@thaJeztah
Copy link
Member

Note that the pull-request that closed this issue (#33330) only prevents one possible reason a directory is created when bind-mounting /var/run/docker.sock. Bind-mounting individual files can still be troublesome, and there are other possible situations where /var/run/docker.sock is not yet available the moment a container starts (thus leading to a directory being created). #33330 should help preventing this though.

@sudo-bmitch
Copy link

I suspect most, myself included, are seeing the race condition on startup with containers that have a restart policy, rather than on shutdown as PR #33330 solves. Is there another issue to follow, should this one be reopened, or should a new issue be created for the other scenarios?

@thaJeztah
Copy link
Member

There's probably other issues mentioning this issue, but I can reopen for now. The remaining problem is that containers are started before the API is up (thus, the socket not being there), in which case starting the container creates the directory. Possibly @piontec's proposal (#30348 (comment)) should be investigated, if someone is interested to look into that possibility

@thaJeztah thaJeztah reopened this Jun 7, 2017
seemethere pushed a commit to seemethere/docker-ce that referenced this issue Jun 9, 2017
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>

(cherry picked from commit 7318eba)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
@ptitjes
Copy link

ptitjes commented Jun 18, 2017

Is there any workaround for this that I can use in the meantime ?
I tried manually removing the docker.sock directory, but every time I run a container with the socket bind-mount it recreates the directory... How can I force the socket creation ?
(Disc: I'm new to docker.)

@jkremser
Copy link

I saw the same issue and rebooting the Linux helped for me. removing the /var/run/docker.sock didn't work for me

andrewhsu pushed a commit to docker/docker-ce that referenced this issue Jun 24, 2017
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: 7318eba
Component: engine
@VTTR
Copy link

VTTR commented Mar 16, 2018

had the same issue on Docker version 17.06.2-ee-6, build e75fdb8
rm -rf /var/run/docker.sock worked for me

onap-github pushed a commit to onap/doc that referenced this issue Apr 3, 2019
* Update docs/submodules/oom/offline-installer.git from branch 'master'
  to 05b9001fa01b3f1076a5a21f063ca40421a66333
  - Merge "Improving docker restart handler"
  - Improving docker restart handler
    
    There is a bug in docker which leads to not properly
    shutdown service preventing subsequent startup.
    moby/moby#30348
    This commit is preventing this problem to appear.
    
    Change-Id: I29505610bd9954af01d73264e5414fdb2b9ac99d
    Issue-ID: OOM-1735
    Signed-off-by: Michal Ptacek <m.ptacek@partner.samsung.com>
@torwag
Copy link

torwag commented Sep 9, 2019

I am still facing this problem, and I am surprised there seems to be no solution yet (or I could not find it). This always catches me cold after an "almost finish, quickly to a reboot"maintenance session...

@stefanlasiewski
Copy link

@torwag I felt like this has been fixed. I haven't seen it for probably 18 months. Are you running a current version of Docker?

@torwag
Copy link

torwag commented Sep 12, 2019

@stefanlasiewski yes I do and I still face the problem. @kylefransham your workaround seems to work, but it comes with some drawback, as many of those containers, which need the docker.sock, have the path also added somewhere in their own config. Thus, it requires to change those configs too.
It really seems to be a race condition. Booting from an SSD might not help here, as the boot itself is quite fast and thus, things tend to get started quite quickly. I couldn't find a way to add delay or the order of the start-up of the containers as well.

@jiuchen1986
Copy link

We face the same issue these days with docker-ce 18.09.6. The thing is we are offering a kubernetes cluster with containerd as runtime, rather than docker, and before we setup docker on the host, users deploy pods mounting hostPath volume of /var/run/docker.sock in purpose of docker-in-container cases. We can manually remove the directory on the host and bring up docker daemon. However, I wonder there is still risk retriggering such issue.

@cpuguy83
Copy link
Member

@jiuchen1986 Absolutely the issue can be re-triggered.
For this not to cause an issue, kubernetes would need to use the "mounts" API which does not auto-create a directory for non-existent host paths... the flip side of this is if the socket doesn't exist then the requerst will error out.

silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this issue Feb 3, 2020
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>

(cherry picked from commit 7318eba)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this issue Feb 3, 2020
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>

(cherry picked from commit 7318eba)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
@torwag
Copy link

torwag commented Feb 28, 2020

Today I figured out that I had two odd settings in my system, which I was not aware of.

  1. /var/run was not (as usual) a symlink of /run
  2. I did not enable docker.sock but directly started docker.service

The second point might contribute to the race condition. If docker.sock is enabled systemd will take care of the socket even before dockerd and any container is active. Thus, the socket is there already and there is no chance that a container creates a folder instead.

systemctl enable docker.socket
systemctl start docker.socket
systemctl enable docker.service
systemctl start docker.service

the standard service and sock files are set-up so that docker.service will get started AFTER docker.socket.

The first point was noticed as the docker.socket file pointed to /run/docker.sock, which is ok, if /var/run is just a symlink of /run. If not dockerd and all the container looking for /var/run/docker.sock (the standard path) will not find it.

I stopped all services, renamed the original folder
mv /var/run/ /var/run_delete_me
and symlinked

cd /var/
ln -s ../run

After a reboot all dynamic content was placed into /run and I could delete

rm -rf /var/run_delete_me

Now I have no problem with the socket creation anymore at boot time. Hope this helps others as well.

silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this issue Mar 10, 2020
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>

(cherry picked from commit 7318eba)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this issue Mar 16, 2020
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: 7318eba
Component: engine
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this issue Mar 23, 2020
… #30348

If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.

fix issue moby/moby#30348

To reproduce this issue, you can add following code

```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
        "sort"
        "strconv"
        "strings"
+       "time"

        "github.com/Sirupsen/logrus"
        "github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
        if err := daemon.setupIpcDirs(c); err != nil {
                return nil, err
        }
-
+       fmt.Printf("===please stop the daemon===\n")
+       time.Sleep(time.Second * 2)
        ms, err := daemon.setupMounts(c)
        if err != nil {
                return nil, err

```

step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #

```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log

The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.

Signed-off-by: Lei Jitang <leijitang@huawei.com>

(cherry picked from commit 7318eba)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
@loretoparisi
Copy link

loretoparisi commented Jun 12, 2020

@thaJeztah I have tried to

rm -rf /var/run/docker.sock
rm -rf /var/lib/docker/containers

but I get

rm: cannot remove '/var/lib/docker/containers/05f727bf4221a0391e4a6cbb241d74683b44ed680fe4137b9fb18b48e2e11b45/shm': Device or resource busy
rm: cannot remove '/var/lib/docker/containers/6bf6a35be947717e8bd44431e913cc1e87d50f1f35f1a9226c28185ca5ca62c7/shm': Device or resource busy
rm: cannot remove '/var/lib/docker/containers/9a33205820543629a2dcbad48d4b2b7dbb0147de220ad869667985c782008c0f/shm': Device or resource busy
rm: cannot remove '/var/lib/docker/containers/d9269ae82916af6fbce72637dc545d756664f18eeb1ff06ff863bbdabb229222/shm': Device or resource busy

@thaJeztah
Copy link
Member

You can't remove those if docker is running (or containers) are running (which could be if the daemon has live-restore enabled. I'd also not recommend to only remove the /containers subdirectory, because that could mean that other files inside /var/lib/docker that keep track of state now no longer match with what's there.

To fix the issue with /var/run/docker.sock if you ended up in a situation where it was created as directory, first stop docker (sudo systemctl stop docker) if it's running, then sudo rm -r /var/run/docker.sock. If you want to remove _all_ docker things (containers, images, volumes, networks, etc), you could sudo systemctl stop docker, then sudo rm -rf /var/lib/docker`. But (as said) that removes all your docker data.

@loretoparisi
Copy link

loretoparisi commented Jun 17, 2020

@thaJeztah Thank you, I think I have identified the cause. The problem seems to be this

$ sudo -i service docker status
docker start/running, process 10847
$ docker info
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

So if I try docker-compose ...up -d I get the error:

$ docker-compose -f docker-compose.yml up -d
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?

then if I try

$ sudo dockerd
INFO[0000] libcontainerd: previous instance of containerd still alive (3097) 
WARN[0000] failed to rename /mnt/docker/tmp for background deletion: %!s(<nil>). Deleting synchronously 
INFO[0000] [graphdriver] using prior storage driver: overlay2 
WARN[0000] libcontainerd: unknown container 22731aae271f9e380def120714ff025df434c96ab189221bdb93241b59fff077 
WARN[0000] libcontainerd: unknown container 22731aae271f9e380def120714ff025df434c96ab189221bdb93241b59fff077 
Error starting daemon: error while opening volume store metadata database: timeout

So I tried to clean up a bit any previous instances:

ps axf | grep docker | grep -v grep | awk '{print "kill -9 " $1}' | sudo sh 

and now I get

$ sudo dockerd
INFO[0000] libcontainerd: new containerd process, pid: 25085 
WARN[0001] failed to rename /mnt/docker/tmp for background deletion: %!s(<nil>). Deleting synchronously 
INFO[0001] [graphdriver] using prior storage driver: overlay2 
INFO[0001] Graph migration to content-addressability took 0.00 seconds 
WARN[0001] Your kernel does not support cgroup rt period 
WARN[0001] Your kernel does not support cgroup rt runtime 
INFO[0001] Loading containers: start.                   
ERRO[0001] Failed to load container 05f727bf4221a0391e4a6cbb241d74683b44ed680fe4137b9fb18b48e2e11b45: open /mnt/docker/containers/05f727bf4221a0391e4a6cbb241d74683b44ed680fe4137b9fb18b48e2e11b45/config.v2.json: no such file or directory 
...
ERRO[0002] get ubuntu_fs-17368c5e: no such volume       
ERRO[0002] get ubuntu_fs-17368c5e: no such volume       
ERRO[0002] get ubuntu_fs-17368c5e: no such volume       
INFO[0002] Loading containers: done.                    
INFO[0002] Daemon has completed initialization          
INFO[0002] Docker daemon                                 commit=89658be graphdriver=overlay2 version=17.05.0-ce
INFO[0002] API listen on /var/run/docker.sock  

Now docker info is ok

$ docker info
Containers: 9
 Running: 1
 Paused: 0
 Stopped: 8
Images: 219
Server Version: 17.05.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: efs local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init

but still docker-compose up -d hangs.

@thaJeztah
Copy link
Member

but still docker-compose up -d hangs.

Does it work if you run with sudo ? A similar error may also be printed if compose doesn't have permissions for the docker socket.

What version of compose are you running, and what version do you have specified in your docker-compose file? I see you're running a really old version of docker (17.05) that reached EOL three years ago; I'd highly recommend upgrading to a more current version as that version has known, unpatched vulnerabilities

@lifeofguenter
Copy link

I got hit with this issue while upgrading from 20.10.8 to 20.10.9 - so during a routine update.

The only container that mounted /var/run/docker.sock was the datadog-agent. For now I am going to remove that container again as its not worthy the trouble.

Not sure how to solve this, but it seems strange that the containers in "restart-mode" would start before the docker-daemon is listening to the socket?

Maybe something can be built for docker to wait for that prior to launching containers? Or add the workaround mentioned by @stuszynski in the upstream packaging :)

@thaJeztah
Copy link
Member

it seems strange that the containers in "restart-mode" would start before the docker-daemon is listening to the socket?

The daemon and API are separate bits in the code; the daemon may be up, but the API not yet listening. There's also scenarios (e.g. the live-restore option) where containers can continue running during daemon restarts.

Or add the workaround mentioned by @stuszynski in the upstream packaging :)

The systemd socket approach (#30348 (comment)) is already in use in all current versions of docker: https://github.com/docker/docker-ce-packaging/blob/a5db88ae1a64189e79d97f780f91e5c852d0ef3f/systemd/docker.service#L6-L13

The default is for the docker daemon to use -H fd:// (which is the file-descriptor of the socket that's created by systemd before the docker service starts).

There may be one fix for the systemd unit file related to this, that hasn't shipped yet; docker/docker-ce-packaging#575 (not sure if it addresses this particular issue, but might help)

@PythonCoderAS
Copy link

I just ran into this issue today, seems that I had a container use the /var/run/docker.sock file as well. Are there any plans to fix this?

@match919
Copy link

match919 commented Dec 1, 2022

Hm, what happened here is that when bind-mounting files or directories from the host, the host path is automatically created by docker if it doesn't exist (we tried deprecating that behavior, but there are many people relying on this; see #21666, and issues linked from that). If the path (/var/run/docker.sock in this case) does not exist, docker assumes it must be a directory, so creates a directory /var/run/docker.sock and bind-mounts that into the container. As an alternative, you could bind-mount /var/run into the container instead of the docker.sock.

I'm not sure though how the container / bind-mount could be created before the daemon was "up" (and the socket created).

ping @cpuguy83 any idea?

When I mount /var/run instead of /var/run/docker.sock the docker commands work in the container, however this variable does not appear to give the container's host ip anymore: host.docker.internal
But when I mount the docker.sock file again that variable is set properly to the host ip where the container resides.

@anonhostpi
Copy link

anonhostpi commented Aug 9, 2023

Relevant StackOverflow answer that can land you here on this bug page: https://stackoverflow.com/a/62209937/10534510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests