Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Daemon Stuck in Activating State #41266

Open
MarthinusKruger opened this issue Jul 27, 2020 · 8 comments
Open

Docker Daemon Stuck in Activating State #41266

MarthinusKruger opened this issue Jul 27, 2020 · 8 comments

Comments

@MarthinusKruger
Copy link

Description
When starting up Docker daemon (dockerd process) it just hangs indefinitely. I have tried uninstalling and re-install all the Docker packages, but this did not make any difference. When I did the uninstall I completely removed the /var/lib/docker so that there are no Docker objects to reference (simulate a clean install). When trying to execute any CLI commands (such as docker system info or docker version) it also just hangs indefinitely.

The command that I have also tried running by hand:
/usr/bin/dockerd -H unix:// --debug

Attached is the strace output of the above, as well as the Go stack dump.

goroutine-stacks-2020-07-27T061805Z.log
dockerd_strace.log

Package versions:

containerd.io                         1.2.13-2
docker-ce                             5:19.03.12~3-0~debian-stretch
docker-ce-cli                         5:19.03.12~3-0~debian-stretch

Steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

The full output I get from the dockerd process is:

time="2020-07-27T06:10:12.846139718Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2020-07-27T06:10:12.846159061Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2020-07-27T06:10:12.846183587Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] <nil>}" module=g
time="2020-07-27T06:10:12.846216219Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2020-07-27T06:10:12.849226465Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2020-07-27T06:10:12.849259129Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2020-07-27T06:10:12.849301266Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] <nil>}" module=g
time="2020-07-27T06:10:12.849322655Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2020-07-27T06:10:12.850486196Z" level=debug msg="Using default logging driver json-file"
time="2020-07-27T06:10:12.850581692Z" level=debug msg="[graphdriver] priority list: [btrfs zfs overlay2 aufs overlay devicemapper vfs]"
time="2020-07-27T06:10:12.852610887Z" level=debug msg="processing event stream" module=libcontainerd namespace=plugins.moby
time="2020-07-27T06:10:12.874835068Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, indexOff=\"\"" storage-driver=overlay2
time="2020-07-27T06:10:12.874891505Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
time="2020-07-27T06:10:12.874912894Z" level=debug msg="Initialized graph driver overlay2"
time="2020-07-27T06:10:12.879519240Z" level=warning msg="Your kernel does not support swap memory limit"
time="2020-07-27T06:10:12.879549732Z" level=warning msg="Your kernel does not support cgroup rt period"
time="2020-07-27T06:10:12.879560474Z" level=warning msg="Your kernel does not support cgroup rt runtime"
time="2020-07-27T06:10:12.879797240Z" level=debug msg="Max Concurrent Downloads: 3"
time="2020-07-27T06:10:12.879840018Z" level=debug msg="Max Concurrent Uploads: 5"
time="2020-07-27T06:10:12.879860026Z" level=info msg="Loading containers: start."
time="2020-07-27T06:10:12.879939431Z" level=debug msg="Option Experimental: false"
time="2020-07-27T06:10:12.879957567Z" level=debug msg="Option DefaultDriver: bridge"
time="2020-07-27T06:10:12.879971501Z" level=debug msg="Option DefaultNetwork: bridge"
time="2020-07-27T06:10:12.879983213Z" level=debug msg="Network Control Plane MTU: 1500"
time="2020-07-27T06:10:12.880058411Z" level=debug msg="processing event stream" module=libcontainerd namespace=moby
## STUCK - NO FURTHER OUTPUT ##

No mention that the docker.sock is up and running.

Describe the results you expected:

I am expecting the following output:

time="2020-07-23T12:36:22.460761175+02:00" level=warning msg="Your kernel does not support cgroup rt period"
time="2020-07-23T12:36:22.461220454+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
time="2020-07-23T12:36:22.461868836+02:00" level=info msg="Loading containers: start."
time="2020-07-23T12:36:24.305786140+02:00" level=info msg="Removing stale sandbox 11a3431395c194695d25ea9aab8f252f9f75fffb3d
time="2020-07-23T12:36:24.315296774+02:00" level=warning msg="Error (Unable to complete atomic operation, key modified) dele
time="2020-07-23T12:36:26.510183475+02:00" level=info msg="Loading containers: done."
time="2020-07-23T12:36:26.560471549+02:00" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=
time="2020-07-23T12:36:26.570024021+02:00" level=info msg="Daemon has completed initialization"
time="2020-07-23T12:36:26.705631188+02:00" level=info msg="API listen on /var/run/docker.sock"

From the above it passed the Loading containers: start step and continued until API listen on....

Additional information you deem important (e.g. issue happens only occasionally):

Kernel and OS version:
Linux developer-tools 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1+deb9u1 (2020-06-07) x86_64 GNU/Linux

Output of docker version:

As per the issue, I can't get the output as the dockerd process hasn't started completely.

N/A

Output of docker info:

As per the issue, I can't get the output as the dockerd process hasn't started completely.

N/A

Additional environment details (AWS, VirtualBox, physical, etc.):

This is a VM instance on the Google Cloud Platform.

@thaJeztah
Copy link
Member

Haven't looked closely yet, but I see some entries about nvidia hooks in the stace output; did you previously have nvidia-docker installed?

newfstatat(AT_FDCWD, "/usr/local/sbin/nvidia-container-runtime-hook", 0xc0007962a8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/nvidia-container-runtime-hook", 0xc000796378, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/nvidia-container-runtime-hook", 0xc000796448, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/nvidia-container-runtime-hook", 0xc000796518, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/nvidia-container-runtime-hook", 0xc0007965e8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/nvidia-container-runtime-hook", 0xc0007966b8, 0) = -1 ENOENT (No such file or directory)

Could be a red herring, but just in case

@MarthinusKruger
Copy link
Author

Haven't looked closely yet, but I see some entries about nvidia hooks in the stace output; did you previously have nvidia-docker installed?

newfstatat(AT_FDCWD, "/usr/local/sbin/nvidia-container-runtime-hook", 0xc0007962a8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/nvidia-container-runtime-hook", 0xc000796378, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/nvidia-container-runtime-hook", 0xc000796448, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/nvidia-container-runtime-hook", 0xc000796518, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/nvidia-container-runtime-hook", 0xc0007965e8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/nvidia-container-runtime-hook", 0xc0007966b8, 0) = -1 ENOENT (No such file or directory)

Could be a red herring, but just in case

I haven't installed it no. Not sure if it might be seen as a package dependency? But I doubt it.

@thaJeztah
Copy link
Member

Might be a red herring then; possibly there's code to detect if it's installed

@akerouanton
Copy link
Member

@thaJeztah The stacktrace for goroutine 1 (executing daemon.NewDaemon()) looks like this:

goroutine 1 [chan receive, 5 minutes]:
github.com/docker/docker/vendor/github.com/godbus/dbus.(*Object).Call(...)
...
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.FirewalldInit(...)
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.initFirewalld()
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.initDependencies()
...
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.Init(...)
github.com/docker/docker/vendor/github.com/docker/libnetwork/drvregistry.(*DrvRegistry).AddDriver(...)
...
github.com/docker/docker/daemon.(*Daemon).initNetworkController(...)
...
github.com/docker/docker/daemon.NewDaemon(...)
...

(*Object).Call() seems to be a blocking call. And it looks like goroutine 102 is stuck while waiting for a dbus connection:

goroutine 102 [IO wait, 5 minutes]:
internal/poll.runtime_pollWait(...)
internal/poll.(*pollDesc).wait(...)
internal/poll.(*pollDesc).waitRead(...)
created by github.com/docker/docker/vendor/github.com/godbus/dbus.(*Conn).Auth
        /go/src/github.com/docker/docker/vendor/github.com/godbus/dbus/auth.go:118 +0x623

I think that's why the daemon is stuck in init phase. @MarthinusKruger Do you have dbus installed on your server?

@MarthinusKruger
Copy link
Author

@thaJeztah The stacktrace for goroutine 1 (executing daemon.NewDaemon()) looks like this:

goroutine 1 [chan receive, 5 minutes]:
github.com/docker/docker/vendor/github.com/godbus/dbus.(*Object).Call(...)
...
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.FirewalldInit(...)
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.initFirewalld()
github.com/docker/docker/vendor/github.com/docker/libnetwork/iptables.initDependencies()
...
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/bridge.Init(...)
github.com/docker/docker/vendor/github.com/docker/libnetwork/drvregistry.(*DrvRegistry).AddDriver(...)
...
github.com/docker/docker/daemon.(*Daemon).initNetworkController(...)
...
github.com/docker/docker/daemon.NewDaemon(...)
...

(*Object).Call() seems to be a blocking call. And it looks like goroutine 102 is stuck while waiting for a dbus connection:

goroutine 102 [IO wait, 5 minutes]:
internal/poll.runtime_pollWait(...)
internal/poll.(*pollDesc).wait(...)
internal/poll.(*pollDesc).waitRead(...)
created by github.com/docker/docker/vendor/github.com/godbus/dbus.(*Conn).Auth
        /go/src/github.com/docker/docker/vendor/github.com/godbus/dbus/auth.go:118 +0x623

I think that's why the daemon is stuck in init phase. @MarthinusKruger Do you have dbus installed on your server?

It is installed on the server:

dpkg -l | grep dbus
dbus                                  1.10.28-0+deb9u1

ps -ef | grep dbus | grep -v grep
message+   559     1  0 08:51 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

@akerouanton
Copy link
Member

akerouanton commented Jul 27, 2020

It is installed on the server

Is there any particular reason to have installed it? Otherwise I'd recommend to uninstall.

@MarthinusKruger
Copy link
Author

It is installed on the server

Is there any particular reason to have installed it? Otherwise I'd recommend to uninstall.

I'm not quite sure, I will confirm with our OS team and if we can uninstall it I will do so and try to start dockerd again and revert back with the new findings and logs.

@MarthinusKruger
Copy link
Author

It is installed on the server

Is there any particular reason to have installed it? Otherwise I'd recommend to uninstall.

I'm not quite sure, I will confirm with our OS team and if we can uninstall it I will do so and try to start dockerd again and revert back with the new findings and logs.

Just to update on this issue. It seems that when we restarted the dbus service the dockerd process could start up. But we started seeing some other OS-related issues (as well as issues with Docker containers starting up) so we will just go ahead and do a clean install again.

Thanks @akerouanton and @thaJeztah for your assistance! It really helped point us into the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants