Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstart: docker started event fires before /var/run/docker.sock exists #6647

Closed
jperville opened this issue Jun 24, 2014 · 11 comments · Fixed by #9287
Closed

upstart: docker started event fires before /var/run/docker.sock exists #6647

jperville opened this issue Jun 24, 2014 · 11 comments · Fixed by #9287

Comments

@jperville
Copy link
Contributor

I am running lxc-docker version 1.0.1 on ubuntu 14.04 amd64 host.

My upstart scripts randomly (but very often) fail to restart their containers after booting the docker host. The following message can be found in the upstart log:

Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
2014/06/24 15:39:22 Error: failed to start one or more containers
Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
2014/06/24 15:39:22 Error: failed to wait one or more containers

This issue should have been fixed in #4168 which made docker listen to its socket as early as possible.

Is there a way to have the docker upstart script wait for the socket to appear before emitting the started event that the container services rely on?

Reproducing the issue

Boot a VM from the following Vagrantfile:

Vagrant.configure('2') do |config|

  config.vm.define :ubuntu1404 do |ubuntu1404|
    ubuntu1404.vm.box      = 'opscode-ubuntu-14.04'
    ubuntu1404.vm.box_url  = 'http://opscode-vm-bento.s3.amazonaws.com/vagrant/virtualbox/opscode_ubuntu-14.04_chef-provisionerless.box'
    ubuntu1404.vm.hostname = "docker-ubuntu-1404"
  end

  config.vm.provider 'virtualbox' do |vbox|
    vbox.customize ['modifyvm', :id, '--memory', 1024]
  end
end

vagrant ssh and wget -qO- https://get.docker.io/ubuntu/ | sudo bash -s

Create the following upstart service, save it as /etc/init/docker-socket.conf:

# vi /etc/init/docker-socket.conf 
description "Reproduces docker socket issue"
author "Julien Pervillé"

start on filesystem and started docker
stop on runlevel [!2345]

script
  echo "`date '+%Y/%m/%d %H:%M:%S'` NOW `ls -l /var/run/docker.sock 2>&1`"
end script

pre-start script
  # Wait for docker to finish starting up first.
  echo "`date '+%Y/%m/%d %H:%M:%S'` PRE `ls -l /var/run/docker.sock 2>&1`"
end script

Run shutdown -r now, log on the VM again and check the content of /var/log/upstart/docker-socket.log:

vagrant@docker-ubuntu-1404:~$ sudo cat /var/log/upstart/docker-socker.log 
2014/06/24 17:06:56 PRE ls: cannot access /var/run/docker.sock: No such file or directory
2014/06/24 17:06:56 NOW ls: cannot access /var/run/docker.sock: No such file or directory

If the docker socket is not present when this test upstart service runs, it also means that container services that will also start on started docker will also fail because the docker client won't be able to communicate with the docker server listening on the socket.

@tiborvass
Copy link
Contributor

ping @tianon ?

@tianon
Copy link
Member

tianon commented Jun 24, 2014

The only way I know of that we could really fix this would be having Docker itself daemonize after it creates the socket, which currently isn't possible in Go. There might be some upstart-specific signal magic we could do, but I'm not fluent enough in upstart magic to say for sure there.

@jperville
Copy link
Contributor Author

I see several ugly workarounds:

  • have my upstart services sleep for a second in 'pre-start', this should be enough for most cases
  • have my upstart services wait for the socket to be available in pre-start (eg. with a loop or inotifywait)
  • wrap /usr/bin/docker to retry with exponential backoff when action is start, run, build etc.

The latter workaround would probably be the most efficient if the retry code was included in the docker binary itself and active for client commands only. For now, I will experiment with the first workaround and submit a pull request to bflad/chef-docker.

@jperville
Copy link
Contributor Author

The workaround of waiting for the socket to be available in 'pre-start' seems to do the trick. See below for a sample service with the workaround.

description "Docker service for my-volatile-app"
author "Docker Chef Cookbook"

start on filesystem and started docker
stop on runlevel [!2345]

# We don't want to TERM the `docker wait` process so we fake the signal 
# we send to it. The pre-stop script issues the `docker stop` command
# which causes the `docker wait` process to exit
kill signal CONT

# Due to a bug in upstart we need to set the modes we consider
# successful exists https://bugs.launchpad.net/upstart/+bug/568288
normal exit 0 CONT

respawn

exec /usr/bin/docker wait my-volatile-app

pre-start script
  # Wait for docker to finish starting up first.
  FILE=/var/run/docker.sock
  while [ ! -e $FILE ] ; do
    inotifywait -t 2 -e create $(dirname $FILE)
  done
  /usr/bin/docker start my-volatile-app || true
end script

pre-stop script
  /usr/bin/docker stop -t 60 my-volatile-app
end script

@tianon
Copy link
Member

tianon commented Jun 25, 2014

We actually used to document the inotifywait workaround specifically to solve this issue, but then the Docker daemon started listening on the socket earlier so we removed it. Maybe we should add it back until we can daemonize or figure out another solution?

/cc @crosbymichael

@jperville
Copy link
Contributor Author

Yes, I found the older documentation with that workaround. Even if the socket now listens earlier, it is still not early enough for upstart script (at least in those script we can work around the issue). Also, I encountered the bug quite a lot while trying to reproduce #6673 (by basically spamming sudo service docker restart && docker ps in a shell).

@bflad
Copy link
Contributor

bflad commented Jun 26, 2014

I took a deeper look into some Upstart documentation and still didn't see anything that might help. Please let me know if you need additional help testing, but I'm +1 for inotifywait (either in the docker Upstart or in dependent containers) as a workaround for now.

@tianon
Copy link
Member

tianon commented Jul 1, 2014

/cc @SvenDowideit what do you think? (since you were involved in the previous removal of the inotifywait hacks)

@SvenDowideit
Copy link
Contributor

@tianon - isn't this an @alexlarsson type thing?

@tianon
Copy link
Member

tianon commented Jul 1, 2014

Naw, Upstart is an Ubuntu thing. ;)

@jpanganiban
Copy link

👍 on this.

drothlis added a commit to drothlis/docker that referenced this issue Dec 16, 2014
Fixes moby#6647: Other upstart jobs that depend on docker by specifying
"start on started docker" would often start before the docker daemon was
ready, so they'd fail with "Cannot connect to the Docker daemon" or
"dial unix /var/run/docker.sock: no such file or directory".

This is because "docker -d" doesn't daemonize, it runs in the
foreground, so upstart can't know when the daemon is ready to receive
incoming connections. (Traditionally, a daemon will create all necessary
sockets and then fork to signal that it's ready; according to @tianon
this "isn't possible in Go"[1]. See also [2].)

Presumably this isn't a problem with systemd init with its socket
activation. The SysV init scripts may or may not suffer from this
problem but I have no motivation to fix them.

This commit adds a "post-start" stanza to the upstart configuration
that waits for the socket to be available. Upstart won't emit the
"started" event until the "post-start" script completes.[3]

Note that the system administrator might have specified a different path
for the socket, or a tcp socket instead, by customising
/etc/default/docker. In that case we don't try to figure out what the
new socket is, but at least we don't wait in vain for
/var/run/docker.sock to appear.

If the main script (`docker -d`) fails to start, the `initctl status
$UPSTART_JOB | grep -q "stop/"` line ensures that we don't loop forever.
I stole this idea from Steve Langasek.[4]

If for some reason we *still* end up in an infinite loop --I guess
`docker -d` must have hung-- then at least we'll be able to see the
"Waiting for /var/run/docker.sock" debug output in
/var/log/upstart/docker.log.

I considered using inotifywait instead of sleep, but it isn't worth
the complexity & the extra dependency.

[1] moby#6647 (comment)
[2] https://code.google.com/p/go/issues/detail?id=227
[3] http://upstart.ubuntu.com/cookbook/#post-start
[4] https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html

Signed-off-by: David Röthlisberger <david@rothlis.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants