upstart: Don't emit "started" event until docker.sock is available #9287

drothlis · 2014-11-22T10:17:46Z

Fixes #6647: Other upstart jobs that depend on docker by specifying
"start on started docker" would often start before the docker daemon was
ready, so they'd fail with "Cannot connect to the Docker daemon" or
"dial unix /var/run/docker.sock: no such file or directory".

This is because "docker -d" doesn't daemonize, it runs in the
foreground, so upstart can't know when the daemon is ready to receive
incoming connections. (Traditionally, a daemon will create all necessary
sockets and then fork to signal that it's ready; according to @tianon
this "isn't possible in Go"[1]. See also [2].)

Presumably this isn't a problem with systemd init with its socket
activation. The SysV init scripts may or may not suffer from this
problem but I have no motivation to fix them.

This commit adds a "post-start" stanza to the upstart configuration
that waits for the socket to be available. Upstart won't emit the
"started" event until the "post-start" script completes.[3]

Note that the system administrator might have specified a different path
for the socket, or a tcp socket instead, by customising
/etc/default/docker. In that case we don't try to figure out what the
new socket is, but at least we don't wait in vain for
/var/run/docker.sock to appear. If for some unforeseen reason we do end
up in an infinite loop, you'll be able to see the "Waiting for
/var/run/docker.sock" debug output in /var/log/upstart/docker.log.

I considered using inotifywait instead of sleep, but it isn't worth
the complexity & the extra dependency.

[1] #6647 (comment)
[2] https://code.google.com/p/go/issues/detail?id=227
[3] http://upstart.ubuntu.com/cookbook/#post-start

Signed-off-by: David Röthlisberger david@rothlis.net

drothlis · 2014-11-22T10:21:19Z

@tianon and @jfrazelle (from docker/contrib/init/upstart/MAINTAINERS) might want to look at this.

jessfraz · 2014-11-22T17:24:11Z

cool will try this out on my ubuntu machine with another job calling start on started docker

jessfraz · 2014-11-24T20:19:48Z

works perfectly...
@tianon if you want a test file just to save you two secs, I used

start on started docker

pre-start script
    docker pull debian:jessie
end script

got a socket error with current upstart:

2014/11/24 15:17:59 Post http:///var/run/docker.sock/v1.15/images/create?fromImage=debian%3Ajessie: dial unix /var/run/docker.sock: no such file or directory
root@ubuntu-testing:~# cat /etc/init/dtest.conf

and successfully got pull log with this patch.

jessfraz · 2014-11-24T20:37:50Z

thanks for the fix!

tianon · 2014-11-25T22:00:49Z

I'm a little bit confused because we've had stuff like this before, but ultimately removed it thanks to #4168. I guess we just don't start listening on the socket early enough in the Daemon startup code?

tianon · 2014-11-25T22:02:02Z

See especially all the referenced issues/PRs starting at #4168 (reference)

drothlis · 2014-11-26T08:58:20Z

@tianon: If the socket is created by the docker daemon itself (as opposed to injected by the process manager à la systemd) then there is always going to be a race condition, no matter how early the daemon creates the socket. It goes something like this:

a. upstart execs docker -d
b. upstart emits "started docker"
c. docker creates the socket
d. docker starts processing connections on the socket

"b" and "c" can happen in either order (it's a race condition).

If another upstart job starts after "b" but before "c", we have the defect that this pull request fixes.

#4168 seems to solve the problem where the other upstart job starts after "c" (but before "d").

Now I'm not discounting the possibility that there's a regression that causes "c" to happen later than it used to. However no one has any evidence for that. And even if it were the case, and it were fixed, the race condition that this pull request fixes still exists.

Note also that docker has never actually had this fix before; what you removed in #4799 was an example upstart configuration file in your documentation. Furthermore it was a workaround that needed to be added to every downstream upstart job that wants to start a docker container. This pull request adds a fix to the actual upstart configuration for the docker daemon that you ship in your official deb packages. Also note that the implementation in this pull request doesn't suffer from any of the following drawbacks of the example configuration removed in #4799:

New dependency on inotifywait.
Infinite loop if the system administrator overrides the socket in /etc/default/docker.
Unnecessary 2 second delay if the docker daemon creates the socket after the call to [ ! -e $FILE ] but before inotifywait -e create.

jessfraz · 2014-12-16T20:19:07Z

@tianon wdyt, should we merge, I think even tho this is looping, it is better than inotifywait

tianon · 2014-12-16T20:29:00Z

Yeah, I guess that's fair. The only comment I'd have left is that I'd prefer we add a loop counter of some kind so that this doesn't go infinite if Docker fails to start. 😄

drothlis · 2014-12-16T20:57:42Z

add a loop counter

Sure thing, I can do that. What's your preferred workflow: A separate commit, or can I squash and force-push to this PR?

tianon · 2014-12-16T20:58:38Z

Squash/amend and force push is perfect! :)

@tianon

Fixes moby#6647: Other upstart jobs that depend on docker by specifying "start on started docker" would often start before the docker daemon was ready, so they'd fail with "Cannot connect to the Docker daemon" or "dial unix /var/run/docker.sock: no such file or directory". This is because "docker -d" doesn't daemonize, it runs in the foreground, so upstart can't know when the daemon is ready to receive incoming connections. (Traditionally, a daemon will create all necessary sockets and then fork to signal that it's ready; according to @tianon this "isn't possible in Go"[1]. See also [2].) Presumably this isn't a problem with systemd init with its socket activation. The SysV init scripts may or may not suffer from this problem but I have no motivation to fix them. This commit adds a "post-start" stanza to the upstart configuration that waits for the socket to be available. Upstart won't emit the "started" event until the "post-start" script completes.[3] Note that the system administrator might have specified a different path for the socket, or a tcp socket instead, by customising /etc/default/docker. In that case we don't try to figure out what the new socket is, but at least we don't wait in vain for /var/run/docker.sock to appear. If the main script (`docker -d`) fails to start, the `initctl status $UPSTART_JOB | grep -q "stop/"` line ensures that we don't loop forever. I stole this idea from Steve Langasek.[4] If for some reason we *still* end up in an infinite loop --I guess `docker -d` must have hung-- then at least we'll be able to see the "Waiting for /var/run/docker.sock" debug output in /var/log/upstart/docker.log. I considered using inotifywait instead of sleep, but it isn't worth the complexity & the extra dependency. [1] moby#6647 (comment) [2] https://code.google.com/p/go/issues/detail?id=227 [3] http://upstart.ubuntu.com/cookbook/#post-start [4] https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html Signed-off-by: David Röthlisberger <david@rothlis.net>

drothlis · 2014-12-16T21:31:08Z

I've added a check so that the post-start script doesn't loop forever if the
main upstart script (docker -d) fails. I squashed into my previous commit
(and rebased onto the current master) and force pushed. The diff is:

@@ -49,6 +49,7 @@ post-start script
        fi
        if ! printf "%s" "$DOCKER_OPTS" | grep -qE -e '-H|--host'; then
                while ! [ -e /var/run/docker.sock ]; do
+                       initctl status $UPSTART_JOB | grep -q "stop/" && exit 1
                        echo "Waiting for /var/run/docker.sock"
                        sleep 0.1
                done

I stole this idea from Steve Langasek:
https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html
It works well.

tianon · 2014-12-16T22:03:51Z

Ah, Steve's a solid reference, especially for this kind of thing. Thanks for sharing the source of the wisdom, too! 👍

LGTM

jessfraz · 2014-12-16T22:14:58Z

awesome sauce LGTM

upstart: Don't emit "started" event until docker.sock is available

jperville · 2014-12-17T01:42:04Z

Nice work @drothlis . I wish my github notifications worked on my original issue, I only saw your PR after it got merged. Got no chance to help testing.

drothlis force-pushed the fix-upstart-race-condition branch from 41a4db4 to f42c0a5 Compare December 16, 2014 21:30

jessfraz pushed a commit that referenced this pull request Dec 16, 2014

Merge pull request #9287 from drothlis/fix-upstart-race-condition

ad8096a

upstart: Don't emit "started" event until docker.sock is available

jessfraz merged commit ad8096a into moby:master Dec 16, 2014

SvenDowideit mentioned this pull request Jan 27, 2015

Post 1.4.1 docs update 2 #9739

Merged

hacfi mentioned this pull request Mar 5, 2015

Docker 1.5.0 deb package installation fails sometimes #11167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upstart: Don't emit "started" event until docker.sock is available #9287

upstart: Don't emit "started" event until docker.sock is available #9287

drothlis commented Nov 22, 2014

drothlis commented Nov 22, 2014

jessfraz commented Nov 22, 2014

jessfraz commented Nov 24, 2014

jessfraz commented Nov 24, 2014

tianon commented Nov 25, 2014

tianon commented Nov 25, 2014

drothlis commented Nov 26, 2014

jessfraz commented Dec 16, 2014

tianon commented Dec 16, 2014

drothlis commented Dec 16, 2014

tianon commented Dec 16, 2014

drothlis commented Dec 16, 2014

tianon commented Dec 16, 2014

jessfraz commented Dec 16, 2014

jperville commented Dec 17, 2014

upstart: Don't emit "started" event until docker.sock is available #9287

upstart: Don't emit "started" event until docker.sock is available #9287

Conversation

drothlis commented Nov 22, 2014

drothlis commented Nov 22, 2014

jessfraz commented Nov 22, 2014

jessfraz commented Nov 24, 2014

jessfraz commented Nov 24, 2014

tianon commented Nov 25, 2014

tianon commented Nov 25, 2014

drothlis commented Nov 26, 2014

jessfraz commented Dec 16, 2014

tianon commented Dec 16, 2014

drothlis commented Dec 16, 2014

tianon commented Dec 16, 2014

drothlis commented Dec 16, 2014

tianon commented Dec 16, 2014

jessfraz commented Dec 16, 2014

jperville commented Dec 17, 2014