Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstart: Don't emit "started" event until docker.sock is available #9287

Merged
merged 1 commit into from
Dec 16, 2014

Conversation

drothlis
Copy link
Contributor

Fixes #6647: Other upstart jobs that depend on docker by specifying
"start on started docker" would often start before the docker daemon was
ready, so they'd fail with "Cannot connect to the Docker daemon" or
"dial unix /var/run/docker.sock: no such file or directory".

This is because "docker -d" doesn't daemonize, it runs in the
foreground, so upstart can't know when the daemon is ready to receive
incoming connections. (Traditionally, a daemon will create all necessary
sockets and then fork to signal that it's ready; according to @tianon
this "isn't possible in Go"[1]. See also [2].)

Presumably this isn't a problem with systemd init with its socket
activation. The SysV init scripts may or may not suffer from this
problem but I have no motivation to fix them.

This commit adds a "post-start" stanza to the upstart configuration
that waits for the socket to be available. Upstart won't emit the
"started" event until the "post-start" script completes.[3]

Note that the system administrator might have specified a different path
for the socket, or a tcp socket instead, by customising
/etc/default/docker. In that case we don't try to figure out what the
new socket is, but at least we don't wait in vain for
/var/run/docker.sock to appear. If for some unforeseen reason we do end
up in an infinite loop, you'll be able to see the "Waiting for
/var/run/docker.sock" debug output in /var/log/upstart/docker.log.

I considered using inotifywait instead of sleep, but it isn't worth
the complexity & the extra dependency.

[1] #6647 (comment)
[2] https://code.google.com/p/go/issues/detail?id=227
[3] http://upstart.ubuntu.com/cookbook/#post-start

Signed-off-by: David Röthlisberger david@rothlis.net

@drothlis
Copy link
Contributor Author

@tianon and @jfrazelle (from docker/contrib/init/upstart/MAINTAINERS) might want to look at this.

@jessfraz
Copy link
Contributor

cool will try this out on my ubuntu machine with another job calling start on started docker

@jessfraz
Copy link
Contributor

works perfectly...
@tianon if you want a test file just to save you two secs, I used

start on started docker

pre-start script
    docker pull debian:jessie
end script

got a socket error with current upstart:

2014/11/24 15:17:59 Post http:///var/run/docker.sock/v1.15/images/create?fromImage=debian%3Ajessie: dial unix /var/run/docker.sock: no such file or directory
root@ubuntu-testing:~# cat /etc/init/dtest.conf 

and successfully got pull log with this patch.

@jessfraz
Copy link
Contributor

thanks for the fix!

@tianon
Copy link
Member

tianon commented Nov 25, 2014

I'm a little bit confused because we've had stuff like this before, but ultimately removed it thanks to #4168. I guess we just don't start listening on the socket early enough in the Daemon startup code?

@tianon
Copy link
Member

tianon commented Nov 25, 2014

See especially all the referenced issues/PRs starting at #4168 (reference)

@drothlis
Copy link
Contributor Author

@tianon: If the socket is created by the docker daemon itself (as opposed to injected by the process manager à la systemd) then there is always going to be a race condition, no matter how early the daemon creates the socket. It goes something like this:

a. upstart execs docker -d
b. upstart emits "started docker"
c. docker creates the socket
d. docker starts processing connections on the socket

"b" and "c" can happen in either order (it's a race condition).

If another upstart job starts after "b" but before "c", we have the defect that this pull request fixes.

#4168 seems to solve the problem where the other upstart job starts after "c" (but before "d").

Now I'm not discounting the possibility that there's a regression that causes "c" to happen later than it used to. However no one has any evidence for that. And even if it were the case, and it were fixed, the race condition that this pull request fixes still exists.

Note also that docker has never actually had this fix before; what you removed in #4799 was an example upstart configuration file in your documentation. Furthermore it was a workaround that needed to be added to every downstream upstart job that wants to start a docker container. This pull request adds a fix to the actual upstart configuration for the docker daemon that you ship in your official deb packages. Also note that the implementation in this pull request doesn't suffer from any of the following drawbacks of the example configuration removed in #4799:

  • New dependency on inotifywait.
  • Infinite loop if the system administrator overrides the socket in /etc/default/docker.
  • Unnecessary 2 second delay if the docker daemon creates the socket after the call to [ ! -e $FILE ] but before inotifywait -e create.

@jessfraz
Copy link
Contributor

@tianon wdyt, should we merge, I think even tho this is looping, it is better than inotifywait

@tianon
Copy link
Member

tianon commented Dec 16, 2014

Yeah, I guess that's fair. The only comment I'd have left is that I'd prefer we add a loop counter of some kind so that this doesn't go infinite if Docker fails to start. 😄

@drothlis
Copy link
Contributor Author

add a loop counter

Sure thing, I can do that. What's your preferred workflow: A separate commit, or can I squash and force-push to this PR?

@tianon
Copy link
Member

tianon commented Dec 16, 2014

Squash/amend and force push is perfect! :)

Fixes moby#6647: Other upstart jobs that depend on docker by specifying
"start on started docker" would often start before the docker daemon was
ready, so they'd fail with "Cannot connect to the Docker daemon" or
"dial unix /var/run/docker.sock: no such file or directory".

This is because "docker -d" doesn't daemonize, it runs in the
foreground, so upstart can't know when the daemon is ready to receive
incoming connections. (Traditionally, a daemon will create all necessary
sockets and then fork to signal that it's ready; according to @tianon
this "isn't possible in Go"[1]. See also [2].)

Presumably this isn't a problem with systemd init with its socket
activation. The SysV init scripts may or may not suffer from this
problem but I have no motivation to fix them.

This commit adds a "post-start" stanza to the upstart configuration
that waits for the socket to be available. Upstart won't emit the
"started" event until the "post-start" script completes.[3]

Note that the system administrator might have specified a different path
for the socket, or a tcp socket instead, by customising
/etc/default/docker. In that case we don't try to figure out what the
new socket is, but at least we don't wait in vain for
/var/run/docker.sock to appear.

If the main script (`docker -d`) fails to start, the `initctl status
$UPSTART_JOB | grep -q "stop/"` line ensures that we don't loop forever.
I stole this idea from Steve Langasek.[4]

If for some reason we *still* end up in an infinite loop --I guess
`docker -d` must have hung-- then at least we'll be able to see the
"Waiting for /var/run/docker.sock" debug output in
/var/log/upstart/docker.log.

I considered using inotifywait instead of sleep, but it isn't worth
the complexity & the extra dependency.

[1] moby#6647 (comment)
[2] https://code.google.com/p/go/issues/detail?id=227
[3] http://upstart.ubuntu.com/cookbook/#post-start
[4] https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html

Signed-off-by: David Röthlisberger <david@rothlis.net>
@drothlis
Copy link
Contributor Author

I've added a check so that the post-start script doesn't loop forever if the
main upstart script (docker -d) fails. I squashed into my previous commit
(and rebased onto the current master) and force pushed. The diff is:

@@ -49,6 +49,7 @@ post-start script
        fi
        if ! printf "%s" "$DOCKER_OPTS" | grep -qE -e '-H|--host'; then
                while ! [ -e /var/run/docker.sock ]; do
+                       initctl status $UPSTART_JOB | grep -q "stop/" && exit 1
                        echo "Waiting for /var/run/docker.sock"
                        sleep 0.1
                done

I stole this idea from Steve Langasek:
https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html
It works well.

@tianon
Copy link
Member

tianon commented Dec 16, 2014

Ah, Steve's a solid reference, especially for this kind of thing. Thanks for sharing the source of the wisdom, too! 👍

LGTM

@jessfraz
Copy link
Contributor

awesome sauce LGTM

jessfraz pushed a commit that referenced this pull request Dec 16, 2014
upstart: Don't emit "started" event until docker.sock is available
@jessfraz jessfraz merged commit ad8096a into moby:master Dec 16, 2014
@jperville
Copy link
Contributor

Nice work @drothlis . I wish my github notifications worked on my original issue, I only saw your PR after it got merged. Got no chance to help testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upstart: docker started event fires before /var/run/docker.sock exists
4 participants