-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstart: Don't emit "started" event until docker.sock is available #9287
Conversation
@tianon and @jfrazelle (from docker/contrib/init/upstart/MAINTAINERS) might want to look at this. |
cool will try this out on my ubuntu machine with another job calling start on started docker |
works perfectly... start on started docker
pre-start script
docker pull debian:jessie
end script got a socket error with current upstart:
and successfully got pull log with this patch. |
thanks for the fix! |
I'm a little bit confused because we've had stuff like this before, but ultimately removed it thanks to #4168. I guess we just don't start listening on the socket early enough in the Daemon startup code? |
See especially all the referenced issues/PRs starting at #4168 (reference) |
@tianon: If the socket is created by the docker daemon itself (as opposed to injected by the process manager à la systemd) then there is always going to be a race condition, no matter how early the daemon creates the socket. It goes something like this: a. upstart execs "b" and "c" can happen in either order (it's a race condition). If another upstart job starts after "b" but before "c", we have the defect that this pull request fixes. #4168 seems to solve the problem where the other upstart job starts after "c" (but before "d"). Now I'm not discounting the possibility that there's a regression that causes "c" to happen later than it used to. However no one has any evidence for that. And even if it were the case, and it were fixed, the race condition that this pull request fixes still exists. Note also that docker has never actually had this fix before; what you removed in #4799 was an example upstart configuration file in your documentation. Furthermore it was a workaround that needed to be added to every downstream upstart job that wants to start a docker container. This pull request adds a fix to the actual upstart configuration for the docker daemon that you ship in your official deb packages. Also note that the implementation in this pull request doesn't suffer from any of the following drawbacks of the example configuration removed in #4799:
|
@tianon wdyt, should we merge, I think even tho this is looping, it is better than |
Yeah, I guess that's fair. The only comment I'd have left is that I'd prefer we add a loop counter of some kind so that this doesn't go infinite if Docker fails to start. 😄 |
Sure thing, I can do that. What's your preferred workflow: A separate commit, or can I squash and force-push to this PR? |
Squash/amend and force push is perfect! :) |
Fixes moby#6647: Other upstart jobs that depend on docker by specifying "start on started docker" would often start before the docker daemon was ready, so they'd fail with "Cannot connect to the Docker daemon" or "dial unix /var/run/docker.sock: no such file or directory". This is because "docker -d" doesn't daemonize, it runs in the foreground, so upstart can't know when the daemon is ready to receive incoming connections. (Traditionally, a daemon will create all necessary sockets and then fork to signal that it's ready; according to @tianon this "isn't possible in Go"[1]. See also [2].) Presumably this isn't a problem with systemd init with its socket activation. The SysV init scripts may or may not suffer from this problem but I have no motivation to fix them. This commit adds a "post-start" stanza to the upstart configuration that waits for the socket to be available. Upstart won't emit the "started" event until the "post-start" script completes.[3] Note that the system administrator might have specified a different path for the socket, or a tcp socket instead, by customising /etc/default/docker. In that case we don't try to figure out what the new socket is, but at least we don't wait in vain for /var/run/docker.sock to appear. If the main script (`docker -d`) fails to start, the `initctl status $UPSTART_JOB | grep -q "stop/"` line ensures that we don't loop forever. I stole this idea from Steve Langasek.[4] If for some reason we *still* end up in an infinite loop --I guess `docker -d` must have hung-- then at least we'll be able to see the "Waiting for /var/run/docker.sock" debug output in /var/log/upstart/docker.log. I considered using inotifywait instead of sleep, but it isn't worth the complexity & the extra dependency. [1] moby#6647 (comment) [2] https://code.google.com/p/go/issues/detail?id=227 [3] http://upstart.ubuntu.com/cookbook/#post-start [4] https://lists.ubuntu.com/archives/upstart-devel/2013-April/002492.html Signed-off-by: David Röthlisberger <david@rothlis.net>
41a4db4
to
f42c0a5
Compare
I've added a check so that the post-start script doesn't loop forever if the
I stole this idea from Steve Langasek: |
Ah, Steve's a solid reference, especially for this kind of thing. Thanks for sharing the source of the wisdom, too! 👍 LGTM |
awesome sauce LGTM |
upstart: Don't emit "started" event until docker.sock is available
Nice work @drothlis . I wish my github notifications worked on my original issue, I only saw your PR after it got merged. Got no chance to help testing. |
Fixes #6647: Other upstart jobs that depend on docker by specifying
"start on started docker" would often start before the docker daemon was
ready, so they'd fail with "Cannot connect to the Docker daemon" or
"dial unix /var/run/docker.sock: no such file or directory".
This is because "docker -d" doesn't daemonize, it runs in the
foreground, so upstart can't know when the daemon is ready to receive
incoming connections. (Traditionally, a daemon will create all necessary
sockets and then fork to signal that it's ready; according to @tianon
this "isn't possible in Go"[1]. See also [2].)
Presumably this isn't a problem with systemd init with its socket
activation. The SysV init scripts may or may not suffer from this
problem but I have no motivation to fix them.
This commit adds a "post-start" stanza to the upstart configuration
that waits for the socket to be available. Upstart won't emit the
"started" event until the "post-start" script completes.[3]
Note that the system administrator might have specified a different path
for the socket, or a tcp socket instead, by customising
/etc/default/docker. In that case we don't try to figure out what the
new socket is, but at least we don't wait in vain for
/var/run/docker.sock to appear. If for some unforeseen reason we do end
up in an infinite loop, you'll be able to see the "Waiting for
/var/run/docker.sock" debug output in /var/log/upstart/docker.log.
I considered using inotifywait instead of sleep, but it isn't worth
the complexity & the extra dependency.
[1] #6647 (comment)
[2] https://code.google.com/p/go/issues/detail?id=227
[3] http://upstart.ubuntu.com/cookbook/#post-start
Signed-off-by: David Röthlisberger david@rothlis.net