New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto restart containers based on restart policy #7414

Merged
merged 18 commits into from Aug 13, 2014

Conversation

Projects
None yet
@crosbymichael
Member

crosbymichael commented Aug 5, 2014

This implements the proposal in #7226

Closes #7226
Closes #7521

Restart Policies

Using the --restart flag on docker run you can specify a restart policy for
how a container should or should not be restarted on exit.

no - Do not restart the container when it exits.

on-failure - Restart the container only if it exits with a non zero exit status.

always - Always restart the container regardless of the exit status.

You can also specify the maximum amount of times docker will try to restart the
container when using the on-failure policy. The default is that docker will try forever to restart the container.

$ sudo docker run --restart=always redis

This will run the redis container with a restart policy of always so that if
the container exits, docker will restart it.

$ sudo docker run --restart=on-failure:10 redis

This will run the redis container with a restart policy of on-failure and a
maximum restart count of 10. If the redis container exits with a non-zero exit
status more than 10 times in a row docker will abort trying to restart the container.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 5, 2014

I need some help writing the tests because I'm not sure of the best way to grep for running containers and the data. I guess I can write some unit tests and then use inspect in the integration tests.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 5, 2014

The best part about this feature being in docker is that we can restart your processes and your IP, Ports, etc will not change. We also skip the unmount/mount of the rootfs so it's much quicker than if an external process supervisor is taking care of things. It has to totally stop and deallocate any resources then start/mount everything back again. With this approach it's just rerunning the processes with the rootfs and networking all setup

@SvenDowideit

This comment has been minimized.

Contributor

SvenDowideit commented Aug 5, 2014

Docs LGTM - though some needs to be added to the man pages too.

** no ** - Do not restart the container when it exits.
** on-failure ** - Restart the container only if it exits with a non zero exit status.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

status should be code.

** on-failure ** - Restart the container only if it exits with a non zero exit status.
** always ** - Always restart the container reguardless of the exit status.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

status == code

This comment has been minimized.

@crosbymichael

crosbymichael Aug 5, 2014

Member

Not really, I believe exit status is the correct term to use here because of signals

** always ** - Always restart the container reguardless of the exit status.
You can also specify the maximum amount of times docker will try to restart the

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

Docker

** always ** - Always restart the container reguardless of the exit status.
You can also specify the maximum amount of times docker will try to restart the
container when using the ** on-failure ** policy. The default is that docker will try forever to restart the container.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

Docker

$ sudo docker run --restart=always redis
This will run the redis container with a restart policy of ** always ** so that if
the container exits, docker will restart it.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

Docker

$ sudo docker run --restart=always redis
This will run the redis container with a restart policy of ** always ** so that if

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

redis

$ sudo docker run --restart=on-failure:10 redis
This will run the redis container with a restart policy of ** on-failure ** and a

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

redis

$ sudo docker run --restart=on-failure:10 redis
This will run the redis container with a restart policy of ** on-failure ** and a
maximum restart count of 10. If the redis container exits with a non-zero exit

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

redis

This will run the redis container with a restart policy of ** on-failure ** and a
maximum restart count of 10. If the redis container exits with a non-zero exit
status more than 10 times in a row docker will abort trying to restart the container.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

status == code

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014

Contributor

Docker

@jamtur01

This comment has been minimized.

Contributor

jamtur01 commented Aug 5, 2014

Some comments on the Docs.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 6, 2014

I updated the PR to only restart containers at daemon load when the policy is always. This is inline with the other refactoring and is straight forward with no crazy race hacks or anything.

@progrium

This comment has been minimized.

Contributor

progrium commented Aug 7, 2014

Yes yes yes yes.

@thockin

This comment has been minimized.

Contributor

thockin commented Aug 7, 2014

To be clear, a container with a policy of "on-failure" will NOT be
restarted when the daemon dies?

If that's the case, then we still need to wrap the whole thing in a restart
loop, which is unfortunate. Or am I misunderstanding?

On Wed, Aug 6, 2014 at 11:09 AM, Michael Crosby notifications@github.com
wrote:

I updated the PR to only restart containers at daemon load when the policy
is always. This is inline with the other refactoring and is straight
forward with no crazy race hacks or anything.

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 7, 2014

@thockin yes that is correct. Is there a use case that if the daemon dies or is stopped cleanly that you would want container's marked as on-failure to restart?

@thockin

This comment has been minimized.

Contributor

thockin commented Aug 7, 2014

If the user has a container that they asked to run "until success" (which
is just another way to say restart on failure), then this behavior does not
fulfill the contract. Presumably their container is resilient to being
restarted, so what reason do we have for not doing what the user asked?

They asked us to run until they "succeed", but we didn't. And worse, we
failed to liv eup to our promise for no reason that has ANYTHING to do with
the user. That's going to be a bad experience.

On Wed, Aug 6, 2014 at 8:28 PM, Michael Crosby notifications@github.com
wrote:

@thockin https://github.com/thockin yes that is correct. Is there a use
case that if the daemon dies or is stopped cleanly that you would want
container's marked as on-failure to restart?

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 7, 2014

@thockin Err, what happens if the container has run and exited successfully. Then someone restarts the docker daemon?

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 7, 2014

@thockin I edited my last post really quick because I had an additional question

@thockin

This comment has been minimized.

Contributor

thockin commented Aug 7, 2014

If the container has run and exited successfully, then it is done and must not be restarted (or else we violate the contract in the opposite direction than before). Docker should be able to GUARANTEE receipt of the termination status. If it can't, that should be fixed.

There's still a bit of a race around what happens if the docker daemon dies with signal 9 - any running containers actually keep running and docker, upon restart, is not clever enough to find them. This should be fixed, too, but is a small enough problem to not hold up general restarts, IMO.

I can talk more about how we have solved some of these guarantees internally, if it helps, but we've made some different design decisions.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 7, 2014

@thockin sound fair, it was the daemon getting SIGKILL'd that I was worried about.

Anyways, I updated the PR to restart containers on load if they have a policy of on-failure and the exit status was non-zero.

@thockin

This comment has been minimized.

Contributor

thockin commented Aug 7, 2014

Awesome. Glad you could be convinced.

On Wed, Aug 6, 2014 at 9:17 PM, Michael Crosby notifications@github.com
wrote:

@thockin https://github.com/thockin sound fair, it was the daemon
getting SIGKILL'd that I was worried about.

Anyways, I updated the PR to restart containers on load if they have a
policy of on-failure and the exit status was non-zero.

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 11, 2014

Ok, so I updated this PR to have a new state of Restarting. We need this new state so that when you run docker ps when a container is restarting it shows up in the list as... restarting. Because we use an increasing time increment between restarts, a container could be waiting on it's next restart where it's in a little bit of a limbo. The user may see that this container has been failing to start and issue a stop to the container so that it does not try again.

You can test this by running a container with a restart policy of always or on-failure.

docker run -d --restart always busybox false

Without the restarting state there would be no way to stop the container without stopping the daemon. With the restarting state you can see in docker ps that the container is restarting, get the id or name of the container then run docker stop <id> to ensure that the container does not try to restart again.

@vieux

This comment has been minimized.

Collaborator

vieux commented Aug 11, 2014

Works great just one comment, when you stop/kill a container in the restarting state. It was to wait the end of the timeout to actually stops.

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 11, 2014

@vieux

Check out the last commit and please test again. This should be fixed and you should not see a pause after asking docker to stop your container.

@vieux

This comment has been minimized.

Collaborator

vieux commented Aug 12, 2014

LGTM

crosbymichael added some commits Aug 11, 2014

Add Restarting state when docker is handling the restart of containers
Signed-off-by: Michael Crosby <michael@docker.com>
Honor the restarting state in Stop
Signed-off-by: Michael Crosby <michael@docker.com>
Improve wait during restart
We need to do this so that when a user asks docker to stop the container
and it is currently in the restart loop we don't want to have to wait
for the duration of the restart time increment before ack. the stop.

Signed-off-by: Michael Crosby <michael@docker.com>
Reguardless of success reset time increment
Reset the time increment if the container's execution time is greater
than 10s or else as a container runs and is restarted the time will grow
overtime.

Signed-off-by: Michael Crosby <michael@docker.com>
Rebased changes to return on first start's error
Signed-off-by: Michael Crosby <michael@docker.com>
Update based on comments from the code review
Signed-off-by: Michael Crosby <michael@docker.com>
Deprecate --restart on the daemon
Signed-off-by: Michael Crosby <michael@docker.com>
@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 13, 2014

rebased again

crosbymichael added a commit that referenced this pull request Aug 13, 2014

Merge pull request #7414 from crosbymichael/auto-restart
Auto restart containers based on restart policy

@crosbymichael crosbymichael merged commit b95f6c1 into moby:master Aug 13, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@crosbymichael crosbymichael deleted the crosbymichael:auto-restart branch Aug 13, 2014

fsouza added a commit to fsouza/go-dockerclient that referenced this pull request Aug 14, 2014

@prologic

This comment has been minimized.

Contributor

prologic commented Aug 14, 2014

This is amazing guys :)

@discordianfish

This comment has been minimized.

Contributor

discordianfish commented Aug 15, 2014

Yay! Really, really happy about this guys. The feature I missed the most :)

@dexterbt1

This comment has been minimized.

dexterbt1 commented Aug 15, 2014

Wow. Can't wait to test this out in the next release.

@borjaburgos

This comment has been minimized.

borjaburgos commented Aug 15, 2014

Happy to see Auto-restart / Crash-recovery making it to the core. I look forward to testing it out!

@gorsuch

This comment has been minimized.

Contributor

gorsuch commented Aug 20, 2014

<3

@WIZARD-CXY

This comment has been minimized.

WIZARD-CXY commented Nov 28, 2016

@crosbymichael @vieux
I am wondering why you set 10s as the reset time.
If we have a container which runs this script:
#!/bin/bash

sleep 11
exit 1

This container always restart with no delay....
I just uses a similar container that uses 10s to init the server, but fails afterwards, it gets restarted every time without delay..... And I believe the document not mention this 10s stuff.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Nov 28, 2016

@WIZARD-CXY please don't use a very old, merged PR to report issues. Open a new issue instead with your use case, and/or an issue to report possible enhancements to the documentation.

@WIZARD-CXY

This comment has been minimized.

WIZARD-CXY commented Nov 29, 2016

@thaJeztah sorry, I just use 1.10.3 and use git blame to find this pr from the code. I'm not quite sure if this still happens in the upstream.Maybe I will open a new issue later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment