Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto restart containers based on restart policy #7414

Merged
merged 18 commits into from Aug 13, 2014

Conversation

@crosbymichael
Copy link
Contributor

@crosbymichael crosbymichael commented Aug 5, 2014

This implements the proposal in #7226

Closes #7226
Closes #7521

Restart Policies

Using the --restart flag on docker run you can specify a restart policy for
how a container should or should not be restarted on exit.

no - Do not restart the container when it exits.

on-failure - Restart the container only if it exits with a non zero exit status.

always - Always restart the container regardless of the exit status.

You can also specify the maximum amount of times docker will try to restart the
container when using the on-failure policy. The default is that docker will try forever to restart the container.

$ sudo docker run --restart=always redis

This will run the redis container with a restart policy of always so that if
the container exits, docker will restart it.

$ sudo docker run --restart=on-failure:10 redis

This will run the redis container with a restart policy of on-failure and a
maximum restart count of 10. If the redis container exits with a non-zero exit
status more than 10 times in a row docker will abort trying to restart the container.

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 5, 2014

I need some help writing the tests because I'm not sure of the best way to grep for running containers and the data. I guess I can write some unit tests and then use inspect in the integration tests.

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 5, 2014

The best part about this feature being in docker is that we can restart your processes and your IP, Ports, etc will not change. We also skip the unmount/mount of the rootfs so it's much quicker than if an external process supervisor is taking care of things. It has to totally stop and deallocate any resources then start/mount everything back again. With this approach it's just rerunning the processes with the rootfs and networking all setup

@SvenDowideit
Copy link
Contributor

@SvenDowideit SvenDowideit commented Aug 5, 2014

Docs LGTM - though some needs to be added to the man pages too.


** no ** - Do not restart the container when it exits.

** on-failure ** - Restart the container only if it exits with a non zero exit status.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

status should be code.


** on-failure ** - Restart the container only if it exits with a non zero exit status.

** always ** - Always restart the container reguardless of the exit status.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

status == code

This comment has been minimized.

@crosbymichael

crosbymichael Aug 5, 2014
Author Contributor

Not really, I believe exit status is the correct term to use here because of signals


** always ** - Always restart the container reguardless of the exit status.

You can also specify the maximum amount of times docker will try to restart the

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

Docker

** always ** - Always restart the container reguardless of the exit status.

You can also specify the maximum amount of times docker will try to restart the
container when using the ** on-failure ** policy. The default is that docker will try forever to restart the container.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

Docker

$ sudo docker run --restart=always redis

This will run the redis container with a restart policy of ** always ** so that if
the container exits, docker will restart it.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

Docker


$ sudo docker run --restart=always redis

This will run the redis container with a restart policy of ** always ** so that if

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

redis


$ sudo docker run --restart=on-failure:10 redis

This will run the redis container with a restart policy of ** on-failure ** and a

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

redis

$ sudo docker run --restart=on-failure:10 redis

This will run the redis container with a restart policy of ** on-failure ** and a
maximum restart count of 10. If the redis container exits with a non-zero exit

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

redis


This will run the redis container with a restart policy of ** on-failure ** and a
maximum restart count of 10. If the redis container exits with a non-zero exit
status more than 10 times in a row docker will abort trying to restart the container.

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

status == code

This comment has been minimized.

@jamtur01

jamtur01 Aug 5, 2014
Contributor

Docker

@jamtur01
Copy link
Contributor

@jamtur01 jamtur01 commented Aug 5, 2014

Some comments on the Docs.

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 6, 2014

I updated the PR to only restart containers at daemon load when the policy is always. This is inline with the other refactoring and is straight forward with no crazy race hacks or anything.

@progrium
Copy link
Contributor

@progrium progrium commented Aug 7, 2014

Yes yes yes yes.

@thockin
Copy link
Contributor

@thockin thockin commented Aug 7, 2014

To be clear, a container with a policy of "on-failure" will NOT be
restarted when the daemon dies?

If that's the case, then we still need to wrap the whole thing in a restart
loop, which is unfortunate. Or am I misunderstanding?

On Wed, Aug 6, 2014 at 11:09 AM, Michael Crosby notifications@github.com
wrote:

I updated the PR to only restart containers at daemon load when the policy
is always. This is inline with the other refactoring and is straight
forward with no crazy race hacks or anything.

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 7, 2014

@thockin yes that is correct. Is there a use case that if the daemon dies or is stopped cleanly that you would want container's marked as on-failure to restart?

@thockin
Copy link
Contributor

@thockin thockin commented Aug 7, 2014

If the user has a container that they asked to run "until success" (which
is just another way to say restart on failure), then this behavior does not
fulfill the contract. Presumably their container is resilient to being
restarted, so what reason do we have for not doing what the user asked?

They asked us to run until they "succeed", but we didn't. And worse, we
failed to liv eup to our promise for no reason that has ANYTHING to do with
the user. That's going to be a bad experience.

On Wed, Aug 6, 2014 at 8:28 PM, Michael Crosby notifications@github.com
wrote:

@thockin https://github.com/thockin yes that is correct. Is there a use
case that if the daemon dies or is stopped cleanly that you would want
container's marked as on-failure to restart?

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 7, 2014

@thockin Err, what happens if the container has run and exited successfully. Then someone restarts the docker daemon?

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 7, 2014

@thockin I edited my last post really quick because I had an additional question

@thockin
Copy link
Contributor

@thockin thockin commented Aug 7, 2014

If the container has run and exited successfully, then it is done and must not be restarted (or else we violate the contract in the opposite direction than before). Docker should be able to GUARANTEE receipt of the termination status. If it can't, that should be fixed.

There's still a bit of a race around what happens if the docker daemon dies with signal 9 - any running containers actually keep running and docker, upon restart, is not clever enough to find them. This should be fixed, too, but is a small enough problem to not hold up general restarts, IMO.

I can talk more about how we have solved some of these guarantees internally, if it helps, but we've made some different design decisions.

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 7, 2014

@thockin sound fair, it was the daemon getting SIGKILL'd that I was worried about.

Anyways, I updated the PR to restart containers on load if they have a policy of on-failure and the exit status was non-zero.

@thockin
Copy link
Contributor

@thockin thockin commented Aug 7, 2014

Awesome. Glad you could be convinced.

On Wed, Aug 6, 2014 at 9:17 PM, Michael Crosby notifications@github.com
wrote:

@thockin https://github.com/thockin sound fair, it was the daemon
getting SIGKILL'd that I was worried about.

Anyways, I updated the PR to restart containers on load if they have a
policy of on-failure and the exit status was non-zero.

Reply to this email directly or view it on GitHub
#7414 (comment).

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 11, 2014

Ok, so I updated this PR to have a new state of Restarting. We need this new state so that when you run docker ps when a container is restarting it shows up in the list as... restarting. Because we use an increasing time increment between restarts, a container could be waiting on it's next restart where it's in a little bit of a limbo. The user may see that this container has been failing to start and issue a stop to the container so that it does not try again.

You can test this by running a container with a restart policy of always or on-failure.

docker run -d --restart always busybox false

Without the restarting state there would be no way to stop the container without stopping the daemon. With the restarting state you can see in docker ps that the container is restarting, get the id or name of the container then run docker stop <id> to ensure that the container does not try to restart again.

@vieux
Copy link
Contributor

@vieux vieux commented Aug 11, 2014

Works great just one comment, when you stop/kill a container in the restarting state. It was to wait the end of the timeout to actually stops.

@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 11, 2014

@vieux

Check out the last commit and please test again. This should be fixed and you should not see a pause after asking docker to stop your container.

@vieux
Copy link
Contributor

@vieux vieux commented Aug 12, 2014

LGTM

Signed-off-by: Michael Crosby <michael@docker.com>
We need to do this so that when a user asks docker to stop the container
and it is currently in the restart loop we don't want to have to wait
for the duration of the restart time increment before ack. the stop.

Signed-off-by: Michael Crosby <michael@docker.com>
Reset the time increment if the container's execution time is greater
than 10s or else as a container runs and is restarted the time will grow
overtime.

Signed-off-by: Michael Crosby <michael@docker.com>
Signed-off-by: Michael Crosby <michael@docker.com>
Signed-off-by: Michael Crosby <michael@docker.com>
Signed-off-by: Michael Crosby <michael@docker.com>
@crosbymichael
Copy link
Contributor Author

@crosbymichael crosbymichael commented Aug 13, 2014

rebased again

crosbymichael added a commit that referenced this pull request Aug 13, 2014
Auto restart containers based on restart policy
@crosbymichael crosbymichael merged commit b95f6c1 into moby:master Aug 13, 2014
1 check passed
1 check passed
@crosbymichael
continuous-integration/travis-ci The Travis CI build passed
Details
@crosbymichael crosbymichael deleted the crosbymichael:auto-restart branch Aug 13, 2014
fsouza added a commit to fsouza/go-dockerclient that referenced this pull request Aug 14, 2014
@prologic
Copy link
Contributor

@prologic prologic commented Aug 14, 2014

This is amazing guys :)

@discordianfish
Copy link
Contributor

@discordianfish discordianfish commented Aug 15, 2014

Yay! Really, really happy about this guys. The feature I missed the most :)

@dexterbt1
Copy link

@dexterbt1 dexterbt1 commented Aug 15, 2014

Wow. Can't wait to test this out in the next release.

@borjaburgos
Copy link

@borjaburgos borjaburgos commented Aug 15, 2014

Happy to see Auto-restart / Crash-recovery making it to the core. I look forward to testing it out!

@gorsuch
Copy link
Contributor

@gorsuch gorsuch commented Aug 20, 2014

<3

@WIZARD-CXY
Copy link

@WIZARD-CXY WIZARD-CXY commented Nov 28, 2016

@crosbymichael @vieux
I am wondering why you set 10s as the reset time.
If we have a container which runs this script:
#!/bin/bash

sleep 11
exit 1

This container always restart with no delay....
I just uses a similar container that uses 10s to init the server, but fails afterwards, it gets restarted every time without delay..... And I believe the document not mention this 10s stuff.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Nov 28, 2016

@WIZARD-CXY please don't use a very old, merged PR to report issues. Open a new issue instead with your use case, and/or an issue to report possible enhancements to the documentation.

@WIZARD-CXY
Copy link

@WIZARD-CXY WIZARD-CXY commented Nov 29, 2016

@thaJeztah sorry, I just use 1.10.3 and use git blame to find this pr from the code. I'm not quite sure if this still happens in the upstream.Maybe I will open a new issue later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.