New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-restart processes #26

Closed
shykes opened this Issue Mar 9, 2013 · 52 comments

Comments

Projects
None yet
@shykes
Copy link
Collaborator

shykes commented Mar 9, 2013

Docker should be capable of restarting its child processes when they exit or are killed. This behavior should be optional: sometimes you don't want the process to be restarted (singleton jobs, interactive shells...)

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Jul 26, 2013

The current -r flag is insufficient for a production setup. Docker should out of the box auto-detect which containers need to be restarted, without having to start it in a special "recovery" mode.

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Sep 7, 2013

This is scheduled for 0.7.

@unclejack

This comment has been minimized.

Copy link
Contributor

unclejack commented Nov 2, 2013

@vieux Isn't this fixed by #1832?

@bfirsh

This comment has been minimized.

Copy link
Contributor

bfirsh commented Nov 2, 2013

@unclejack As far as I know, #1832 is for starting containers that were previously running when starting the Docker daemon. This issue is about restarting containers when they quit.

crosbymichael referenced this issue in crosbymichael/docker Nov 14, 2013

@discordianfish

This comment has been minimized.

Copy link
Contributor

discordianfish commented Jan 22, 2014

👍
I spend some thoughts around that and I think docker should restart a container at least if the command returned != 0.
To avoid crash looping, I would suggest to add a restart counter with some limit. This limit should be shown in the ps output and exposed via the api so it can be used for monitoring.

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Jan 22, 2014

@sreuter

This comment has been minimized.

Copy link

sreuter commented Jan 25, 2014

+1,000 @discordianfish ... This would would be much more convenient than using supervisor for doing the job!

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Jan 25, 2014

@discordianfish one thing I was wondering is how to prevent that for one shot commands that should not be restarted (and perhaps have side effects which even make it harmful to restart them).

On Sat, Jan 25, 2014 at 9:08 AM, sreuter notifications@github.com wrote:

+1,000 @discordianfish ... This would would be much more convenient than using supervisor for doing the job!

Reply to this email directly or view it on GitHub:
#26 (comment)

@bfirsh

This comment has been minimized.

Copy link
Contributor

bfirsh commented Jan 25, 2014

Some possible scenarios:

  1. running a one-off command in the foreground
  2. running a long-running command in the foreground
  3. running a one-off command as a daemon
  4. running a long-running command as a daemon

For (1), you almost definitely don't want it to auto-restart. For (2) you probably don't want to - you are probably running the command inside another process manager such as supervisord or upstart which will manage that for you.

For (3) you might want it to auto restart if it failed, but you would want it to be opt-in so you can confirm that the command is safe to run twice. For (4) you probably want it to auto-restart.

It feels like it should be an optional switch to docker run. I don't think it should be the default because there's no way to tell if a daemon is a long running process or not, and restarting a one-off process could potentially be dangerous. It probably shouldn't be tied to the -d flag because you might want to use it either in the foreground or as a daemon.

@discordianfish

This comment has been minimized.

Copy link
Contributor

discordianfish commented Jan 27, 2014

I think we have the following options:

  1. restart every command returning != 0
  2. restart every command returning != 0 if running in background
  3. restart every command if flag X is given (maybe have another flag to restart only if returning !=0 )

If we want to prevent on shot commands in the background from being restarted by default even if they return != 0, we need go for 3).
I don't like that very much since usually you have long running jobs that should be restarted and forgetting about that option could be harmful as well. I guess the question is what's more likely:

  • People forgetting to explicitly make their containers restart and harm caused by them being down or
  • People forgetting that by default every command which exits != 0 will get restarted and harm caused by that

But either way this would be a big improvement.

@bfirsh

This comment has been minimized.

Copy link
Contributor

bfirsh commented Jan 27, 2014

My mental model of using docker run without -d is as if I were just running a command in a shell. Automatically restarting would certainly confuse that understanding. These sorts of one-off commands are often repetitively run by a person at a command line and make no sense to automatically restart. They are going to forget to pass a flag.

Automatically restarting when running with -d and exit code != 0 could work. Though it'd need an extra flag to disable that behaviour.

It's also worth thinking about how containers automatically restarting will affect the initial experience of using Docker ("why does docker run ubuntu echo hello world never stop?"). Even for exit code != 0 ("why does my process infinitely print out errors?").

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Jan 27, 2014

The problem is that on the server side, where we do the restart, we don't know if you passed -d on the cli or not. Attach/detach is a cli only feature and does not change anything on the server side. It all looks the same.

If we were to say only restart processes that are meant to be demonized and returns with a non zero exit code, and that the user did not request to be stopped/killed/whatever, we would have to send a flag to the daemon saying that this container is meant to be attached or not.

Then we have the problem that you run a container without -d and detach and expect it to keep running and get restarted if it dies.

My vote is to have an --autorestart flag, similar to how you have to tell supervisor to autorestart your process if it dies, to tell docker to manage the container.

The default value for this will need to be decided, i think it should be false. I don't think it is a big deal to force a user to specify that they want this container to be auto restarted because it will reduce a lot more issue than if this were true you don't know why this container keeps on living forever.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Jan 27, 2014

For auto restarting, I have to wonder whether relying on upstart/systemd to manage this wouldn't be a better option, and a docker command like "docker install -as <system.unit.filename>" wouldn't end up being the better path. Deciding to auto restart feels more like a system service option than a command line option - after all, isn't this what these system processes were designed to do?

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Jan 28, 2014

@smarterclayton We already have the ability to let process managers monitor the process. It was part of the host integration PR.

I am all for keeping docker as small as possible and integrating with external tools, loggers, process monitors, but I also can see the need for docker to provide good defaults and a feature set to users.

As for external process managers I'm using supervisor just fine and we also have a script to auto generate init and systemd configs in the contrib part of the repo.

[program:skydock]
command=docker start -a skydock
autostart=true
autorestart=true
stdout_logfile=/var/log/docker/skydock.log
redirect_stderr=true
numprocs=1
@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Jan 28, 2014

I guess I was thinking along a different line - allowing the client or daemon to translate an image definition directly into a systemd unit, such that at "start" time of the service docker is used solely to initialize the container namespace, but not be resident or active for the remainder of the service call. Some of the complexity I'm thinking of is systemd socket activation where docker acting as a network proxy adds a layer of indirection that isn't truly necessary, or needing the daemon to be active all the time (vs being only running on demand to create / commit containers).

@mbonano

This comment has been minimized.

Copy link

mbonano commented Feb 5, 2014

I'd like to +1 Michael Crosby's suggestion of adding an --autorestart option to the run command.

I currently use supervisor to monitor a container and restart that container if an exit code other than 0 is returned. I accomplish this by creating the container (docker run -n test-container some-image), stopping the container (docker stop test-container) and then starting the container via supervisor:

[program:test-container]
command=docker start -a test-container
autostart=true
autorestart=unexpected

The initial work of starting and stopping the container is the necessary evil that accompanies the use of a process manager in conjunction with docker. While this solutions "works" I would argue that "container management" is a sufficiently-unique animal that merits its own set of tooling.

The ideal solution would be the ability to create a container that auto-restarts. The container should be able to auto-restart if an exit code other than 0 is supplied as follows:

docker run -autorestart unexpected -name test-container my-image

The container should be able to always auto-restart by supplying "-autorestart true", and the container should default to "-autorestart false". This would, not only eliminate the need to introduce a process manager for container management, it would also be a much more efficient and elegant solution.

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 5, 2014

@mbonano

If docker were to do this for simple cases I think that flag would work but I really am in favor of letting a supervisor do this because that is what they are made for.

@discordianfish

This comment has been minimized.

Copy link
Contributor

discordianfish commented Feb 5, 2014

@crosbymichael To me docker is (beside other things) a container supervisor. If we do things like restarting containers after reboot, we should also restart them on containers crashes.

Beside that, I imaging the docker api to be the only interaction point with a system:
Bootstrapping a system and installing docker is trivial and can be easily reproduced ( -> all system are the same). But managing supervisor config (container dependent state outside of a container) on a per host basis (on this host, those supervisor configs need to be written) causes exactly that configuration management nightmare I hope to escape :)

@mbonano

This comment has been minimized.

Copy link

mbonano commented Feb 5, 2014

As docker evolves, I believe container management will become an increasing concern, especially if functionality like cluster support is to ever exist in the technology. The day where a cluster of containers running across several docker host machines is supported, the supervisor solution becomes completely unmanageable. Until then, I will have to write a home-grown container management solution and hope that the tooling one day renders my solution obsolete.

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 18, 2014

Any proposal for a flag name? --autorestart ?

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Jun 24, 2014

@discordianfish I think the best thing would be to write up a spec on how the runtime should work and what operations it will have. Let me look at my gists and see if I can find my notes

@bgrant0607

This comment has been minimized.

Copy link

bgrant0607 commented Jun 24, 2014

I agree with those advocating 3 restart policies: always (services), on failure (batch tasks), never (tests, commands). More complex logic (e.g., backoff, giving up after N tries, killing/restarting groups of containers) should be punted to a higher layer.

crosbymichael added a commit that referenced this issue Jul 8, 2014

@jpanganiban

This comment has been minimized.

Copy link

jpanganiban commented Jul 18, 2014

I'm working with Ubuntu 14.04 and Docker 1.1.1 hosted on AWS.

I have the --restart=false flag in my daemon (configured under /etc/defaults/docker)

DOCKER_OPTS="--restart=false ${DOCKER_OPTS}"

I then ran this command

docker run --rm=true --name test ubuntu:14.04 /bin/sh -c "while true; do echo hello world; sleep 1; done"

and then issued a reboot from the AWS panel. And then ran $ docker ps -a and have this:

bde1fa320748        ubuntu:14.04        /bin/sh -c 'while tr   7 minutes ago       Exited (0) 2 minutes ago                       test

My questions are:

  1. Should the container have persisted after the restart (even with the --restart=false flag)? What's the expected behavior for this?
  2. @crosbymichael: I'm also using supervisor to manage my containers. I usually delete containers that exited and spin-off a new one based on an image that's why I have a docker run <image> <command> instead of a docker start -a <container> in my command parameter. Is this a bad approach I'm doing? Right now, it's not working for me since the containers still persist even after a reboot.
@mkb

This comment has been minimized.

Copy link

mkb commented Jul 25, 2014

Should process monitoring and restarting be part of Docker's duties or does that mean reinventing the wheel? Might it be better to have Docker play nicely with existing tools like Monit/Systemd/God/etc rather than creating one more way to do things?

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Jul 25, 2014

Closing this as we are doing a proposal for auto restart in #7226

dm0- pushed a commit to dm0-/docker that referenced this issue Sep 21, 2016

tiborvass pushed a commit to tiborvass/docker that referenced this issue Jul 20, 2018

Merge pull request moby#26 from thaJeztah/18.06-backport-fix_TestExte…
…rnalGraphDriver_pull

[18.06] Fix flaky TestExternalGraphDriver/pull test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment