Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose better way to run docker from a unit file #6791

Closed
ibuildthecloud opened this issue Jul 1, 2014 · 34 comments

Comments

Projects
None yet
@ibuildthecloud
Copy link
Contributor

commented Jul 1, 2014

Systemd does a lot of stuff. Docker does a lot of stuff. That stuff may or may not overlap. I don't really care. I just need to solve one very specific problem. I just need a sane way to launch Docker containers in a systemd environment as a system service. As it stands today, the only way I know how is to do docker start -a or docker run ... without -d. Then dockerd launches the container in the background and systemd essentially monitors the docker client. Two problems with this. First, whether or not the docker client is running says very little about whether the actual container is running. Second, I'm left with a rather large docker run process in memory that's not providing much value except to stream stdout/stderr to journald.

So I hacked up the below script to make things better, or really just to see if it was possible to make things better since the script is just a dirty hack. You don't really need to read the script, just skip down and I'll explain what it does.

#!/bin/bash
set -e

ID=$(/usr/bin/docker "$@")
PID=$(docker inspect -f '{{.State.Pid}}' $ID)

declare -A SRC DEST

for line in $(grep slice /proc/$PID/cgroup); do
        IFS=: read _ NAME LOC <<< "$line"
        SRC[${NAME##name=}]=$LOC
done 

for line in $(grep slice /proc/$$/cgroup); do
        IFS=: read _ NAME LOC <<< "$line"
        DEST[${NAME##name=}]=$LOC
done

for type in ${!SRC[@]}; do
        from=/sys/fs/cgroup/${type}${SRC[$type]}
        to=/sys/fs/cgroup/$type/"${DEST[$type]}"/$(basename "${SRC[$type]}")

        echo $from "=>" $to
        mkdir -p $to
        for p in $(<$from/cgroup.procs); do
                echo $p > $to/cgroup.procs
        done
done

echo $PID > /var/run/test.pid

Then I wrote the following unit file

[Unit]
Description=My Service
After=docker.service
Requires=docker.service

[Service]
ExecStart=/opt/bin/docker-wrapper.sh run -d busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"
Type=forking
PIDFile=/var/run/test.pid

[Install]
WantedBy=multi-user.target

So what this does (and I know it's a hack, but I wanted to see if my proposal has any chance of working) is that after the container is launched, I look up the PID of the container and all of its cgroups. I then create child cgroups of the systemd cgroups and then move the PIDs from the original cgroups to the systemd child cgroups. After that is done I then write the PID of the container to a file. I end up with systemd cgroups being the parent, then a child cgroup under that. Looking something like below

  ├─test.service
  │ └─docker-8a0ff7503e0fca4f44d48f76a24cbcae82079818e3ad4d0d707ccf5765698184.scope
  │   ├─19103 /bin/sh -c while true; do echo Hello World; sleep 1; done
  │   └─19169 sleep 1

Also, since I told systemd to use a PIDFile, systemd is monitoring the PID 1 of the container because I wrote it to a file. So now if I do either docker stop or systemctl stop things just work (at least they seem to do) and I don't have a useless docker client hanging around in memory Now if you look at the script, you'll notice I'm just moving the PIDs, not the settings, so yeah, total hack that defeats the purpose of the original cgroup, but that's not the point right now.

Here's what I propose to make systemd and docker integration a tad bit better. When you want to run docker in a systemd unit you run docker run/start --yo-dawg-use-my-cgroups-as-your-parent ... which will read the current /proc/$$/cgroup of the client and pass it to dockerd. Dockerd now just creates its cgroups as a child of the cgroups passed in, if the subsystem exists. I think this means we can remove the systemd cgroup code and just use the cgroup fs based code (but docker will still have to write to the name=systemd fs). So now systemd can setup the parent cgroups however it wishes and Docker can setup the child cgroups how ever it wishes.

Is this the best solution? Probably not. But it seems a lot better than what we have today and it solves a current pain point.

Is this just plain stupid or already been thought of and shot down?

@ibuildthecloud ibuildthecloud changed the title Slightly better systemd integration Propose better way to run docker from a unit file Jul 2, 2014

@vbatts

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2014

From the [significant]discussion around systemd unit files in the contributor meeting yesterday https://botbot.me/freenode/docker-dev/msg/17771621/). The example unit files is @crosbymichael https://github.com/crosbymichael/.dotfiles/blob/master/systemd/redis.service

@daurnimator

This comment has been minimized.

@ibuildthecloud

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2014

FYI, for anybody who stumbles upon this issue. I created https://github.com/ibuildthecloud/systemd-docker as an attempt to address the issues between docker and systemd.

@pikeas

This comment has been minimized.

Copy link

commented Jan 6, 2015

Any new thoughts/movement on this?

@tbatchelli

This comment has been minimized.

Copy link

commented Jan 22, 2015

I have been using @ibuildthecloud's systemd-docker and the combo is a killer. Would be better if the issues it addresses issues were dealt with by docker itself

@larsks

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2015

This issue is hardly specific to systemd. It affects any environment in which someone wants to reliably start and monitor a container, which would include just about any non-SysV init system (systemd, upstart, runit, daemontools, launchd).

@ibukanov

This comment has been minimized.

Copy link

commented Jan 31, 2015

A simpler solution then using @ibuildthecloud's systemd-docker is to start a docker container in the background in ExecStartPre via run -d container or start container and then using ExecStart=/usr/bin/docker logs -f container. This way systemd, before starting any dependent units, waits until docker run -d or docker start returns and that happens only when the container is started. Then the logs command sends the initial startup logs to systemd and journal and then continue to do so as the new logs arrive until the container stops.

With this approach one also needs to put -/usr/bin/docker stop container both to ExecStop and ExecStopPost. The latter ensures that if /usr/bin/docker logs dies before the container terminates, then systemd still stops the container. Note that by just using ExecStopPost without ExecStop one will not get the termination logs into the journal as systemctl stop will kill the logs command before ExecStopPost stops the container.

@stuartpb

This comment has been minimized.

Copy link

commented Jun 8, 2015

as noted by ibuildthecloud/systemd-docker#25, #10427 helps with this

@geekpete

This comment has been minimized.

Copy link

commented Oct 7, 2015

Not that it's going to be an init system into the future, but using Upstart worked quite well for controlling docker containers for the most part in a simple config file per service that did everything you'd want.

hrldcpr added a commit to heatseeknyc/relay that referenced this issue Nov 6, 2015

use `docker run -d` in systemd units so that they're not considered s…
…tarted until the container is actually running, otherwise dependent services may try to start too soon 😭 - see moby/moby#6791 (comment)

ginkel added a commit to tgbyte/puppet-docker that referenced this issue Dec 22, 2015

colszowka added a commit to experimental-platform/platform-configure that referenced this issue Mar 9, 2016

@berglh

This comment has been minimized.

Copy link

commented Feb 4, 2017

Now that dockerd can be restarted with the --live-restore directive, if you have started containers with systemd, the docker client stops because the daemon is no longer available when restarting dockerd.

Even in @ibukanov example above, if the docker daemon restarts, the docker client will fail to connect to the daemon to get the logs and will cause the systemd unit to fail. Sure it might restart, but my goal is to have the container to continue running while being managed by systemd. Yes, the unit should require the docker daemon for startup, but once it's running, I want systemd to track the pid of the process launched by the container.

If I have the Restart=no directive set, the container will still run, logging of the docker client to journalctl will stop and the systemd unit will be in a failed state. If the unit file is set to Restart=on-failure, then the unit file will restart and either fail to start, because the container is already running or you force stop/rm old containers to prevent start-up problems using ExecStartPre=-/usr/bin/docker rm -f container.

This problem with systemd effectively stops you from making any decent use of the --live-restore option when managing the containers with unit files. I've tried looking at --cgroup-parent and using the systemd cgroup driver, but I am yet to see how this solves my problem. Sure systemd is aware of the cgroup, but it's not tracking the pid of the container, but the pid of the docker client that was used to launch the container.

I am unsure of my understanding in general around this behaviour, and there may be some example of structuring ExecStartPre, ExecStart, ExecStop and ExecStopPost to get the desired result. I'm going to read through @ibuildthecloud 's solution to this and see if I can come up with something less convoluted, but as far as I can see the issue still stands.

@ibukanov

This comment has been minimized.

Copy link

commented Feb 4, 2017

@berglh At this point I long gave up to trying to integrate docker with systemd. It just does not work due to very different approaches. So with docker I stick with its native commands using no unit files. In practice any dependency problems between containers can be solved with a shell script running in container that just waits until the condition is meat before starting the main application. Surprisingly this makes the whole setup much more robust and I have no problems with docker daemon restarts as it nicely restarts all my containers.

If systemd intgeration and unit files is a must, consider using runc, not docker itself, to run docker containers.

@berglh

This comment has been minimized.

Copy link

commented Feb 10, 2017

@ibukanov I'll checkout runc for sure, but I'm currently using fleet and etcd on Oracle Enterprise Linux. Considering fleet is going to be officially not supported by CoreOS anymore, maybe I'm better off moving to Kubernetes or Openshift. The thing is that fleet is such a simple and straight forward concept of scheduling unit files, it's been attractive for the particular cluster I'm managing. Regardless, I'm going to have to probably move from fleet in the long run.

@dashesy

This comment has been minimized.

Copy link

commented Jun 22, 2017

@ibukanov rkt can run docker images as-is, and works well with systemd.

Mortal added a commit to Mortal/csaudk-domserver that referenced this issue Aug 11, 2017

Mortal added a commit to Mortal/csaudk-domserver that referenced this issue Aug 11, 2017

@mathstuf

This comment has been minimized.

Copy link

commented Oct 24, 2017

Note that rkt currently requires that images be pushed to a registry, so running local images isn't going to work out of the box. See rkt/rkt#2392.

@aholbreich

This comment has been minimized.

Copy link

commented Apr 18, 2018

It's April 2018. Is there any best practice to start containerized services with systemd?

If not, what again are the benefits of starting docker container as:

ExecStartPre=/usr/bin/docker run -d --name container1 some-image
ExecStart=/usr/bin/docker logs -f contaner1

instead of

-ExecStart=/usr/bin/docker run --name container1 some-image 

?

@dashesy

This comment has been minimized.

Copy link

commented Apr 18, 2018

@aholbreich the best way is to use rkt, sorry but docker does not play well with systemd. Unfortunately rkt is not popular.

@mwpastore

This comment has been minimized.

Copy link

commented Apr 18, 2018

@aholbreich The former works well with SystemD; the latter does not. In order to use a docker run command as your ExecStart=, you have to use a wrapper like ibuildthecloud/systemd-docker. Either way, if you use SystemD, you can't use --live-restore as @berglh documented above.

@aholbreich

This comment has been minimized.

Copy link

commented Apr 19, 2018

@mwpastore Ok, i understand wrapper and the --live-restore.
"The former works well with SystemD; the latter does not. " can you elaborate on that...
If i see it correctly this:

ExecStartPre=/usr/bin/docker run -d --name container1 some-image
ExecStart=/usr/bin/docker logs -f contaner1

is also not a real enabler of --live-restore, or? So any advantage here in these lines?

@dashesy i will consider rkt some day, but this out of scope now (for many reasons)

@mwpastore

This comment has been minimized.

Copy link

commented Apr 19, 2018

@aholbreich This works with SystemD, but does not enable --live-restore:

ExecStartPre=/usr/bin/docker run -d --name container1 some-image
ExecStart=/usr/bin/docker logs -f contaner1

This does not work with SystemD; you need a wrapper, and even with a wrapper, it does not enable --live-restore:

ExecStart=/usr/bin/docker run --name container1 some-image 
@aholbreich

This comment has been minimized.

Copy link

commented Apr 19, 2018

This does not work with SystemD

Of course it works:
ExecStart=/usr/bin/docker run --name container1 some-image

@mwpastore

This comment has been minimized.

Copy link

commented Apr 19, 2018

@aholbreich

Of course it works:

Please re-read the details of this issue and ibuildthecloud/systemd-docker#readme, and you will clearly see that—while SystemD does launch the process using that syntax—there's much more to it than that.

@aholbreich

This comment has been minimized.

Copy link

commented Apr 20, 2018

I did. The initial problem is that Systemd monitors docker client and not the container.
How this is better in this case? I don't see it. In every line docker client is used.

ExecStartPre=/usr/bin/docker run -d --name container1 some-image
ExecStart=/usr/bin/docker logs -f contaner1
@ibukanov

This comment has been minimized.

Copy link

commented Apr 20, 2018

@aholbreich If docker client dies, with just ExecStart=/usr/bin/docker run systemd consider the unit as failed when the container in fact runs.

@aholbreich

This comment has been minimized.

Copy link

commented Apr 20, 2018

@ibukanov ok, belive you & will try..
but strange that if "docker client dies"
this Commant should work further....

ExecStart=/usr/bin/docker logs -f contaner1

it's still docker client or not? & also don't kills container if dies... also wrong state. Why it works in this case?

@dashesy

This comment has been minimized.

Copy link

commented Apr 20, 2018

I used this to tell systemd when client dies:

#!/bin/bash

function docker_cleanup {
    docker exec $IMAGE bash -c "if [ -f $PIDFILE ]; then kill -TERM -\$(cat $PIDFILE); rm $PIDFILE; fi"
}

IMAGE=$1
PIDFILE=/tmp/docker-exec-$$
shift
trap 'kill $PID; docker_cleanup $IMAGE $PIDFILE' TERM INT
docker exec $IMAGE bash -c "echo \"\$\$\" > $PIDFILE; exec $*" &
PID=$!
wait $PID
trap - TERM INT
wait $PID

One big problem with -d is that logs will not go to journald

@ibukanov

This comment has been minimized.

Copy link

commented Apr 20, 2018

@aholbreich See my comments above with ExecStop/ExecStopPost that ensures that the container stops when the client dies.

But these days if ever need to start a docker container from a systemd unit with docker, I will create the container outside systemd scripts in a provision script via docker create --restart=unless-stopped --log-driver=journald ... and use something like:

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=docker start mycontainer
ExecStop=docker stop mycontainer

This avoids running useless docker client, delegates restarting the container if it fails to Docker while still allowing to log to journald and letting systemd to start/stop the container to satisfy dependencies. The drawback is that stopping container via manual docker stop will not be reflected in systemd, but depending on deployment it can be even useful for debugging etc.

@aholbreich

This comment has been minimized.

Copy link

commented Apr 21, 2018

Ok, with all the drawbacks of the proposed workarounds i'm gona continue direct use of:

ExecStart=/usr/bin/docker run --name container1 some-image

i think docker or systemd (probably systemd) should improve this in the future...
But for now, since i had no issues with this, causing any problem so far, don't see any reason to overcomplicate...

@ubergesundheit

This comment has been minimized.

Copy link

commented Jul 12, 2018

Sorry for maybe asking an unrelated question, if this is the cast I'll happily delete my comment and ask the question somewhere more appropriate!

I assume this issue also applies to running containers via docker-compose? I sense docker-compose just amplifies everything by bringing another layer between the container and systemd?

@mathstuf

This comment has been minimized.

Copy link

commented Aug 9, 2018

Yes, the problem is that docker run isn't much more than a fancy communication over a socket to the docker.service process. The way systemd works, it assumes that the process under ExecStart is the service that is running. This isn't the way Docker works and neither project is very likely to change anything (IMO, there's nothing in systemd to "fix" and Docker doesn't want to have code which would make systemd understand what's going on). In the long run, using rkt (or at least a container runtime that behaves more…normally) is the better choice.

@Gigadoc2

This comment has been minimized.

Copy link

commented Aug 9, 2018

@ubergesundheit Yes, compose is already handling multiple "services", so running compose as a systemd service adds this conceptual mismatch on top of the other mentioned problems when running docker containers as systemd services.

FWIW, I think there are two main problems:

The first, lesser one, is that both systemd and docker want to manage cgroups. I can't really fault systemd for managing the cgroups, as it really needs to do so in order to provide the supervision capabilities I expect from a modern service manager. However, I also recognize that docker wants to do more with cgroups than systemds API might allow them to.

The systemd cgroup-driver is (was?) dockers solution for people who are willing to give up a bit of cgroup-related features in exchange to integrate docker and systemd better (with docker "controlling" systemd in this scenario). But as docker is favoring the cgroupfs driver (and I'm not even sure if the systemd driver is still available in current upstream docker), most systems will have docker and systemd managing cgroups in parallel. This currently kind-of works, but I believe that this won't be the case with the unified cgroup structure anymore. A proper solution might be to Delegate= a cgroup subtree to docker, but that probably requires a few changes to docker (changes of the sort that docker devs might be opposed to).

But the other, currently much bigger problem is that docker is designed to be a service manager similar to systemd, but does not provide a superset of systemds features (and is only capable of managing containers). Docker combines (at least) a service manager, a container runtime and a package manager (these components may have been split on a technical level, but from an operational perspective everything is still controlled by the one docker daemon). As the container runtime and the service manager parts are inextricably linked, it's by design pretty much impossible to (cleanly and elegantly) run docker "below" another service manager.

That in itself is actually not a problem: As long as the cgroup-problem above is solved, you can just run docker on your systemd-based system and use docker-commands instead of systemctl. I think that all the people happily using docker are doing just that. The "problem" here really just is that systemd is - at least in some aspects - a better service manager than docker. For one thing, systemd can manage regular processes and ones started through a pure container runtime (rkt, podman, just runc), so one can express dependencies between containers and regular processes - not possible for docker. And even if we consider "pure container systems", where there are no dependencies between system services and containers, some of us still prefer the dependency management of systemd (for example, I really prefer waiting for a service to declare readiness itself instead of pulling in an external "check script"). Also, I really like socket activation, and I think containers would especially profit from that.

So, there are multiple possible solutions to this problem, but they depend on how you think a system should be managed (or whether there even is a problem at all):

One approach would be to remove the service manager part from docker. That is, I believe, what RedHat is trying to do with their docker-fork Podman (or CoreOS with rkt), albeit more for the sake of intergrating with Kubernetes than systemd. That is my favored approach and I would use podman, were it not for the shortcomings of CNI (but that is really off-topic).

Another "solution" would be to just drop the notion that docker can be used with non-container software, or outside of a "dedicated container-server" scenario. Docker as it currently is already works well within that context, when you use systemd to just get docker up and running and only use docker from that point on. Though I'd personally like more dependency management than what docker+compose currently offer.

Theoretically, one could also try to extend dockers service-manager part, so it can manage non-containers as well. However, docker then would need to completely control systemd, which is possible, but would add way too much complexity and maintenance effort. And by abstracting systemds interface one would probably lose some features of it (just like docker loses features by going trough systemds cgroup interface).

This got way longer than I intended it to, but that is my layman assessment why there won't be a proper way to run docker from a unit file unless there is some significant change to dockers design. There might also be something systemd can do (I'm thinking about some extended interface to be "aware" of container runtimes and get supervision data from them), but in any case not without changes to docker as well.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

So, the original issue is that docker run (or similar client strategies) are tied to the lifecycle of another daemon, which makes it difficult to to manage with systemd. This is just the design of Docker and it is unlikely to change.

containerd is much better suited for this, where the client is not tied to the lifecycle of another daemon.
So containerd's ctr utility should generally satisfy what's needed here, or a custom client can be made if doesn't do exactly what you want.

Another possible approach:
In the upcoming containerd 1.2 release there is also a new version of the contained shim API (v2), a new shim can be created that just defers all management to systemd... note such a shim does not exist today nor have I actually messed around with it, but it is certainly a possibility.

In any case, docker/moby is not the right place for this and containerd is very well suited for exactly this case, as such I am going to, respectfully, close this issue as it is no longer relevant unless Moby is massively redesigned (in which case it would be something new anyway, containerd does this today).

Thanks all for your interest, feel free to ping me on slack if you have any questions/concerns about this. 🙇 👼

@billmetangmo

This comment has been minimized.

Copy link

commented Feb 21, 2019

I know the discussion is closed but I encountered this issue and I want to share with the posterity a snippet of the solution using runc as suggest by @cpuguy83 .

OS version

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.4 LTS
Release:        16.04
Codename:       xenial

Docker version


Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

A solution using runC

No need to install runC as the version as ( for the previous version in my OS) , installing docker comes with a docker-runc executable.

runc version 1.0.0-rc5
commit: 4fc53a81fb7c994640722ac585fa9ca548971871
spec: 1.0.0

For docker-runc to run you have to provide him 2 things: a folder name rootfs which contains an export of the docker container you want to launch with runC and a config.json file which is a representation of all the arguments you give to docker engine when using docker run command but that follows the OCI format spec.

I provide to you links that helps me to do so:

After creating your rootfs directory and config.json , you can create your systemd configuration based on my template ( it works like a charm for me):

[Unit]
Description=<name> Container
After=docker.service
Requires=docker.service

[Service]
Type=forking
Restart=always
RestartSec=5s
WorkingDirectory=<the directory where rootfs and config.jon are>
ExecStart=/usr/bin/docker-runc run <name> --detach
ExecStop=/usr/bin/docker-runc delete --force <name>

[Install]
WantedBy=multi-user.target

Thanks !

@daurnimator

This comment has been minimized.

Copy link

commented Feb 22, 2019

a folder name rootfs which contains an export of the docker container you want to launch with runC

You could probably use docker save | tar -x to create that in an ExecStartPre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.