-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Systemd does a lot of stuff. Docker does a lot of stuff. That stuff may or may not overlap. I don't really care. I just need to solve one very specific problem. I just need a sane way to launch Docker containers in a systemd environment as a system service. As it stands today, the only way I know how is to do docker start -a or docker run ... without -d. Then dockerd launches the container in the background and systemd essentially monitors the docker client. Two problems with this. First, whether or not the docker client is running says very little about whether the actual container is running. Second, I'm left with a rather large docker run process in memory that's not providing much value except to stream stdout/stderr to journald.
So I hacked up the below script to make things better, or really just to see if it was possible to make things better since the script is just a dirty hack. You don't really need to read the script, just skip down and I'll explain what it does.
#!/bin/bash
set -e
ID=$(/usr/bin/docker "$@")
PID=$(docker inspect -f '{{.State.Pid}}' $ID)
declare -A SRC DEST
for line in $(grep slice /proc/$PID/cgroup); do
IFS=: read _ NAME LOC <<< "$line"
SRC[${NAME##name=}]=$LOC
done
for line in $(grep slice /proc/$$/cgroup); do
IFS=: read _ NAME LOC <<< "$line"
DEST[${NAME##name=}]=$LOC
done
for type in ${!SRC[@]}; do
from=/sys/fs/cgroup/${type}${SRC[$type]}
to=/sys/fs/cgroup/$type/"${DEST[$type]}"/$(basename "${SRC[$type]}")
echo $from "=>" $to
mkdir -p $to
for p in $(<$from/cgroup.procs); do
echo $p > $to/cgroup.procs
done
done
echo $PID > /var/run/test.pidThen I wrote the following unit file
[Unit]
Description=My Service
After=docker.service
Requires=docker.service
[Service]
ExecStart=/opt/bin/docker-wrapper.sh run -d busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"
Type=forking
PIDFile=/var/run/test.pid
[Install]
WantedBy=multi-user.target
So what this does (and I know it's a hack, but I wanted to see if my proposal has any chance of working) is that after the container is launched, I look up the PID of the container and all of its cgroups. I then create child cgroups of the systemd cgroups and then move the PIDs from the original cgroups to the systemd child cgroups. After that is done I then write the PID of the container to a file. I end up with systemd cgroups being the parent, then a child cgroup under that. Looking something like below
├─test.service
│ └─docker-8a0ff7503e0fca4f44d48f76a24cbcae82079818e3ad4d0d707ccf5765698184.scope
│ ├─19103 /bin/sh -c while true; do echo Hello World; sleep 1; done
│ └─19169 sleep 1
Also, since I told systemd to use a PIDFile, systemd is monitoring the PID 1 of the container because I wrote it to a file. So now if I do either docker stop or systemctl stop things just work (at least they seem to do) and I don't have a useless docker client hanging around in memory Now if you look at the script, you'll notice I'm just moving the PIDs, not the settings, so yeah, total hack that defeats the purpose of the original cgroup, but that's not the point right now.
Here's what I propose to make systemd and docker integration a tad bit better. When you want to run docker in a systemd unit you run docker run/start --yo-dawg-use-my-cgroups-as-your-parent ... which will read the current /proc/$$/cgroup of the client and pass it to dockerd. Dockerd now just creates its cgroups as a child of the cgroups passed in, if the subsystem exists. I think this means we can remove the systemd cgroup code and just use the cgroup fs based code (but docker will still have to write to the name=systemd fs). So now systemd can setup the parent cgroups however it wishes and Docker can setup the child cgroups how ever it wishes.
Is this the best solution? Probably not. But it seems a lot better than what we have today and it solves a current pain point.
Is this just plain stupid or already been thought of and shot down?