Description
Each process started by docker exec and each health-check probe has PPID=0, same as the container's init process. When it exits, all of its child processes are reparented to the pid namespace's (i.e. the container's) PID 1. If the container's PID 1 does not reap zombie children, they will start to pile up. Exhausing the process table by health-checking a container which has a PID 1 that does not reap zombies using a probe command that forks is not an uncommon issue in the community. Using programs incapable of reaping zombies as a container PID 1 is tragically commonplace, so as tempting as it is to merely declare that docker exec and health checks on containers without an appropriate PID 1 be used at their own risk, we would not be doing our users any favours. We could potentially do more to reap zombies of execs.
We could perhaps spawn each exec and health probe as a supervised process tree implicitly under docker-init -sg so that the foreground process group is killed (and zombies reaped) when the exec process exits. Any daemonized (read: double-forked) processes not part of the foreground process group would become orphaned and reparented to container PID 1 when the docker-init subreaper exits. Alternatively, we could find or build a subreaper program (or extend tini, a.k.a. docker-init) which waits for all children to exit before exiting. Daemonized processes spawned from an exec would block such a subreaper from exiting, but the subreaper would be able to reap zombie children the daemonized process failed to reap when the daemonized process does exit.
Given that the child processes of an exec were not forked from a (child of) PID 1, it is arguably just a quirk of the existing implementation that orphaned children of an exec could be waited on by PID 1. It is possible, albeit unlikely, that someone would be broken by the behaviour change of adding implicit subreapers to execs. I expect the amount of breakage would be minimal so long as processes could daemonize themselves such that they are not killed when their exec'ed ancestor exits. We might be able to get away with unconditionally running execs and health probes under a subreaper!
Description
Each process started by
docker execand each health-check probe has PPID=0, same as the container's init process. When it exits, all of its child processes are reparented to the pid namespace's (i.e. the container's) PID 1. If the container's PID 1 does not reap zombie children, they will start to pile up. Exhausing the process table by health-checking a container which has a PID 1 that does not reap zombies using a probe command that forks is not an uncommon issue in the community. Using programs incapable of reaping zombies as a container PID 1 is tragically commonplace, so as tempting as it is to merely declare thatdocker execand health checks on containers without an appropriate PID 1 be used at their own risk, we would not be doing our users any favours. We could potentially do more to reap zombies of execs.We could perhaps spawn each exec and health probe as a supervised process tree implicitly under
docker-init -sgso that the foreground process group is killed (and zombies reaped) when the exec process exits. Any daemonized (read: double-forked) processes not part of the foreground process group would become orphaned and reparented to container PID 1 when thedocker-initsubreaper exits. Alternatively, we could find or build a subreaper program (or extend tini, a.k.a. docker-init) which waits for all children to exit before exiting. Daemonized processes spawned from an exec would block such a subreaper from exiting, but the subreaper would be able to reap zombie children the daemonized process failed to reap when the daemonized process does exit.Given that the child processes of an exec were not forked from a (child of) PID 1, it is arguably just a quirk of the existing implementation that orphaned children of an exec could be waited on by PID 1. It is possible, albeit unlikely, that someone would be broken by the behaviour change of adding implicit subreapers to execs. I expect the amount of breakage would be minimal so long as processes could daemonize themselves such that they are not killed when their exec'ed ancestor exits. We might be able to get away with unconditionally running execs and health probes under a subreaper!