Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf docker image should use init system as an entrypoint #66

Closed
smolse opened this issue Mar 1, 2017 · 5 comments
Closed

Telegraf docker image should use init system as an entrypoint #66

smolse opened this issue Mar 1, 2017 · 5 comments
Labels

Comments

@smolse
Copy link

smolse commented Mar 1, 2017

Telegraf's exec plugin kills timed out scripts using SIGKILL signal. It means that a child process of the script can become orphaned if it was still running when the script was killed. Without proper handling that process will turn to zombie. Telegraf image should use some init system (e.g. dumb-init) as an entrypoint so that it will reap zombie processes. Otherwise they will be accumulated.

Here is a simple script to reproduce the issue: https://github.com/smolse/telegraf-zombies/blob/master/reproduce_zombies.sh
Output of the script:

$ bash reproduce_zombies.sh 
Pulling telegraf docker image...
Starting telegraf container...
Container has started, zombie processes will be checked every 30 sec...
2 zombie processes have been found
5 zombie processes have been found
8 zombie processes have been found
11 zombie processes have been found
@goller
Copy link
Contributor

goller commented May 16, 2017

Hi @smolse, we could use an a init for sure. However, is the issue to track in telegraf: influxdata/telegraf#2526

@jsternberg
Copy link
Contributor

I also believe this should be something handled by telegraf instead. Telegraf should be starting processes in its own process group and then killing the process group instead of the parent.

While an init system would fix this, I think it acts as a band-aid.

@danielnelson
Copy link
Contributor

I left myself a note on the Telegraf exec kill issue to investigate if we ought to signal the process group or just the parent. This is unrelated to the question of init system and won't fully deal with orphans.

IMO It's not a processes job to reap orphans unless it is meant to be pid 1, but if it's not much code we can add it anyway to make things works better in docker. I opened a case for this as well.

@danielnelson
Copy link
Contributor

In docker 1.13.0 they have added an --init option that will reap children, so I suggest using this. Telegraf will of course also improve the method it uses to kill subprocesses.

@jsternberg
Copy link
Contributor

I think some alternatives have been pointed out and since using an init system like systemd in Docker is not recommended, I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants