New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rrdtool making lots of zombie processes #443

Closed
JohnFlowerful opened this Issue Feb 5, 2015 · 27 comments

Comments

Projects
None yet
6 participants
@JohnFlowerful

JohnFlowerful commented Feb 5, 2015

Lots (about 15000) of these were created:

nginx 351 0.0 0.0 0 0 ? Z 08:18 0:00 [rrdtool]

How would I go about debugging this? Obviously it's a graph at fault, but I have no idea how to tell which one(s)...

@paulgear

This comment has been minimized.

Member

paulgear commented Feb 5, 2015

Hi @JohnFlowerful - The best way to track that down at this stage is to catch the processes while they're running. If you can have a script running in the background that checks for running rrdtool processes and runs lsof on them, then browse around several different graphs, you might be able to narrow it down. If that doesn't give us any joy, we'll have to look into logging all rrdtool requests to a file or something like that.

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Feb 6, 2015

Every graph is creating a zombie process. Restarting php-fpm clears them all, so it appears to be something with my webserver setup.
I'll leave my configs here. If anyone sees anything, do let me know. I use Gentoo, kernel version 3.16.5. nginx and php are installed via Portage.

nginx (version 1.7.8) with the following modules enabled (full list here):
NGINX_MODULES_HTTP="access auth_basic autoindex browser charset empty_gif fastcgi geo gzip limit_conn limit_req map memcached mp4 proxy referer rewrite scgi slowfs_cache split_clients ssi upstream_ip_hash userid uwsgi slowfs_cache stub_status spdy"
nginx configuration for librenms: http://sprunge.us/PjjS

php (version 5.6.5) with these flags flicked: http://sprunge.us/efAS
php.ini: http://sprunge.us/YNiJ
php-fpm.conf: http://sprunge.us/GeUX

@f0o

This comment has been minimized.

Member

f0o commented Feb 6, 2015

Good morning fellow gentoo'nian :)
I'll have a look at my setup when I get home and report back on whats different.

Running it for a while now, so versions are unlikely to match - nevertheless it shouldnt really cause the zombies!

Cheers

@f0o

This comment has been minimized.

Member

f0o commented Feb 6, 2015

@JohnFlowerful can you paste me your specific USE flags for nginx, php and php-fpm please?

Trying to reproduce your setup with a clean Gentoo-VM :)

PS: Also Profile please! I just used hardened-amd64 (default for my stage3's)

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Feb 6, 2015

nginx's package.use entry is simply www-servers/nginx ssl ipv6, with the modules in make.conf:

NGINX_MODULES_HTTP="access auth_basic autoindex browser charset empty_gif fastcgi geo gzip limit_conn limit_req map memcached mp4 proxy referer rewrite scgi slowfs_cache split_clients ssi upstream_ip_hash userid uwsgi slowfs_cache stub_status spdy"

php's (php-fpm) uses:

dev-lang/php fpm mysql mysqli curl gd cgi hash crypt sqlite tidy xmlrpc xslt zip exif xpm wddx intl pdo soap unicode

With this in make.conf:

PHP_INI_VERSION="production"
PHP_TARGETS="php5-6"

Both are ~arch. In package.accept_keywords:

www-servers/nginx ~amd64
dev-lang/php ~amd64

I use the default Gentoo profile for my system: default/linux/amd64/13.0.

@f0o

This comment has been minimized.

Member

f0o commented Feb 7, 2015

Nevermind my other post, early morning here without coffee - portage is compiling

//Edit:
Any particular reason why you arent using the default versions? (php-5.5.21 and nginx-1.7.6)

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Feb 7, 2015

No real reason, no.
I still have my config files for php 5.5.x so I can easily switch back if if that is needed. Shouldn't be any problems downgrading nginx either.

@f0o

This comment has been minimized.

Member

f0o commented Feb 7, 2015

Ok I can reproduce the zombies.

I'm unable to resolve this with downgrading php. I suspect the bug is in php-fpm.
I've tested php 5.6.5 and 5.5.21.
I've also tested rrdtool 1.5.0 and 1.4.9.

My productive system runs php-5.4.36, rrdtool-1.4.7, nignx-1.2.1. (I know it's old..)

@f0o

This comment has been minimized.

Member

f0o commented Feb 7, 2015

Just downgraded to php-5.4.36 and the bug is resolved.
Unless you really depend on php 5.5 or 5.6, I'd suggest you to go down to 5.4 for now.

I will keep poking around, maybe I find a proper solution using a shell_exec wrapper of some sort...

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Feb 7, 2015

I only require a minimum php-fpm of 5.3, so switching to 5.4.37 has resolved the issue.

@f0o

This comment has been minimized.

Member

f0o commented Feb 8, 2015

Please keep this issue open until we got a future proof solution to this as php 5.4 is going into EOL soon so we need to tackle this.

Thanks for noticing this by the way! :)

@f0o

This comment has been minimized.

Member

f0o commented Feb 8, 2015

Quick update:
I just deployed debian jessie which comes with php-5.6.5 and can reproduce the zombies again.

I noticed that proc_close is returning NULL and thus not freeing the process.
Bug is being issued at php.net later today after some more research.
It seems proc_close doesnt work with variables that are created by reference

@f0o

This comment has been minimized.

Member

f0o commented Feb 8, 2015

@JohnFlowerful do you still have php-5.6.5?
I think I found a solution that I'd like to have verified from your end.

Change Line 63 from includes/rrdtool.inc.php
From:
function rrdtool_pipe_close(&$rrd_process, &$rrd_pipes)
To:
function rrdtool_pipe_close($rrd_process, &$rrd_pipes)

(Remove & before $rrd_process)

//EDIT: since you've gentoo, you should be able to do eselect php set fpm php5.6 unless you've unmerged it already

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Feb 8, 2015

I do. The wonders of eselect~

That solution doesn't work for me. Graphs still create zombie processes when using php-5.6.5.

@f0o

This comment has been minimized.

Member

f0o commented Feb 8, 2015

ok, too bad - I'm poking together with php-devs around...
they cant explain the bug either as the code for proc_* didnt change between 5.4 and 5.6 ...

I'll probably ping you later again with another attempt ;)

@f0o

This comment has been minimized.

Member

f0o commented Feb 8, 2015

@f0o

This comment has been minimized.

Member

f0o commented Feb 14, 2015

So it doesnt seem to be reliably reproduceable and php-devs are unclear how this is happening.
I will poke around php5.6.5 on my test-gentoo and see what I can do to workaround it..

@paulgear can you tag this as wontfix for now please?

@laf laf added the Wontfix label Feb 14, 2015

@laf

This comment has been minimized.

Member

laf commented Mar 22, 2015

@f0o Any reason why we can't introduce your fix from earlier (dropping the &)?

@laf

This comment has been minimized.

Member

laf commented Apr 9, 2015

Hey @f0o

Can we look at PRing your fix? Seems to have helped a couple of installs out now.

@Tatermen

This comment has been minimized.

Contributor

Tatermen commented Apr 24, 2015

I was getting zombie processes on Apache and PHP 5.6.1 (no php-fpm, OpenSuSE 13.2). @f0o 's fix worked for me.

@JohnFlowerful

This comment has been minimized.

JohnFlowerful commented Apr 24, 2015

The fix still doesn't work for me. Same setup as listed above, but with the php versions 5.5.24 and 5.6.8. php version 5.4.40 works fine.

@adammmmm

This comment has been minimized.

adammmmm commented Apr 26, 2015

The fix also works on ubuntu 14.04.2 Apache, php5.5.x

@f0o

This comment has been minimized.

Member

f0o commented May 11, 2015

I can PR the rather trivial fix but I wouldnt call it fix... It's just yet another oddity introduced by distribution's package-maintainers that result in a very odd reference-handling.

PR inbound.

@laf

This comment has been minimized.

Member

laf commented May 13, 2015

I know this wasn't the fix but worth closing this now as it's an upstream issues?

@paulgear

This comment has been minimized.

Member

paulgear commented May 13, 2015

I don't think there's any need to close things that are still known issues. We can filter it out by label if we don't want to see it, but this is useful documentation for people who are encountering the bug.

@laf

This comment has been minimized.

Member

laf commented May 13, 2015

People will still have to search to find this as it goes down the list so it will still be available :)

Also, it's technically fixed in our code base so users won't see the issue unless they aren't running an upto date install.

@laf

This comment has been minimized.

Member

laf commented May 19, 2015

I've added an FAQ explaining this a bit so we don't need to keep this issue open.

@laf laf closed this May 19, 2015

@lock lock bot locked as resolved and limited conversation to collaborators May 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.