Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombies left behind #21

Closed
kensanata opened this issue Jun 1, 2016 · 7 comments
Closed

Zombies left behind #21

kensanata opened this issue Jun 1, 2016 · 7 comments
Assignees
Labels

Comments

@kensanata
Copy link
Contributor

@kensanata kensanata commented Jun 1, 2016

I recently upgraded my installation (Toadfarm using Mojolicious::Plugin::CGI to run a CGI script via a code reference) and noticed that my site was suddenly leaving zombies behind. Here's a Munin graph showing the problem. I upgraded to the latest version, then I noticed the problem, then I made Monit restarted Toadfarm whenever there were more than 250 child processes, and then I started looking for the last revision that worked. 0.26 work fine, 0.27 starts leaving zombies behind.

Munin

Do you have any idea what could be going wrong? Is there something I need to pay attention to?

My toadfarm setup:

#!/usr/bin/env perl
use Toadfarm -init;

$ENV{PATH} .= ":/usr/local/bin";

my $farm = '/home/alex/farm';

logging {
  combined => 1,
  file     => "$farm/farm.log",
  level    => "error",
};

mount "$farm/face/face.pl" => {
  mount_point => '/face',
};

mount "$farm/halberdsnhelmets/halberdsnhelmets.pl" => {
  mount_point => '/halberdsnhelmets',
};

mount "$farm/alexschroeder.pl" => {
  "Host" => qr{^(www.)?alexschroeder\.ch:8080$},
  mount_point => '/wiki',
};

plugin "Toadfarm::Plugin::AccessLog";

start; # needs to be at the last line

The app I care about, $farm/alexschroeder.pl:

#! /usr/bin/env perl

use Mojolicious::Lite;

plugin CGI => {
  support_semicolon_in_query_string => 1,
};

plugin CGI => {
  route => '/',
  script => '/home/alex/farm/wiki.pl', # ~/farm/wiki.pl
  run => \&OddMuse::DoWikiRequest,
  before => sub {
    my $c = shift;
    $OddMuse::RunCGI = 0;
    $OddMuse::DataDir = '/home/alex/alexschroeder';
    require '/home/alex/farm/wiki.pl' unless defined &OddMuse::DoWikiRequest;
  },
  env => {},
  errlog => '/home/alex/farm/alexschroeder.log', # path to where STDERR from cgi script goes
};

app->start;

If it's nothing obvious, I'll have to try and write a smaller test case, I guess.

@kensanata
Copy link
Contributor Author

@kensanata kensanata commented Jun 3, 2016

I wrote a test that does the following:

  1. start Hypnotoad with an app using the CGI Plugin with a run code reference
  2. make 20 requests, sleeping 1s between requests
  3. wait for these processes to get reaped

If this doesn't happen within 20 seconds, the test fails.

On my system:

alex@kallobombus:~/src/mojolicious-plugin-cgi$ perl -Ilib t/zombies.t
[Fri Jun  3 12:55:13 2016] [info] Listening at "http://127.0.0.1:56500"
ok 1 - PID 29634 found
ok 2 - right status
ok 3 - right content
# Hammering the server with 20 requests
# Waiting for the reaper
not ok 4 - No zombies left after 20 seconds
#   Failed test 'No zombies left after 20 seconds'
#   at t/zombies.t line 82.
#          got: '18'
#     expected: '0'
ok 5 - 29634 is terminated
1..5
# Looks like you failed 1 test of 5.

Example top output while it runs, sorted by state:

top - 12:55:32 up 148 days, 13:17,  2 users,  load average: 0.35, 0.32, 0.60
Tasks:  61 total,   1 running,  42 sleeping,   0 stopped,  18 zombie
%Cpu(s):  4.0 us,  1.0 sy,  0.0 ni, 95.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   4194304 total,  1078900 used,  3115404 free,        0 buffers
KiB Swap:  1048576 total,    10476 used,  1038100 free,   698812 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                     
29633 alex      20   0     0    0    0 Z   0.0  0.0   0:00.52 /tmp/TSl67AmHZM             
29637 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29638 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29639 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29641 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29642 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29643 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29645 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29647 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29649 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29651 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29652 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29653 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29654 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29655 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29657 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29658 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             
29659 alex      20   0     0    0    0 Z   0.0  0.0   0:00.00 /tmp/TSl67AmHZM             

And here's how the test succeeds for 0.26:

alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout 0.26 lib/Mojolicious/Plugin/CGI.pm
alex@kallobombus:~/src/mojolicious-plugin-cgi$ perl -Ilib t/zombies.t
[Fri Jun  3 13:07:47 2016] [info] Listening at "http://127.0.0.1:41785"
ok 1 - PID 30373 found
ok 2 - right status
ok 3 - right content
# Hammering the server with 20 requests
# Waiting for the reaper
ok 4 - No zombies left after 1 seconds
ok 5 - 30373 is terminated
1..5

@kensanata
Copy link
Contributor Author

@kensanata kensanata commented Jun 3, 2016

I'll need to get a more recent version of Perl up on this Debian Wheezy system.

This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 91 registered patches, see perl -V for more detail)

I cannot reproduce the bug on my Mac running a much more recent version.

This is perl 5, version 22, subversion 0 (v5.22.0) built for darwin-2level

@kensanata
Copy link
Contributor Author

@kensanata kensanata commented Jun 3, 2016

The bug seems to be unrelated to the Perl version. I installed perlbrew, switched to 5.25.1, installed Mojolicious 6.62, Mojolicious::Plugin::CGI 0.32, IO::Pipely 0.005, and some libraries I need for the test (File::Which, Proc::ProcessTable) and I still get the problem.

alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout 0.26 lib/Mojolicious/Plugin/CGI.pm
alex@kallobombus:~/src/mojolicious-plugin-cgi$ perl -Ilib t/zombies.t
[Fri Jun  3 21:50:27 2016] [info] Listening at "http://127.0.0.1:58127"
ok 1 - PID 31581 found
ok 2 - right status
ok 3 - right content
# Hammering the server with 20 requests
# Waiting for the reaper
ok 4 - No zombies left after 1 seconds
ok 5 - 31581 is terminated
1..5
alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout master lib/Mojolicious/Plugin/CGI.pm
alex@kallobombus:~/src/mojolicious-plugin-cgi$ perl -Ilib t/zombies.t
[Fri Jun  3 21:51:03 2016] [info] Listening at "http://127.0.0.1:42949"
ok 1 - PID 31613 found
ok 2 - right status
ok 3 - right content
# Hammering the server with 20 requests
# Waiting for the reaper
not ok 4 - No zombies left after 21 seconds
#   Failed test 'No zombies left after 21 seconds'
#   at t/zombies.t line 83.
#          got: '19'
#     expected: '0'
ok 5 - 31613 is terminated
1..5
# Looks like you failed 1 test of 5.

@kensanata
Copy link
Contributor Author

@kensanata kensanata commented Jun 3, 2016

Using some debug statements, it seems that for every invocation of the script, _waitpids is called 50 times for the same pid with waitpid $pid, WNOHANG always failing.

@jhthorsen jhthorsen self-assigned this Jun 4, 2016
@jhthorsen jhthorsen added the bug label Jun 4, 2016
@jhthorsen
Copy link
Owner

@jhthorsen jhthorsen commented Jun 4, 2016

@kensanata: Thanks for all the details! Unfortunately I won't have too much time hacking on this the next week...

Could you see if #20 helps out?

@kensanata
Copy link
Contributor Author

@kensanata kensanata commented Jun 5, 2016

Indeed it does! I don't quite understand what it does. For now I just cherry-picked 0b594d8 and it seems to work:

alex@kallobombus:~/src/mojolicious-plugin-cgi$ perl -Ilib t/zombies.t
[Sun Jun  5 11:01:19 2016] [info] Listening at "http://127.0.0.1:50876"
ok 1 - PID 24117 found
ok 2 - right status
ok 3 - right content
# Hammering the server with 20 requests
# Waiting for the reaper
ok 4 - No zombies left after 1 seconds
ok 5 - 24117 is terminated
1..5

jhthorsen added a commit that referenced this issue Jun 7, 2016
 - Fix zombies left behind #20 #21 #22
@jhthorsen
Copy link
Owner

@jhthorsen jhthorsen commented Jun 7, 2016

Cool! So I hope the same goes for #22, which is part of 0.33.

@jhthorsen jhthorsen closed this Jun 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants