Skip to content

fix of wait_all_children. If one or more childs will die you can get an ... #8

Closed
wants to merge 1 commit into from

2 participants

@Perlover

No description provided.

@Perlover

Merged to #9

@Perlover Perlover closed this Jun 10, 2013
@kazuho kazuho reopened this Apr 9, 2014
@kazuho
Owner
kazuho commented Apr 9, 2014

IMO this should not be an issue since the exits (or dies) of child processes are reported via SIGCHLD to the master process.

@kazuho kazuho closed this May 12, 2014
@Perlover

Please reopen this ticket
I think you make mistake

My patch works only with existing processes
Your circle will make infinite loop if some process died BECAUSE you don't clear pid from worker_pids array. If there will be pid of not existing process there will be infinite loop.
Now your code has only empty sub {} for $SIG{CHLD} or 'DEFULT' value

Please check up!
I made patch because i got this bug in my production server. Now i use my patched version of v0.12 and i am afraid use your because i think this bug is there.

I think you should use my patch for grepping only existing process pids or should have other code for $SIG{CHLD} - there should be some reaped code (wait & deleting $self->{worker_pids}{$pid} reaped process)

Thanks

@kazuho
Owner
kazuho commented Jul 14, 2014

Would you mind creating a test code that reproduces the problem?

I am sorry but I do not understand why you think that the current implementation is wrong.

Your circle will make infinite loop if some process died BECAUSE you don't clear pid from worker_pids array.

No. pid is always cleared in line 198 for every worker process that exits (either normally or abnormally), and thus the function would not loop infinitely.

My understanding is that POSIX-based systems are required to return the exits of all the child processes to the parent, and the design of Parallel::Prefork relies on the premise. Without such assumption it is impossible to maintain a designated number of worker processes.

Now your code has only empty sub {} for $SIG{CHLD} or 'DEFULT' value.

Having an empty sub is the right way.

POSIX does not guarantee that SIGCHLD is fired for every exit of the child process (i.e. the number of deaths reported by the signal might be less than the number of the processes that actually died; see wait). So we need to collect the child processes using wait(2) (or waitpid(2)).

OTOH we need to set the signal handler to something other than SIG_IGN; setting it to SIG_IGN means that the exits of each child process would not be reported to the master process through wait(2).

These two reasons together explains why setting $SIG{CHLD} to an empty function is necessary.

@kazuho
Owner
kazuho commented Jul 14, 2014

PS. The behavior of POSIX based-systems regarding how child exits can be received by calling wait(2) seems to be best defined in _Exit.

Process termination caused by any reason shall have the following consequences:
(snip)

  • If the parent process of the calling process is executing a wait(), waitid(), or waitpid(), and has neither set its SA_NOCLDWAIT flag nor set SIGCHLD to SIG_IGN, it shall be notified of termination of the calling process and the low-order eight bits (that is, bits 0377) of status shall be made available to it. If the parent is not waiting, the child's status shall be made available to it when the parent subsequently executes wait(), waitid(), or waitpid().

  • The semantics of the waitid() function shall be equivalent to wait().

  • If the parent process of the calling process is not executing a wait(), waitid(), or waitpid(), and has neither set its SA_NOCLDWAIT flag nor set SIGCHLD to SIG_IGN, the calling process shall be transformed into a zombie process. A zombie process is an inactive process and it shall be deleted at some later time when its parent process executes wait(), waitid(), or waitpid().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.