First SIGCHLD handler (#5) #9

gaborcsardi · 2017-03-10T21:33:21Z

Minimalistic, but it already gets rid of the zombie processes.

Currently it interferes with parallel's fork clusters, because neither packages keep the other's signal handler. We would need to work out some cooperative solution, which would need to go in base R, to avoid this. E.g. storing a list of handlers for the various signals and add API to R to install/uninstall them.

You can merge this if you want, and then we can discuss improvements, especially about storing the exit status, so that one can query it from R.

Minimalistic, but it already gets rid of the zombie processes.

gaborcsardi · 2017-03-10T21:34:16Z

Forgot to refer to #5.

jeroen · 2017-03-11T12:47:07Z

Thanks. I'm not too experience with implementing signal handling. Some questions:

What if the encapsulating system already uses signal handlers? For example RApache or the CRAN check machinery use signal handlers to detect when R has crashed. Will overriding the signal handler in R affect the behavior of R's parent system?
AFAIK, Linux propagates each signal to the entire process group. So the process might receive signals from another process that is not it's child? Is it harmless to call waitpid() in this case?

@s-u

gaborcsardi · 2017-03-11T13:03:55Z

What if the encapsulating system already uses signal handlers? For example RApache or the CRAN check machinery use signal handlers to detect when R has crashed. Will overriding the signal handler in R affect the behavior of R's parent system?

Yes, this overwrites the other SIGCHLD signal handlers that were created in the same process. Rapache does not bother with SIGCHLD, apparently: https://github.com/jeffreyhorner/rapache/search?utf8=%E2%9C%93&q=SIGCHLD

It will interfere with parallel, though. Maybe we can work around that by keeping the signal handler installed by parallel. Maybe this is possible. But even if it is not possible, an interference with parallel's fork clusters is still better than the current zombie army.

AFAIK, Linux propagates each signal to the entire process group. So the process might receive signals from another process that is not it's child? Is it harmless to call waitpid() in this case?

I am not sure what other processes might be in the same process group. But anyway, yes, waitpid is harmless. The WNOHANG ensures that if it does not happen to find a child process that has just exited, then it will return immediately.

Actually, I just realized that I can implement all this in another package, it does not have to be in sys. So this is not a blocker for me. OTOH, to reliably get the exit status on windows requires modifications in sys I think, so it would be still better to have the whole thing in sys.

jeroen · 2017-03-11T13:27:08Z

RApache wraps apache2 prefork() mechanism for lanching R workers, which might use SIGCHLD?

Do you understand why the PR is failing unit tests? It seems to affect the error handling in R? https://travis-ci.org/jeroenooms/sys/builds/209904805

It would be nice to get a clean solution in sys. But the current solution does not allow you to get the exit status anymore, correct? I suppose sys_sigchld_callback() would need to store wstat in some global table that maps each pid to the wstat exit code, so that we check the process status and exit code from R?

jeroen · 2017-03-11T13:52:12Z

Ah I see why it fails now. Currently exec_wait uses waitpid() internally to get the exit code for the process, but you're already doing that in the SIGCHLD handler now. So after that we can no longer get the exit code manually.

jeroen · 2017-03-11T14:52:43Z

Also if the purpose is only to prevent zombies at this point, perhaps we can use SA_NOCLDWAIT?

jeroen · 2017-03-11T16:21:39Z

Merged via 28cb317. Still not 100% sure though this is final, seems to have a lot of side effects...

gaborcsardi · 2017-03-11T21:23:37Z

What side effects?

jeroen · 2017-03-12T11:33:24Z

If you add a printf to the signal handler, you see that it gets triggered a lot for processes that were not launched by sys. Especially rstudio seems to launch lot of processes (git, pandoc, etc). Rstudio will not be able to get the exit code for these procs because our handler cleans them as soon as the child dies.

Also this excerpt from NEWS.rd makes me worried that we might be breaking other packages depend on a SIGCHLD.

  \R{} is now trying harder to not cleanup child processes
  that were not spawned by \code{mcparallel()} on platforms that
  provide information about the source process of the \code{SIGCHLD}
  signal. This allows 3rd party libraries to manage the exit status
  of children that they spawn without \R{} interfering.

Would there be some trick to mark a process when we launch it in sys, and only waitpid() on it when it was launched by sys?

jeroen · 2017-03-12T11:49:16Z

Moving this to a new issue: #10

First SIGCHLD handler

783fced

Minimalistic, but it already gets rid of the zombie processes.

gaborcsardi changed the title ~~First SIGCHLD handler~~ First SIGCHLD handler (#5) Mar 10, 2017

jeroen closed this Mar 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First SIGCHLD handler (#5) #9

First SIGCHLD handler (#5) #9

gaborcsardi commented Mar 10, 2017

gaborcsardi commented Mar 10, 2017

jeroen commented Mar 11, 2017

gaborcsardi commented Mar 11, 2017 •

edited

Loading

jeroen commented Mar 11, 2017

jeroen commented Mar 11, 2017 •

edited

Loading

jeroen commented Mar 11, 2017

jeroen commented Mar 11, 2017

gaborcsardi commented Mar 11, 2017

jeroen commented Mar 12, 2017 •

edited

Loading

jeroen commented Mar 12, 2017

First SIGCHLD handler (#5) #9

First SIGCHLD handler (#5) #9

Conversation

gaborcsardi commented Mar 10, 2017

gaborcsardi commented Mar 10, 2017

jeroen commented Mar 11, 2017

gaborcsardi commented Mar 11, 2017 • edited Loading

jeroen commented Mar 11, 2017

jeroen commented Mar 11, 2017 • edited Loading

jeroen commented Mar 11, 2017

jeroen commented Mar 11, 2017

gaborcsardi commented Mar 11, 2017

jeroen commented Mar 12, 2017 • edited Loading

jeroen commented Mar 12, 2017

gaborcsardi commented Mar 11, 2017 •

edited

Loading

jeroen commented Mar 11, 2017 •

edited

Loading

jeroen commented Mar 12, 2017 •

edited

Loading