Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When killing the child process, kill all the descendants as well #170

Closed
wants to merge 8 commits into from

Conversation

tomsotte
Copy link
Contributor

@tomsotte tomsotte commented Jan 20, 2019

This is an attempt to fix the problem described by the issue #96.
If a child process is killed, the descendents of that child process won't be killed as well. This happens on Linux but not on Windows 1.
The adopted solution is the "PID range hack" 2 that uses the detached mode for spawning a process and then kills that child process by killing the PID group, using process.kill(-pid), effectively killing all the descendents.

Implementation

  • added an internal option killByPid as a reminder for the spawned child process that it will be detached and to kill it by PID
  • expanded and moved to a separate function the routine to kill the spawned process to killSpawned
  • the ChildProcess#kill method of the spawned child process will be replaced by the killSpawned routine, to kill by pid if necessary
  • the killSpawned routine signals that the child process has been killed, if it has been killed by the pid

I checked and all the tests pass.

This implementation also considers the issue #115 and shouldn't interfere with the detached/cleanup fix.

Footnotes

  1. https://nodejs.org/api/child_process.html#child_process_subprocess_kill_signal

  2. https://azimi.me/2014/12/31/kill-child_process-node-js.html

This is an attempt to fix the problem described by the issue sindresorhus#96.
If a child process is killed, the descendents of that child process won't be killed as well. This happens on Linux but not on Windows [^1].
The adopted solution is the "PID range hack" [^2] that uses the `detached` mode for spawning a process and then kills that child process by killing the PID group, using `process.kill(-pid)`, effectively killing all the descendents.

*Implementation*

- added an internal option `killByPid` as a remained for the spawned child process that it will be `detached` and to kill it by PID
- expanded and moved to a separate function the routine to kill the spawned process to `killSpawned`
- the `ChildProcess#kill` method of the spawned child process will be replaced by the `killSpawned` routine, to kill by pid if necessary
- the `killSpawned` routine signals that the child process has been killed, if it has been killed by the pid

I checked and all the tests pass.

This implementation also considers the issue sindresorhus#115 and shouldn't interfere with the detached/cleanup fix.

[^1]: https://nodejs.org/api/child_process.html#child_process_subprocess_kill_signal
[^2]: https://azimi.me/2014/12/31/kill-child_process-node-js.html
index.js Outdated
spawned._kill(signal);
} else {
// Kills the spawned process and its descendents using the pid range hack
// Source: https://azimi.me/2014/12/31/kill-child_process-node-js.html
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the article, it looks like the PID trick only works when using {detached: true} which we cannot assume.

@sindresorhus
Copy link
Owner

This will needs tests.

@ehmicky ehmicky changed the title When killing the child process, kill all the descendents as well. Closes #96 When killing the child process, kill all the descendants as well. Closes #96 Mar 14, 2019
@tomsotte tomsotte changed the title When killing the child process, kill all the descendants as well. Closes #96 When killing the child process, kill all the descendants as well Mar 21, 2019
@tomsotte
Copy link
Contributor Author

tomsotte commented Mar 21, 2019

The PID hack requires detached: true. My implementation tries to use it subtly but I understand it might not be the best solution.

I'm looking into simonepri/pidtree, a package similar to ps-tree suggested by the issue related to this PR. As written in the README, pidtree should be "faster, safer and provides sub-children results".

The basic implementation that comes to mind should go something like this: if it's Linux then use pidtree to collect all the PIDs related to this child_process and terminate all.
This should account for Windows, which does not need this kind treatment, since by default it should kill all descendants.
I'll try to see if macOS has the problem, but I won't be able to test on it.

@tomsotte
Copy link
Contributor Author

Which one of these implementation is better?

  1. Enhancing the kill method to kill all the descendants; the method will retain compatibility with child_process#kill and keep its sync nature; it would be more consistent with Windows and so execa would be more cross-platform; this implementation requires a function to gather all the descendants PIDs in a sync way, but could kill them later on (to not impact on performance);
  2. Adding a new method killTree that would ensure killing all the descendants; execa would remain retro-compatible, cancel and kill methods wouldn't be touched; if implemented, should cancel use this method to ensure killing all children, like it would happen on Windows?
  3. Changing the kill method to kill all the descendants; it would ditch it's compatibility with child_process#kill; I can implement it this way right now but this probably would require re-adapting some tests.

For the solution 1. pidtree and ps-tree are async by callbacks, which means this would require a new library to gather all the children PIDs in a sync way. Otherwise, if we kill the spawned process before collecting the children PIDs they would change their parent PID. All the impl. to gather the children PIDs use the parent PID, but if it changes it would be impossible to find them.

@sindresorhus
Copy link
Owner

Number 2, but .kill() and .cancel() should also kill descendants if the killTree option is set. The killTree option should mean that no matter how the process is killed, the descendants should also be killed.

@sindresorhus
Copy link
Owner

@ehmicky Any opinions on this?

@ehmicky
Copy link
Collaborator

ehmicky commented Mar 25, 2019

I am wondering if the original problem was framed correctly because of two reasons:
a) Despite its name, kill() can also be used for IPC/messaging, not only terminating processes.
b) Some of the discussion might relate to failure termination, not only explicit signals termination / kill(). For example when a process is terminated because of some syntax error or segmentation fault: should child processes (even detached) be terminated as well or not? This question seems related to me.

I.e. for me the question is about failure termination, either external (signals / kill()) or internal (process abort, etc.). In that case what we would need to do instead would be killing descendants if the process failed, which would happen before makeError().

@tomsotte
Copy link
Contributor Author

tomsotte commented Mar 25, 2019

I am wondering if the original problem was framed correctly because of two reasons:
a) Despite its name, kill() can also be used for IPC/messaging, not only terminating processes.

True, so it would be better to leave that method alone and add a new one.

b) Some of the discussion might relate to failure termination, not only explicit signals termination / kill(). For example when a process is terminated because of some syntax error or segmentation fault: should child processes (even detached) be terminated as well or not? This question seems related to me.

By a quick test I've made on Linux, to sum up:

  • a process terminated because of syntax errors doesn't leave orphan child processes, even if they are ran detached
  • process.exit() or uncaught exception after a timeout do leave orphan child processes
  • also, ChildProcess#kill() do not terminate its descendants (but it does on Windows)

I.e. for me the question is about failure termination, either external (signals / kill()) or internal (process abort, etc.). In that case what we would need to do instead would be killing descendants if the process failed, which would happen before makeError().

On a practical note, on Linux, after a process has failed it leaves orphaned processes which can't be tracked as their PPID (which is the process PID) changes. We can't ensure killing the descendants once a process has failed.

I'm also wondering if this problem could and should be addressed by execa or not. There're other libs out there that already deals with killing the child processes tree. I believe that, in the form of a new method killTree, it is an enhancement if we treat it as "terminate the process and all it's descendants", but with other signals things could get complicated.

As a spec to follow we could compare what Windows and what Linux do and don't do well in terms of sending signals and terminating the process and it's descedants.

@ehmicky
Copy link
Collaborator

ehmicky commented May 13, 2019

On a practical note, on Linux, after a process has failed it leaves orphaned processes which can't be tracked as their PPID (which is the process PID) changes. We can't ensure killing the descendants once a process has failed.

Yes you're right. I just checked it on Ubuntu and when a process is terminated, its still-running child processes become orphan, i.e. their ppid switch from the PID of the original parent process to the PID of /lib/systemd/systemd --user.

It does not seem to be possible to know the original parent after it has exited. Which means it can only be done while the parent is still running, which we cannot guarantee on signal termination like SIGKILL. So ideally we would make the cleanup option deep, but it seems like we cannot.

Based on that I agree with @tomsotte that the best thing to do would be to limit this PR to an explicit method (like killTree([signal])) that looks up descendants by PID/PPID and send a signal to all of them. What do you think @tomsotte?

Note: it would be nice also if that method could re-use the extra kill() options added by #208 (not merged yet).

@tomsotte
Copy link
Contributor Author

tomsotte commented May 13, 2019

For the killTree method implementation I'd use node-tree-kill under the hood. It seemed a good candidate when I researched a while ago. I'll assess this later on.

As @sindresorhus said:

.kill() and .cancel() should also kill descendants if the killTree option is set. The killTree option should mean that no matter how the process is killed, the descendants should also be killed.

What do we do about those two other ways of killing the process? Should they remain intact? Should they honour the option killTree?

Note: it would be nice also if that method could re-use the extra kill() options added by #208 (not merged yet).

Yes, I might look into adding this as well.

@ehmicky
Copy link
Collaborator

ehmicky commented May 13, 2019

About using an option: I think we should either use an option with kill() and cancel(), or use a separate killTree() method, but not both, as it complicates the API (for the users) and the implementation (for us). Either works for me.

@tomsotte
Copy link
Contributor Author

You're right on that.
I've implemented the killTree method with node-tree-kill and made a simple test as well. I'll wait for #208 to be finalized before trying to add the retry options.

I'm also thinking that this addition will require a different library, something like pidtree. This is because I need to collect all the PIDs in the tree, because the parent process may have been killed before its descendants and I wouldn't be able to find them again via the parent PID.

@ehmicky
Copy link
Collaborator

ehmicky commented May 15, 2019

Instead of having two different strategies:

  1. kill processes by PPID (with node-tree-kill) when no retry option is used.
  2. find processes by PPID (with pidtree) then kill them by PID (with process.kill()) when retry option is used.

Wouldn't it be simpler to use only 2), whether retry is used or not?

node-tree-kill actually uses that second strategy under the hood for Mac and Linux. On Windows it uses taskkill /pid PID /T /F though, which is faster since it uses a single call, but I'm wondering if it's worth the extra implementation complexity?

@tomsotte
Copy link
Contributor Author

Correct, it would be best to just use the 2) implementation as you said.

The method might use taskkill with the tree option for Windows, while on Linux and Mac it would run the whole tree of descendants processes and process.kill one by one.
This should cover the case where the parent is dead but some descendant is still alive, provided that we collected the PIDs tree of the parent process.

I was also looking at fkill but it spawns the kill process for each PID to kill. I'm worried about performance of spawning multiple processes and I'm not sure if it's worth using it for a whole tree of processes to kill, instead of using a more low-level process.kill.
I'll try to find some references to this and maybe find a way to benchmark it. In the meantime a first implementation might just use what I said above.

@ehmicky
Copy link
Collaborator

ehmicky commented May 15, 2019

I agree. I actually thought of fkill as well, but I also think spawning a child process for each kill is... an overkill :)

@ehmicky
Copy link
Collaborator

ehmicky commented Jun 18, 2019

I've been thinking about this PR and I actually think the following implementation might work better, let me know what you think:

const childProcess = execa('echo');
childProcess.kill('SIGTERM', {deep: true});
childProcess.kill('SIGTERM', {deep: true, forceKillAfterTimeout: 5000});

deep (default: false) would call process.kill(signal) on all descendants.

When it comes to finding all descendants, I am actually wondering whether pidtree is a good idea. The library looks good but it does not look maintained anymore, which is a problem.

I am also starting to think that this feature belongs to a separate repository. Or sending a PR to any repository that already does something similar. This could be useful outside of execa and is more about signal sending than process execution.

@tomsotte
Copy link
Contributor Author

tomsotte commented Jun 20, 2019

I am also starting to think that this feature belongs to a separate repository. Or sending a PR to any repository that already does something similar. This could be useful outside of execa and is more about signal sending than process execution.

The very reason I've started to look into this issue, that this PR is trying to fix, was that I was upset that on Windows the ChildProcess kill method would kill all the descendants processes and on *nix it wasn't.
On that note, yes, I agree that kill would have looked better than having a new killTree method, but I also think that it's best if kill stays in sync with the original ChildProcess implementation.

I agree with you that execa may not be the best place to add this feature.
I'll try to improve the PR on fkill instead.

EDIT: If you feel the same we can close this PR. Thanks in advance for your time spent on this and as always I've appreciated your mentoring and your insights :)

@ehmicky
Copy link
Collaborator

ehmicky commented Jun 20, 2019

Yes I agree with you. I suggested using a deep option for kill() because we are already monkey-patching it and adding a options object (to force kill the child process after 5 seconds). However a separate killTree() method works as well for me, if you think it fits better. We just need to make sure the "force killing" feature can be combined with the "kill all descendants" feature.

I also agree that this is a useful feature but I do think the following might be better:

  • implementing in a different library (such as fkill). Hopefully without spawning one child process for each child process to kill.
  • then opening a new PR to integrate that library with execa.

What do you think?

@sindresorhus
Copy link
Owner

execa is a heavy enough dependency as it is. I don’t think we should add any more large dependencies here.

@ehmicky
Copy link
Collaborator

ehmicky commented Jun 20, 2019

@sindresorhus do you think we should implement this feature directly inside execa? It might involve quite some logic and will also most likely bring some dependencies (such as finding processes by PPID).

A third solution would be to separate the logic into a new library for this specific purpose (hence smaller).

@sindresorhus
Copy link
Owner

I think we should open an issue on Node.js about providing this built-in.

@sindresorhus
Copy link
Owner

Closing per previous discussion. Would be great if someone could open that Node.js issue and comment a link to it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants