child_process: some "termination signals" don't behave as expected #3222

Closed
coltrane opened this Issue May 5, 2012 · 7 comments

4 participants

@coltrane

Please see this test case.
And the results and analysis in the attached comment

The issue applies to v0.6.x, as well as v0.7.x, but the test only runs correctly in >= v0.7.x

Here's a summary:

The test creates a child process, waits for it to be alive, and then kills it with a given signal. That procedure is repeated once for each posix signal that carries a default-action of "terminate process". (POSIX signals are listed here)

The test's goal is to verify that, as the child dies, the parent:

  • receives the proper events in the proper order ('disconnect', followed by 'exit')
  • that the exit event receives the expected arguments. When a child "exits abnormally" (a.k.a. death-by-signal), the exit handler should receive the arguments: [code=null, signal=<signal-name>].

The test reveals several unexpected results:

  1. For two signals -- SIGINT, and SIGTERM -- the exit event receives incorrect arguments. In both cases, it is called as though the child "exited normally": [ code=1, signal=null ]. I assume that this is a bug (and it's the main reason for filing this issue).

  2. SIGPOLL is undefined on OS-X (replaced by SIGENT). On Ubuntu, SIGPOLL exists but it maps to SIGIO. So, we have cross-platform madness. I'm guessing that this isn't a big deal.

  3. SIGPIPE doesn't cause the child to terminate at all. Turns out that SIGPIPE is trapped by nodejs (src/node.cc:2723). I assume this is due to "The special problem of SIGPIPE" (see: deps/uv/src/unix/ev/ev.3). So this is probably fine as-is too.

In addition to fixing #(1), maybe we should add a short "signals" section to the child_process docs (maybe under the .kill() function). There, we can document that SIGPOLL, and SIGPIPE have node-specific behavior. SIGUSR1 (node debugger) should be mentioned there too.

@bnoordhuis
Node.js Foundation member

For two signals -- SIGINT, and SIGTERM -- the exit event receives incorrect arguments. In both cases, it is called as though the child "exited normally": [ code=1, signal=null ].

That's because it does. Node shuts down gracefully(ish) on SIGINT and SIGTERM.

SIGUSR1 (node debugger) should be mentioned there too.

It's documented in doc/api/debugger.markdown.

I appreciate that you took the time to write a test and bug report but I don't really see an issue here. Signals are a very UNIX-y thing - most of the signals you mention have no Windows counterpart and the few that do are not really signals but emulated for the sake of backwards compatibility.

@coltrane

Node shuts down gracefully(ish) on SIGINT and SIGTERM.

Yes, I saw that, but shouldn't it preserve the signal when it exits? See "Handlers That Terminate the Process" (glibc docs):

The cleanest way for a handler to terminate the process is to raise the 
same signal that ran the handler in the first place.

A code example is shown on that same page. Also relevant are: "Termination Signals", and "Termination Internals"; and an example of a bash script that that illustrates the problem: see this gist.

Signals are a very UNIX-y thing - most of the signals you mention have no Windows counterpart and the few that do are not really signals but emulated for the sake of backwards compatibility.

Signals are a "very UNIX-y thing" -- they are a fundamental part of UNIX's process management strategy, they are key to the parent/child process relationship, as well as the notion of "process groups", they make shell scripts and "pipelines" behave in a sane manageable way, ... and the list goes on.

I don't think that node's Windows support justifies incorrect behavior on UNIX.

Preserving the signal on exit is not a difficult change. If it's something that you'd consider, I'll be glad to submit a patch.

@bnoordhuis
Node.js Foundation member

I don't know, your example seems a little contrived - you normally only care whether or not the process completed successfully. There's also nothing stopping a script from exiting with a status code that could be interpreted as having received a signal (128 + signo) or simply installing its own signal handlers and ignoring everything you send it.

It's trivial to re-raise the signal in the default signal handler but that will break scripts that rely on node's current behavior. I don't want to do that unless there is a very compelling reason to. So far I'm not seeing it.

@coltrane

The compelling reason (imho) is this: As it stands today, there is no way for a parent process to distinguish whether node was killed (i.e. SIGINT/SIGTERM), or whether it "exited normally" (i.e. userland called process.exit(1)). For programs that need to make that distinction, this is a real problem.

"Re-raising" the signals is an easy fix for this.

...but that will break scripts that rely on node's current behavior.

  • Programs that genuinely don't care about the distinction between signal-exit, and "normal" exit, should not be impacted by the change at all. An example is upstart (via the "respawn" stanza).

  • The only programs that would be affected, are those that specifically look for child processes to _exit(1) as an indicator of SIGINT/SIGTERM. I'm unable to come up with an example. This is node-specific behavior, so any example would likely be a node-specific process monitor or something similar. In any case, this is a brittle design that can be "broken" with a single line of userland code (process.exit(1)). Wouldn't these programs also benefit from being able to catch those signals explicitly (via WIFSIGNALED/WTERMSIG)?

your example seems a little contrived - you normally only care whether or not the process completed successfully.

I've simplified the script example (here). Hopefully, it's now clear that I'm not doing anything "contrived" -- just running node and checking for a successful exit code. Here's an article that discusses shells and SIGINT in depth, and which explains why my example behaves as it does.

There's also nothing stopping a script from exiting with a status code that could be interpreted as having received a signal (128 + signo)

This doesn't work. Exiting a child process with a code of 130, for example, will not cause the child to report (WIFSIGNALED(status) && WTERMSIG(status) == 2) in the waiting parent process. There is simply no way to simulate a termination-due-to-signal.

The proposed change adds needed functionality that is not otherwise available. No existing functionality is removed. It makes node behave in a more "standard" way (on UNIX platforms). The minor break in external "api" would be consistent with a new sub-major version release (eg. 0.8.x).

Seems to me like it's worth considering.

@bnoordhuis
Node.js Foundation member

We're entering the v0.8 feature freeze so if this change gets landed (which is a big if in itself), it won't be user-visible until v0.10.

@bminer

The behavior of the 'exit' event is a bit weird. The Node child_process documentation states:

Event: 'exit'
code - Number the exit code, if it exited normally.
signal - String the signal passed to kill the child process, if it was killed by the parent.
...
If the process terminated due to receipt of a signal, signal is the string name of the signal, otherwise null.

Okay... so the way I read that is... if Node sends a SIGINT or SIGTERM to a child process, the exit event should contain this signal. I'm with @coltrane on this one. This needs to be fixed, or the documentation needs to be updated.

@bnoordhuis - Just to be clear. This is a bug, not a feature request.

@chrisdickinson

Tested the attached script on node v0.11, and SIG{TERM,KILL} behaved as expected on OSX. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment