Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process.spawn: does not throw error when uv_spawn() fails #33566

Open
jgehrcke opened this issue May 26, 2020 · 9 comments
Open

child_process.spawn: does not throw error when uv_spawn() fails #33566

jgehrcke opened this issue May 26, 2020 · 9 comments
Labels
child_process

Comments

@jgehrcke
Copy link
Contributor

jgehrcke commented May 26, 2020

Calling child_process.spawn() with for example a non-existing path does not throw an error.

I believe that any synchronously available error happening during child_process.spawn() should throw an error right away so that handling of common errors is straight-forward, encouraged, and not subject to race conditions. Exposing those errors synchronously that are known synchronously will make it easier for programmers to write NodeJS programs with robust child process management.

child_process.spawn() runs the system calls required for creating a new process from an executable file in the context of the current event loop. This might block the event loop for a little bit. "Blocking the event loop for a little bit" might be considered as a downside (discussed here: #21234) but the upside is that errors could be handled synchronously! That's good, we only need to do it :-).

Naturally, there are some expected errors that can be thrown by the system call used for starting a process from an executable (exec()-like calls on Unix), such as:

  • the path provided to the executable is invalid: ENOENT on Linux
  • lack of privileges to run the executable: EACCES on Linux

NodeJS' child_process.spawn uses uv_spawn() under the hood which should indeed synchronously report about these errors, see see http://docs.libuv.org/en/v1.x/process.html#c.uv_spawn. Quotes:

If the process is successfully spawned, this function will return 0. Otherwise, the negative error code corresponding to the reason it couldn’t spawn is returned.

Possible reasons for failing to spawn would include (but not be limited to) the file to execute not existing, not having permissions to use the setuid or setgid specified, or not having enough memory to allocate for the new process.

The specific proposal is to throw an error immediately when uv_spawn() fails.

Let me know what you think!

Related discussions:

@bnoordhuis bnoordhuis added the child_process label May 26, 2020
@bnoordhuis
Copy link
Member

bnoordhuis commented May 26, 2020

The rule of thumb in Node.js core is that logic errors (application bugs, like passing a string where a number is expected) are rejected with synchronous exceptions while runtime errors are emitted asynchronously. Trying to spawn an executable that isn't there falls squarely in the latter category.

It's a rule of thumb, not a hard rule, but it's not something we deviate from unless there's very good reason. If the current situation is practically unworkable, that would be a very good reason, but I don't think that applies here.

Your proposal also breaks backwards compatibility rather big time. That raises the bar even higher to the point that I think this proposal is practically DOA. There's simply not enough payoff vs. possible ecosystem fallout.

@jgehrcke
Copy link
Contributor Author

jgehrcke commented May 27, 2020

Thanks for the feedback!

The rule of thumb in Node.js core is that logic errors (application bugs, like passing a string where a number is expected) are rejected with synchronous exceptions while runtime errors are emitted asynchronously.

Thanks for sharing that perspective! I have a bit of a hard time to be convinced by this :-). Clarification added to the bottom of this comment. I am sure you have thought a lot about this principle and this is probably an oversimplified version of it.

In the NodeJS API/stdlib as far as I understand there are obviously asynchronous parts of that API and then there are obviously synchronous parts of that API.

For the synchronous parts I intuitively expect synchronously known errors to be thrown synchronously. Especially when they are about interaction with the operating system (system call errors).

Will fs.readFileSync() throw errors asynchronously? I don't know this while I am writing this question. I went into docs and couldn't find this documented right away. So I tried it, and the answer is "no", at least for ENOENT:

> fs.readFileSync('not')
Uncaught Error: ENOENT: no such file or directory, open 'not'
    at Object.openSync (fs.js:457:3)
    at Object.readFileSync (fs.js:359:35) {
  errno: -2,
  syscall: 'open',
  code: 'ENOENT',
  path: 'not'
}

This is behaving as I would expect.

I expected the same for child_process.spawn().

I can accept the fact that it does not behave so and can try to work with that. Sure :).

But then at some point I concluded that it's also important to provide feedback about how I feel about that when writing actual programs. The result is that bug report. A non-ideal bug report of course! The backwards compatibility argument obviously kills the idea, that's clear :).

It's pretty deeply surprising, from the point of view of a person who has done quite a bit of systems programming and tries do implement robust error handling.

Of course I understand that something like this cannot be changed w/o breaking code that previously worked. I hope that the result from this discussion is a mixture of

  • a small idea contributing to a plan for the far future
  • a piece of documentation showing a recipe for how to handle errors around creating and managing child processes in a robust way.

My point of view is the point of view of a more or less experienced programmer who cares about error handling when doing "system programming" (when using system calls, the operating system API). I know what can go wrong, and I would like to handle some of these situations. In NodeJS I more often than not wonder: how? Coming in with positive intentions, asking "how do I do this?" -- my conclusion after a few months is that error handling is sometimes much harder than it should be, and of course this also has to do with documentation.

About "logic errors" and "runtime errors":

  • I think of "runtime errors" to mainly be JavaScript code execution errors, as of bad code. But I think this is what you tried to cover with "logic errors".
  • I think of "system interaction errors" (errors when using the operating system API, i.e. system calls) as what I think you were referring to with "runtime errors".

Thanks @bnoordhuis let me know what you think, but I'd also understand if you'd like to engage into more productive, smaller-scoped discussions instead :-).

@sam-github
Copy link
Contributor

sam-github commented May 27, 2020

re: logic vs runtime (aka programmer error vs operational error): see https://www.joyent.com/node-js/production/design/errors for discussion

@jgehrcke
Copy link
Contributor Author

jgehrcke commented May 27, 2020

re: logic vs runtime (aka programmer error vs operational error): see https://www.joyent.com/node-js/production/design/errors for discussion

Thanks for chiming in! Just want to say that I am a strong advocate of the "programmer error vs operational error" differentiation, and derived reasoning based on this classification. I discovered and read the linked page long before I started working with NodeJS and it's really a decent write-up!

But did Ben's "logic vs runtime" really refer to precisely that distinction? Anyway. I think the direct comparison between fs.readFileSync() and child_process.spawn() could be a bit of a more productive line of discussion here.

@sam-github
Copy link
Contributor

sam-github commented May 27, 2020

But did Ben's "logic vs runtime" really refer to precisely that distinction?

Yes, a file not present is a runtime error.

I think the direct comparison between fs.readFileSync() and child_process.spawn() could be a bit of a more productive line of discussion here.

One is sync, the other is not. One does not accept a callback or return an event emitter, so it obviously throws, because no other error signalling mechanism exists. The other returns an event emitter, so is clearly async, and has an async error signalling mechanism (emitting an error event).

spawn is consistent with general practice in Node.js. There may be areas of inconsistency in the API surface, but these two very different APIs aren't examples of that.

@jgehrcke
Copy link
Contributor Author

jgehrcke commented May 27, 2020

One is sync, the other is not.

I can totally see what you mean with that and where you are coming from, but I'd say it's more like a goal, but not exactly the reality.

I hope you can agree that in practice a lot of child_process.spawn() is happening synchronously.

This is what #14917 (titled spawn() is not asynchronous ...) really is about and which is why I linked it in the first post above.

Some error information is available synchronously, just not being used in that moment, and this is really where I am coming from :).

In most mental models for managing child processes (be it in the mental model of shell primitives or within e.g. CPython's subprocess concept or in Go) there is both, a synchronous phase to it (during spawn, involving the system calls to spawn/start the new process -- usually fork/exec on Unix, CreateProcess on win32) and an asynchronous phase (after these system calls completed, after having let go, after having given up control, and where, conceptually, a new system call is necessary to detect any subsequent problem from the parent's point of view).

I'd argue that this two-phase concept is the established concept and that it is innate to the nature of launching and controlling child processes. libuv seems to agree (exposing error detail synchronously during uv_spawn()). NodeJS could have embraced that, too, by throwing synchronously available errors during spawn immediately, and saved programmers a little bit of pain.

@sam-github
Copy link
Contributor

sam-github commented May 27, 2020

Having to guess which errors are thrown, and which errors are delivered by callback/event listener is a whole lot of pain, and historically, a source of consistent complaint. Which is why we are where we are: that only things that are clearly "bugs" (type mismatches, nulls instead of values, etc.) are thrown, so bug-free code has only to attempt to get errors in one place, and thown errors can be allowed to terminate the process. Its more common to hear the request that NOTHING get thrown, but that historically has also caused problems, because typos in the code then get sent as errors... and either ignored, or logged and ignored (irresponsibly, of course), and then users come back and ask why the obviously buggy code didn't cause an error to be thrown.

In short, there are tradeoffs, but the current balance has worked itself out over years, a semver-major destabillization of that balance is not likely, in my personal opinion.

Btw, as far as a systems programmer's expectation of "spawn" goes, fork+exec is inherently async on Unixen, the exec system call fails in the child process, not the parent. IIRC, uv goes through quite some effort to block the parent process until the exec has suceeded in the child in order to get more windowsy spawn semantics (by doing a blocking read on a fd that was duped to the child and marked close-on-exec, a nice bit of system programming). There is a performance cost to this forced synchronization, but apparently not enough node.js developers are doing heavy enough spawning to notice the event loop blocks briefly on spawn.

@jgehrcke
Copy link
Contributor Author

jgehrcke commented May 28, 2020

[...] and thown errors can be allowed to terminate the process. Its more common to hear the request that NOTHING get thrown, but that historically has also caused problems, [...]

a semver-major destabillization of that balance is not likely

What I begin to appreciate more is that NodeJS is rather special instead of general purpose -- coming from the "I am here to build long-running network servers!" world where people don't want "fail fast (globally)", but instead would often want a best effort, by having even unexpected (and erroneously/irresponsibly unhanded) errors affect as little context as possible.

This can make certain "systems programming" tasks a little more difficult to achieve in a robust, predictable way than in other programming environments. But as you say, there are trade-offs.

the exec system call fails in the child process, not the parent. IIRC, uv goes through quite some effort to block the parent process until the exec has suceeded in the child [...]

Yeah sure, all this effort to provide synchronous error information to the caller of uv_spawn()... :-).

Thank you @sam-github for taking the time to engage in this discussion. Really appreciate it. Thanks for providing a little bit of historical context. There is an evolutionary path to everything and it often is important to consider, indeed :).

I think an actionable item from this discussion is for us to develop and share/document decent recipes for how to do sane error handling around child_process.spawn(). It's clearly non-trivial!

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Oct 15, 2020

looks like this discussion is stalled.

@jgehrcke - I can fully understand your expectations. I want to bring in a new perspective in expectation of converging the discussion:

  • the span API (like many other APIs) is an abstraction around OS syscalls fork and exec (in unix)
  • this means there is a boundary at which the the control is passed to the syscall(s).
  • My view of the design is: all errors in the validation steps within the wrapper are thrown back in-line, while issues occuring once the control has gone to the syscalls, are classified as runtime errors.

in this specific case as you describe:

  • suppose we check the path of the target executable
  • (may be through a stat call or so)
  • and found to be good and valid and accessible
  • and we progress further to invoke the syscall
  • there can be race conditions wherein the path is now unavailable!

so my way of thinking is: to leave it to the syscall, and post process errors, if any. thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process
Projects
None yet
Development

No branches or pull requests

4 participants