Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input option to child_process.spawn() #48786

Closed
ehmicky opened this issue Jul 15, 2023 · 16 comments
Closed

Add input option to child_process.spawn() #48786

ehmicky opened this issue Jul 15, 2023 · 16 comments
Labels
feature request Issues that request new features to be added to Node.js.

Comments

@ehmicky
Copy link

ehmicky commented Jul 15, 2023

What is the problem this feature will solve?

Users often need a child process's stdin to be some input as a string or buffer.
This is a use case common enough that Bash has a built-in operator for it: <<<.

At the moment, child_process.spawn() stdio[0] does not have an easy way to pass some input as a string or buffer.

What is the feature you are proposing to solve the problem?

child_process.spawnSync() implements an input option to solve this problem. For feature parity, this could be ported to child_process.spawn() as well.

What alternatives have you considered?

This issue originated in Execa: sindresorhus/execa#474

We've been considering many ways to implement this at the userland level, but they all come with challenges.

Readable.from()

Converting the string or buffer to a stream using something like Readable.from() does not work because stdio[0] requires any stream to have an underlying file descriptor.

stream.pipe(childProcess.stdin) or childProcess.stdin.end(string)

Calling childProcess.stdin.end(string) right after child_process.spawn() does not work when the process is very fast. If the process already exited, this results in an EPIPE error (see #40085).

Temporary file or socket

Creating a temporary file or socket, then passing it to stdio[0] requires more complex and slower logic to be implemented for what appears to be a simple use case. With --experimental-permissions, this also requires additional permissions.

Additional process

One could spawn a process that just outputs the string or buffer, then pipe its stdout to the other process's stdin. However, doing so in a way that is both fast and cross-OS is not straightforward.

@ehmicky ehmicky added the feature request Issues that request new features to be added to Node.js. label Jul 15, 2023
@mscdex
Copy link
Contributor

mscdex commented Jul 15, 2023

#48553

@ehmicky
Copy link
Author

ehmicky commented Jul 15, 2023

Thanks @mscdex, and sorry for not adding that issue in the original message.

There is also the underlying issue at #25231. However, I was struck by the discussion straying from being purely technical, or even being on-topic for some of the comments. As @benjamingr put it in that issue:

I'm not too happy with how discussion went down here - I feel like the idea proposed by @sindresorhus was shot down very quickly without first discussing alternatives and options.
I feel like valid and relevant technical points were raised but also some less constructive arguments. I'd love to see some constructive discussion on the pros and cons.

Some of the discussion was about whether this feature belonged in core Node.js or userland. As someone who tried implementing it in Execa, at the userland level, I think this is rather hard to do so in a way that is stable, fast and cross-platform.

I am also hoping to highlight with this issue some of those problems. For example, childProcess.stdin.end() is not an option due to the EPIPE problem (#40085).

As you put it in the PR:

I think people who are using the synchronous methods are using them for the synchronous behavior, not because of conveniences missing from the asynchronous versions.

Which I also completely agree with. Therefore, if the input option is useful in a synchronous context, it seems to make sense to provide with feature parity in an asynchronous context. At a high-level, the purpose of the input option is rather orthogonal to whether the execution is sync or async. The reason why this is currently only available in a synchronous context seems to be mostly implementation-related.

@mscdex
Copy link
Contributor

mscdex commented Jul 16, 2023

At a high-level, the purpose of the input option is rather orthogonal to whether the execution is sync or async.

Not really. spawnSync() waits until the process exits, so you'd have no opportunity to write to stdin if there was no input option, that's the reason why it exists. spawn() doesn't need input because it is async and therefore you can simply proc.stdin.write().

@benjamingr
Copy link
Member

Can you explain why:

const child = cp.exec(...);
child.stdin.write(...);

Doesn't work? The linked issue seems to be related to timing but I'm not sure I fully understand - does the above code have a potential race condition where it can cause an EPIPE and an input option won't? If so - can you post a repro?

@ehmicky
Copy link
Author

ehmicky commented Jul 16, 2023

spawn() doesn't need input because it is async and therefore you can simply proc.stdin.write().

Doesn't work? The linked issue seems to be related to timing but I'm not sure I fully understand - does the above code have a potential race condition where it can cause an EPIPE and an input option won't? If so - can you post a repro?

By the time, child.stdin.write() is executed, the child process might already have ended, if it is a fast process. This results in an EPIPE error.

const child = spawn('echo')
child.stdin.write('test')
node example.js
node:events:490
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at __node_internal_captureLargerStackTrace (node:internal/errors:496:5)
    at __node_internal_errnoException (node:internal/errors:623:12)
    at afterWriteDispatched (node:internal/stream_base_commons:160:15)
    at writeGeneric (node:internal/stream_base_commons:151:3)
    at Socket._writeGeneric (node:net:949:11)
    at Socket._write (node:net:961:8)
    at writeOrBuffer (node:internal/streams/writable:399:12)
    at _write (node:internal/streams/writable:340:10)
    at Writable.write (node:internal/streams/writable:344:10)
    at file:///home/ether/Desktop/a.js:4:20
    at ModuleJob.run (node:internal/modules/esm/module_job:192:25)
    at async DefaultModuleLoader.import (node:internal/modules/esm/loader:228:24)
    at async loadESM (node:internal/process/esm_loader:40:7)
    at async handleMainPromise (node:internal/modules/run_main:66:12)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}

Node.js v20.4.0

@benjamingr
Copy link
Member

I'm having a hard time reproducing this?

bgruenbaum@XLOs-MacBook-Pro 1 % node -e 'c = require("child_process").spawn("echo"); console.log(c.stdin.write("foo"));'
true

I'm on mac on v20.4.0 (since that's what you used). Ran it 1000 times in a loop and no repro.

@ehmicky
Copy link
Author

ehmicky commented Jul 16, 2023

Thanks for trying to run it @benjamingr.

I am on Node 20.4.0, but I use Ubuntu 23.04, not macOS. I am not sure whether the difference is OS-related (e.g. between the Linux and macOS syscalls) or hardware-related.

Running the same command line as you:

$ node -e 'c = require("child_process").spawn("echo"); console.log(c.stdin.write("foo"));'
false
node:events:490
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
...
Node.js v20.4.0

There is definitely a timing dimension to this problem. For example, if I make the command slower by adding a long argument to process, it stops failing. By tweaking the size of the argument, I can obtain commands which either succeed or fail half of the time, such as (on my machine):

$ node -e 'c = require("child_process").spawn("bash", ["-c", "echo " + "a".repeat(27000)]); console.log(c.stdin.write("foo"));'
true

$ node -e 'c = require("child_process").spawn("bash", ["-c", "echo " + "a".repeat(27000)]); console.log(c.stdin.write("foo"));'
false
node:events:490
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE

$ node -e 'c = require("child_process").spawn("bash", ["-c", "echo " + "a".repeat(27000)]); console.log(c.stdin.write("foo"));'
true

I agree with you that an input option with child_process.spawn() (unlike spawnSync() as pointed out by @mscdex) is not needed if users can just use child.stdin.write(). However, this assumes child.stdin.write() always writes to stdin before the child process starts consuming it, and that does not seem to always be the case.

@benjamingr
Copy link
Member

benjamingr commented Jul 16, 2023

Nice, I can't repro this on macOS but I can confirm this reproduces on an Ubuntu machine.

@mscdex what do you think? I think we should either allow input or fix the timing issue in child.stdin.write.

cc @nodejs/child_process - the ask here is for:

const child = exec("whatever");
child.stdin.write("something");

To not EPIPE on short commands by either accepting an input argument in exec or making sure child.stdin.write in the above case succeeds.

I'm not sure if to add the confirmed bug label I'd like someone from @nodejs/child_process to weigh in.

@bnoordhuis
Copy link
Member

If short-lived means something like /bin/false, then the race is intrinsic and not specific to node: the child disappears before the parent gets a chance to write to the pipe that connects them.

Adding an input argument won't fix that. Suppressing EPIPE means suppressing a genuine error.

@benjamingr
Copy link
Member

@bnoordhuis was hoping you'd show up, care to educate me on why?

For example if I do something like false <<< "hello" the shell(?) handles the case stdin may be closed when you start it?

i.e. there is nothing "special" about shells and there is no system call for "make a process with a preset stdin with that data in it" or some such and it's all multiple call APIs (e.g one to fork and one to set stdin).

@bnoordhuis
Copy link
Member

[..] the shell(?) handles the case stdin may be closed when you start it?

Pretty much. Shells do a lot of bookkeeping.

Plus, and unlike node, most programs simply terminate on SIGPIPE and therefore never generate EPIPE errors.

@ehmicky
Copy link
Author

ehmicky commented Jul 17, 2023

Thanks @bnoordhuis for giving that additional knowledge about how this works at the OS level. 👍

there is no system call for "make a process with a preset stdin with that data in it" or some such and it's all multiple call APIs (e.g one to fork and one to set stdin)

From looking at the code, it does seem to be as you describe @benjamingr. I might be incorrect but I believe the underlying syscall on Unix being used might be execve through exec, which only specifies the program, arguments and environment variables. Once spawned, libuv seems to pipe the correct file descriptors passed to the stdio option.

r = uv_spawn(uv_loop_, &uv_process_, &uv_process_options_);

https://github.com/libuv/libuv/blob/d09441ca03e399fe641da88624c8ea0476967187/src/unix/process.c#L956

https://github.com/libuv/libuv/blob/d09441ca03e399fe641da88624c8ea0476967187/src/unix/process.c#L401

From that perspective, I understand what you mean @bnoordhuis that it is not Node-specific, this is just how things work as the OS level (at least on Unix).

So the problem described above should only happen when either:

  • the child process does not use stdin, but the parent tries to pipe to it, which the parent process arguably should not do. Since the child process is not using stdin, it's not waiting for it and might exit before the parent writes to stdin. That's the examples we used above: false <<<"hello" and spawn('echo').stdin.write('test').
  • the child process uses stdin but does not wait the file descriptor to be closed before exiting, which the child process arguably should not do

In both cases, the current behavior of throwing an EPIPE would then be the correct behavior, to highlight the programming error.

Am I understanding this correctly? If I do, then that means calling child.stdin.write() after spawn() does seem to be the best solution, which is what the open PR at #48553 is currently implementing, and what Execa is doing through the input option. While I personally think it'd be a nice addition which would provide some feature parity with spawnSync(), I can see how others might prefer this to be implemented at the userland level instead.

@bnoordhuis
Copy link
Member

the child process does not use stdin, but the parent tries to pipe to it, which the parent process arguably should not do

Correct.

the child process uses stdin but does not wait the file descriptor to be closed before exiting, which the child process arguably should not do

That's context-dependent. In cat /dev/urandom | head -c42, head reads 42 bytes and exits. The next time cat tries to write, it's killed by a SIGPIPE signal. That's normal and how it's supposed to work.

Programs that want to keep running (like node) suppress the SIGPIPE and deal with the EPIPE error from the read() or write() system call. Node, being a runtime, bubbles it up to you, the programmer.

@benjamingr
Copy link
Member

In that case, if we can't provide input in a way to users that actually gives that input to the child process "as it spawns" and avoids the race condition since there is no syscall that does that - I vote we don't add input: since it's exactly as powerful as child.stdin.write(...) but appears to give more guarantees which we can't do.

@ehmicky
Copy link
Author

ehmicky commented Jul 18, 2023

Thanks for the additional insights @bnoordhuis, that was super helpful.
I agree with you @benjamingr, so I'm going to go ahead and close this issue. I think it might also be useful to update the PR at #48553 accordingly, since it implements the input option using child.stdin.write().
Thanks again for your guidance on this!

@xmedeko
Copy link

xmedeko commented Jan 12, 2024

I disagree than attempt to write stdin to the finished process is a programming bug always. E.g. you may start a program, which accepts stdin usually, but under some conditions (e.g. mistake in the program config file) it's hard or not possible to check these conditions before the process starts. So, a workaround is to listen to error event and handle the exception:

const child = exec("whatever");
child.stdin.once("error", err => { 
    // Handle the EPIPE error.
});
child.stdin.write("something");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js.
Projects
Status: Not planned
Development

No branches or pull requests

5 participants