Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input option does not work with fast processes #474

Closed
ehmicky opened this issue Sep 11, 2021 · 16 comments
Closed

input option does not work with fast processes #474

ehmicky opened this issue Sep 11, 2021 · 16 comments
Labels

Comments

@ehmicky
Copy link
Collaborator

ehmicky commented Sep 11, 2021

This can be reproduced with:

execa('echo', { input: 'test' })
node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

Error: write EPIPE
    at afterWriteDispatched (node:internal/stream_base_commons:164:15)
    at writeGeneric (node:internal/stream_base_commons:155:3)
    at Socket._writeGeneric (node:net:780:11)
    at Socket._write (node:net:792:8)
    at writeOrBuffer (node:internal/streams/writable:389:12)
    at _write (node:internal/streams/writable:330:10)
    at Socket.Writable.end (node:internal/streams/writable:609:17)
    at Socket.end (node:net:594:31)
    at handleInput (/home/ether/code/nve/node_modules/execa/lib/stream.js:17:17)
    at execa (/home/ether/code/nve/node_modules/execa/index.js:156:2) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}

This happens when the child process is too fast to finish. When this happens, Execa writes to the process stdin while it has already ended, which creates an EPIPE error.

Discussion about this happening at nodejs/node#40085

@nickroberts
Copy link

I am getting this error as well, from execa >= 2. I have tried with 0.x, 1.x, and it seems to work as expected. Once I install execa@2, it fails with the command not found.

@ehmicky ehmicky changed the title stdin option does not work with Node >=16.7.0 stdin option does not work with fast processes Oct 4, 2021
@ehmicky
Copy link
Collaborator Author

ehmicky commented Oct 4, 2021

@nickroberts I have updated my original message. The problem is actually not related to Node.js versions. Instead, this is happening when the process execute too fast.

A discussion is currently happening about a way to fix this at nodejs/node#40085

@jkopanko-threathub
Copy link

The issue over at nodejs/node#40085 was closed and considered not a bug, as it can be handled by the application.

Will execa handle it internally or is it left to be handled by the user?

@ehmicky ehmicky changed the title stdin option does not work with fast processes input option does not work with fast processes May 23, 2023
@ehmicky
Copy link
Collaborator Author

ehmicky commented May 23, 2023

Yes, it is not a bug from Node.js. Namely, the input option is implemented by writing to the child process sdin after it's been spawned. Doing so creates a timing issue, since the child process might have already exited, hence the EPIPE error.

This is a bug from Execa that we should ideally fix.

I'm having some issue finding a reliable way to detect whether stdin is still writable.

Instead, I'm thinking the proper way would be to create a stream from the input string, then pass this to the stdin option. What do you think @sindresorhus?

@sindresorhus
Copy link
Owner

Instead, I'm thinking the proper way would be to create a stream from the input string, then pass this to the stdin option. What do you think @sindresorhus?

Yeah, that sounds good to me. You could just use stream.Readable.from().

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 12, 2023

I took a stab at it, but realized the following problem: child_process.spawn() stdio option only allows streams that have an underlying file descriptor.

object: Share a readable or writable stream that refers to a tty, file, socket, or a pipe with the child process. The stream's underlying file descriptor is duplicated in the child process to the fd that corresponds to the index in the stdio array. The stream must have an underlying descriptor (file streams do not until the 'open' event has occurred).

This is confirmed in the code:

https://github.com/nodejs/node/blob/b76862df0ac124f7f683dd51efd56215336d1e31/lib/internal/child_process.js#L1055-1077

https://github.com/nodejs/node/blob/b76862df0ac124f7f683dd51efd56215336d1e31/lib/internal/child_process.js#L340-347

This results in:

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'stdio' is invalid. Received Readable

Strings and buffers are allowed, but only with spawnSync().

I feel like this means we would need to write (then cleanup) a temporary file. Touching the filesystem seems a bit too much just to solve this bug. What are your thoughts on this @sindresorhus? Do you have any other idea on how to solve this problem?

@sindresorhus
Copy link
Owner

It says it accepts a socket too, so you could potentially create that:

const net = require('net');
const { spawn } = require('child_process');

const server = net.createServer(function (socket) {
  // Spawn your child process here and use the socket as a stdio stream
  const child = spawn('command', [], {
    stdio: ['pipe', socket, socket]
  });

  // Write to the child's stdin
  child.stdin.write('Hello, child process!\n');

  // Print the child's stdout/stderr
  child.stdout.on('data', (data) => {
    console.log(`stdout: ${data}`);
  });
});

server.listen(function () {
  const { port } = server.address();

  // Connect to the server socket
  const socket = net.connect(port);

  // Write to the server socket (writes to the child's stdin)
  socket.write('Hello, server socket!\n');

  // Print the server socket's output (the child's stdout/stderr)
  socket.on('data', (data) => {
    console.log(`socket: ${data}`);
  });
});

ChatGPT

@sindresorhus
Copy link
Owner

Otherwise, I don't think it's too bad to create a temporary file either.

@sindresorhus
Copy link
Owner

Could we maybe simply detect EPIPE and ignore it if it's in the first second (or less) of the process execution or something?

@sindresorhus
Copy link
Owner

Maybe we could wait for the spawn event before writing?

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 15, 2023

Thanks for all those different ideas! :)

It says it accepts a socket too, so you could potentially create that:

Otherwise, I don't think it's too bad to create a temporary file either.

Using a socket is a creative solution, although using a temporary file might have the same result, but be more straightforward to implement.

Could we maybe simply detect EPIPE and ignore it if it's in the first second (or less) of the process execution or something?

I am concerned about the following issues:

  • We would need to report to users that the stdin that they specified never got passed to the child process, because it might have impacted its execution. Maybe creating a special error for this case might be more proper?
  • The specific time threshold might be somewhat arbitrary and dependent on the machine's speed.
  • Can EPIPE be emitted for other reasons in the very beginning of a process execution? For example, when passing to stdin a stream that has already ended.

Maybe we could wait for the spawn event before writing?

I am not sure this would work, since it would only delay writing to stdin further, increasing the chances of the process having ended when stdin is written to. At the moment, we are calling stdin.write() almost right after child_process.spawn(), in the same microtask. I am also wondering about the risk of writing to stdin after the child process's logic has already consumed it?


Based on this, I am feeling like using a temporary file might be the "cleanest" (as in "bug-proof") way to implement this?

@sindresorhus
Copy link
Owner

I agree. A temporary file seems like the safest as there are a lot of unknowns with the other solutions.

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 15, 2023

I took another stab at it, but encountered the following problem, which applies to both the temporary file and the socket solution: execa() returns the childProcess synchronously. child_process.spawn() is called and its return value is returned right away. The temporary file needs to be created before child_process.spawn() since it is passed as the stdin option.

We could use sync I/O to create the temporary file, but this would make execa() hold up the microtask much longer than before. Some people create hundreds of processes in a row, making this even more noticeable. Async I/O could also be used through the usage of a separate method like:

import { execa, createInput } from 'execa'

const input = await createInput('test')
await execa('command', { input })

🤔

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 15, 2023

Another potential idea: calling a second child process that just outputs the string, then pipe its stdout to the first child process's stdin. This would remove the need for filesystem/network I/O. This would also work in terms of sync/async, because child_process.spawn() returns the child process right away. Something like:

const childProcess = execa('printf', ['test'])
await execa('command', { stdin: childProcess.stdout })

Also, with the new Node.js permissions system, Execa would not require any additional --allow-fs-read/--allow-fs-write permission, only the --allow-child-process permission currently used.

One problem with this approach is that the first child process would also need to be "handled": errors, exit, etc.
This might end up being not so straightforward to implement correctly. 🤔

Another problem is to find a command which echoes a string (without appending a newline) that is cross-OS. For example, printf is not cross-OS. We could use different commands based on process.platform, although we might miss some edge cases (some people have weird OS setups). node -e 'process.stdout.write("test")' could be used, but it would be much slower. On my machine:

$ time for i in {1..1000}; do node -e 'process.stdout.write("test")'; done > /dev/null
real	0m18.207s
user	0m11.914s
sys	0m6.739s

$ time for i in {1..1000}; do printf test; done > /dev/null
real	0m0.004s
user	0m0.004s
sys	0m0.000s

As a side note, this is a very interesting problem to solve. :)

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 15, 2023

I am starting to wonder whether this is something that should be fixed by Node.js core, so I opened nodejs/node#48786 to port the input option from child_process.spawnSync() to child_process.spawn().

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jul 18, 2023

The discussion in core Node.js was very insightful.

The first thing is: at the syscall level (using Linux as an example), exec spawns the program with only its arguments and environment variables. Any stdin redirection/piping only happens after that syscall succeeded. There is no way to specify some input to stdin as the program is spawning, it has to be done after that syscall, which is what libuv is doing with the stdio option.

So, calling childProcess.stdin.write() right after child_process.spawn() (which Execa currently does) is the "fastest" way to pipe to stdin. Passing a stream to stdio[0] does not pipe to stdin any sooner. In both cases, it's done right after spawning.


The second thing is: EPIPE can happen in several situations such as:

  • the child process does not use stdin, but the parent tries to pipe to it, which the parent process arguably should not do. Since the child process is not using stdin, it's not waiting for it and might exit before the parent writes to stdin. That's the examples we used above with echo.
  • the child process uses stdin but does not wait the file descriptor to be closed before exiting, which the child process arguably should not do (except in the following case).
  • the child process uses stdin but up to a point. For example head in cat /dev/urandom | head -c42. Once it's done consuming it, the child process just exits, which makes the other process get a SIGPIPE, which terminates it. In shells, this is the normal way for pipelines like this to terminate, and it does not report any error nor change the exit code. In Node.js, SIGPIPE becomes an EPIPE exception, which lets the programmer decide on how to handle it.

In all those cases, propagating EPIPE is the expected behavior as it as it indicates either a programming error, or an intentional termination.


As a conclusion, the current behavior is the proper one, and I think we should close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants