Consider implementing a duplex mode #143

wtgtybhertgeghgtwtg · 2018-07-06T06:15:25Z

Basically, a way to make a predictable duplex stream from stdin and stdout (or stderr), so you can use a child process as a transform stream. It'd emit an error if the process errors, and it'd kill the process if the stream is destroyed.

const execa = require('execa');

myStream.pipe(execa.duplex('myFile', ['my', 'arguments'])).pipe(myOtherStream);

It's not terribly difficult to do something like this as execa is now

const execa = require('execa');

execa.stdout('myFile', ['my', 'arguments'], {input: myStream}).pipe(myOtherStream);

but this can behave in unexpected ways.

// ENOENT gets swallowed, stream completes without error.
execa.stdout('myFiel', ['my', 'arguments'], {input: myStream});

const processStream = execa.stdout('myFile', ['my', 'arguments'], {input: myStream});
// `stdout` is destroyed, but the process doesn't die.
processStream.destroy();

IssueHunt Summary

ehmicky has been rewarded.

Backers (Total: $70.00)

issuehunt ($70.00)

Submitted pull Requests

#912 Add .readable(), .writable() and .duplex() methods

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

The text was updated successfully, but these errors were encountered:

sindresorhus · 2018-07-06T06:20:14Z

I like it. I could see myself using this in some places.

I would name it duplexStream() instead of duplex(). I think that's clearer.

IssueHuntBot · 2019-03-20T07:59:39Z

@IssueHunt has funded $70.00 to this issue.

Submit pull request via IssueHunt to receive this reward.
Want to contribute? Chip in to this issue via IssueHunt.
Checkout the IssueHunt Issue Explorer to see more funded issues.
Need help from developers? Add your repository on IssueHunt to raise funds.

dflupu · 2019-03-25T14:33:18Z

I was looking into working on this, but I ran into a small issue.

If this syntax were to work
myStream.pipe(execa.duplex('myFile', ['my', 'arguments'])).pipe(myOtherStream);

we'd have to return a duplex stream:

	new stream.Duplex({
		write(...writeArgs) {
			return spawned.stdin.write(...writeArgs);
		},
		read(...readArgs) {
			return spawned.stdout.read(...readArgs);
		}
	});

but we can't close this stream once we're done writing the stdin data. How do we work around that?

sindresorhus · 2019-03-31T18:53:48Z

Maybe use a stream.Passthrough. I haven't really looked into this issue closely though.

sloonz · 2020-05-15T10:33:01Z

Trying to work on it too. Some questions :

Should a non-zero exit code emit an error event of the duplex ?
If the user call duplex.destroy(), should the process be killed, or do we let it run and possibly fail due to broken pipe ?

sloonz · 2020-05-15T14:19:53Z

Rough, mostly untested first try :

sloonz@9708d7c

ehmicky · 2023-12-14T00:08:00Z

I am wondering whether the new pipe methods might already be fixing the underlying problem of this issue?

// stdout -> stdin
await execa(...).pipeStdout(execa(...))

// stdout -> stream
await execa(...).pipeStdout(otherStream)

// stdout -> file
await execa(...).pipeStdout('path');

// stderr -> stdin/stream/file
await execa(...).pipeStderr(...);

// stdout + stderr -> stdin/stream/file
await execa(...).pipeAll(...);

// stream -> stdin
await execa(..., {stdin: stream})

// file -> stdin
await execa(..., {inputFile: 'path'})

// string -> stdin
await execa(..., {input: 'value'})

So it is actually quite easy now to pipe multiple processes:

const { stdout: topTenFilenames } = await execa('ls', ['exampleDir'])
  .pipeStdout(execa('sort'))
  .pipeStdout(execa('head', ['-n', '10']))

sindresorhus · 2023-12-14T00:50:35Z

If anyone wants to work on this, see the feedback in: #424

ehmicky · 2024-03-17T04:46:48Z

It's taken me quite some time, but I finally got this working. I have a PR at #912 which implements this, together with .readable() and .writable() methods.

Motivation

The purpose is to use a subprocess as a stream, to pass it to APIs which accept streams. This contrasts with:

The std*: stream options, which do the opposite, i.e. passing a stream to a subprocess
The .pipe() method, which only focuses on subprocess piping, and does it well

Readable vs Writable vs Duplex

Users might want to only use the readable/writable side of the subprocess, but still convert it to a stream. Using subprocess.stdin or subprocess.stdout works in some cases, but not if the user wants to:

Wait for the subprocess to complete
Propagate any error from the subprocess to the stream, and the other way around

If a user wants a read-only stream, it is improper to return a Duplex with an already ended writable side. Same for write-only streams. That's because the Node.js stream API has some subtle logic meant for Duplex only. For example, finished(duplex) (unlike finished(readable) and finished(writable)) does not necessarily wait for the close event and might be missing errors thrown during stream._destroy(). That seems like a minor thing, but because finished() is used everywhere by the Node.js stream API, this leads to many subtle bugs when using a Duplex as if it were a Readable or Writable.

I actually started implementing this as a single .duplex() method, but it became quite clear the proper way to do this while respecting the Node.js streams API's inner workings was to expose 3 different methods: readable, writable, duplex.

const streamOne = subprocess.readable()
const streamTwo = subprocess.writable()
const streamThree = subprocess.duplex()

This also means implementing this as a PassThrough does not work, since those are duplexes. That was my first implementation attempt, and I had to backtrack from that approach.

This also means Duplex.from() cannot be used since it returns duplexes. I also made an implementation attempt with this, and it just did not work well.

File descriptors

Users should be able to choose the file descriptor. My PR does it by re-using the from and to option from the .pipe() method. It works for any file descriptor, not only stdin/stdout/stderr. all can be specified, to get from both stdout and stderr. For example:

const stream = subprocess.readable({from: 'stderr'})

Method vs property

Exposing this as a property (as I previously suggested in #591) does not work because:

This does not allow choosing the file descriptor (as described above)
If the user does not use that property, the stream will still pull data and propagate errors, which might break things for existing users, and result in a waste of resources

So it needs to be called on-demand via a method.

Error propagation

The stream awaits the subprocess completion. If the subprocess fails, an error event is emitted on the stream. This means users do not need to await subprocess anymore as long as they await the stream (using methods like text(), finished(), pipeline(), etc.).

In the other direction, the subprocess does not wait for the stream: there is no need to. However, if the subprocess takes some input, its stdin needs to be ended either through the stream or through subprocess.stdin.

If the stream errors and/or is destroyed, it destroys subprocess.stdin or subprocess.stdout|stderr. It does not terminate the subprocess via a signal. Doing so would be improper as it prevents graceful exit. Just like we do with the .pipe() method, we follow the way shells behave in that matter, which is to just close the subprocess' stdin or stdout or let the subprocess end on its own. If the subprocess uses stdin, ending stdin will make the subprocess end gracefully. If the subprocess writes to stdout, ending stdout will create EPIPE errors, making the subprocess end gracefully.

For duplexes, we purposely do not propagate errors between the readable and the writable side. There is no reason to, and it creates race conditions and subtle bugs.

Multiple consumers

Using subprocess.readable() should not make the Execa's return value's result.stdout or result.stderr become empty. This was initially the case due to get-stream not supporting multiple consumers, so this required some major changes to that library.

Also, the PR allows for calling subprocess.readable(), subprocess.writable() and subprocess.duplex() more than once, on either the same file descriptor or different ones.

Hard parts

Solving this issue was very difficult. I tried many approaches and was close to giving up a few times, especially with tests that failed in CI but not locally due to hard-to-debug race conditions. I am pretty confident with the current PR though as it has lots of automated tests covering edge cases.

Some of the things that made this hard to implement:

Cannot re-use Duplex.from(), PassThrough, piping. In other words, custom Readable/Writable/Duplex classes must be defined with their own implementation of _write/_read/_final/_destroy.
Needs to work in object mode, since users can now make subprocess.std* that are in object mode when using transforms
Needs to respect stream highWaterMark and buffering behavior
.readable() needs not to buffer too much and automatically pause/resume subprocess.stdout when its buffer gets full
Needs to work with the autoDestroy option, which tends to propagate state when not wanted
subprocess.stdin, subprocess.stdout and subprocess.stderr are weird. By default, those 3 are actually duplex TCP sockets. subprocess.stdin is writable and not readable, but subprocess.stdout/subprocess.stderr are both writable and readable.

Web streams

My PR is implementing this with Node.js streams. All of this work should be done for web streams too. I have opened #913 to track this.

I believe we should let users choose whether they want a Node.js stream or a web stream. While Node.js streams are legacy, most Node modules still only accept them. Also, from having worked with both a lot, I am expecting the implementing with web streams to be more complicated, slower and less stable. Finally, there a few things that only Node.js streams can do, such as having multiple readers at once, where there might be the only option in specific cases.

That being said, we definitely should implement this with for web streams too.

Method names

Because of the above, my PR uses the following names, which are the same names as the stream class itself and the TypeScript type:

.readable(), .writable(), .duplex(): Node.js streams
.readableStream(), .writableStream(), .transformTransform(): web streams

sindresorhus · 2024-03-17T06:33:11Z

This is impressive research and work. I'm pretty sure I would have given up long before you. 👏

sindresorhus · 2024-03-17T06:34:18Z

I wonder if there is anything Node.js could improve that would have made this work easier. If so, definitely open some Node.js issues. For example, a version of Duplex.from() that can be reused.

ehmicky · 2024-03-17T07:18:58Z

One of the issues with Duplex.from() is that it is not designed to create read-only or write-only streams. It is possible to do it, but it is still a Duplex under-the-hood, which creates a few issues. But I think that's by design.

Also it propagates state changes between the write and read side. In most cases, this is good, but this was such a problem with this specific issue.

The problem with streams is that they combine (by their own nature) multiple things that lead to complexity: performance-sensitive, memory-sensitive, statefulness, async, extendable by users. Then this issue combines it with OS processes, which are complex too. :)

issuehunt-oss · 2024-03-18T08:00:11Z

@sindresorhus has rewarded $63.00 to @ehmicky. See it on IssueHunt

💰 Total deposit: $70.00
🎉 Repository reward(0%): $0.00
🔧 Service fee(10%): $7.00

sindresorhus changed the title ~~Consider implementing a duplex mode.~~ Consider implementing a duplex mode Jul 6, 2018

sindresorhus added enhancement help wanted labels Aug 3, 2018

issuehunt-oss bot added the 💵 Funded on Issuehunt This issue has been funded on Issuehunt label May 10, 2019

sloonz mentioned this issue May 15, 2020

Duplex stream #424

Closed

This was referenced Dec 14, 2023

Improved piping #590

Closed

Add childProcess.readable and childProcess.writable #591

Closed

ehmicky mentioned this issue Mar 17, 2024

Add .readable(), .writable() and .duplex() methods #912

Merged

ehmicky mentioned this issue Mar 17, 2024

Add .readableStream(), .writableStream() and .transformStream() methods #913

Open

sindresorhus closed this as completed in #912 Mar 18, 2024

issuehunt-oss bot added 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt and removed 💵 Funded on Issuehunt This issue has been funded on Issuehunt labels Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider implementing a duplex mode #143

Consider implementing a duplex mode #143

wtgtybhertgeghgtwtg commented Jul 6, 2018 •

edited by issuehunt-oss bot

ehmicky has been rewarded.

Backers (Total: $70.00)

Submitted pull Requests

Tips

sindresorhus commented Jul 6, 2018

IssueHuntBot commented Mar 20, 2019

dflupu commented Mar 25, 2019 •

edited by sindresorhus

sindresorhus commented Mar 31, 2019

sloonz commented May 15, 2020

sloonz commented May 15, 2020

ehmicky commented Dec 14, 2023 •

edited

sindresorhus commented Dec 14, 2023

ehmicky commented Mar 17, 2024 •

edited

sindresorhus commented Mar 17, 2024

sindresorhus commented Mar 17, 2024

ehmicky commented Mar 17, 2024 •

edited

issuehunt-oss bot commented Mar 18, 2024

Consider implementing a duplex mode #143

Consider implementing a duplex mode #143

Comments

wtgtybhertgeghgtwtg commented Jul 6, 2018 • edited by issuehunt-oss bot

ehmicky has been rewarded.

Backers (Total: $70.00)

Submitted pull Requests

Tips

sindresorhus commented Jul 6, 2018

IssueHuntBot commented Mar 20, 2019

dflupu commented Mar 25, 2019 • edited by sindresorhus

sindresorhus commented Mar 31, 2019

sloonz commented May 15, 2020

sloonz commented May 15, 2020

ehmicky commented Dec 14, 2023 • edited

sindresorhus commented Dec 14, 2023

ehmicky commented Mar 17, 2024 • edited

Motivation

Readable vs Writable vs Duplex

File descriptors

Method vs property

Error propagation

Multiple consumers

Hard parts

Web streams

Method names

sindresorhus commented Mar 17, 2024

sindresorhus commented Mar 17, 2024

ehmicky commented Mar 17, 2024 • edited

issuehunt-oss bot commented Mar 18, 2024

wtgtybhertgeghgtwtg commented Jul 6, 2018 •

edited by issuehunt-oss bot

dflupu commented Mar 25, 2019 •

edited by sindresorhus

ehmicky commented Dec 14, 2023 •

edited

ehmicky commented Mar 17, 2024 •

edited

ehmicky commented Mar 17, 2024 •

edited