Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push or pull? #15

Closed
dominictarr opened this issue Sep 10, 2012 · 9 comments
Closed

push or pull? #15

dominictarr opened this issue Sep 10, 2012 · 9 comments

Comments

@dominictarr
Copy link

The core idea of this module is to make streaming a pull operation, rather than a push operation.
However, the current implementation of pipe (specifically, the internal function flow) mungs it back into a push operation.

Consider the metaphor of physical pipes carrying water. Naively you can get the water to move my increasing the pressure at one end, like a garden hose, or by decreasing the pressure at the other end - like with a drinking straw.

You could consider the old stream api to be a "SquirtStream" -- it is the pressure at the readable end that starts pushing data through the pipeline, and I think in spirit, I think the new interface wants to be a "SuckStream" - it is the reader that pulls data through.

Currently, ReadableStream#pipe is in a weird middle ground.

When piped, each segment is forced to accept data until it return false.
i.e. piping causes each individual segment to start pulling.
This is not completely unreasonable, after all, that is how your esophagus and your veins work.
So, this is like a "SwallowStream".

so what I propose, is allow the writable stream to implement flow. (one could default to dest.flow || flow of course)
then, it would be possible to construct lazy pipelines where source.read() is never called until dest.read() is called

readable.pipe(through).pipe(through2)

readable.read() is never called until through2.read() is called.

Currently, one chunk is read from readable, but through.write(data)===false
then that chunk is drained when through2 reads, so through reads again.
so readable has read two chunks, when actually neither of those reads was necessary yet.

indeed, no reads should be performed until through2.pipe(writeable)

But what if it could work more like this:

Through.prototype.read = function () {
  return this._source.read()
}

Also, it would enable writable streams to make use of the read(bytes) api, and read only 4 bytes, for example,
while keeping the composeable api of a pipable stream!

I'm not sure exactly how this would best be implemented, but do I feel that a SuckStream is much more conceptually simple than a SwallowStream.

@rvagg rvagg closed this as completed Dec 31, 2014
@richardscarrott
Copy link

richardscarrott commented Jan 9, 2020

Years old I know, but I'm guessing this idea was never implemented? I really would like to see the ability to compose Readable and Transform streams, e.g.

const pipeline = readable.pipe(transform1).pipe(transform2); // Do not start reading from `readable` yet.
pipeline.pipe(writable); // Start reading now.

This would then allow sequential pipelines, e.g.

import merge from 'merge2';

const pipeline1 = readable1.pipe(transform1).pipe(transform2);`readable` yet.
const pipeline2 = readable2.pipe(transform1).pipe(transform2);

merge(pipeline, pipeline2).pipe(writable); // merge runs pipeline1, *then* runs pipeline2

I wonder if there's a package out there which helps with this?

@dominictarr
Copy link
Author

I gave up on node streams (years ago) but you may care to check out https://github.com/pull-stream/
(and the more recent https://github.com/pull-stream/) both of which enable you to do stuff like this. (pull-stream has a large ecosystem, https://github.com/pull-stream/ but push-stream has more low level optimization, both are faster than node streams though)

@mikeal
Copy link

mikeal commented Jan 9, 2020

@dominictarr i think your links got messed up because you just linked to the same org 3 times and I think you meant to point at specific repos ;)

I also “gave up on streams” but I’m using async iterators now instead. This is the easiest path to compatibility since Node.js streams are already valid async iterators and they are properly in the language now which means things like error handling are native and far less problematic. Here’s a great list of repos for working with async iterators https://github.com/alanshaw/it-awesome

@rvagg
Copy link
Member

rvagg commented Jan 10, 2020

^ I use a mix of both async iterators and standard streams depending on use-case. You an even turn the async iterator into a pull-style interface (chunk = await readNextChunk() style): https://github.com/rvagg/js-datastore-car/blob/1c423e07fa1b16b0e20bbca3836d26b10503afea/car.js#L39-L50

Like Mikeal suggests, my suspicion is that async iterators, once unflagged as experimental, will be the primary way that most of us interact with streams in the future.

Also worth noting in this ancient thread that there are some newer streams APIs that make working with them a bit nicer (these first two can be promisified too) stream.pipeline() and stream.finish(), plus the inverse of an stream-as-async-iterator: stream.Readable.from().

@dominictarr
Copy link
Author

dominictarr commented Jan 10, 2020 via email

@mikeal
Copy link

mikeal commented Jan 10, 2020 via email

@dominictarr
Copy link
Author

since it’s syntax any penalties can be optimized away

but not by node.js developers, only by JavaScript engine developers... and they are not gonna exactly jump and optimize a thing just because node.js developers want to be lazy. Also, some things are harder or easier to optimize. Can it be optimized fully away? Unless someone can give me a convincing story that it can be then I'm gonna default to it can't.

@mcollina
Copy link
Member

From my tests async iteration is not a bottleneck when processing a node stream.
If you consider:

const stream = fs.createReadStream('./foobar.txt')

for await (let chunk of stream) {
  // do something with chunk
}

the cost associated with processing the iterator is currently non-measurable.

@richardscarrott
Copy link

richardscarrott commented Jan 12, 2020

Yeah the whole npm ecosystem for streams is eerily quiet these days, many stream packages haven't been touched in years, so I had thought perhaps people have moved on to iterators -- I've not spent much time with them tbh but I guess they support buffering and handle back-pressure?

Do Node streams offer anything which cannot be done with async iterators?

I just discovered highlandjs which is super nice; it's basically a pull-based rxjs and has first class support for Node streams handling back-pressure correctly; it's been really nice to have a complete API in a single package rather than trying to cobble together 100s of Node stream packages with various incompatibilities; highly recommend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants