Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using different streams as the basis #146

Closed
phated opened this issue Jan 14, 2016 · 21 comments
Closed

Consider using different streams as the basis #146

phated opened this issue Jan 14, 2016 · 21 comments
Milestone

Comments

@phated
Copy link
Member

phated commented Jan 14, 2016

After all the trouble we have been dealing with using node streams (error propagation, lost events, etc), I think it might be time to consider using different stream module as the underlying streams for this module. Quite some time ago, https://github.com/caolan/highland was big news, maybe we should review it and see how it handles some of our problems.

@yocontra
Copy link
Member

the browser Stream spec has error propagation IIRC

@phated
Copy link
Member Author

phated commented Jan 14, 2016

@contra yeah, I had been discussing the whatwg stream spec with chris - not sure where in the process it is and if anyone has a solid implementation yet.

@phated
Copy link
Member Author

phated commented Jan 14, 2016

@contra looks like there is a reference implementation polyfill at https://github.com/whatwg/streams/tree/master/reference-implementation

@erikkemperman
Copy link
Member

Not sure if it would help with the trouble you've been seeing -- but @piranna pointed to fstream and that looks nice in that it integrates stat.

@mikeal
Copy link

mikeal commented Jan 26, 2016

AFAIK the Stream spec's error propogation comes from being based on Promises. Without getting into too much depth, Promise error propogation has... issues.

The initial WHATWG Stream spec was based on a lot of feedback from the Node.js Stream implementation and fixing what we got wrong. Since then it's taken a lot of turns and I'm not sure how many of them were to keep spec people and Promise advocates happy and how many were real improvements.

@phated
Copy link
Member Author

phated commented Jan 26, 2016

@mikeal the node stream error implementation is much worse than promises and the unhandledRejection handler in new nodes would be a really nice thing for us to hook into. We have been dealing with a ton of issues that I'm trying to work through with @chrisdickinson and he said that WHATWG Streams actually solve some of them (sometimes-writeable/sometimes-transform streams).

@chrisdickinson
Copy link

The initial WHATWG Stream spec was based on a lot of feedback from the Node.js Stream implementation and fixing what we got wrong. Since then it's taken a lot of turns and I'm not sure how many of them were to keep spec people and Promise advocates happy and how many were real improvements.

I've been keeping track of the WHATWG stream changes — most of them are well-grounded. The biggest concern I have with them right now is that, last I checked, the byte reader stuff was up in the air, though that may have since changed.

AFAIK the Stream spec's error propogation comes from being based on Promises. Without getting into too much depth, Promise error propogation has... issues.

As someone who was formerly very anti Promise, I'd be very interested in having a conversation with you about this in a separate venue :)

@mikeal
Copy link

mikeal commented Jan 26, 2016

So, unhandledRejection solves the problem of errors going unhandled but the really hard part about propogation is what you do during splits. This happens in promises and in streams (when streams are piped to multiple outputs) and I've never seen a good solution to this. You either pick a direction to propogate, potentially not sending to the side that has a handler, or you potentially hit two handlers on each side of the split. We've been trying to figure this out since the first standardized Stream implementation landed in Node.js and haven't found a more reliable pattern than requiring people to handle errors in the stream that generates it.

@mikeal
Copy link

mikeal commented Jan 26, 2016

As someone who was formerly very anti Promise, I'd be very interested in having a conversation with you about this in a separate venue :)

Sure :)

For the record, I don't really care about Promises. I don't hate them, and I'm not in love with them, I think they are fine for people that prefer them but I know from experience that many people find them a bigger conceptual barrier than callbacks.

However, I do dislike the proliferation of Promises through the specification process. I can't blame the spec authors though, there isn't a standardized way of doing something in the future or even providing callbacks and event handlers and it probably sucks to open up those parts of their spec to endless bike shedding. But this also means that Promises are showing up in all kinds of places they don't need to be, and I mean this not as a matter of preference but on a purely technical basis. It means that Promises are mostly complicating APIs that would otherwise be much simpler because there just isn't an alternative in the spec world while in the real world we have the Node.js standard callback API, EventEmitters, etc.

@chrisdickinson
Copy link

@mikeal: Thanks for your response — this is interesting! I'll respond to the Promise part of the conversation on ➡️ nodejs/readable-stream#181 so as not to fill up the vinyl-fs tracker.

@domenic
Copy link

domenic commented Jan 26, 2016

I of course disagree with all of @mikeal's characterizations here. Promises are being used appropriately, and have no issues with error propagation. But I'm not really interested in debating that, just wanted to counter some of the FUD.

the byte reader stuff was up in the air, though that may have since changed.

It's getting really really nice. Zero-copy everywhere and much less awkward than a separate type. Watch whatwg/streams#418 for the details.

@contra looks like there is a reference implementation polyfill at https://github.com/whatwg/streams/tree/master/reference-implementation

There's a couple issues I'd warn against.

  • The big one: writable streams and transform streams are still due for a rewrite. See the warning at https://streams.spec.whatwg.org/#ws. Basically, readable streams have gotten a lot of feedback to make them better, and we need to port those innovations over to writable streams. And then, we need to figure out the implications for transform streams (which are writable + readable together).
  • The smaller one: the reference implementation is meant as a reference implementation, not a usable polyfill. It isn't tweaked for performance and some things like the queue tracking is pretty bad for performance. I'd love to collaborate with someone who works on a polyfill; there is a big test suite you can use.

@phated
Copy link
Member Author

phated commented Jan 26, 2016

@domenic this wouldn't end up in gulp 4 but is forward-looking towards gulp 5. The reference implementation might be fine to prototype against; however, do you know of any polyfill authors that are actively working on one? I remember seeing a bunch of people jump in like a year ago but haven't heard anything since.

@domenic
Copy link

domenic commented Jan 26, 2016

@phated I haven't seen any in active maintenance, no. But, forking from the reference implementation and fixing the worst-performing aspects/de-Traceurizing would probably be a few hours work.

@calvinmetcalf
Copy link

what about just using pump (or something similar) instead of .pipe as the default method of combining streams in the next version of gulp? It would have the advantage of keeping compatibility with current node streams in use.

@mcollina
Copy link
Contributor

The reason why I use gulp is because it is based around node streams. It is its greater feature for me, because it makes this compatible with the greater node ecosystem. I also use the vynil-fs in some other ways, and I have created awesome things with it.

I know and have experienced all the problems with Streams, including the difficulty in explaining them to newbies. I think we should fix the problems in node streams, because it will benefit the whole node community.

@phated what are the problems you are experiencing? how can we improve the gulp user experience by fixing streams?

@phated
Copy link
Member Author

phated commented Jan 27, 2016

@mcollina I believe the latest saga is documented through issues and PRs:

  • Using vinyl-fs apart from gulp wasn't working correctly in a lot of scenarios because in gulp, we expect people to return the stream so we can sink it; however, using vinyl-fs outside of that and ending on a vfs.dest() would cause the highWaterMark to be hit and the stream would stop processing and this was caused by us having vfs.dest() to sometimes behave as a Transform stream and sometimes as a Writeable stream - vfs-dest doesnt end? #120
  • The solution, after extensive discussion with @chrisdickinson, was to sink the stream we return, which would guarantee we always pipe to a Writeable stream of some sort, but it shouldn't have effected other streams being piped to because it flows at the slowest processing speed - sink the stream, this avoids highWaterMark being hit - add test #126
  • However, this caused a bunch of stuff to break because piping to that sink put the stream into flowing mode and caused the readable event to stop being emitted - Sink should respect a readable event on the destination stream #142
  • It turned out that duplexer2 (used by stream-combiner2) listens for that readable event so you can't pass flowing streams to either of those libraries.
  • The solution to this was to add a whole bunch of event listeners to add and remove the sink when something else registered for readable or data events - Allow things to listen for data or readable events and avoid sinking in those cases #145
  • I forget the thing @chrisdickinson said that WHATWG Streams has that would have made this simpler (in theory) but I think it was some sort of interface.

We've also been the biggest proponent of better error management in node streams and were pushing for something like @chrisdickinson's original error management PR on node core. We don't want people to have to use gulp-plumber in their pipelines and with the amount of people using it, I think error propagation should probably be the default behavior. We also need some sort of error recovery so the stream doesn't get torn down on error.

@dominictarr
Copy link

If you are considering looking at a different stream api, I strongly recommend you have a look at pull-stream

pull-streams can do all the things that node-streams can do, but are dramatically less code. you don't even need a library to implement a pull-stream (it's just a couple of functions). The source and the sink both have a chance to apply back pressure, and errors propagate. If a source errors, that is pretty much like a normal end condition, but if a sink errors that propagates back up and aborts the source.

When you need to interact with the regular node ecosystem, there is stream-to-pull-stream and pull-stream-to-stream modules for interop.

back pressure in pull-streams is much easier to reason about, so you can use use FRP style things like pull.take(5) which reads 5 things from the source and then aborts it, but since it also have does have fully fledged back pressure you can use it for IO too.

@darsain
Copy link

darsain commented Oct 8, 2016

@dominictarr I actually played with the idea of adding pull-streams to gulp recently. They are amazing and so simple, but they are a bit too simple :) The issue is that they are not easily identifiable. Gulp checks what is returned from the task, and behaves accordingly, but pull streams are basically just plain functions. Source is a function with 2 arguments, sink is a function with 1 argument, and through can't be checked at all since it appears as a sink. That dives below the line I considered acceptable when duck typing.

Wouldn't you consider adding a flag to the spec? Something like stream._isPullStream = true? And maybe another one with the type of the stream? Or maybe combine them into one truthy flag stream._isPullStream = 'sink', although that feels weird.

Anyway, my point is, pull streams are a type of async interface that just needs to be reliably identifiable, otherwise tools like gulp can't really use them.

@dominictarr
Copy link

thank you! hmm, I actually had that in early version of pull-stream, (back when there was a .pipe() function, but that is removed now).

Since pull-streams are just a function, and you don't have to inherit from anything, or depend on any specific module to create a pull-stream, it would mean that every pull-stream module would need to a flag. That is not a reasonable change.

On the other hand, if gulp adopted a convention for accepting pull-streams which gulp modules could follow, which could be returning a pull-stream with a flag for gulp to see, that would be gulp's business.

If you do that it shouldn't start with _ because that conventionally means "private" but or "internal" but I'd argue it's a public api.

@ivan-kleshnin
Copy link

ivan-kleshnin commented Apr 13, 2017

Pull-streams are too good for committee people.
No classes, composable API, code is unnecessary readable... boring.
Just a historical record after reading this crap.

@phated
Copy link
Member Author

phated commented May 14, 2020

With the awesome work on https://github.com/mafintosh/streamx - we will be switching to that project for our streams implementation. It fixes a lot of our issues and aims to be compatible with node core streams. I'm going to open a new issue for the switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests