Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams pain points #89

Closed
chrisdickinson opened this issue Dec 5, 2014 · 66 comments

Comments

Projects
None yet
@chrisdickinson
Copy link
Contributor

commented Dec 5, 2014

This is a place to enumerate pain points / problems with streams as they exist presently. Please focus on problems, clarifying comments and questions, or statements of commiseration for specific problems. Proposed solutions should be posted as separate issues that link back to this issue. Proposed solutions should have enough example / backing code to have an informed discussion about the tradeoffs, but are not expected to be production ready / fully working.

If the pain point is "rip out streams from core" -- that can be discussed over here. :)

I'll start:

  • Every stream cares about byte-level buffering. This necessitates an "objectMode" flag, which when omitted, causes all sorts of gotchas for stream users. I am of the opinion that the this mechanism for determining when and when not to buffer causes more problems than its worth and should be replaced. Streams' awareness of their contents reduces their utility.
  • Resource-backed streams (every stream in Node core) have to implement their own semantics for destroy, close, etc; and they have to ensure that their sequence of events matches all others over time.
  • null is a reserved value in streams -- streams cannot meaningfully pass this value along without bugs. This causes problems for folks migrating from earlier versions of streams, and in situations where null is "in-alphabet."
  • The ergonomics of the streams API lead to the prevalence of packages like through and through2 (which by itself is fine), but it would be nice if the APIs were directly usable "out of the box", so to speak.
  • There's no spec for streams, so it's hard to build alternatives that are guaranteed to work.
@tracker1

This comment has been minimized.

Copy link

commented Dec 5, 2014

1 - bringing in the functionality of event-stream into core streams would be really nice (which includes through2 and the like).

2 - being able to easily create a generic stream that you can write to, that passes through would also be nice.

3 - a first class error method (Stream.prototype.error) method would be helpful for binding/passing, vs stream.emit.bind(stream, 'error')

4 - having methods bound to the context internally would be a bonus as well... stream.write.bind(stream) vs simply passing stream.write;


The following psuedo-example illustrates what I mean here, and something that would be closer to what I'd like to see.

mylib.foo = function getDataRowStream(query, params) {
  var stream = Stream.create(true /*object mode*/);  // 2 above, simple create of through stream
  queryDataAndPipeToStream(query, params, stream);
  return stream;
}

function queryDataAndPipeToStream(query, params, stream) {
  getDatabaseConnection()
    .then(function(conn){
      var req = new dal.Request(conn);
      req.stream = true;
      req.on('row', stream.write /*methods bound to stream instance*/)
      req.on('error', stream.error /*emits 'error' event*/)
      req.on('complete', stream.end);
      req.query(query, params)
    })
    .catch(stream.error);
}
@chrisdickinson

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2014

(EDIT: this was originally quoting the comment above, which was since edited)

I think the disconnect for me, is going from something like creating a connection to a database (where a promise is a great fit) to something where a stream makes sense (returning rows of data) is a bit weird.

I might be misinterpreting this quote, so forgive me if I'm just blathering here:

Promises are great for representing single async operations that happen exactly once, pass or fail -- they are a chainable primitive for those sorts of operations. Node's primitive for that class of operation is the Node-style callback -- which is, notably, not inherently chainable. Streams (collectively including readables, writables, duplexes, and transforms) represent a series of zero or more data events over time, that may eventually end or fail.

@chrisdickinson

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2014

Some more pain points:

  1. There are two .pipe functions -- Stream.prototype.pipe and ReadableStream.prototype.pipe. This is confusing for end-users and causes bugs.
  2. Streams are based on event emitters, but listening to certain events (readable and data) has implicit side effects.
  3. Streams documentation is currently a mix of reference, implementation guide, usage guide, and advanced usage -- it's hard to follow and link into from other docs, or consume as a single document.
@aredridel

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2014

Those last bits are SO TRUE to my experience.

@tracker1

This comment has been minimized.

Copy link

commented Dec 5, 2014

@chrisdickinson I removed the part you quoted... I was merely expressing a thought in my head that it would be nice if there was a cleaner way to chain against a Promise that resolved to a Stream. It's not spec, but if a Promise was also an EventEmitter that would emit against the resolved item, it could double as a ReadableStream...


Another minor pain point is that Writable streams don't emit end they emit finish ... which means when my final chain goes from a through stream to a writeable io stream, my end/finish event handler needs to change. It would be nice if writable streams emitted end as well as finish after the final flushed output.

@jonathanong

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2014

a lot of my grievances are already issues in node: https://github.com/joyent/node/issues/created_by/jonathanong

a major issue for web framework maintainers are leaks: https://github.com/jshttp/on-finished#example. requiring us to use https://github.com/stream-utils/destroy and https://github.com/jshttp/on-finished is a symptom of a broken stream implementation, imo.

but personally, i'd rather have readable-stream be separated from core, have all the stream development progress be moved into the WHATWG Stream specification, and allow users to use that as the stream implementation (assuming it fixes these bugs).

@mjackson

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2014

There's no spec for streams

This is definitely the main pain point for me. That's it. If there were a formal streams spec that everyone could run their own little implementations on to test for conformance that would fix a LOT of issues.

@chrisdickinson I'm sure you've seen https://github.com/promises-aplus/promises-tests. I'm tempted to do something like that for streams. Think it would help us all get on the same page?

A first pass could just spec out the existing implementation in node, trying to catch as many corner cases as possible. Then, anyone who wanted could make a competing streams implementation, similar to how when.js, rsvp, bluebird, etc. compete on speed and features.

@chrisdickinson

This comment has been minimized.

Copy link
Contributor Author

commented Dec 8, 2014

re: @mjackson:

This is definitely the main pain point for me. That's it. If there were a formal streams spec that everyone could run their own little implementations on to test for conformance that would fix a LOT of issues.

The other side of this is: if we spec streams as they exist currently, the difficulty of changing/improving streams goes up. Are we comfortable with how streams currently work, and are we willing to support that for the long haul? I agree that the eventual goal of streams should be a spec, but on the other hand, I feel like committing to what we have now may be premature.

EDIT: That said, I'd totally welcome a stream spec suite so we can see clearly what behaviors we're dealing with!


re: @jonathanong:

but personally, i'd rather have readable-stream be separated from core, have all the stream development progress be moved into the WHATWG Stream specification, and allow users to use that as the stream implementation (assuming it fixes these bugs).

The first part of that is under debate over here; I've made a big list of concerns with that approach as well. Re: the second half, the problem with using the WHATWG spec is that it's built around a core primitive that Node doesn't use -- namely, Promises.

Going through your linked issues, it seems like the lion's share of problems have to do with resource or handle-backed streams and how they interact (or, rather, don't interact as expected) with the pipe mechanism -- I totally agree that that behavior needs to be shored up; maybe as something that builds on top of vanilla streams, rather than baked into the base Readable and Writable classes.

@jmar777

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2014

Copying these over from nodejs/roadmap#1:

  1. The lack of error-forwarding via pipe() removes much of its elegance. The docs like to make things look simple, as in this example:
var r = fs.createReadStream('file.txt');
var z = zlib.createGzip();
var w = fs.createWriteStream('file.txt.gz');
r.pipe(z).pipe(w);

Beautiful, except you can't just add a single error handler on the end... you need one for each stream. And you could actually get errors from all three of them, so you can't just forward them to a callback or anything sane like that... it just gets gnarly way too fast.

  1. This is much broader, but the whole readable/writable dichotomy is confusing in a way that spiders out into virtually all stream implementations too. I think there are enough examples out there of "read only" stream APIs that make a compelling enough case for that (I even think @creationix experimented with wrapping node streams like that at one point).

  2. Object mode. I sort of get why they ended up in there, but if there's a specific point in time where streams jumped the shark...

I don't think any of those are actionable so long as node compatibility is a requirement, but I guess that's a different topic.

@gaearon

This comment has been minimized.

Copy link

commented Dec 8, 2014

Coming from Rx background, I was unpleasantly surprised by how unintuitive piping is. Most examples of downloading a file (arguably the simplest use case for streams and piping) that I found it blogs and on StackOverflow would get error handling, or flushing, or finish/close/disposal sequence wrong. Seriously. (Oh noes.)

IMO Node's pipe is somewhat like jQuery's “promise”-mutating then in this respect: broken by design due to absence of ownership and reader/writer separation in the API. In contrast, Rx Observables separate consuming from creating, like real Promises do, so there is no room for the whole class of mistakes.

Rx also has first-class disposables, as in you have to return a “disposable” from Observable constructor, and there are several built-in utility disposables for simpler composition.

I can be wrong though! Pardon me if I'm missing something obvious about how streams should be implemented.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2014

I think the problem with error forwarding is that if you had a long stream of piping object, it would be hard to determine where the error actually resides and deal with it in an appropriate manor. As far as i know streams, (though similar in nature to a promise will unlimited resolves of data , before a final end resolve - to squish into a promise paradigm) the idea is more about each individual stream taking ownership of its responsibilities, that actually gives the implementer much more control, albeit extra code and edge-cases which they themselves are required to take responsibility of dealing with.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2014

Also i get the point about the end event on the readable and the finish event on a writable, but what about transform streams? The finish event gets called when .end is called on the writable as it should, even if a stream consuming the readable end has not finished reading it. Only once this is done will it emit the end event. I think it is important to differentiate between the two events even though it might see a little confusing semantically.

@joepie91

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2014

From an application developer point of view, these are my main concerns with streams2:

  1. There is a pipe event on Writable streams, but not on Readable streams. It can be useful to be notified when something starts being piped elsewhere, and it's not obvious how to be made aware of this. Overriding _read or pipe methods is a hack at best, especially with third-party streams.
  2. Apparently it's only possible to set objectMode true for either both input and output, or neither. To set either of the two to object mode, but not the other, you need a custom constructor. This should really be easier to do.

Aside from those two points, I haven't really found any issues with streams so far, from my point of view. I'm actually finding them to be rather practical to work with.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2014

Having a pipe event on a duplex/ transform/ readable & writable stream, would be confusing. How would you know if it was piping or being piped if you had the same event?

Overriding the pipe method is not too difficult, if you absolutely needed that:

var pipe = Readable.prototype.pipe
Readable.prototype.pipe = function (dest, options) {
  this.desintations.push(dest)

  dest.on('finish', function () {
    var i = this.destinations.indexOf(dest)
    ~i && this.desintations.splice(i, 1)
  }.bind(this)

  return pipe.call(this, dest, options)
}

With regards to question 2, if you needed to pipe a transform object stream into a buffer there is no reason you can't do this in your transform:

var Transform = require('stream').Transform
var inherits = require('util').inherits

function ObjToBufferStream() {
  Transform.call(this, { objectMode: true })
}
inherits(ObjToBufferStream, Transform)

ObjToBufferStream.prototype._transform = function (data, enc, next) {
  try {
     next(null, new Buffer(JSON.stringify(data)))
  } catch (err) {
     next(err)
  }
}

var objToBufferStream = new ObjToBufferStream()
objToBufferStream.pipe(process.stdout)

objToBufferStream.end({ obj: true })
@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2014

@chrisdickinson I am curious, after hearing the discussion in the TC about potentially moving streams development out of core and introducing it as a bundled dependency.

What sort of API would you give streams, if you could start from scratch? Would you go for a promised backed approach such as @domenic's https://github.com/whatwg/streams approach?

@aredridel

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2014

You can in fact set objectMode for readable vs writable, but it's ugly: this._readableState.objectMode = true

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2014

@aredridel this is true, albeit the opposite to my example.

But this would only really matter for a duplex stream, in any case you are still going to need to implement the internals _read() method and use of push() to accommodate for the format of your data.

If you were implementing a transform stream, and it was just set to objectMode, that doesn't mean you can't write buffers to it. I think determining objectMode for one side or another is a non-issue.

No matter what is implemented, the hardest part is the conversion between formats (e.g. buffers to objects), as data across the wire is not always deterministic and could very well run over the boundaries of a chunk.

This is an issue no matter what encoding you have, unless you are using only JavaScript objects directly (and then you can't really manage memory effectively).

My example is contrived but the only important thing is the part:

next(null, JSON.stringify(data))

all the rest is purely contextual.

Personally when I see properties like _readableState, _writableState, _transformSate I consider it as implementation detail and not API, unless documented otherwise (such as in the case of _read, _write, _transform methods for the streams inheritance API).

As far as I am concerned, technically implementation detail is free to change without breaking semver!

@bodokaiser

This comment has been minimized.

Copy link

commented Dec 14, 2014

While implementing a WebSocket parser using the stream.Transform I always wished to have some feature where you could reliable push data back to read it the next time. This comes in handy when you need four bytes in row to extract data and you don't need to manage the buffer yourself. I think #unshift was once intended to do this but it never worked for me.

Beside I would like to have streams working much more as expected.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

@bodokaiser This is an interesting idea... so you are describing a way of putting some data back onto the write buffer and applying back pressure. Unshift would not act in that way, because it is on the Readable side so it would only put it on the start of the readableState buffer.

This is definitely food for though!

@bodokaiser

This comment has been minimized.

Copy link

commented Dec 15, 2014

@sonewman yes, indeed. I always wondered why this method was missing as streams should be chunk-able

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2014

It think this would be tricky to implement with what we have currently, for the purposes of this explanation I will call this method this.pushBack (this being the stream.Writable of course).

We currently create a new WriteReq when adding chunks to the _writableState buffer and this contains the encoding and the writecb supplied to .write(data, encoding, function () { /* this callback */ }) the question is, if a we were to call this.pushBack(chunk) would the callback fire? because technically some of the data would have been handled.

This could lead to confusion (and inconsistent behaviour) if it was decided that callback should be called multiple times, or alternatively if it was only called on the first handle and was not for the subsequent chunk.

The other problem is that when chunk is handled, it transfers the encoding and callback on to the _writableState before the _write method is called. When we call next in _write or _transform it eventually calls state.onwrite, which in turn calls the callback associated with that chunk either on thenextTick or immediately (if the writable end is being uncorked and the buffer cleared or it's being written to and there is anything previously backed up in the buffer).

(stream.Transform calls this via callback inception, in case following the source causes mind bending! 😄)

The above scenario would mean that this.pushBack() would need to be called synchronously within the _write or _writev method. Otherwise the encoding and callback could be lost when the handling of the next chunk happens synchronously or before the point in the nextTick that the callback was added (which would likely occure if we used process.nextTick in _write or _writev and then call next) this could cause some very strange behaviour.

I think we would need to decide if this functionality was definitely something that we really want to add to stream.Writable as it would most likely depend on some larger internal restructuring. In addition we would need to watch closely to ensure that engineering this does not affect the streams performance.

@stken2050

This comment has been minimized.

Copy link

commented Dec 31, 2014

A Happy NewYear to all!

@sonewman

Thanks a lot, very informative.
I think I should consider this issue harder by myself.
So far since I have not commit this problem, your immutable/FRP consideration against my idea is truly valuable to me.

I will consider this more from now, but so far, if there's a silver bullet, that would be FRP, if there's not such a thing, all we can do is to setup a sweat spot.

Regards.

@stken2050

This comment has been minimized.

Copy link

commented Dec 31, 2014

About Immutable data for chunk.

How about this?

Add time-stamp to every chunk of data.
Then the stream data becomes immutable.

stream.read(now) or something like that.
I understand the current node-stream employ pull model.
Pull data on time-stamp basis. Then again, it's functional, and reference transparent.

The stream state buffer (_readableState.buffer or _writableState.buffer) could never be immutable because its contents are forever changing, unless data is never appended to it.

Since every chunk of data is immutable with the time-stamp, the stream state buffer also becomes immutable with the time-stamps.
It's only matter of which part of immutable data you contains in the buffer.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 1, 2015

@kenokabe Happy New Year to you too.

I think this suggestion would be good for a certain use case. The problem is that .read(n) already has a valuable purpose. This dictates the number of bytes to be read from the underlying buffer. This involves some quite complex semantics by constructing the relevant "chunk" (of size n) from the internal array of Buffers. This chunk is always constructed from the front of the array of Buffers.

By allowing data to be plucked out of the array of Buffers at random points it would leave strange chunks of data in the array. This would inevitably require some kind clean up or flushing.

This is not to say that you cannot override the .read() method in an inherited implementation. But getting a relevant chunk based on a timestamp would not be useful in the use case of core.

By using a timestamp, components interacting with the stream would need to know the specific timestamp of the chunk to access it (i don't know how this would work).

The another issue with this is that the timestamp of the chunk would only be relevant to the stream, which added it, and the Buffer instance the timestamp was added to.

Would this change when moving the buffer bytes around?

Although bytes in a buffer always refer to the same data in memory despite what Buffer instance they exist in, because we would be adding the timestamp to the instance itself not the byte of data these would easily be lost when buffers are .slice()d.

This is the problem with they low-level data, which core streams deal with, in comparison, to the high-level (more naïve) object streams (often found in user-land), since they usually have a consistent chunk (an object) of data.

@stken2050

This comment has been minimized.

Copy link

commented Jan 5, 2015

@sonewman

Thanks for your comment.

One more issue, I think strongly realted to this topic is
Callbacks are imperative, promises are functional: Node’s biggest missed opportunity

With this definition in place, I want to address what I consider to be the biggest design mistake committed by Node.js: the decision, made quite early in its life, to prefer callback-based APIs to promise-based ones.

Writing correct concurrent programs basically comes down achieving as much concurrent work as you can while making sure operations still happen in the correct order. Although JavaScript is single threaded, we still get race conditions due to asynchrony: any action that involves I/O can yield CPU time to other actions while it waits for callbacks. Multiple concurrent actions can access the same in-memory data, or carry out overlapping sequences of commands against a database or the DOM. As I hope to show in this article, promises provide a way to describe problems using interdependencies between values, like in Excel, so that your tools can correctly optimize the solution for you, instead of you having to figure out control flow for yourself.
I hope to dismiss the misunderstanding that promises are about having cleaner syntax for callback-based async work. They are about modeling your problem in a fundamentally different way; they go deeper than syntax and actually change the way you solve problems at a semantic level.

I feel sorry about not integrating these problems for time-stamp idea and promise-based implementation yet in my brain so far, but I think it's better to share this idea block.

Regards.

@joepie91

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

Another point that I'd completely forgotten about: there should probably be some way to have a consistent (optional) seek implementation. For things like streaming out files from an archive it's essential to have seek functionality, but this is currently not supported in streams as far as I can tell.

Sure, you could have some module-specific extension to a stream to support seeking, but that would defeat the point of having a standard and interoperable stream interface.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

I'm not really sure I understand what you mean by 'seeking'. Do you mean seeking for a file or seeking through a file.

Implementation of both is perfectly possible with the current streams implementation.

You would just need to create a stream, which does the desired seeking.
If it doesn't exist already npmjs.com/search?q=stream+seek.

There is a lot of risk of feature / scope creep for me to think this would be a good idea to be in core.
(Can it do this too..? and this...? How about this other thing..?)

If you are seeking the presences of a sequence of text or bytes, over a chunk barriers there are options for this.

I have written a module called byte-matcher, which facilitates buffer pattern matching using a naïve matching technique. I have also experimented using the Boyer-Moore-Horspool in JavaScript but have yet to test on a significant amount of data for this algorithm to provide a quicker bench result.
(Maybe this could give you some inspiration)

In short there are options out there that are relatively simple to learn and implement.

There is also an open issue(#95) and PR(#160) for providing an indexOf method to the Buffer.

Obviously implementing parsing over chunk boundaries still requires considerations.

@joepie91

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

@sonewman I'm refering to seeking in the sense of reading particular sections of a file using an existing descriptor (or, in this case, stream) - not just for filesystem files, but for any kind of stream that supports this or could reasonably abstract it. Again, I am aware that it is possible to implement this, but it is not part of a standardized API (like the read event or push method are, for example) so it's not possible to have a standardized implementation right now.

For the specific example of a filesystem stream, as far as I understand it all current implementations do in-memory buffering of the entire stream (therefore defeating the point), and the only non-buffering approach I am aware of is to reuse a file descriptor when creating a new stream for every seek (and using the offset functionality), but I don't think this is a technically supported usecase and I'd expect it to break at some point.

An example usecase I was trying to work on recently was a vinyl stream from an archive; it would be an object stream, one object for each file in the archive, the 'contents' for each of those files being a stream itself. To accomplish this without having to open an fd for every file in the archive individually I'd have to somehow be able to have an offset linked to each of the 'substreams', refering to a specific section in the archive file, all using the same fd. That doesn't appear to be 'officially' possible with the current streams API.

@nfroidure

This comment has been minimized.

Copy link

commented Jan 6, 2015

@joepie91 it looks like fs.createReadStream takes a "fd" option in http://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

Wouldn't it work for the case you mention ?

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

@nfroidure 👍

There is also creationix/seekable by @creationix, which allows you to seek through a stream in a similar way.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

Also:

An example usecase I was trying to work on recently was a vinyl stream from an archive; it would be an object stream, one object for each file in the archive, the 'contents' for each of those files being a stream itself. To accomplish this without having to open an fd for every file in the archive individually I'd have to somehow be able to have an offset linked to each of the 'substreams', refering to a specific section in the archive file, all using the same fd. That doesn't appear to be 'officially' possible with the current streams API.

Is a very unique use-case. I don't think the stream itself needs additional API to deal with this. There is absolutely nothing to stop you adding additional methods to your stream implementation to facilitate this functionality.

Are you using streams 2 or 3?

@joepie91

This comment has been minimized.

Copy link
Contributor

commented Jan 7, 2015

@nfroidure Yes, this seems like it could potentially work - however, this approach does not seem to be officially supported as far as I can tell (considering concurrent streams that are all created from the same fd), and I'm not really comfortable relying on an undocumented implementation detail.

@sonewman seekable doesn't appear to actually implement seeking, it just emulates it. As far as I can tell, on every 'seek back' it would reread the entire stream up to the desired point, which is unacceptable from a disk I/O point of view.

As for my usecase; the specific usecase I listed was just a single specific example to illustrate the problem. The same problem would apply to any archive-like format - Git packs, WARC files, tarballs, map/asset download streams for games, and so on, really anything that contains more than one (potentially large) file.

I am aware that I could just implement my own methods, but as I stated before, this would not allow for standardization - everybody would likely implement seek methods with a different name and/or signature, which kind of defeats the point of standardized streams.

I'm not expecting a seeking implementation for streams in general, just an API standardization - a standardized method name and signature that can optionally be implemented by a stream, and perhaps an implementation of that method for the standard library (eg. fs read streams).

I'm currently using streams2, I think. Whatever version is included in Node v.0.10.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 7, 2015

How would the seeking work, with a stream of binary? In order to seek back or forward you would essentially need the data in memory, unless you are using an underlying source, which is managing the seeking (such as in the case of the file-system).

Again with the idea of adding standardised API methods for certain operations, we would be in danger of adding methods for many different use-cases. I guess this is so that other libraries could utilise certain methods on the nested streams in a generic way.

What would other modules necessarily use this for?

Is the use-case for a gulp pipeline?

@bjouhier

This comment has been minimized.

Copy link
Contributor

commented Jan 7, 2015

@joepie91 seeking (at least unbounded backwards) is fundamentally incompatible with streaming.

Another instance where the device/stream distinction helps: seeking and other operations (like sorting) are perfectly valid on certain classes of devices but not on streams.

@creationix

This comment has been minimized.

Copy link
Contributor

commented Jan 7, 2015

Correct, seekable will start over and then skip (throw away) from start up to the desired data point. The assumption there is all you have is a stream primitive that must start at the beginning and only goes one direction.

If the underlying primitive stream was able to start at arbitrary point (which I believe node's fs read stream has), then it could just throw away the old stream and create a new one that starts at the right place. It could even get fancy and keep a list of open streams at different points if the use case is random access that jumps back and forth.

Other than an optional start offset for streams, I can't think of many other simple additions to the API that would help with random seeking.

@loveencounterflow

This comment has been minimized.

Copy link

commented Jan 18, 2015

I've already commented elsewhere; let me add a few points for streams:

  • Streams are event-based which some people see as a mistake; i'd say as soon as some publishes a working, reasonable streamish library built solely on NodeJS-style callbacks, that would fully support the point. There is already ez-streams but unfortunately they're written using streamline.js which produces horribly convoluted, undigestible JS source. Still, that's technically plain JS.

NodeJS events as such have some flaws, including that

  • valid events for a given Emitter are not enumerable,
  • you get no errors when binding to a nonsense event,
  • there's no standard way to listen to all events.
  • There's something like a spec, but there's no (standard at least) way to submit a given library to a test suit that checks what features are available, how they perform in terms of time, space, and correctness, and where the library's API differs from the standard or conventions. I tried to do part of that with jseq; with that tool, you can easily throw in a library or function that purports to do deep equality checking, and jseq will then run a number of tests and show a comparative table of implementational correctness.

BTW NodeJS' assert.deepEqual performs appallingly bad when run through jseq and when you go and complain the devs will tell you Oh no it's according to the CommonJS spec. Trouble is that spec is seriously broken. So let's put any spec above sound reasoning, that's stupid. The wisdom of sound reasoning and practical experience should inform a spec; if that wisdom turns out to be faulty or the experience anecdotal or biased, then the specs should be changed accordingly (and that's exactly what we're talking about when it comes to NodeJS streams: they should evolve because they still have so many problems).

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2015

@loveencounterflow there is no doubt that streams will evolve. But there are a lot of constraints which prevent us from just rewriting a new implementation from scratch. In addition there are a lot of things which streams handle, which are overlooked in most criticisms. They also have a very specific and optimised use case in node/io.js core and the same is true of EventEmitters.

The above list of EventEmitter observations (also listed in your comment iojs/io.js/issues/188#issuecomment-70349136) Are all suggestible features. With valid use-cases (all of which would add additional overhead) I'm not sure on the benefit of throwing errors when listening to events which are not intended for use. That would require a list of events to be predefined in every EventEmitter, because in a lot of cases you want to bind to events before they are emitted.

In any event with well unit tested code surely developer errors such as event name typo's should be covered right?

True binding to all/any event could be useful in some use-cases. But realistically how often is that? Is there a tangible instance where core would benefit from this functionality? Because there is easily a multitude of ways this issue could be solved as user-land module.

In terms of assert.deepEquals i am not really sure how this is a pain point of streams.

BTW NodeJS' assert.deepEqual performs appallingly bad when run through jseq and when you go and complain the devs will tell you Oh no it's according to the CommonJS spec.

I am not really sure who you are referring to? node/io.js only loosely implements the CommonJS spec anyway (since as far as I am aware there is no concept of module.exports = function () { } in CommonJS).

Constructively what is it that you want to see done about the issues you describe above?

@loveencounterflow

This comment has been minimized.

Copy link

commented Jan 18, 2015

@sonewman

a lot of constraints which prevent us from just rewriting a new implementation from scratch. In addition there are a lot of things which streams handle, which are overlooked in most criticisms. There also have a very specific and optimised use case in node/io.js core and the same is true of EventEmitters.

I understand the constraints; sure nothing would be gained by breaking everything just to improve some perceived non-optimal detail. That much is clear.

I also fully underwrite that streams as they are standing are already great; i've rewritten my codebase of around 6000 LOC in a very short time (1000 LOC / month)—98% of those lines have a .pipe call. It's orderly, it works, it's great. I'm so convinced streams as the are are already great i put up with a steep learning curve, and i feel it's worthwhile. That's out of the question.

Still there are hints that event emitters may not be such a great idea to use for streams; at leasdt one @bjouhier has repeatedly argued in favor of callbacks, and as far as i can see he may very well be right.

Worse, there are many complaints that streams are too complex and too hard to get right. That we now have three competing versions in core that may or may not fall back to an earlier version of behavior does not make things simpler. I'm not saying i could have done anything better—which i doubt—but it should still be stated. If it's impossible to make streams easier/simpler, then we've already reached an optimum and should leave it at that—after all, once at the summit, the only way is down.

I'm not sure on the benefit of throwing errors when listening to events which are not intended for use. That would require a list of events to be predefined in every EventEmitter, because in a lot of cases you want to bind to events before they are emitted.

I think i can make a very clear statement here: generally, JS has an issue with throwing clear and early errors—like when you say x = {...}; delete x.f; x.f() you get undefined is not a function which is non-sensical—we already knew that undefined can't be called. The error comes too late (the error lies in accessing an undefined attribute) and with a strange message. We can't change that anymore. But when introducing new things like EventEmitters, we should strive to do better. To me, binding to a non-existing, never-about-to-happen x.on( 'finished', ... ) is just as questionworthy as trying to call undefined. One may argue that the moment that we bind to an event we can not possibly know what events the emitter will emit in the future, which is true, but by the same token we could (i think: should) make it a convention that the moment the x.on method is called, a check should be done. Fail early and tell everyone, that is. Again, not sure whether it makes sense to change that bit now, but i still think it's a flaw.

binding to all/any event could be useful in some use-cases. But realistically how often is that? Is there a tangible instance where core would benefit from this functionality? Because there is easily a multitude of ways this issue could be solved as user-land module.

Hard to answer the 'realistic', 'how often' and 'benefit' parts. The general idea is that you want code that is easily instrumentable, because then you make it a better target for all kinds of diagnostic tools, and those are important and all too often a late afterthought. Given that event emitters encourage writing less cohesive code—which may or may not be a good thing, but sure is interesting—, the need for simple debugging a la 'show me what you emit when and in what order' is there. And yes, when i find i great userland tool that allows me to just require and f.on( '*', ... ) i just shut up on this one.

As for some general statements about the issues with EventEmitters pls. read #188, esp. "Truth 1: A callback API is easy to document, an evented one is not.", "Truth 2: It is very easy to scaffold an evented APIs around a callback API. The reverse is more difficult.". This is not about crying out loud and wanting to change everything in core, this is about candid assessment.

In terms of assert.deepEquals i am not really sure how this is a pain point of streams.

Maybe i just should've linked to my concerns i stated elsewhere; the point i'm making is "let's do a spec and a test suite to evaluate libraries against that spec, and then run that suite against implementations so we can make justified statements about their respective merits and demerits", and what that test suite could look like is exemplified by jseq (which deals with assert.deepEqual among other things).

node/io.js only loosely implements the CommonJS spec anyway

Right, but that doesn't mean you don't get rebuked by someone because CommonJS; for example [some answered a complaint about assert.deepEqual](assert module has stability 5 - locked and is implemented upon CommonJS standard: http://wiki.commonjs.org/wiki/Unit_Testing/1.0 . If you have any ideas - please try submitting them to CommonJS guys first!) saying:

assert module has stability 5 - locked and is implemented upon CommonJS standard: http://wiki.commonjs.org/wiki/Unit_Testing/1.0 . If you have any ideas - please try submitting them to CommonJS guys first!

and closed the issue—effectively telling people to shut up it's not our fault (another dev re-opened the issue the same day; the dev probably meant to be ironic by telling users "this is stability 5 – locked", so we're not changing it, but you can make change request on another site, and then we're still on status "locked" over here so we won't change).

Let's not go down that way—to first change CommonJS and then change NodeJS/io.js just means 'not going to happen'; both groups have valid concerns to refuse changing even broken stuff.

To me the point here is that the deepEquals case concerns questions of governance as much as it is about technical details, and it's the governance parts that are worth thinking about when it comes to the future of streams and the role of specs.

Constructively what is it that you want to see done about the issues you describe above?

That's a hard question!

One thing i support is #97 where a special interest group of sorts is formed to talk about streams. That's good. Reading the list of group members alone instills a great deal of hope this thing is going in the right direction.

Maybe i can answer the question with what i'm personally going to do next, and that's examining a number of stream implementations and addons for their usability for my concrete use cases. Maybe one of the results will be a summary of what modules are available and what their specifics are, and that in itself will help me using streams in a better way.

@tracker1

This comment has been minimized.

Copy link

commented Jan 19, 2015

Another comment on streams... and it may be my own understanding, but when streaming objects, for processing and an error is raised, the entire pipe chain stops.. which may not be the desired behavior.

@aredridel

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2015

There is no event that is guaranteed to fire before 'data' or 'readable' events; 'readable' is not guaranteed to fire at all in a readable stream.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2015

@tracker1 It is true that if a stream errors then anything piping to it will unpipe, which would then cause for nothing down stream to receive anything. This is a contentious issue though, since you could argue that if a stream has emitted an error, it is in a bad state, and therefore you should not continue writing to it.

@aredridel I am not sure what event you would expect to receive before data or readable. There is certainly not going to be a readable event triggered before there is data to read that would defeat the point.

@tracker1

This comment has been minimized.

Copy link

commented Jan 22, 2015

@sonewman I think it should be a state/property of the stream that determines this, like object mode. If you're processing a file for importing, and a single line is messed up, there are lots of cases where the rest of the file should be able to process...

As it stands, I wind up swallowing errors, and passing a state object through the stream with an error property, explicitly checking that, and then having the error handler at the end of the pipe chain... it'd be nice if the chain can simply keep working, and bubble the errors downward.

@sonewman

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2015

@tracker1 that is really specific to a parser, and not every parser either. Exactly how would you expect it to flow down stream, and how would you manage that, because it essentially adds an additional error stream paradigm. In terms of using an event, there is nothing stopping you from using a different arbitrary string to signify this event instead.

@chrisdickinson

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2015

This issue has been a great catalyst for discussion, but I think the time has come for it to be closed. The @iojs/streams team has been formed, and the input we've collected here will inform future improvements to the streams subsystem. That said, if you've got issues with how streams work in iojs, ideas on how you'd like them to improve, or are interested in joining the streams team, head over to the readable-stream repo. Thanks all!

jianchun referenced this issue in nodejs/node-chakracore Dec 7, 2015

Jianchun Xu
chakrashim: remove unused code
Error objects now have no enumerable properties. Remove patch code.

Fix #89

Reviewed-by: @kunalspathak
@ORESoftware

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2017

@aredridel I saw your comment about callbacks in readables - I have always been confused why the read method in readables does not use a callback whereas the write method on writable has a callback. just for kicks I started writing an alternative stream API, and it's based on having the read method have a callback like you mention - assuming that's what you were getting at. Any reason why read doesn't take a callback in the current Node streams API? It makes no sense to me.

@calvinmetcalf

This comment has been minimized.

Copy link
Member

commented Jan 16, 2017

@ORESoftware noms might be what your looking for.

@ORESoftware

This comment has been minimized.

Copy link
Contributor

commented Jan 16, 2017

@calvinmetcalf thanks will check it out; after much research, I think the standard API will work, it's just not very intuitive or consistent, you can do this:

function getReadableStream(fn) {
    return new stream.Readable({
        objectMode: true,
        read: function (n) {
            fn(n, (err, chunk) => {
                 // this.push is only called after a callback has fired
                this.push(chunk);
            })
        }
    });
}

but I wish they had designed the API like folllowing instead,
it would have been so much more intuitive:

function getReadableStream(fn) {
    return new stream.Readable({
        objectMode: true,
        read:  (n,cb) => {  // <<<< read should have taken a callback., but it don't! mofos
            fn(n, (err, chunk) => {
                  if(!err){    // whatever, you get the idea
                     cb(null, chunk);  
                  });
             });
        }
    });
}

after many many weeks of subpar understanding, I think I get it, but you
never know when another curveball will get thrown at you in Node land.
IMO streams should always have been some userland thing...

However, core could add a callback to readable.read(), so maybe one day
that will happen without breaking backwards compatibility...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.