New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplex stream support for separate readableHighWaterMark and writableHighWaterMark #14555

Closed
guymguym opened this Issue Jul 31, 2017 · 8 comments

Comments

Projects
None yet
5 participants
@guymguym
Contributor

guymguym commented Jul 31, 2017

  • Version: 8.2.1
  • Platform: Darwin
  • Subsystem: stream

This need comes from the practice to use transform streams to perform pipeline processing with backpressure, in order to process incoming binary data, by splitting it to intermediate chunk objects. In such cases the transform stream will accept buffers as input (readableObjectMode: true) and push out objects (writableObjectMode: true), or vice versa.

While the objectMode flag supports separation between readable and writable, the highWaterMark option is unified between the stream roles, which doesn't allow to set the internal buffer size units with respect to the stream type. It seems that optional support for readableHighWaterMark and writableHighWaterMark is a natural complementary option to the separated readableObjectMode and writableObjectMode options.

If PR's are welcome I can code the same handling for in the ctors of Readable and Writable and add to docs.

LMK what you think,
Thanks!

References:
https://nodejs.org/en/docs/guides/backpressuring-in-streams/
https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L69

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Jul 31, 2017

Member

@guymguym could you describe a bit why it may be desirable to have these separately?

cc @nodejs/streams

Member

Fishrock123 commented Jul 31, 2017

@guymguym could you describe a bit why it may be desirable to have these separately?

cc @nodejs/streams

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Jul 31, 2017

Member

I’m actually a bit surprised this didn’t already exist, so 👍 from me.

Also, just for clarity, this should be implemented on the Duplex level rather than on Transform.

@Fishrock123 One problem I could imagine is that high/low water marks can’t really ever have an identical and meaningful value if one side is in object mode and the other isn’t.

Member

addaleax commented Jul 31, 2017

I’m actually a bit surprised this didn’t already exist, so 👍 from me.

Also, just for clarity, this should be implemented on the Duplex level rather than on Transform.

@Fishrock123 One problem I could imagine is that high/low water marks can’t really ever have an identical and meaningful value if one side is in object mode and the other isn’t.

@guymguym

This comment has been minimized.

Show comment
Hide comment
@guymguym

guymguym Jul 31, 2017

Contributor

@Fishrock123 this occurs anytime you mix between object modes on the same transform stream, in that case the highWaterMark units is not suitable to measure the buffer size for one of the modes.

Here's an example to illustrate the issue:

fs.createReadStream('large-file-with-json-per-line')
  .pipe(new stream.Transform({
    readableObjectMode: false, // reading in buffers
    writableObjectMode: true, // writing out json decoded objects
    highWaterMark: ??? // bytes or objects???
    transform(buf, enc, callback) {
      // split pending data to lines ...
      lines.forEach(line => this.push(JSON.parse(line)));
      callback();
    }
  })
  .pipe(new stream.Transform({
    objectMode: true,
    transform(obj, enc, callback) {
      rest_api(obj).then(reply => callback(null, reply), err => callback(err));
    }
  })
  .pipe(new stream.Transform({
    readableObjectMode: true, // reading in objects
    writableObjectMode: false, // writing out buffers
    highWaterMark: ??? // bytes or objects???
    transform(obj, enc, callback) {
      callback(null, JSON.stringify(obj) + '\n');
    }
  })
  .pipe(fs.createWriteStream('output-file'));
Contributor

guymguym commented Jul 31, 2017

@Fishrock123 this occurs anytime you mix between object modes on the same transform stream, in that case the highWaterMark units is not suitable to measure the buffer size for one of the modes.

Here's an example to illustrate the issue:

fs.createReadStream('large-file-with-json-per-line')
  .pipe(new stream.Transform({
    readableObjectMode: false, // reading in buffers
    writableObjectMode: true, // writing out json decoded objects
    highWaterMark: ??? // bytes or objects???
    transform(buf, enc, callback) {
      // split pending data to lines ...
      lines.forEach(line => this.push(JSON.parse(line)));
      callback();
    }
  })
  .pipe(new stream.Transform({
    objectMode: true,
    transform(obj, enc, callback) {
      rest_api(obj).then(reply => callback(null, reply), err => callback(err));
    }
  })
  .pipe(new stream.Transform({
    readableObjectMode: true, // reading in objects
    writableObjectMode: false, // writing out buffers
    highWaterMark: ??? // bytes or objects???
    transform(obj, enc, callback) {
      callback(null, JSON.stringify(obj) + '\n');
    }
  })
  .pipe(fs.createWriteStream('output-file'));
@guymguym

This comment has been minimized.

Show comment
Hide comment
@guymguym

guymguym Jul 31, 2017

Contributor

@addaleax You are right of course, this behavior is inherited from Duplex.

However it makes less sense for Duplex streams compared to Transform, so I just chose to describe it from the usage point of view rather from the impl pov.

This is currently implemented inside the ReadableState and WritableState ctors:
https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L69
https://github.com/nodejs/node/blob/master/lib/_stream_writable.js#L49

Contributor

guymguym commented Jul 31, 2017

@addaleax You are right of course, this behavior is inherited from Duplex.

However it makes less sense for Duplex streams compared to Transform, so I just chose to describe it from the usage point of view rather from the impl pov.

This is currently implemented inside the ReadableState and WritableState ctors:
https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L69
https://github.com/nodejs/node/blob/master/lib/_stream_writable.js#L49

@guymguym

This comment has been minimized.

Show comment
Hide comment
@guymguym

guymguym Jul 31, 2017

Contributor

When I think about it, this is also relevant for transform streams of the same objectMode, as the input and output might still have different proportions. For example think of a transformer that pushes two lines for every input line...

Contributor

guymguym commented Jul 31, 2017

When I think about it, this is also relevant for transform streams of the same objectMode, as the input and output might still have different proportions. For example think of a transformer that pushes two lines for every input line...

@Fishrock123 Fishrock123 changed the title from Transform stream support for separate readableHighWaterMark and writableHighWaterMark to Duplex stream support for separate readableHighWaterMark and writableHighWaterMark Jul 31, 2017

@mcollina

This comment has been minimized.

Show comment
Hide comment
@mcollina

mcollina Aug 1, 2017

Member

I'm definitely 👍 on this. Would you like to assemble a PR?

Member

mcollina commented Aug 1, 2017

I'm definitely 👍 on this. Would you like to assemble a PR?

@guymguym

This comment has been minimized.

Show comment
Hide comment
@guymguym

guymguym Aug 2, 2017

Contributor

sure. will do.

Contributor

guymguym commented Aug 2, 2017

sure. will do.

@guymguym

This comment has been minimized.

Show comment
Hide comment
@guymguym

guymguym Aug 4, 2017

Contributor

PR is up. It's my first PR for nodejs so let me know if I missed any guidelines, or you prefer different code for any reason, and I would be happy to make changes.

Contributor

guymguym commented Aug 4, 2017

PR is up. It's my first PR for nodejs so let me know if I missed any guidelines, or you prefer different code for any reason, and I would be happy to make changes.

guymguym added a commit to guymguym/node that referenced this issue Aug 5, 2017

guymguym added a commit to guymguym/node that referenced this issue Aug 7, 2017

@mcollina mcollina closed this in c3c045a Aug 8, 2017

addaleax added a commit that referenced this issue Aug 10, 2017

stream: support readable/writableHWM for Duplex
This commits adds support for readableHighWaterMark and
writableHighWaterMark in Duplex stream, so that they can be set without
accessing the internal state.

Fixes: #14555
PR-URL: #14636
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

icarter09 added a commit to icarter09/node that referenced this issue Aug 12, 2017

stream: support readable/writableHWM for Duplex
This commits adds support for readableHighWaterMark and
writableHighWaterMark in Duplex stream, so that they can be set without
accessing the internal state.

Fixes: nodejs#14555
PR-URL: nodejs#14636
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment