Skip to content

Transforms that convert one chunk into many chunks may stop reading when readableHighWaterMark exceeded #49938

@IanAtOcucom

Description

@IanAtOcucom

Version

18.18.0

Platform

Microsoft Windows NT 10.0.19045.0 x64

Subsystem

stream

What steps will reproduce the bug?

I am only mostly certain this isn't caused by a misreading of the API documentation, but:

If you have a Transform whose writable side accepts a byte stream and may convert one incoming "chunk" into multiple output "chunks" (that is, one call to _transform() results in many calls to push()), if the push() returns false because of back-pressure, 'drain' will never be called on the stream because it has yet to fulfill the incoming chunk's read. This shows up especially when using for-await-of to handle the stream.

(based on this discussion: nodejs/help#2695)


/*
 * transform a "bytestream" into objects. [example uses strings]
 * The incoming byte stream chunk may contain 0..n objects
 * For each _transform() call, convert the stream into its many outputs
 * attempt to push them to the readable side.
 */
class TransformStream extends Transform {
  _decode(chunk,pos,callback) {
    if (pos >= 0 && pos < chunk.length) {
      let end    = chunk.indexOf(':',pos);
      end    = (end < 0 && pos < chunk.length) ? chunk.length : end;
      let substr = chunk.substring(pos,end);

      let noPressure = this.push(substr);

      if (noPressure) { //finish with the chunk or continue processing it
        if (end >= chunk.length) {
          return callback();
        } else {
          setImmediate(()=>{this._decode(chunk,end+1,callback)});
        }
      } else {
        this.once('drain',() => {
          this._decode(chunk,end+1,callback);
        })
      }
    } else {
      return callback();
    }
  }
  _transform(chunk,encoding,callback) {
    console.log("transforming chunk",chunk);
    this._decode(chunk,0,callback);
  }
}

async function wait() {
  return new Promise(resolve => {
    setTimeout(() => resolve(), 1000);
  });
}
    (async () => {
      const transformStream = new TransformStream({objectMode: true, readableHighWaterMark: 1)});

      transformStream.write('hello1:hello2:hello3:hello4:hello5');
      transformStream.write('hello6:hello7:hello8:hello9:hello10');
      transformStream.write('hello11:hello12:hello13:hello14:hello15');
      transformStream.end();

      for await (const text of transformStream) {
        await wait();
        console.error(text);
      }
    })();

How often does it reproduce? Is there a required condition?

We have a case where an input stream might be a large chunk that gets turned into a lot of outputs, and it would be more memory-efficient to pause the input stream rather than buffer the output objects. But using the highWaterMark option, the Transform stream stalls after filling the buffer with calls to push().

What is the expected behavior? Why is that the expected behavior?

There should be a way for an "amplifying" Transform or Duplex to pause reads from its input side until writes to its output side have been consumed. There should be a 'readableDrained' event of some kind to show that the buffer is empty on the output end.

What do you see instead?

reading from the Transform stops when the highWaterMark is set and callback()s from _transform() are paused.

There seems to be no way to detect that the output readable side is drained, because the 'drain' event only fires for the input writable stream.

While it IS possible to peek at the "private" _readableState and check that its buffer.length < highWaterMark and then call push() again, this isn't exactly a publicly documented "API".

Additional information

Example of peeking at private values:

 let interval = setInterval(() => {
          if (this._readableState.buffer.length < this._readableState.highWaterMark) {
            clearInterval(interval)
            this._decode(chunk,end+1,callback);
          }
        },1);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions