Respecting encoding? #14

kirbysayshi · 2014-04-08T07:33:33Z

Shouldn't enc in the below example be 'utf8'? In node 0.10.26 it's buffer:

var fs = require('fs');
var through2 = require('through2');
fs.createReadStream('./file.json', { encoding: 'utf8' })
  .pipe(through2(function(chunk, enc, cb) {
    console.log('chunk', chunk, 'enc', enc);
    this.push(chunk);
    cb();
  }))

Is it expected that transforms are doing string conversions? Am I doing something wrong?

The text was updated successfully, but these errors were encountered:

rvagg · 2014-04-23T10:39:06Z

that's a good question actually... try through2({ encoding: 'utf8' }, function (chunk, enc, cb) { ... instead to explicitly set it on your transform.

kirbysayshi · 2014-04-25T23:21:35Z

So I gave that a try, still got a buffer:

var fs = require('fs');
var through2 = require('through2');
fs.createReadStream('./file.json', { encoding: 'utf8' })
  .pipe(through2({ encoding: 'utf8' }, function(chunk, enc, cb) {
    console.log('chunk', chunk, 'enc', enc);
    this.push(chunk);
    cb();
  }))

$ node stream_test.js
chunk <Buffer 7b 0a 20 20 22 6e 61 6d 65 22 3a 20 22 61 70 70 2d 74 72 61 69 6e 69 6e 67 2d 30 31 22 2c 0a 20 20 22 76 65 72 73 69 6f 6e 22 3a 20 22 30 2e 30 2e 30 22 ...> enc buffer

rvagg · 2014-04-26T06:20:13Z

@Raynos help! I don't have time to try and understand this at the moment, any chance you understand this off the top of your head?

laurelnaiad · 2014-04-26T07:53:41Z

You might try setting {decodeStrings: false} on the through2 options (as they should pass through to the transform stream constructor). http://nodejs.org/api/stream.html#stream_new_stream_transform_options

edit: curiousity got me -- yes, that works.

kirbysayshi · 2014-04-26T20:07:55Z

@stu-salsbury Thanks! I suppose this works this way so that a transform can be configured by its creator to know what type of data to transform. Still a little weird though that encoding: 'utf8' doesn't force decodeStrings to false. Although perhaps encoding is passed to the readable component of the transform, and not the writable.

laurelnaiad · 2014-04-26T22:27:14Z

np -- I've been spending more time than I ever intended reading and rereading that page lately!

alessioalex · 2014-05-21T15:02:10Z

Note to myself: should try to figure out how to fix this.

alessioalex · 2014-05-22T16:08:11Z

Ok so I think I've got to the bottom of it:

https://github.com/isaacs/readable-stream/blob/master/lib/_stream_writable.js#L233-L240

The thing is that there is an option called decodeStrings which is set by default to true, even when the encoding is set to utf8. By passing decodeStrings:false as an option to through2 the encoding will be respected.

Raynos · 2014-05-22T22:46:15Z

Seems like a bug.

Raynos · 2014-05-22T22:46:42Z

we shouldnt encode a buffer as utf8 to just decode it to a buffer.

laurelnaiad · 2014-05-22T23:41:07Z

Why should through2 not respect the decoding defaults of node? I think it's working as it should be designed...

That is to say that I don't think through2 should make assumptions about encoding or decoding, and the node Transform stream defaults should be through2's defaults. Both decodeStrings and encoding are valid properties of Transform stream that the user of through2 can set according to their intent, since through2 rightly passes through the options specified in its parameter to the Transform stream constructor.

If the thinking is that node's Transform stream defaults are the bug, then I don't think altering the defaults in through2 is an appropriate way of expressing that.

Raynos · 2014-05-22T23:48:57Z

@stu-salsbury agreed. this sounds like a node core bug.

laurelnaiad · 2014-05-22T23:50:18Z

quoting from the docs -- "encoding String If specified, then buffers will be decoded to strings using the specified encoding." So this says that if a buffer comes in, do you treat it as if it is encoded a certain way.... if an incoming stream is already a string, as is the case in the example above, then this option will have no effect... so the point here is that you aren't actually ever decoding a buffer in through2 if you get an incoming string... and that decodeStrings = false is telling the transform stream not to decode back to a buffer. I agree that this is a strange choice though perhaps meant to encourage buffer usage and hence drive performance? I dunno. Somebody went to the trouble to document that decodeStrings defaults to true so it seems intentional.

EDIT: I wonder if this is a backward compatibility issue for node, more than a choice that anyone would support in a greenfield design.

related to rvagg/through2#14

TakenPilot · 2014-10-10T10:53:26Z

If this is still an open issue, what would be needed to resolve it?

If the start of stream is being defined somewhere else (in another function, another class, etc.), it may not be clear what encoding the person using through2 should expect. Example:

//this is really annoying
someObject.getStream().pipe(through2(function (chunk, enc, cb) {
  switch(enc) {
    case 'utf8':
    case 'base64':
    etc...
  }
}))

Therefore, there is a big advantage to being able to set the encoding as an option of through2, and there is also a big advantage in when you don't want to. I think that having the setting optional, as well as being close to the function using the data, is very useful.

So is this really an issue?

heroboy · 2016-01-18T13:47:01Z

Hi, my English is bad, so I will express myself in code.

var d = through(opts, transform);
//then we have three questions.
1. d.write(buf);             //buf is Buffer or String
2. function transform(chunk);//chunk is Buffer or String
3. d.on("data", data=> { }); //data is Buffer or String
//the answer is
if (buf is Buffer)
{
    chunk is Buffer;
}
else // buf is String
{
    if (opts.decodeStrings == false)
        chunk is String;
    else
        chunk is Buffer;
}
if (opts.encoding)
   data is String;
else
   data is Buffer;

So if you want your transformFucntion to process strings, but the input buf are Buffers. You need do like this:

through({encoding:"utf8"})                 //convert Buffer to String (buf => data)
    .pipe(through({decodeStrings:false},   //prevent convert String to Buffer (buf => chunk)
        function(chunk,enc,cb){}           //chunk is string
);

The relations are so complex,I think through2 should make these more easily

evg-zhabotinsky · 2016-04-14T04:07:04Z

I have a problem of trying to get to the bottom of things too often, and I found that this behavior actually boils down to following:

In Node, non-object streams do not have encoding, semantically they are byte streams. IO operations have encoding, and streams only provide defaults that can be overridden for separate operations. If read has no encoding, it produces buffer instead of string, and write ignores encoding for buffers.
.pipe() seems to treat all streams as object streams, i.e. it reads/writes whole strings with stream's default encoding. For example, here goes utf8->utf16le converter:
```
process.stdin.setEncoding('utf8');
process.stdout.setDefaultEncoding('utf16le');
process.stdin.pipe(process.stdout);
```
There are writableObjectMode=true and readableObjectMode=true options (and objectMode=true that enables both). They enable Object Mode for write and read sides of transform stream.
decodeStrings=false is a performance hack which only disables automatic string decoding on writable stream, for when it is not needed. It looks awfully like Object Mode, but at least semantically it is not.

Considering all above, observed behavior is not a problem (at least not with through2 or transform streams). What OP tried to do (or at least ended up with) is object-input stream (don't know about output), so he should have used writableObjectMode=true. In his code, previous stream in pipeline is responsible for decoding and slicing up data, transform stream gets finished objects.

The real problem is that I got all of above from experimenting and reading library code, not from documentation.

hollowdoor · 2017-03-09T02:06:01Z

@evg-zhabotinsky Knowing that maybe something like this should be done.

var t2 = new DestroyableTransform(options)
//Right after construction.
//Maybe a more precise condition would be better.
//if(!!options.decodeStrings && options.encoding !== 'buffer'){
if(options.encoding !== 'buffer'){ 

    t2.once('pipe', function(src){
        //This is emitted synchronously in the src.pipe method
        if(src._readableState.encoding !== 'buffer'){
             t2._writableState.objectMode = true;
        }   
    }); 
}

But that might cause problems for people who expect buffers all the time because of how the current streams work.

alessioalex added a commit to alessioalex/debug-stream that referenced this issue Aug 11, 2014

fix the encoding issue with through2

357a984

related to rvagg/through2#14

alessioalex mentioned this issue Aug 11, 2014

fix the encoding issue with through2 mafintosh/debug-stream#1

Open

roccomuso mentioned this issue Jul 27, 2017

filter() not available on emitted data roccomuso/netcat#4

Open

rvagg closed this as completed Nov 6, 2018

Spoondogg mentioned this issue Apr 9, 2019

0-9_GulpBuild.processSass() Spoondogg/ICARUS#83

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respecting encoding? #14

Respecting encoding? #14

kirbysayshi commented Apr 8, 2014

rvagg commented Apr 23, 2014

kirbysayshi commented Apr 25, 2014

rvagg commented Apr 26, 2014

laurelnaiad commented Apr 26, 2014

kirbysayshi commented Apr 26, 2014

laurelnaiad commented Apr 26, 2014

alessioalex commented May 21, 2014

alessioalex commented May 22, 2014

Raynos commented May 22, 2014

Raynos commented May 22, 2014

laurelnaiad commented May 22, 2014

Raynos commented May 22, 2014

laurelnaiad commented May 22, 2014

TakenPilot commented Oct 10, 2014

heroboy commented Jan 18, 2016

evg-zhabotinsky commented Apr 14, 2016

hollowdoor commented Mar 9, 2017

Respecting encoding? #14

Respecting encoding? #14

Comments

kirbysayshi commented Apr 8, 2014

rvagg commented Apr 23, 2014

kirbysayshi commented Apr 25, 2014

rvagg commented Apr 26, 2014

laurelnaiad commented Apr 26, 2014

kirbysayshi commented Apr 26, 2014

laurelnaiad commented Apr 26, 2014

alessioalex commented May 21, 2014

alessioalex commented May 22, 2014

Raynos commented May 22, 2014

Raynos commented May 22, 2014

laurelnaiad commented May 22, 2014

Raynos commented May 22, 2014

laurelnaiad commented May 22, 2014

TakenPilot commented Oct 10, 2014

heroboy commented Jan 18, 2016

evg-zhabotinsky commented Apr 14, 2016

hollowdoor commented Mar 9, 2017