Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Do not mix encoding option with pipeing.
- Loading branch information
Showing
1 changed file
with
9 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for not mixing encoding with piping?
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most streams assume they are getting buffers, setting the encoding will make them encoded strings.
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, however what if a stream is expecting encoded strings? For example, I've run into an issue where not being able to set the encoding on an http stream causes the stream to chunk in-between multi-byte utf8 characters at random points. Being able to set the encoding prevents this from happening. You can set encoding in the core http and fs modules, which forces the underlying mechanism to only emit full utf8 characters.
IMO if you explicitly set the encoding on your source stream, the onus is on the developer to make sure the pipeline can handle that data format and be smart enough to distinguish between/handle strings and buffers. I came across this issue with request after doing research into streaming json parsers, my initial assumption was that it was an error on their part by not handling utf8 correctly, but after thinking about it some more, I think both sides should give a best effort at passing full utf8 chars around IF that is specified with the encoding option... what are your thoughts? If you'd like to see a demo of this bug, I already have a repo set up and recently modified it to reflect some of the changes I found today w/r/t request. You can check it out here jlank / streaming-jsonparsers-utf8-bug-demo ... sorry for the large size, clarinet has a lot of test data in samples/
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, not setting an encoding is the best way to, eventually, deal with multi-byte characters. what you want to do is setEncoding on the final stream that is getting data. since we only disable this when pipeing to another stream we assume that it will want to do the encoding or pass the encoding off to another library.
there was a much bigger discussion about
data
event mutation and I came down on the same side as @isaacs that streams should emit buffers and not other objects. a stream that is being piped to should, IMO, not setEncoding on it's input stream inside apipe
event, and if it wants to handle multi-byte it should either do the internal buffering correctly or pipe it to another stream internally that handles the encoding for it.if someone writes a stream that doesn't handle buffers correctly that is definitely their bug, since buffers are the default
data
from any stream and not all streams offer a setEncoding method.6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool. So if I understand you correctly, the node dogma on this issue is (and sorry I missed it, tried to find a previous discussion on this via the google group but came up empty) that it is the responsibility of the recipient / mutator of a stream to appropriately parse and encode data coming into it. I can agree with that. Are you aware of any modules that exist that do this for utf8? I was about to write one for fun and try to drop it into
jsonparse
andclarinet
and see what those guys thought of it. Maybe it can be a drop insetEncoding
method for streaming modules that don't implement it.6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, calling it node dogma is overstepping a little,
data
mutation in general is a fairly contentious issue and many prominent stream writers like @dominictarr are in favor of mutatingdata
in to all kinds of things including parsed JSON objects.one thing I think all stream writers would agree on though is that a writable stream must handle buffers and should not rely on setEncoding because there are plenty of readable streams that don't implement it.
streams in 0.10 has far more functionality for stuff like this, you should be able to get the StringDecoder from isaacs/readable-stream and use it.
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check out StringDecoder, I appreciate the feedback! Very helpful and informative.
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The StringDecoder is in core.
require('string_decoder').StringDecoder
6a98b9e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isaacs and how do you get at it from the readable-stream module?