Skip to content

Commit

Permalink
Do not mix encoding option with pipeing.
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeal committed Feb 11, 2011
1 parent d67a041 commit 6a98b9e
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion main.js
Expand Up @@ -198,7 +198,14 @@ Request.prototype.request = function () {
return; // Ignore the rest of the response
} else {
options._redirectsFollowed = 0;
if (options.encoding) response.setEncoding(options.encoding);

if (options.encoding) {
if (options.dest) {
console.error("Ingoring encoding parameter as this stream is being piped to another stream which makes the encoding option invalid.");
} else {
response.setEncoding(options.encoding);
}
}
if (options.dest) {
response.pipe(options.dest);
if (options.onResponse) options.onResponse(null, response);
Expand Down Expand Up @@ -258,6 +265,7 @@ Request.prototype.resume = function () {




function request (options, callback) {
if (callback) options.callback = callback;
var r = new Request(options);
Expand Down

9 comments on commit 6a98b9e

@jlank
Copy link

@jlank jlank commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for not mixing encoding with piping?

@mikeal
Copy link
Member Author

@mikeal mikeal commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most streams assume they are getting buffers, setting the encoding will make them encoded strings.

@jlank
Copy link

@jlank jlank commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, however what if a stream is expecting encoded strings? For example, I've run into an issue where not being able to set the encoding on an http stream causes the stream to chunk in-between multi-byte utf8 characters at random points. Being able to set the encoding prevents this from happening. You can set encoding in the core http and fs modules, which forces the underlying mechanism to only emit full utf8 characters.

IMO if you explicitly set the encoding on your source stream, the onus is on the developer to make sure the pipeline can handle that data format and be smart enough to distinguish between/handle strings and buffers. I came across this issue with request after doing research into streaming json parsers, my initial assumption was that it was an error on their part by not handling utf8 correctly, but after thinking about it some more, I think both sides should give a best effort at passing full utf8 chars around IF that is specified with the encoding option... what are your thoughts? If you'd like to see a demo of this bug, I already have a repo set up and recently modified it to reflect some of the changes I found today w/r/t request. You can check it out here jlank / streaming-jsonparsers-utf8-bug-demo ... sorry for the large size, clarinet has a lot of test data in samples/

@mikeal
Copy link
Member Author

@mikeal mikeal commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, not setting an encoding is the best way to, eventually, deal with multi-byte characters. what you want to do is setEncoding on the final stream that is getting data. since we only disable this when pipeing to another stream we assume that it will want to do the encoding or pass the encoding off to another library.

there was a much bigger discussion about data event mutation and I came down on the same side as @isaacs that streams should emit buffers and not other objects. a stream that is being piped to should, IMO, not setEncoding on it's input stream inside a pipe event, and if it wants to handle multi-byte it should either do the internal buffering correctly or pipe it to another stream internally that handles the encoding for it.

if someone writes a stream that doesn't handle buffers correctly that is definitely their bug, since buffers are the default data from any stream and not all streams offer a setEncoding method.

@jlank
Copy link

@jlank jlank commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool. So if I understand you correctly, the node dogma on this issue is (and sorry I missed it, tried to find a previous discussion on this via the google group but came up empty) that it is the responsibility of the recipient / mutator of a stream to appropriately parse and encode data coming into it. I can agree with that. Are you aware of any modules that exist that do this for utf8? I was about to write one for fun and try to drop it into jsonparse and clarinet and see what those guys thought of it. Maybe it can be a drop in setEncoding method for streaming modules that don't implement it.

@mikeal
Copy link
Member Author

@mikeal mikeal commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, calling it node dogma is overstepping a little, data mutation in general is a fairly contentious issue and many prominent stream writers like @dominictarr are in favor of mutating data in to all kinds of things including parsed JSON objects.

one thing I think all stream writers would agree on though is that a writable stream must handle buffers and should not rely on setEncoding because there are plenty of readable streams that don't implement it.

streams in 0.10 has far more functionality for stuff like this, you should be able to get the StringDecoder from isaacs/readable-stream and use it.

@jlank
Copy link

@jlank jlank commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check out StringDecoder, I appreciate the feedback! Very helpful and informative.

@isaacs
Copy link
Contributor

@isaacs isaacs commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StringDecoder is in core. require('string_decoder').StringDecoder

@mikeal
Copy link
Member Author

@mikeal mikeal commented on 6a98b9e Mar 12, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isaacs and how do you get at it from the readable-stream module?

Please sign in to comment.