child_process: fix sending utf-8 to child process#5016
child_process: fix sending utf-8 to child process#5016bnoordhuis wants to merge 3 commits intonodejs:v0.10from
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Would there be a good reason to use a StringDecoder? Otherwise LGTM. |
I'm not sure. The reason it doesn't use one now is simplicity and performance. Using a StringDecoder sounds reasonable because you'd expect partial character sequences to happen sooner or later, when the pipe fills up. But that must have been possible before 96a314b too, when ASCII was passed through as is, and I don't remember any bug reports about that. I'll see if I can create a test case where that's an issue before I land this PR. |
|
Yes, it needs to use a StringDecoder. In theory, it would not be an issue with ascii, because they're always a single byte. It's very unlikely that a pipe to a separate process on the same machine would ever get its chunks split, but it can conceivably happen. |
Well, what I mean is that if the issue exists, v0.8 is affected too because the 'ascii' encoding isn't actually ASCII, it's 8 bits. It lets UTF-8 through unmolested so, in theory, a character sequence could get split over read() / write() syscalls. IOW, not using a StringDecoder is arguably wrong but it's not a regression. I guess the thing to do here is to benchmark the impact of a StringDecoder and decide if that's something we can live with. |
|
No, it's not a regression, but it is certainly a bug. |
Ben is right, this bug is also in v0.8 (gist). |
In process#send() and child_process.ChildProcess#send(), use 'utf8' as the encoding instead of 'ascii' because 'ascii' mutilates non-ASCII input. It worked by accident in v0.8 but not in v0.10 because the high bits are now stripped when converting Buffers to ASCII strings. See commit 96a314b for details. Fixes nodejs#4999 and nodejs#5011.
Handle partial character sequences correctly, use a StringDecoder.
|
@isaacs @TooTallNate Re-review please. I've added a string decoder and a benchmark and - much to my surprise - there seems to be no appreciable performance impact. That's nice for a change unless it means child process I/O was dog slow to start with. :-/ |
|
Yeah, StringDecoder is pretty efficient, and no new test failures as a result of this. LGTM. |
In process#send() and child_process.ChildProcess#send(), use 'utf8' as
the encoding instead of 'ascii' because 'ascii' mutilates non-ASCII
input.
It worked by accident in v0.8 but not in v0.10 because the high bits
are now stripped when converting Buffers to ASCII strings. See commit
96a314b for details.
Fixes #4999 and #5011.
Reviewer: @isaacs or @TooTallNate