Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode (multibyte) characters are corrupted #21

Closed
lfdoherty opened this issue Jan 9, 2015 · 4 comments
Closed

unicode (multibyte) characters are corrupted #21

lfdoherty opened this issue Jan 9, 2015 · 4 comments

Comments

@lfdoherty
Copy link

Looking at the code - it's not safe to decode buffers to strings except at separator boundaries - they may end part-way through a multi-byte character, hence the issue.

@scharf
Copy link

scharf commented Jan 28, 2015

That's what I thought too. I wanted to create a failing test and tried this (based on the StringDecoder API example):

var StringDecoder = require('string_decoder').StringDecoder;
var decoder = new StringDecoder('utf8');

var cent = new Buffer([0xC2, 0xA2]);
console.log(decoder.write(cent));

var euro = new Buffer([0xE2, 0x82, 0xAC]);
console.log(decoder.write(euro));

console.log('v1='+decoder.write(new Buffer([0xE2, 0x82])) + decoder.write(new Buffer([0xAC])));
console.log('v2='+decoder.write(new Buffer([0xE2])) + decoder.write(new Buffer([0x82, 0xAC])));

If you look at the output, there seems no problem.

@lfdoherty
Copy link
Author

I think you're right. I reran this against the library I used to replace it (line-by-line), and eventually determined that the corrupt characters appear when reader.nextLine is called more than once before it returns.

@jedwards1211
Copy link
Contributor

@scharf Huh. I take this to mean that JS strings can store part of a multi-byte character?

@jedwards1211
Copy link
Contributor

So the following couldn't occur with line-reader's buffering, but this doesn't work:

var v2 = decoder.write(new Buffer([0xE2])) + decoder.write(new Buffer([0x82])) + decoder.write(new Buffer([0xAC]));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants