Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't convert Buffer into String correctly with binary encoding #12908

Closed
bennesp opened this issue May 8, 2017 · 6 comments
Closed

Can't convert Buffer into String correctly with binary encoding #12908

bennesp opened this issue May 8, 2017 · 6 comments
Labels
buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers.

Comments

@bennesp
Copy link

bennesp commented May 8, 2017

Hello, I was working with binary files and Buffer, and encountered this problem.

Code:

b = Buffer.from([0xca, 0xc5, 0x0e])
console.log(b.toString('binary'))
console.log(b)

I would expect as output something like:

ÊÅ^N

which is exactly the 3 bytes 0xca, 0xc5, 0x0e
Instead of this I got the equivalent of:

\xc3\x8a\xc3\x85\x0e

Is it the expected behavior? If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

  • Version: 7.9.0
  • Platform: Mac OS X Sierra
@addaleax addaleax added buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers. labels May 8, 2017
@addaleax
Copy link
Member

addaleax commented May 8, 2017

I would expect as output something like: ÊÅ^N

That is the output that you get, though: The string ÊÅ^N – encoded as UTF-8.

which is exactly the 3 bytes 0xca, 0xc5, 0x0e

Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1 in Node v6+, and binary before).

Is it the expected behavior?

Yes. You’re using console.log() to print the string to the terminal, which uses process.stdout.write() in its implementation; and unless you call process.stdout.setDefaultEncoding(…), the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).

So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)

If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

b.toString('binary') (b.toString('latin1')) is actually the right answer – you can verify that by running b.toString('binary') === "\xca\xc5\x0e" on your example, which does return true as expected.

Let me know if this doesn’t help!

@bennesp
Copy link
Author

bennesp commented May 8, 2017

Thank you so much for the useful and complete explanation, you saved my day!

@addaleax
Copy link
Member

addaleax commented May 8, 2017

Okay, I’m closing this as an answered question then – feel free to ask any follow-up questions, here or on https://github.com/nodejs/help. :)

@addaleax addaleax closed this as completed May 8, 2017
@dustinmichels
Copy link

dustinmichels commented Jul 5, 2018

Hi there! Sorry but I'm still a bit confused.

In the nodejs docs I see this example:

const buf1 = Buffer.from('this is a tést');
const buf2 = Buffer.from('7468697320697320612074c3a97374', 'hex');

console.log(buf1.toString());
// Prints: this is a tést
console.log(buf2.toString());
// Prints: this is a tést

Why can't I interact with a binary representation the same way I can with hex?

For instance:

const buf3 = Buffer.from('11101...0100', 'binary') // '...' just used to truncate
buf3.toString()
// Prints '11101...0100', not 'this is a tést' (which I want!)

Furthermore:

buf1.toString('hex')
// Prints: 7468697320697320612074c3a97374

buf1.toString('binary')
// Prints: 'this is a tést', not '11101...0100' (which I want!)

Thank you very much for your help.

@richardlau
Copy link
Member

@dustinmichels binary is an alias for latin1: https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings

@dustinmichels
Copy link

I see. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers.
Projects
None yet
Development

No branches or pull requests

4 participants