Skip to content

Converting binary data to UTF-8 changes the data #3674

@MatteoT9890

Description

@MatteoT9890

Version

14.14.0

Platform

Linux matt 4.15.0-163-generic nodejs/node#171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

let buffer = fs.readFileSync("file.pdf")
let byteLength = Buffer.byteLength(buffer) // **Valid pdf, Length: 1440177**
let utf8String = buffer.toString("utf8") 
let bufferAgain = Buffer.from(utf8String,"utf8") // **Bad pdf, Length: 2551916*
buffer.equals(bufferAgain) // **Gives false**

How often does it reproduce? Is there a required condition?

Always with PDF files

What is the expected behavior?

Buffer of PDF file must be converted into string utf8, then converted again in the same inital buffer

What do you see instead?

The two buffer, one before conversion and one after conversion, are different.

Additional information

File pdf can be downloaded from this url.

When download, rename it to "file.pdf" in order to run the provided snippet code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions