Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upString encoding/decoding round trip is wrong for multibyte UTF-8 codepoints #1
Labels
Comments
tkaitchuck
added a commit
to tkaitchuck/bincode
that referenced
this issue
Jan 14, 2020
Fork repo and make basic changes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Strings are encoded with a prefix of their byte length, but the decoder takes that length and reads that many
chars (aka unicode codepoints/Unicode scalar values). E.g.åis 2 bytes in UTF-8 so encoding"å"will write[2, 0xXX, 0xYY](I don't know the exact encoding of å), and then decoding will first read the 2 bytes of theåand then try to read another codepoint from the stream, even though there's no string data left.