Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding/decoding round trip is wrong for multibyte UTF-8 codepoints #1

Closed
huonw opened this issue Sep 30, 2014 · 1 comment
Closed
Assignees
Labels
bug

Comments

@huonw
Copy link

@huonw huonw commented Sep 30, 2014

Strings are encoded with a prefix of their byte length, but the decoder takes that length and reads that many chars (aka unicode codepoints/Unicode scalar values). E.g. å is 2 bytes in UTF-8 so encoding "å" will write [2, 0xXX, 0xYY] (I don't know the exact encoding of å), and then decoding will first read the 2 bytes of the å and then try to read another codepoint from the stream, even though there's no string data left.

@TyOverby TyOverby added the bug label Oct 2, 2014
@TyOverby TyOverby self-assigned this Oct 2, 2014
@TyOverby
Copy link
Collaborator

@TyOverby TyOverby commented Oct 2, 2014

Fixed in 79bb788

@huonw: thanks for the bug report!

@TyOverby TyOverby closed this Oct 2, 2014
tkaitchuck added a commit to tkaitchuck/bincode that referenced this issue Jan 14, 2020
Fork repo and make basic changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.