Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about char encoding #63

Closed
sffc opened this issue Jun 21, 2022 · 3 comments
Closed

Question about char encoding #63

sffc opened this issue Jun 21, 2022 · 3 comments

Comments

@sffc
Copy link

sffc commented Jun 21, 2022

According to the new wire format spec:

A char will be encoded in UTF-8 form, and encoded as a string.

I was curious about the choice of this encoding versus encoding a varint(u32) encoding, which seems like it would be more compact on average.

@jamesmunns
Copy link
Owner

That's a good question!

The answer is probably "I didn't think of that"! Originally, I think I might have sent the whole [u8; 4], and at the time when I switched to "as a string" encoding, varints were limited to enum discriminants and slice lengths. It didn't occur to me to switch it to a varint(u32) when I was doing the "varint everything" rework.

It certainly would be possible to make a VarintChar wrapper type that does this behavior. That being said, I don't find many people using char in practice. Most people are already using String/&str types, even for single characters.

Closing this as resolved, but feel free to re-open if there are any follow-ups!

@finnbear
Copy link

I was just running some benchmarks and noticed that this crate not only uses UTF-8 char encoding but stores an additional "length" field to count the bytes in the char. The length is implicit in the first UTF-8 byte, so it could be omitted. Chars are probably rare, but this could make common chars twice as small.

@jamesmunns
Copy link
Owner

jamesmunns commented Apr 17, 2023

Hey @finnbear, congrats on releasing bitcode :)

There are definitely better ways to encode char - but I've stabilized the wire format for postcard 1.0. I'm going to open a new issue to track this for a (someday) postcard 2.0 wire format, so it doesn't get dropped there.

I think either varint(u32) or "just put the raw utf8 bytes which are already basically a compatible varint" are both viable options.

edit: opened as #101.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants