Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

First-class string type in serialization specification #13

Closed
pgriess opened this Issue · 7 comments

6 participants

@pgriess

Packing all strings as raw byte arrays makes it very difficult to figure out how to unpack them correctly. In particular, it is impossible to know what encoding was used when encoding the string as a sequence of bytes. To address this, it would be nice to have a first-class MSGPACK_OBJECT_STRING type with a mandatory encoding (say, UTF-8).

@aaronblohowiak

any news on this?

@andrewschaaf

Packing all strings as raw byte arrays makes it very easy to figure out how to unpack them correctly: as Buffers/byte[]s/....

Libraries could have

  • unpackRaw (doin' it rite)
  • unpackUnicode (attempting to autodetect encodings, starting with UTF-8)

instead of unpack

@tracker1

@andrewschaaf The issue is more along the lines of dealing with cross-system messages. For example one system may have a native in-memory representation of strings as UTF-16, another may user UTF-8 ... since UTF-8 is usually the most effecient, it would make sense to have a string type that is always UTF-8 encoded without a BOM.

@tracker1

For that matter, you could just put the UTF-8 encoded Byte Order Marker (BOM) at the beginning of your raw data, when reading out, you'll "know" that it's a UTF-8 string.

@cabo

The discussion of this issue is just exploding in #121

(And I'll plug http://tools.ietf.org/html/draft-bormann-apparea-bpack here, too.)

@cabo

Well, it seems we are continuing the technical discussion in #128 today.

@kuenishi kuenishi closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.