New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-class string type in serialization specification #13

Closed
pgriess opened this Issue Jul 23, 2010 · 7 comments

Comments

Projects
None yet
6 participants
@pgriess

pgriess commented Jul 23, 2010

Packing all strings as raw byte arrays makes it very difficult to figure out how to unpack them correctly. In particular, it is impossible to know what encoding was used when encoding the string as a sequence of bytes. To address this, it would be nice to have a first-class MSGPACK_OBJECT_STRING type with a mandatory encoding (say, UTF-8).

@aaronblohowiak

This comment has been minimized.

Show comment
Hide comment
@aaronblohowiak

aaronblohowiak May 25, 2011

any news on this?

aaronblohowiak commented May 25, 2011

any news on this?

@andrewschaaf

This comment has been minimized.

Show comment
Hide comment
@andrewschaaf

andrewschaaf Aug 31, 2011

Packing all strings as raw byte arrays makes it very easy to figure out how to unpack them correctly: as Buffers/byte[]s/....

Libraries could have

  • unpackRaw (doin' it rite)
  • unpackUnicode (attempting to autodetect encodings, starting with UTF-8)

instead of unpack

andrewschaaf commented Aug 31, 2011

Packing all strings as raw byte arrays makes it very easy to figure out how to unpack them correctly: as Buffers/byte[]s/....

Libraries could have

  • unpackRaw (doin' it rite)
  • unpackUnicode (attempting to autodetect encodings, starting with UTF-8)

instead of unpack

@tracker1

This comment has been minimized.

Show comment
Hide comment
@tracker1

tracker1 Jul 26, 2012

@andrewschaaf The issue is more along the lines of dealing with cross-system messages. For example one system may have a native in-memory representation of strings as UTF-16, another may user UTF-8 ... since UTF-8 is usually the most effecient, it would make sense to have a string type that is always UTF-8 encoded without a BOM.

tracker1 commented Jul 26, 2012

@andrewschaaf The issue is more along the lines of dealing with cross-system messages. For example one system may have a native in-memory representation of strings as UTF-16, another may user UTF-8 ... since UTF-8 is usually the most effecient, it would make sense to have a string type that is always UTF-8 encoded without a BOM.

@tracker1

This comment has been minimized.

Show comment
Hide comment
@tracker1

tracker1 Sep 14, 2012

For that matter, you could just put the UTF-8 encoded Byte Order Marker (BOM) at the beginning of your raw data, when reading out, you'll "know" that it's a UTF-8 string.

tracker1 commented Sep 14, 2012

For that matter, you could just put the UTF-8 encoded Byte Order Marker (BOM) at the beginning of your raw data, when reading out, you'll "know" that it's a UTF-8 string.

@cabo

This comment has been minimized.

Show comment
Hide comment
@cabo

cabo Feb 20, 2013

The discussion of this issue is just exploding in #121

(And I'll plug http://tools.ietf.org/html/draft-bormann-apparea-bpack here, too.)

cabo commented Feb 20, 2013

The discussion of this issue is just exploding in #121

(And I'll plug http://tools.ietf.org/html/draft-bormann-apparea-bpack here, too.)

@cabo

This comment has been minimized.

Show comment
Hide comment
@cabo

cabo Feb 24, 2013

Well, it seems we are continuing the technical discussion in #128 today.

cabo commented Feb 24, 2013

Well, it seems we are continuing the technical discussion in #128 today.

@kuenishi

This comment has been minimized.

Show comment
Hide comment
@kuenishi
Member

kuenishi commented Aug 17, 2013

@kuenishi kuenishi closed this Aug 17, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment