Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpack strings to ASCII-8Bit, not UTF-8 #40

Closed
paddor opened this issue Oct 4, 2014 · 9 comments
Closed

Unpack strings to ASCII-8Bit, not UTF-8 #40

paddor opened this issue Oct 4, 2014 · 9 comments

Comments

@paddor
Copy link

paddor commented Oct 4, 2014

MessagePack doesn't know the encoding of any String in a message serialized with MessagePack. Some of the Strings might be byte arrays (since there's no real distinction between Strings and byte arrays in MessagePack), and those will cause problems later when the wrongly-encoded Strings are used where a correct encoding is expected.

I suggest you set the encoding of unpacked Strings to ASCII-8Bit (Encoding::ASCII_8BIT). Users of this library who expect a String (not a byte array) will have to use String#force_encoding because only they know what unpacked Strings have which encoding.

@funny-falcon
Copy link
Contributor

But they can do #force_encoding with UTF-8 strings too, if they know that string is in different encoding.
So what's the difference?

@paddor
Copy link
Author

paddor commented Feb 3, 2015

Yes they can, of course, after getting an invalid byte sequence exception. Since msgpack-ruby really can't know a String's encoding, I still suggest using the most basic encoding possible, which is Encoding::ASCII_8BIT.

@iconara
Copy link
Member

iconara commented Feb 3, 2015

#44 presents another solution, and the one I chose for the JRuby implementation. The solution proposed here would have been the right one with the first version of the MessagePack spec, but since the updated version and the addition of the UTF-8 type there's actually a way to tell the difference.

@paddor would you agree that #44 solves your problem? Can we close this issue in favour of it?

@paddor
Copy link
Author

paddor commented Feb 3, 2015

I guess you meant #45? Yeah, that would definitely be very nice solution to my problem.

@iconara
Copy link
Member

iconara commented Feb 3, 2015

Yes #45, sorry. That’s good. Now, if I only can get you to switch to JRuby, your problem will be solved…

On 3 feb 2015, at 17:09, Patrik Wenger notifications@github.com wrote:

I guess you meant #45? Yeah, that would definitely be very nice solution to my problem.


Reply to this email directly or view it on GitHub.

@paddor
Copy link
Author

paddor commented Feb 4, 2015

Why switch to JRuby? Does that pull request only work for JRuby or what? And don't worry, my problem is solved/worked-around already. Just an inconvenience I noticed and thought could be improved. Pretty sure the 'bin' type in MsgPack didn't exist back then yet.

@iconara
Copy link
Member

iconara commented Feb 4, 2015

I just meant that the semantics proposed in #44/#45 are already implemented in msgpack-0.5.10-java, but not in the MRI release of the same version.

@paddor
Copy link
Author

paddor commented Feb 4, 2015

Oh okay. No need to rush with that.

@tagomoris tagomoris reopened this May 26, 2015
@tagomoris
Copy link
Member

v0.6.0 released, which supports encoding between bin types and ASCII-8BIT string.
I believe that it solved this issue.
Please reopen if any problems exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants