New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8 problems with msgpack? #15
Comments
I have similar problems to_msgpack() -> redis -> MessagePack.unpack(data) leads to UTF8 errors. It seems to happen with hashes and this is what the unpacked data looks like: (ruby 1.9.3 preview 1) |
Did either of you have any luck figuring this out? We're having a similar issue. |
Nope, never got down to the bottom of it.. |
I ran into the same issue with to_msgpack -> redis -> MessagePack.unpack. I tracked it down to a single UTF character Forcing ASCII-8BIT encoding before deserialization seems to fix the problem. |
Experienced the same problem - "force_encoding" solution described by @sgtFloyd fixed it! |
The redis-rb gem forces the Redis response encoding to Encoding::default_external in Redis::Connection::CommandHelper -- this is logical, as the string is coming from an external I/O stream so it uses the default here, which in most setups is UTF-8. The normal case of setting/getting UTF-8 encoded strings in Redis works as expected. But MessagePack is a binary serialization format, and it expects to unpack from a raw binary string, so you need to force the string you get from redis-rb into binary (or ASCII-8BIT as @sgtFloyd suggested above):
I think the MessagePack.unpack method itself should perform this force_encoding in a future version, but for now we have to do it ourselves. |
As each language implementation was separated, please open another issue at each repository if this is still problematic. Thank you. |
A bit of a shot in the dark, but has anyone come across problems with utf8 + msgpack? I'm using the Ruby bindings. Logged ~500 GB of data in zmpac format (stream + zlib), in ~200mb chunks (~1GB uncompressed). Trying to read the data back, and running into parse errors on random files.
Haven't had much luck tracking down the culprit so far, but if I try to sysread chunks of the file 1024 bytes at a time, and parse out the messages.. once the message is thrown, and I dump the buffer, I am seeing chinese characters, etc.
Same behavior under 1.8 and under 1.9. Any suggestions for how to recover this data, and/or any other tips?
The text was updated successfully, but these errors were encountered: