Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bin 8 bit support for Ruby ASCII-8BIT data type #45

Merged
merged 5 commits into from
May 18, 2015

Conversation

mattheworiordan
Copy link
Contributor

This Pull Request adds support for the Msgpack bin data type. If you refer to http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/, you will see that the convention for Ruby binary data is to use the ASCII-8BIT encoding. The Ruby MsgPack gem will now detect ASCII-8BIT when encoding and encode as bin, and will return ASCII-8BIT strings if the data type is bin.

This PR fixes issue #44, and also resolves a few small other issues such as adding support for str 8 data type that was missing.

@mattheworiordan
Copy link
Contributor Author

Please note my build failed, however that is largely because the master branch is already failing, see https://travis-ci.org/msgpack/msgpack-ruby/jobs/37151501. I personally would drop support for Ruby 1.8.7 or release a separate PR to address adding 1.8.7 support removing all support for encoding.

mattheworiordan added a commit to ably-forks/msgpack-ruby that referenced this pull request Dec 3, 2014
@mattheworiordan
Copy link
Contributor Author

@iconara and @frsyuki is there any reason this isn't going to be merged in? If not, I will have to create a fork of this gem with a new namespace such as MessagePack5 so that it does not conflict with this namespace. It has been well over a month that this PR has been open and it's a real blocker for us as binary support is needed.

@iconara
Copy link
Member

iconara commented Feb 12, 2015

I'm all for it, but I'm the JRuby guy so I'll have to leave the final say to @frsyuki.

If anything I would have liked to see tests for what happens when a string is not UTF-8 nor ASCII-8BIT. I might be wrong, but it looks like all non-ASCII-8BIT strings are written as raw strings, and from my understanding of the spec that is not right. Raw strings should be UTF-8, so non-UTF-8 strings need to be transcoded before they are packed. Otherwise we're back to the same problem as the old MessagePack spec: you had no idea what encoding strings were in.

Here's how the JRuby implementation handles this case: https://github.com/msgpack/msgpack-ruby/blob/master/ext/java/org/msgpack/jruby/Encoder.java#L194-L196, there's also a (maybe cryptically described) test for it: https://github.com/msgpack/msgpack-ruby/blob/master/spec/jruby/msgpack_spec.rb#L131-L134

@mattheworiordan
Copy link
Contributor Author

Thanks for the quick reply @iconara.
Sure, it wouldn't be difficult to add that support, although equally Binary support is still better than the current state of affairs. Do you have any idea if @frsyuki will ever consider this change, because if not, I am going to start an new fork as I need MsgPack support in my client library yet if I'm using a fork and the person using my library is already using MsgPack, it will prevent anyone else from my using my library unfortunately. Any ideas on what I can do?

@nathany
Copy link

nathany commented May 15, 2015

@frsyuki Any feedback on this? I'm happy to use the msgpack-ably fork, but it would be nice if bin was supported in the official msgpack gem.

@mattheworiordan
Copy link
Contributor Author

@nathany I am probably going to create a MsgPackV5 fork that uses a different namespace as this fork is not ideal as the namespace would conflict if someone used both the old & forked MsgPack gems. However, if this is not a problem for you (it won't effect most people) then please do use my fork, I hope it helps.

@frsyuki
Copy link
Member

frsyuki commented May 18, 2015

Thank you for this pull-request.
As @iconara mentions, transcoding would be necessary if string is encoded in non-UTF8 encodings. Implementation of is_byte_array calls rb_eval_string and rb_funcall but it may impact performance significantly.
I created #62.

@tagomoris tagomoris merged commit 95eeb0d into msgpack:master May 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants