-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a Ruby extension. #121
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/. If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits. Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name. |
Ping @googlebot: made my Google org membership public, retry? |
@@ -236,7 +236,16 @@ python_EXTRA_DIST= \ | |||
python/setup.py \ | |||
python/mox.py \ | |||
python/stubout.py \ | |||
python/README.txt | |||
python/README.txt \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby files need to be listed in a separate ruby_EXTRA_DIST variable and added to all_EXTRA_DIST (see line 250). Also please add ruby to line 26 in https://github.com/google/protobuf/blob/master/configure.ac#L26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
Ping @googlebot, CLA fixed now? |
CLAs look good, thanks! |
Glad to see this happening! Will it work on JRuby? cc @headius |
f4ee2a7
to
4bf2296
Compare
@tamird, we haven't looked into JRuby yet. Is this an use case you'd be interested in seeing supported? We'd be happy to take PRs to add support or fix any issues related to JRuby if so. (Cursory searching indicates JRuby does support C-based extensions, so maybe this won't be so bad?) |
@cfallin JRuby originally merged a GSoC project that implemented parts of the MRI C extension API, but it was never "Production Ready™" and crashed quite frequently. They subsequently removed it. MRI C extensions are incompatible with JRuby. |
@cfallin unfortunately, I believe JRuby's c-ext support is DOA. This is a really important use case for Square, because we are heavy users of JRuby. The VM-agnostic way forward is probably to use https://github.com/ffi/ffi There's another aspect to consider: Ruby C-extensions preclude many Ruby programmers from making meaningful contributions, since Ruby programmers tend not to be C-literate. |
This adds a Ruby extension in ruby/ that is based on the 'upb' library (now included as a submodule), and adds support for Ruby code generation to the protoc compiler.
JRuby issues should be investigated further, but let's merge this for now. |
@haberman merging this breaks JRuby support |
@tarcieri How can this break JRuby support if there was never JRuby support to begin with? |
@tarcieri, could you provide some more detail? We didn't previously have JRuby support. Do you mean this conflicts with some third-party JRuby extension? If so, how (namespace conflict, ...)? |
The problems were outlined above: you have switched from a pure Ruby implementation to one which uses the MRI-specific C extension API. Gems that use the C extension API are specifically incompatible with JRuby. Using the MRI C extension API is, IMO, a bad idea in general. You should be using FFI. |
Apparently I had this confused with: https://github.com/localshred/protobuf That said, my concerns still stand re: using the C extension API. This is not a portable implementation. |
@tarcieri, this is actually the first Ruby extension to exist in the google/protobuf repository, native-C or otherwise; unclear what you mean by "switched"? We wrote an extension from scratch, and we chose to write it in C for performance reasons. If JRuby compatibility is a strong need then we're happy to look into other ways of doing things as well, but this isn't a regression w.r.t. features/support. |
@cfallin please see my previous comment. In order to be a portable extension, this entire patch needs to be rewritten with FFI. |
And to reiterate, when I say "support JRuby", I really mean "support any Ruby interpreter besides MRI". Various implementations (e.g. Rubinius) have various degrees of support for the MRI C extension API, but FFI is the official way to produce native extensions that are portable across many Ruby implementations. |
@tarcieri Thanks for alerting us to the JRuby issue. We certainly want to be as portable as possible, all else being equal. Protocol Buffers present somewhat unique optimization challenges. When data is being parsed, the library needs to set a bunch of Ruby object members very rapidly. If the data is being parsed with a C library (which is required for speed), this means crossing the C/Ruby barrier a lot. This is very analogous to the performance challenges of implementing JSON with Ruby. None of the fast JSON libraries for Ruby (JRuby or MRI) use FFI, from what I can tell. They all (JSON, Oj, jrjackson) use native APIs, either MRI/C or JRuby/Java. If you look at yajl-ffi (the one Ruby JSON library I could find that does use FFI), it is about 6x slower than its non-FFI counterparts. See: https://github.com/dgraham/yajl-ffi#user-content-alternatives Using FFI also means that you can trigger SIGSEGV from Ruby. I still think it's worthwhile to support JRuby, whether through Ruby-FFI or by writing Java directly to bind to the protobuf-Java library. My point is just that it's not as simple as: "we need to rewrite this with FFI." FFI has plusses but also significant minuses compared to what we just merged. |
I would suggest extracting the data in a single native call, then constructing a new object with all of its members already mapped in via FFI.
This is a non-argument. It only means you relocate the unsafe code from C to Ruby, which IMO is a plus as you're dealing with one language at the boundary, not two.
The only alternative is a completely different implementation for JRuby versus other implementations. |
In what format? Whatever format you choose, now you have to parse it again from Ruby. The performance will still suffer. I would again point out that no existing JSON library with reasonable performance does what you are suggesting.
Yes, but it also means that unsafe APIs are now exposed to Ruby. Without FFI, a bug in Ruby can't (generally) SIGSEGV the interpreter. Once FFI is loaded, that assurance is gone. It's a difference worth mentioning.
Yes, though it could share code with proto2-Java. |
An
I think you have a lot to learn about FFI
Unsafe C code will always expose the same vulnerabilities to Ruby vicariously through unsafe C code. By even attempting to interact with native code, in any form, whatsoever, you are introducing potential unsafety into your program. There is no way around this. Unsafety in any form is unsafe and can be introduced via misusing FFI or writing bad C code. The net effect is the same. The only difference is someone trying to understand the unsafety must now focus on your Ruby code, your C extension, and the target library instead of just Ruby FFI code and the target library. Using a C extension complects the problem.
I've written Ruby libraries that provide both C backends + Java backends as well as FFI gems that bind to native code with a single library, e.g.: Trying to implement the same API on both C and Java backends is an incredibly difficult undertaking that requires expert understanding of both CRuby and JRuby semantics. I would strongly recommend against it if possible. In my experience in practice, it results in ongoing incompatibilities between the different backends. FFI is the best path to getting consistent semantics across Ruby implementations. |
By looking at https://github.com/dgraham/yajl-ffi/blob/master/lib/yajl/ffi/tasks/benchmark.rake and https://github.com/dgraham/yajl-ffi/blob/d0a04a0a15536482f92d3ac56879b4711346b9e1/lib/yajl/ffi/parser.rb#L34 they use pure ruby to read the file while the native c ruby ext reads it in c apparently https://github.com/brianmario/yajl-ruby/blob/229e0f7e74d282e901db2b0a380374c8db462a12/ext/yajl/yajl_ext.c#L494. It looks like thats the performance bottleneck. When writing a ffi binding you want to stay out of ruby scope as long as remotely possible and only hand over the results to ruby with one single ffi call. |
It seems you have hit the merge button on thousands upon thousands of lines of C code, submitted 4 days ago, without a single comment, in the presence of outstanding complaints to the contrary. What guarantee to security conscious users have that this code is safe? As a security practitioner this is an unconscionable act, and unless this situation is remedied I will actively seek to discourage anyone from using this code until it is better reviewed. |
Wow, I think that might be a record for the shortest amount of time between meeting someone and having them accuse me of an "unconscionable act." I'm Josh, by the way. Nice to meet you. This code has been under review internally (inside Google) for the last month, with a total of 176 comments going back and forth between Chris and his reviewers (mainly me). We have moved our open-source repository to GitHub recently but are still doing most of our code reviews internally for the moment. This may change soon but I'm not sure. On top of that, we have plans to get feedback from the Google security team, which could lead to further changes. I would appreciate it if you would approach your comments more respectfully. |
Easier said than done. :) My previous message left out a lot of the details and logical steps I was relying on to reach my conclusion. I should explain this in a bit more detail. Also I've just done a deep dive into the Ruby-FFI source code, so I can talk more specifically in terms of Ruby-FFI concepts and implementation details rather than relying on my general knowledge of VM's and FFI's as I was before. The tricky part with Protocol Buffers (and JSON) is that the result is not a single object but rather a tree of objects. The top-level message can contain pointers to submessages and arrays, which can contain pointers to other submessages. This poses a significant memory management challenge. Unlike the use case for
If I define But that still leaves the memory management problem. How will the message tree (including submessages) get freed? I want them to get freed when they are no longer reachable.
|
Using something like FFI::ManagedStruct also has the problem that it relies on ObjectSpace.define_finalizer, which is the only way for Ruby-FFI to instruct the GC what to do once a Object is no longer in use. ObjectSpace.define_finalizer is up to 20 times slower than releasing the memory manually in my own tests. Made a gist which shows the diffrence it can make |
The ext is extremely large, and I would not want to maintain it. However, if we did a JRuby version, I assume we'd be able to use a Java library to do the heavy lifting, right? Does the Java lib produce byte[] or String? If it is the latter...that is a shame, and performance is already suffering. If it is byte[] the JRuby ext should be able to beat the MRI ext easily. We could also just write JRuby support calling the Java lib directly. I am very disappointed that in a month of internal review nobody thought to consider JRuby support and nobody had any concern about completely shutting JRuby users out of this library. |
require "rake/testtask" | ||
|
||
spec = Gem::Specification.new do |s| | ||
s.name = "protobuf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name collides with already existing and maintained since 2011 project - https://github.com/localshred/protobuf. Can we change it to google-protobuf
for example and host on rubygems.org?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads-up. I think Chris will follow up with a PR to rename this to something that doesn't collide with pre-existing work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, PR coming for this.
We want our work to be as widely usable and as portable as possible, and there was no intent to exclude JRuby. We certainly would have made one library that could support all Ruby implementations if there was a practical way of doing this. But our benchmarks and preliminary investigations indicated that:
However, we would be happy to see JRuby support. We probably won't be able to implement this ourselves for now (we have lots of languages and environments that we're working on spinning up), but the good news is that a JRuby version of this would probably be a lot smaller and simpler than the MRI extension we have here. I would love to have a more extended conversation about how this could work. One of the main reasons the MRI extension is complicated is that we can't just wrap a C or C++ representation due to memory management mismatches between Ruby and C/C++. C/C++ don't use a GC and Ruby does. This means we have to duplicate a lot of functionality from protobuf-C++. However JRuby wrapping protobuf-java could potentially be a much more straightforward wrapping, because both Java and JRuby use the JVM's GC. So writing protobuf support for JRuby could end up being a relatively thin and straightforward wrapper around protobuf-java. I don't know enough of the details of JRuby to say for sure. But I would encourage you or any other JRuby expert to look into this a bit more and see if this would work. I will support you in this however I can. |
protobuf support for JRuby could probably be written entirely in Ruby, using protobuf-java directly. I'd be surprised if it wasn't as fast or faster than MRI. |
I too am very unhappy with the lack of JRuby support coming out of the gate. I understand a lot of people use MRI but just as many of us only use JRuby in production, I was super excited until everybody started mentioning it was built as a C-Ext... then my dreams crashed (not really but some will know what it's like to be super excited and then be let down.) |
I started looking into a pure-Ruby wrapper around protobuf-java, but I'm not familiar with that API and the names don't appear to match between the Ruby API and the Java API. Pulling in the Java API is as simple as doing |
FWIW my pure ruby implementation is not "unusable slow", able to do rpc round trips (w/ db access) in under 25ms. https://GitHub.com/localshred/protobuf </shameless_plug> |
upb_path = File.absolute_path(File.dirname($0)) + "/../../../upb" | ||
libs = ["upb_pic", "upb.pb_pic", "upb.json_pic"] | ||
system("cd #{upb_path}; make " + libs.map{|l| "lib/lib#{l}.a"}.join(" ")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to check the exit code of this? $?.success?
@localshred Pure Ruby parsers like yours work great for small-ish payloads. It's only when the payload grows larger, or parsing/serialization is happening in a tight loop, that the difference becomes noticeable. Here are the benchmark numbers we got from our early experiments. This is for decoding a protobuf over and over:
The pure-Ruby decoders for protobufs are 10-20x slower than the built-in Ruby JSON module (depending on how you measure). While this doesn't matter for small payloads, we didn't want users to experience this slowdown for larger payloads. Our C-based extension is much faster than the pure-Ruby implementations (I don't have specific numbers on this yet, but we'll publish them when we have them.). |
Your JRuby-based implementation is the worst one...since it doesn't exist. Working code is better than nothing. |
I also assume you tested ruby protobuf on MRI. MRI does no optimization of code, no JIT, and has a relatively weak GC. On JRuby, the numbers would almost certainly be a lot better. |
@haberman I see what you're saying. I'm not trying to argue that ruby is faster than C ;). My only point is that "unusably" is hyperbole and doesn't apply to all orgs or stacks. Also agree with @headius that you didn't run my code on JRuby, which is significantly faster than MRI. All in all though, love that you guys are putting ruby support as a first-class priority. I'm assuming ruby RPC support is not present, is that true (sorry, haven't looked at the full diff)? Assume that would have to go in the Protobuf-Socket-RPC lib. |
By rpc support do you mean parsing out and providing classes for services with methods? I don't believe protocol buffers enforce any sort of transport requirements. |
I have to agree that unusable is hyperbole unless it's tested in multiple ways: many large payloads, many small payloads, small large payloads and small small payloads. For us we normally send tons of small payloads vs huge payloads and at the end of the day a Ruby implementation could probably reliably handle many small payloads at once (no I have not tested, but this is an assumption based on my past experience.) |
Sorry if "unusable" was too strong of a word. I didn't mean to suggest that no one could benefit from a slower implementation. Plenty of people have small or medium data sets and get lots of benefits from pure-Ruby parsing. For our purposes, it's important that people see good performance even with larger data sets. Parsing a protobuf from Ruby should be at least as fast as parsing JSON from Ruby, and ideally faster. Protocol Buffers have had a pure Python implementation for a while, and lots of people (both inside and outside Google) have complained about its performance limitations. We are trying to offer an implementation that will satisfy even performance-sensitive users. I want to reiterate that we would love to see JRuby support, and it sounds like there are several interested parties who could help to make this happen. I think that a JRuby library that wraps protobuf-java, but offers the same API as the MRI Ruby extension you see here, would be fantastic to see. |
Changed C API to only define structs, a table, and a few minimal inline function.
Changed C API to only define structs, a table, and a few minimal inline function.
No description provided.