Add proto2 default support to Ruby gem #3606

zanker · 2017-09-07T16:11:47Z

(Sorry this is going to be long!)

I'm happy to split the PR up into multiple, but wasn't really sure what your preferences were on this. I intentionally didn't update the proto generator, since this only supports MRI and not the Java side too. I can add the Java one into this PR, or do that after.

One of the main problems with this I ran into was API compatibility. You need to have the syntax set before you do anything, because things like defaults or maps are set when messages are added initially, and a lot of it is easier to gate at add time rather than doing post add validation checks (like is done with enums).

I opted to make FileDescriptor an optional argument you pass to DescriptorPool, and that gets automatically added to anything you attach to it after. If you don't pass a FileDescriptor, it sets it to proto3 as that's the current syntax that's supported when generating so it maintains backwards compatibility.

It means Descriptor and EnumDescriptor end up with breaking changes because they require a FileDescriptor passed. Are instantiating these directly supported APIs? Same with FieldDescriptor, ideally it would be a required init so things like FieldDescriptor_default_set and FieldDescriptor_default can be gated by syntax. As it stands, it seems you can't access the parent from FieldDescriptor until after it's been added.

For setting defaults for enums, you have to use an int, because we can't guarantee an enum is added to the pool before we set our default. The only way I can see supporting adding-by-name is if we do it after the pool is initialized, and we add enums first, but I didn't want to bypass UPB for only default setting. Might be something that UPB3 makes easier @haberman?

While right now, you can only set default for add_field, I made it a hash rather than a single argument to give us the ability to add the other extension type fields without having to make more API changes.

The way hasbits are handled feels a little janky, but I was trying to reuse the existing UPB MSG_WRITERS rather than just copy/pasting the code and storing the offsets as-is. One thing @nerdrew noticed is that upb_msg_has seems to have a bug? It does DEREF(msg, hasbit / 8, char) | (1 << (hasbit % 8)) which will always end up being true, looks like this should be an AND not an OR? The method is never used that I could tell.

For hasbits, I made it so we always use them (even in proto3), but it gates them behind the upb_fielddef_haspresence so under proto3 it's only using them for sub messages. One semantic I'm not sure how you want to handle is unsetting a field though.

I added a clear_<field> method which unsets the hasbit, but it means doing <field>=nil doesn't unset it. If I remember right, in Java this also doesn't (you have to call clear explicitly). One option is to make it so <field>=nil is equivalent to clear_<field>. That feels like the better way of handling it in Ruby, but wanted to get your opinion on it.

For tests, I just added a basic_proto2.rb one. It looks like the proto2 tests for the other libraries are handled via compatibility_tests and copying the old version to preserve it historically. Since there wasn't a prior version that supported proto2, I just opted to inline another test. I can follow that pattern and have a fake v2 version under it if you want.

(cc @nerdrew @djudd-stripe)

grpc-kokoro · 2017-09-07T16:11:48Z

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

grpc-kokoro · 2017-09-07T16:11:48Z

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

bazel-io · 2017-09-07T16:11:48Z

Can one of the admins verify this patch?

googlebot · 2017-09-07T16:11:49Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

zanker · 2017-09-07T16:13:03Z

ruby/ext/google/protobuf_c/defs.c

+  if (argc > 1) {
+    rb_raise(rb_eArgError, "Expected 0 or 1 argument, got %d", argc);
+  } else if (argc == 1) {
+    if (CLASS_OF(argv[0]) != cFileDescriptor) {


Something @eam pointed out, should this be CLASS_OF(argv[0]) != cFileDescriptor or should it be calling the Ruby rb_obj_is_kind_of or rb_obj_is_instance_of instead? I was mostly mirroring the existing code you had.

zanker · 2017-09-07T16:23:12Z

Oops, forgot to get myself added to our corporate CLA. Will get that done today.

zanker · 2017-09-07T18:01:04Z

I should be listed as authorized under Square's CLA with Google now.

googlebot · 2017-09-07T18:01:07Z

CLAs look good, thanks!

haberman · 2017-09-20T23:03:42Z

ok to test

haberman · 2017-09-20T23:12:22Z

Thanks for the PR! Did you forget some changes to ruby_generator.cc?

zanker · 2017-09-25T17:05:51Z

Sorry about the delay, was out for a few days last week.

I wasn't able to find the file to generate those in, I added setting of the syntax to the generator. For setting defaults, should I just mimic what it does now? Attempt to generate a proto2 file, and if it uses something unsupported like extensions then error?

haberman · 2017-10-09T23:24:51Z

src/google/protobuf/compiler/ruby/ruby_generator.cc

@@ -451,8 +466,18 @@ bool GenerateFile(const FileDescriptor* file, io::Printer* printer,
    }
  }

+  printer->Print("Google::Protobuf::DescriptorPool.generated_pool.build(\n");
+  printer->Indent();
+  printer->Print("Google::Protobuf::FileDescriptor.new.tap do |f|\n");


Maybe the syntax could be a parameter to the constructor? The syntax is a fundamental property of a file, and in an actual .proto file it has to be the first statement.

Making the syntax a constructor parameter and then defining all messages/enums inside the block makes sense to me.

If we are planning to go farther down the DSL route, we'll be wanting to add the ability to specify options of files, messages, enums, etc. To do that we'll need to mirror the structure of actual proto descriptors as closely as possible. Files contain messages, and every message belongs to a file.

Seems reasonable, but could you clarify what you mean by "constructor parameter", so something like (super rough):

require 'other_pb.rb' Google::Protobuf::FileDescriptor.start do syntax :proto2 option :java_package, "com.example.foo" pool.build do add_message ... end end

Is what you're talking about?

I was thinking more along the lines of:

Google::Protobuf::DescriptorPool.generated_pool.build do add_file(syntax => :proto2) do add_message ... end end

I am not exactly sure yet how we should specify options. The problem is that options can get really tricky. You can define custom options that use your own types. But even weirder, you can use the option on the same type that is defining the option:

syntax = "proto2"; import "google/protobuf/descriptor.proto"; package pkg; message M { // This will become an option on message M, but its value is also // an instance of the M message. This is really tricky because it // implies that you need to be able to create an M message while // you are defining it. // // The protobuf library generally deals with this by encoding options // as serialized protos, and only parsing them once the message // types are defined. option (pkg.m) = {val: 999}; optional int32 val = 1; } extend google.protobuf.MessageOptions { optional M m = 1234567; }

Whatever path we go down with options, I want to make sure it can accommodate the trickiest cases, like the above. This is one of the reasons the DSL worries me: it's easy to design yourself into a corner if you're not careful.

Gotcha, that is fun to deal with. Do you have an example somewhere of what it looks like in another language?

You can try compiling my snippet above into other languages to see how it is handled. Python might be a good model since it's similar to Ruby. Python defines the types, then registers extensions (including custom options), then sets options again. The options are specified as a serialized binary string, which will find any extensions that are registered:

_M = _descriptor.Descriptor( name='M', full_name='pkg.M', filename=None, file=DESCRIPTOR, containing_type=None, fields=[ _descriptor.FieldDescriptor( name='val', full_name='pkg.M.val', index=0, number=1, type=5, cpp_type=1, label=1, has_default_value=False, default_value=0, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, options=None), ], extensions=[ ], nested_types=[], enum_types=[ ], options=_descriptor._ParseOptions(descriptor_pb2.MessageOptions(), _b('\272\350\332\004\003\010\347\007')), is_extendable=False, syntax='proto2', extension_ranges=[], oneofs=[ ], serialized_start=53, serialized_end=79, ) DESCRIPTOR.message_types_by_name['M'] = _M DESCRIPTOR.extensions_by_name['m'] = m M = _reflection.GeneratedProtocolMessageType('M', (_message.Message,), dict( DESCRIPTOR = _M, __module__ = 'test_pb2' # @@protoc_insertion_point(class_scope:pkg.M) )) _sym_db.RegisterMessage(M) m.message_type = _M google_dot_protobuf_dot_descriptor__pb2.MessageOptions.RegisterExtension(m) _M.has_options = True _M._options = _descriptor._ParseOptions(descriptor_pb2.MessageOptions(), _b('\272\350\332\004\003\010\347\007'))

This approach could work for Ruby. We should freeze the descriptor after we set options, so users don't try to add or mutate options on the descriptor after it has been built.

haberman · 2017-10-09T23:26:08Z

Sorry about the delay from me. There are some merge conflicts to resolve, but overall I think this is a promising approach. I think the biggest thing to nail down is how to express this in the DSL. I think we should mirror the structure of proto descriptors as closely as possible.

zanker · 2017-11-20T19:21:34Z

Sorry for the delay, been on vacations. I left Square, and won't be able to continue the work on this PR. When I talked to @nerdrew last, he was going to potentially take a look at it, but not sure if that's still the case. Sorry about that, thanks for your help on the PR.

If someone else ends up wanting to take it over the finish line, I'm happy to help walk them through the PR and how things are organized.

djudd-stripe · 2017-11-20T19:37:39Z

FWIW - I am unlikely to be able to take this up; we have a workaround which involves parsing protoc's descriptor_set_out to get annotation values, and so I can't justify a lot of time on it. Also my C is pretty rusty. Happy to provide more details on our workaround to anyone interested, though.

djudd-stripe · 2017-11-20T19:37:57Z

Also, @zanker thanks for updating us!

hrsht · 2018-06-13T22:54:39Z

@haberman I am going to take over this work of adding proto2 support for Ruby from @zanker and hopefully see to completion. I plan on adding support for extensions and options in the coming weeks.

Meanwhile, do you think this PR is more or less ready to be merged for basic proto2 syntax support (except of course for changes in ruby_generator.cc to not error out for proto2 files)? If so, I can send out a new PR with all of the changes here, plus a couple bug fixes and the required changes in ruby_generator.cc. Please let me know.

Thanks!

cc: @Yurokle

zanker · 2018-06-13T23:21:48Z

🎉🎉

…

Sent from my iPhone

On Jun 13, 2018, at 15:55, Harshit Chopra ***@***.***> wrote: @haberman I am going to take over this work of adding proto2 support for Ruby from @zanker and hopefully see to completion. I plan on adding support for extensions and options in the coming weeks. Meanwhile, do you think this PR is more or less ready to be merged for basic proto2 syntax support (except of course for changes in ruby_generator.cc to not error out for proto2 files)? If so, I can send out a new PR with all of the changes here, plus a couple bug fixes and the required changes in ruby_generator.cc. Please let me know. Thanks! cc: @Yurokle — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hrsht · 2018-06-15T19:21:39Z

ruby/ext/google/protobuf_c/storage.c

@@ -447,6 +428,22 @@ MessageLayout* create_layout(const upb_msgdef* msgdef) {

  layout->fields = ALLOC_N(MessageField, nfields);

+  size_t hasbit = sizeof(void*) * 8;


@zanker do you remember the reason for setting this to sizeof(void*) * 8 and not 0?

I thought it had to do with pointer alignment, but apparently past me was smart and wrote a helpful comment. From encode_decode.c

// The offset we pass to UPB points to the start of the Message, rather than the start of where our data is stored. // Which works out to the size of a pointer. Since we can't pass arbitrary pointers to UPB, we bump it by the size*8 // which lets us reuse the UPB writers for primitives. int32_t hasbit = hasbit_off == MESSAGE_FIELD_NO_HASBIT ? -1 : hasbit_off + sizeof(void*) * 8;

The general gist of the presence checking, is that it's a bit hackish because I'm trying to reuse as much of the existing UPB code as possible. There's definitely nicer ways to do it, but you end up reimplementing more code.

Well, starting at sizeof(void*) * 8 sets the first hasbit at offset 64, which will allocate the 9th byte in the data part of MessageHeader. So effectively, there is no difference whether you start from 0, or 64 or any other value.

As for the code

int32_t hasbit = hasbit_off == MESSAGE_FIELD_NO_HASBIT ? -1 : hasbit_off + sizeof(void*) * 8;

it should be sizeof(MessageHeader) instead of sizeof(void*) as you want the offset to the data part of the message type and skip the number of bytes needed for MessageHeader

If you're able to get it to work by simplifying it, then I would definitely do so. You should also add some tests that encode and decode a message then check the has_foo? fields to verify. The main thing, is that it's all setup to align to the offsets used by: https://github.com/zanker/protobuf-1/blob/af38326c9a7ceab44b148a80ed8e1e734821733f/ruby/ext/google/protobuf_c/upb.c#L3282, not by MessageLayout.

Ahh.. I see. Looks like that has changed in the recent version of UPB. https://github.com/google/protobuf/blob/master/ruby/ext/google/protobuf_c/upb.c#L4311

Ah! Awesome. That should make life easier then.

It looks like the version of UPB got upgraded with the unknown field change, and so you now have things like upb_sethasbit and upb_readhasbit. That should make it a lot easier and let you clean up the code. A lot of the issues I had were that I had to reimplement them to match what UPB did.

Don't know if I can use upb_sethasbit or upb_readhasbit without switching to using UPB for message layout and the in-memory representation of a message. And that to me sounds like more work that this PR is intended for.

Between, @haberman is the eventual plan to use UPB for message layout and the in-memory representation?

hrsht · 2018-06-21T18:01:59Z

@haberman ping!

xfxyjwf · 2018-06-21T19:46:59Z

@hrsht Can you create a new issue and/or pull request for this?

hrsht · 2018-06-22T17:17:16Z

@xfxyjwf created #4816. Thanks!

googlebot added the cla: no label Sep 7, 2017

zanker commented Sep 7, 2017

View reviewed changes

zanker force-pushed the zanker/syntax2 branch from a578920 to b9ccb98 Compare September 7, 2017 16:18

googlebot added cla: yes and removed cla: no labels Sep 7, 2017

xfxyjwf requested review from haberman and TeBoring September 7, 2017 18:22

zanker force-pushed the zanker/syntax2 branch 4 times, most recently from 1bc43f3 to 4b57eca Compare September 15, 2017 15:48

Zachary Anker added 4 commits September 25, 2017 08:55

Added support for proto2 syntax in addition to proto3 syntax

e42bde3

Include empty map/repeated fields as being default values

d391deb

Rename field descriptor has to has? and add clearing

bbb6e86

Set FileDescriptor for generated Ruby protos

af38326

zanker force-pushed the zanker/syntax2 branch from 4b57eca to af38326 Compare September 25, 2017 17:06

haberman reviewed Oct 9, 2017

View reviewed changes

zanker closed this Nov 20, 2017

zanker mentioned this pull request Feb 1, 2018

No way to use custom options in Ruby #1198

Closed

xfxyjwf added ruby proto2 labels Jun 13, 2018

hrsht reviewed Jun 15, 2018

View reviewed changes

hrsht mentioned this pull request Jun 22, 2018

Basic Proto2 support for Ruby gem #4816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proto2 default support to Ruby gem #3606

Add proto2 default support to Ruby gem #3606

zanker commented Sep 7, 2017 •

edited

grpc-kokoro commented Sep 7, 2017

grpc-kokoro commented Sep 7, 2017

bazel-io commented Sep 7, 2017

googlebot commented Sep 7, 2017

zanker Sep 7, 2017 •

edited

zanker commented Sep 7, 2017

zanker commented Sep 7, 2017

googlebot commented Sep 7, 2017

haberman commented Sep 20, 2017

haberman commented Sep 20, 2017

zanker commented Sep 25, 2017

haberman Oct 9, 2017

zanker Oct 9, 2017 •

edited

haberman Oct 10, 2017

zanker Oct 10, 2017

haberman Oct 10, 2017

haberman commented Oct 9, 2017

zanker commented Nov 20, 2017

djudd-stripe commented Nov 20, 2017

djudd-stripe commented Nov 20, 2017

hrsht commented Jun 13, 2018

zanker commented Jun 13, 2018 via email

hrsht Jun 15, 2018

zanker Jun 15, 2018

hrsht Jun 15, 2018

zanker Jun 15, 2018

hrsht Jun 19, 2018

zanker Jun 19, 2018

hrsht Jun 21, 2018

hrsht commented Jun 21, 2018

xfxyjwf commented Jun 21, 2018

hrsht commented Jun 22, 2018

		@@ -447,6 +428,22 @@ MessageLayout* create_layout(const upb_msgdef* msgdef) {

		layout->fields = ALLOC_N(MessageField, nfields);

		size_t hasbit = sizeof(void) 8;

Add proto2 default support to Ruby gem #3606

Add proto2 default support to Ruby gem #3606

Conversation

zanker commented Sep 7, 2017 • edited

grpc-kokoro commented Sep 7, 2017

grpc-kokoro commented Sep 7, 2017

bazel-io commented Sep 7, 2017

googlebot commented Sep 7, 2017

zanker Sep 7, 2017 • edited

Choose a reason for hiding this comment

zanker commented Sep 7, 2017

zanker commented Sep 7, 2017

googlebot commented Sep 7, 2017

haberman commented Sep 20, 2017

haberman commented Sep 20, 2017

zanker commented Sep 25, 2017

Choose a reason for hiding this comment

zanker Oct 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haberman commented Oct 9, 2017

zanker commented Nov 20, 2017

djudd-stripe commented Nov 20, 2017

djudd-stripe commented Nov 20, 2017

hrsht commented Jun 13, 2018

zanker commented Jun 13, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrsht commented Jun 21, 2018

xfxyjwf commented Jun 21, 2018

hrsht commented Jun 22, 2018

zanker commented Sep 7, 2017 •

edited

zanker Sep 7, 2017 •

edited

zanker Oct 9, 2017 •

edited