New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proto2 default support to Ruby gem #3606
Conversation
Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure. |
1 similar comment
Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure. |
Can one of the admins verify this patch? |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
if (argc > 1) { | ||
rb_raise(rb_eArgError, "Expected 0 or 1 argument, got %d", argc); | ||
} else if (argc == 1) { | ||
if (CLASS_OF(argv[0]) != cFileDescriptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something @eam pointed out, should this be CLASS_OF(argv[0]) != cFileDescriptor
or should it be calling the Ruby rb_obj_is_kind_of
or rb_obj_is_instance_of
instead? I was mostly mirroring the existing code you had.
a578920
to
b9ccb98
Compare
Oops, forgot to get myself added to our corporate CLA. Will get that done today. |
I should be listed as authorized under Square's CLA with Google now. |
CLAs look good, thanks! |
1bc43f3
to
4b57eca
Compare
ok to test |
Thanks for the PR! Did you forget some changes to |
Sorry about the delay, was out for a few days last week. I wasn't able to find the file to generate those in, I added setting of the syntax to the generator. For setting defaults, should I just mimic what it does now? Attempt to generate a proto2 file, and if it uses something unsupported like extensions then error? |
4b57eca
to
af38326
Compare
@@ -451,8 +466,18 @@ bool GenerateFile(const FileDescriptor* file, io::Printer* printer, | |||
} | |||
} | |||
|
|||
printer->Print("Google::Protobuf::DescriptorPool.generated_pool.build(\n"); | |||
printer->Indent(); | |||
printer->Print("Google::Protobuf::FileDescriptor.new.tap do |f|\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the syntax could be a parameter to the constructor? The syntax is a fundamental property of a file, and in an actual .proto file it has to be the first statement.
Making the syntax a constructor parameter and then defining all messages/enums inside the block makes sense to me.
If we are planning to go farther down the DSL route, we'll be wanting to add the ability to specify options of files, messages, enums, etc. To do that we'll need to mirror the structure of actual proto descriptors as closely as possible. Files contain messages, and every message belongs to a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, but could you clarify what you mean by "constructor parameter", so something like (super rough):
require 'other_pb.rb'
Google::Protobuf::FileDescriptor.start do
syntax :proto2
option :java_package, "com.example.foo"
pool.build do
add_message ...
end
end
Is what you're talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more along the lines of:
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file(syntax => :proto2) do
add_message ...
end
end
I am not exactly sure yet how we should specify options. The problem is that options can get really tricky. You can define custom options that use your own types. But even weirder, you can use the option on the same type that is defining the option:
syntax = "proto2";
import "google/protobuf/descriptor.proto";
package pkg;
message M {
// This will become an option on message M, but its value is also
// an instance of the M message. This is really tricky because it
// implies that you need to be able to create an M message while
// you are defining it.
//
// The protobuf library generally deals with this by encoding options
// as serialized protos, and only parsing them once the message
// types are defined.
option (pkg.m) = {val: 999};
optional int32 val = 1;
}
extend google.protobuf.MessageOptions {
optional M m = 1234567;
}
Whatever path we go down with options, I want to make sure it can accommodate the trickiest cases, like the above. This is one of the reasons the DSL worries me: it's easy to design yourself into a corner if you're not careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that is fun to deal with. Do you have an example somewhere of what it looks like in another language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try compiling my snippet above into other languages to see how it is handled. Python might be a good model since it's similar to Ruby. Python defines the types, then registers extensions (including custom options), then sets options again. The options are specified as a serialized binary string, which will find any extensions that are registered:
_M = _descriptor.Descriptor(
name='M',
full_name='pkg.M',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='val', full_name='pkg.M.val', index=0,
number=1, type=5, cpp_type=1, label=1,
has_default_value=False, default_value=0,
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None),
],
extensions=[
],
nested_types=[],
enum_types=[
],
options=_descriptor._ParseOptions(descriptor_pb2.MessageOptions(), _b('\272\350\332\004\003\010\347\007')),
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[
],
serialized_start=53,
serialized_end=79,
)
DESCRIPTOR.message_types_by_name['M'] = _M
DESCRIPTOR.extensions_by_name['m'] = m
M = _reflection.GeneratedProtocolMessageType('M', (_message.Message,), dict(
DESCRIPTOR = _M,
__module__ = 'test_pb2'
# @@protoc_insertion_point(class_scope:pkg.M)
))
_sym_db.RegisterMessage(M)
m.message_type = _M
google_dot_protobuf_dot_descriptor__pb2.MessageOptions.RegisterExtension(m)
_M.has_options = True
_M._options = _descriptor._ParseOptions(descriptor_pb2.MessageOptions(), _b('\272\350\332\004\003\010\347\007'))
This approach could work for Ruby. We should freeze the descriptor after we set options, so users don't try to add or mutate options on the descriptor after it has been built.
Sorry about the delay from me. There are some merge conflicts to resolve, but overall I think this is a promising approach. I think the biggest thing to nail down is how to express this in the DSL. I think we should mirror the structure of proto descriptors as closely as possible. |
Sorry for the delay, been on vacations. I left Square, and won't be able to continue the work on this PR. When I talked to @nerdrew last, he was going to potentially take a look at it, but not sure if that's still the case. Sorry about that, thanks for your help on the PR. If someone else ends up wanting to take it over the finish line, I'm happy to help walk them through the PR and how things are organized. |
FWIW - I am unlikely to be able to take this up; we have a workaround which involves parsing protoc's descriptor_set_out to get annotation values, and so I can't justify a lot of time on it. Also my C is pretty rusty. Happy to provide more details on our workaround to anyone interested, though. |
Also, @zanker thanks for updating us! |
@haberman I am going to take over this work of adding proto2 support for Ruby from @zanker and hopefully see to completion. I plan on adding support for extensions and options in the coming weeks. Meanwhile, do you think this PR is more or less ready to be merged for basic proto2 syntax support (except of course for changes in Thanks! cc: @Yurokle |
🎉🎉
…Sent from my iPhone
On Jun 13, 2018, at 15:55, Harshit Chopra ***@***.***> wrote:
@haberman I am going to take over this work of adding proto2 support for Ruby from @zanker and hopefully see to completion. I plan on adding support for extensions and options in the coming weeks.
Meanwhile, do you think this PR is more or less ready to be merged for basic proto2 syntax support (except of course for changes in ruby_generator.cc to not error out for proto2 files)? If so, I can send out a new PR with all of the changes here, plus a couple bug fixes and the required changes in ruby_generator.cc. Please let me know.
Thanks!
cc: @Yurokle
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@@ -447,6 +428,22 @@ MessageLayout* create_layout(const upb_msgdef* msgdef) { | |||
|
|||
layout->fields = ALLOC_N(MessageField, nfields); | |||
|
|||
size_t hasbit = sizeof(void*) * 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zanker do you remember the reason for setting this to sizeof(void*) * 8
and not 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it had to do with pointer alignment, but apparently past me was smart and wrote a helpful comment. From encode_decode.c
// The offset we pass to UPB points to the start of the Message, rather than the start of where our data is stored.
// Which works out to the size of a pointer. Since we can't pass arbitrary pointers to UPB, we bump it by the size*8
// which lets us reuse the UPB writers for primitives.
int32_t hasbit = hasbit_off == MESSAGE_FIELD_NO_HASBIT ? -1 : hasbit_off + sizeof(void*) * 8;
The general gist of the presence checking, is that it's a bit hackish because I'm trying to reuse as much of the existing UPB code as possible. There's definitely nicer ways to do it, but you end up reimplementing more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, starting at sizeof(void*) * 8
sets the first hasbit at offset 64, which will allocate the 9th byte in the data part of MessageHeader
. So effectively, there is no difference whether you start from 0, or 64 or any other value.
As for the code
int32_t hasbit = hasbit_off == MESSAGE_FIELD_NO_HASBIT ? -1 : hasbit_off + sizeof(void*) * 8;
it should be sizeof(MessageHeader)
instead of sizeof(void*)
as you want the offset to the data part of the message type and skip the number of bytes needed for MessageHeader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're able to get it to work by simplifying it, then I would definitely do so. You should also add some tests that encode and decode a message then check the has_foo?
fields to verify. The main thing, is that it's all setup to align to the offsets used by: https://github.com/zanker/protobuf-1/blob/af38326c9a7ceab44b148a80ed8e1e734821733f/ruby/ext/google/protobuf_c/upb.c#L3282, not by MessageLayout
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh.. I see. Looks like that has changed in the recent version of UPB. https://github.com/google/protobuf/blob/master/ruby/ext/google/protobuf_c/upb.c#L4311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Awesome. That should make life easier then.
It looks like the version of UPB got upgraded with the unknown field change, and so you now have things like upb_sethasbit
and upb_readhasbit
. That should make it a lot easier and let you clean up the code. A lot of the issues I had were that I had to reimplement them to match what UPB did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if I can use upb_sethasbit
or upb_readhasbit
without switching to using UPB for message layout and the in-memory representation of a message. And that to me sounds like more work that this PR is intended for.
Between, @haberman is the eventual plan to use UPB for message layout and the in-memory representation?
@haberman ping! |
@hrsht Can you create a new issue and/or pull request for this? |
(Sorry this is going to be long!)
I'm happy to split the PR up into multiple, but wasn't really sure what your preferences were on this. I intentionally didn't update the proto generator, since this only supports MRI and not the Java side too. I can add the Java one into this PR, or do that after.
One of the main problems with this I ran into was API compatibility. You need to have the syntax set before you do anything, because things like defaults or maps are set when messages are added initially, and a lot of it is easier to gate at add time rather than doing post add validation checks (like is done with enums).
I opted to make
FileDescriptor
an optional argument you pass toDescriptorPool
, and that gets automatically added to anything you attach to it after. If you don't pass aFileDescriptor
, it sets it to proto3 as that's the current syntax that's supported when generating so it maintains backwards compatibility.It means
Descriptor
andEnumDescriptor
end up with breaking changes because they require aFileDescriptor
passed. Are instantiating these directly supported APIs? Same withFieldDescriptor
, ideally it would be a required init so things likeFieldDescriptor_default_set
andFieldDescriptor_default
can be gated by syntax. As it stands, it seems you can't access the parent fromFieldDescriptor
until after it's been added.For setting defaults for enums, you have to use an int, because we can't guarantee an enum is added to the pool before we set our default. The only way I can see supporting adding-by-name is if we do it after the pool is initialized, and we add enums first, but I didn't want to bypass UPB for only default setting. Might be something that UPB3 makes easier @haberman?
While right now, you can only set
default
foradd_field
, I made it a hash rather than a single argument to give us the ability to add the other extension type fields without having to make more API changes.The way hasbits are handled feels a little janky, but I was trying to reuse the existing UPB MSG_WRITERS rather than just copy/pasting the code and storing the offsets as-is. One thing @nerdrew noticed is that
upb_msg_has
seems to have a bug? It doesDEREF(msg, hasbit / 8, char) | (1 << (hasbit % 8))
which will always end up being true, looks like this should be an AND not an OR? The method is never used that I could tell.For hasbits, I made it so we always use them (even in proto3), but it gates them behind the
upb_fielddef_haspresence
so under proto3 it's only using them for sub messages. One semantic I'm not sure how you want to handle is unsetting a field though.I added a
clear_<field>
method which unsets the hasbit, but it means doing<field>=nil
doesn't unset it. If I remember right, in Java this also doesn't (you have to call clear explicitly). One option is to make it so<field>=nil
is equivalent toclear_<field>
. That feels like the better way of handling it in Ruby, but wanted to get your opinion on it.For tests, I just added a
basic_proto2.rb
one. It looks like the proto2 tests for the other libraries are handled via compatibility_tests and copying the old version to preserve it historically. Since there wasn't a prior version that supported proto2, I just opted to inline another test. I can follow that pattern and have a fake v2 version under it if you want.(cc @nerdrew @djudd-stripe)