-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat(bigquery/storage/managedwriter/adapt): add schema -> proto support #4375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery/storage/managedwriter/adapt): add schema -> proto support #4375
Conversation
TODO: this doesn't deal with nested messages yet.
storagepb.TableFieldSchema_NUMERIC: ".google.protobuf.BytesValue", | ||
storagepb.TableFieldSchema_STRING: ".google.protobuf.StringValue", | ||
storagepb.TableFieldSchema_TIME: ".google.protobuf.Int64Value", | ||
storagepb.TableFieldSchema_TIMESTAMP: ".google.protobuf.Int64Value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a Timestamp type in proto3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/protocolbuffers/protobuf/tree/master/src/google/protobuf, but this comes down to what the backend accepts for each column type. If the backend will convert timestamp protos, we can use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we haven't supported it yet.
}, nil | ||
} | ||
// For NULLABLE, we use the wrapper types. | ||
return &descriptorpb.FieldDescriptorProto{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we default nullable to wrapper? Should we annotate the field with use_defaults instead?
https://screenshot.googleplex.com/3kHa8QbkhmgFsuj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to include a bunch of internal type annotations for driving behavior here.
-
We have no stable source of truth for them. They're published within the zetasql project, but that project doesn't make any guarantees about stability and has no GA release, so the route to a GA launch is...probably problematic.
-
The reason we're using proto3 semantics is external users live in a proto3 ecosystem. In that world, the wrapper types are how to properly communicate nulls. I don't know that it matters, but type_annotations.proto is still proto2 based, so it's possible it introduces other issues as a side effect. We could consider revisiting this route down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think frontend supports wrapper to nullable field conversion (we probably should). So currently if you convert these to wrapper, they will try map themselves back as a struct. In order to support such converter, the field needs to be annotated as is_wrapper. I understand that zetasql is not usable, but we can open backdoors for your self-defined annotation. Introducing the default behavior of mapping wrapper to nullable field would be a breaking change now.
Note that our API actually accepts proto2 instead of proto3 as the schema descriptor, inside of the converter, we look at the default_value field, if it is set then we set the default value, if not set, then we set null. You can surely set it on the ProtoSchema explicitly without having to introduce this mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, you're making users choose between being unable to send nulls, or being unable to send default values (empty string, 0, etc) when communicating the ProtoSchema?
Do you want me to create an issue for supporting wrapper types on the internal converter, or is there one already? An advantage of using the well known wrapper types is to avoid the need for special annotations, since both the client and backend have the definitions in place as part of the whole protocol buffer ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support the wrapper, we would need the is_wrapper annotation, otherwise it is a breaking change to existing conversions.
As to your first question, we don't have to make the choice if we are using proto2 (yeah, going back to that question). If it is proto 3 then there seems to be no other choices.
you can create an issue to support the wrapper, you are even welcomed to just go ahead and fix it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created b/193064992 for the wrapper issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the final solution would be for zetasql to be published and everything conform with the current googlesql specification (instead of creating our own wheels). If that is not possible, adding some fields to ProtoSchema would be fine, less ideal since it would be per schema instead of per msg/field option.
Since you are applying this to all nullable fields, it would be better for the backend to first support it before you add the wrapper conversion, otherwise, the library would be unusable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a toggle to suppress generation of wrapper types, so we can revisit this once we have a path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also looking at the is_wrapper
annotation you mentioned (googlesql MessageOption extension). I don't think that's going to be the right way to do this, as the expectation is that you "own" the message and can annotate it appropriately. These types are provided as part of the standard protocol buffer definitions, so adding annotations seems dodgy. I could see adding a FieldOption extension where you effectively say "the message in this field is a wrapper", but the "this message is a wrapper" annotation seems mismatched for standard types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the annotation should belong to the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@shollyman Could you support |
This PR adds the ability to generate proto message definitions dynamically based on a table's schema. The new BQ write API communicates data solely through protocol buffers , so this helps enable the various use cases where users show up without a proto message predefined/compiled.
With this, a user can use the dynamic definitions to construct and serialize messages suitable for use in the write client. It also enables other features like json -> dynamic proto -> serialized data.
Towards: #4366
More background: The underlying API expects users to communicate proto schema (in the form of a DescriptorProto). Then we can append rows via streaming RPC where the backend acknowledges the writes.