-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oneof: generate better code #168
Comments
So each of the Value types will have implemented methods like so func (v Value_Str) MarshalTo([]byte) (int, error) {
...
} Looks pretty slick to me :) Maybe change the example a little
Just so we can discriminate between the message struct and oneof value struct. |
Right :) The last example is a bit different, it will require a Value struct to be separated from OneValue/TwoValue interfaces. In the first example it is possible to inline Value struct serialization into each oneof sub type MarshalTo. |
I don't know why protobuf spec decided to not just make oneof a type of message, but here we are. |
We should link this issue #106 |
@awalterschulze How hard it would be to implement this? It seems like it will add at least two additional parameters for generators (for both examples to work), and some check to detect if oneof is an only field in the message. Is there something else? @stevvooe Maybe you have some other ideas to incorporate? |
@tamird what do you think? I think we could have an extension. gogoproto.onlyoneof This extension will give an error when applied to a message with more than one oneof or in fact any other fields that are not in the one oneof. |
Its not little effort, basically all the plugins are going to be touched and the proto/encode.go and proto/decode.go files are also going to see some action. |
At a glance, I don't see the point. This change would not improve performance (since you still need an interface and so you're allocating, like it or not), and it doesn't really improve ergonomics either (if you want to improve ergonomics, a better alternative is to bring This proposal would make the generated code slightly prettier, but who cares? |
Ok and there we have the added restriction of the types of the fields that have to be different and then we are back at #103 which I think is probably still the easiest to implement. |
@tamird I thought that this fork is all about making the generated code faster AND prettier at the same time :) &RequestUnion_EndTransaction{&EndTransactionRequest{}} When I propose to do just this: EndTransactionRequest{} You may see two things here:
Even further, if performance of both cases is the same, why not to use more concise variant? Instead of generating setter/getter pair you can directly set or get your interface type. |
Making the code prettier seems to be a nice-to-have, and in this case, it's going to be a lot of work. Generating convenient setters and getters is going to be a much easier solution to the ergonomics problem, and addressing the inner allocation can also be done without nearly as much rejiggering as you're suggesting. |
Ah, that upstream issue already exists in this repo as well: #106 |
Speed is gained by being able to use the struct that you want (pretty code) to use, since then you don't have to copy between the struct you want to use and the one you have to use. But it will definitely be much less work to add some methods and even change the message pointer thing. |
I will try to implement it regardless of the amount of work needed - it's will make things much easier, as it seems to cover all three issues discussed. But as the getter/setter approach can be done significantly faster we could do it as a separate extension at the first place. What do you think? Or it may wait some time for me to implement another approach? |
@dennwc I haven't thought this one through. Under usage of In general, my approach here would be to work it back from the desired usage. As far as I see it, the problems come down from the following:
In both cases, it is clear to me that the issue is the intermediary types, both for usability and performance.
There are some details to work out but this the direction I would explore. I am not sure about #103. Restrictions that affect data structure design for the target generation language are problematic. @tamird In general, I disagree with your dismissal of aesthetics and ergonomics. Ergonomics/usability/aesthetics are a fantastic tool when the performant, correctness and pretty overlap. ;)
@tamird You are not alone in receiving this style of response. |
This change would not reduce allocations; the thing you're setting is an interface type, so it's incurring an allocation whether it's a value or a pointer. Implementing interfaces on values is less type safe than implementing them on pointers, though.
Are they problematic for other reasons? What other reasons? I think this is probably exactly the reason for this API being what it is. Your suggestions for working around that are probably doable, but I'd be against that level of complexity in the interface. Also, I haven't dismissed ergonomics, only suggested an alternative. But yes, I stand by my rejection of prettification of generated code. |
With a value literal that gets immediately set to a field, we avoid the allocation (I may be wrong). Is there not an extra indirection when the union values have pointers?
As I said, I haven't really thought this through, so I am not sure. Indeed, this is probably the reasoning behind solution arrived at. Another alternative may be to generate helper functions that look up the correct type for literals: &Foo{
OneOfField: OneOfFieldValue(&AnotherType{}),
} That solves the literal problem. Perhaps, we can solve the type switch problem similarly:
The value and unpack functions just get generated to assist in hiding the interim type.
We also end up generating fewer new types. In cases where external types are, the user simply has to cast. Yes, it is complex in that we have to handle package internal and external types differently, but we already have some complexity. From this perspective, I agree that my suggestion is non-ideal.
The most common reaction I get when trying to convince others to adopt protobufs is that it is "ugly". It was my original reaction, as well. It is not rational or logical, but it is a hurdle. |
This is correct, but irrelevant. You cannot set a value to a field here because you need polymorphism.
Are you talking about the inner pointers or the outer? If you're talking about the inner then I agree, and that is #106. |
The concrete union type does not need to be a pointer and provides polymorphism. I think we're on the same page, however. |
So it seems that in all your examples there are some common themes.
With a message conforming to:
Sorry this might be a bit of tangent on your discussion, but to me this makes the options we have clearer. |
@awalterschulze Part of the problem here (and with my suggestions) is that After thinking about this, I'm wondering if, perhaps, the current solution is fine but we just need better naming generation and documentation around they map. Let's take the following: message Foo {
oneof value {
string a = 1;
string b = 2;
}
} The above generates the following: type Foo struct {
// Types that are valid to be assigned to Value:
// *Foo_A
// *Foo_B
Value isFoo_Value `protobuf_oneof:"value"`
}
func (m *Foo) Reset() { *m = Foo{} }
func (m *Foo) String() string { return proto.CompactTextString(m) }
func (*Foo) ProtoMessage() {}
type isFoo_Value interface {
isFoo_Value()
}
type Foo_A struct {
A string `protobuf:"bytes,1,opt,name=a,proto3,oneof"`
}
type Foo_B struct {
B string `protobuf:"bytes,2,opt,name=b,proto3,oneof"`
}
func (*Foo_A) isFoo_Value() {}
func (*Foo_B) isFoo_Value() {} We end up getting Furthermore, I think the original proposal from @dennwc, of using type declarations and casts, is more than sufficient to remove the pointer lookup situation. We still need to generate a type for each field name, but it will remove some overhead of having to do struct literals and have an extra indirection. |
Yes FooValue would be a much better naming scheme. Interfaces do add another pointer, but is one of the few ways of mapping the oneof union typing without breaking anything or making any assumptions. @DenWC solution requires assumptions 1 and 2. Namely:
Personally I find myself having assumptions 1 and 3 typically, but its then easy to get to 1,2 and 3 by simply adding the extra fields in a wrapping message. So yes oneof is not a type union, but what I am asking is whether that is the typical usecase.
|
Here is another solution that only makes assumption 3 in the Set methods. message Foo {
oneof value {
string a = 1;
string b = 2;
}
oneof value2 {
string c = 3;
string d = 4;
}
} type Foo struct {
Value Value
Value2 Value2
}
type Value struct {
A *string
B *string
}
func (this *Foo) GetValue() interface{} {
if this.A != nil {
return *this.A
}
if this.B != nil {
return *this.B
}
return nil
}
func (this *Foo) SetValue(interface{}) {
... type switch
}
type Value2 struct {
C *string
D *string
}
func (this *Foo) GetValue2() interface{} {
if this.C != nil {
return *this.C
}
if this.D != nil {
return *this.D
}
return nil
}
func (this *Foo) SetValue2(interface{}) {
... type switch
} |
I think It's a bit confusing, because in this case you can actually set both fields. The second point is that GetValue will return string type for both A and B field and it cannot be checked whatever it comes from field A or B. This might be safer: type Foo struct {
Value Value
Value2 Value2
}
type Value interface{
isFooValue()
}
type FooValueA string
func (FooValueA) isFooValue(){}
type FooValueB string
func (FooValueB) isFooValue(){}
// same for Value2 |
Ok sorry my bug. message Foo {
oneof value {
string a = 1;
int64 b = 2;
}
oneof value2 {
string c = 3;
int64 d = 4;
}
} type Foo struct {
Value Value
Value2 Value2
}
type Value struct {
A *string
B *int64
}
func (this *Foo) GetValue() interface{} {
if this.A != nil {
return *this.A
}
if this.B != nil {
return *this.B
}
return nil
}
func (this *Foo) SetValue(interface{}) {
... type switch
}
type Value2 struct {
C *string
D *int64
}
func (this *Foo) GetValue2() interface{} {
if this.C != nil {
return *this.C
}
if this.D != nil {
return *this.D
}
return nil
}
func (this *Foo) SetValue2(interface{}) {
... type switch
} |
Yes you can set both values, but this can give an error when marshaling. |
Would it be faster? To give a context, we have a use case for goprotobuf when generated structures are exposed as part of our API. In this case other devs not familiar with protobuf will see both fields in oneof structure and they might try to set each of them. The problem is that the code will actually compile, but will lead to a runtime error later. I think this should be checked in compile time with the union interface. |
I am not particularly sure if it would be faster, probably not. Good to know about your use case, this is what I am after. |
oneof is part of protoc 2.6.1 I don't see how the extra constraint of having the field named as 'value' gives us anything in return. Would #106 also solve your problem? |
Having the field be named "value" isn't a required constraint, but just a hint that we can use to tell the Protoc compiler to produce the following, instead of what's written in #168, when all three conditions are met. (Well, it's a good constraint because it lets the coder opt into the following behavior)
#106 would solve the potential memory/GC performance issue, which maybe is 50% of the problem. The other problem is that the indirection is arguably needless, and it creates ugly code. |
I agree the current implementation is slow and ugly. |
I'm working on it right now. There are few problems, mostly reflect-related (jsonpb, GetProperties, etc). Will see if they can be omitted. |
Excited to see progress :) |
I'd love to see some progress on this. One big problem (perhaps already mentioned in this thread, I didn't read all the comments super carefully) with the current approach is that there's no easy way to wrap or unwrap a In order to get at the wrapped value, you have to write a big switch like the commit that @tamird mentions, and be careful to keep it in sync with the generated types. You can probably hack up some generic reflection-based get/set functions, but it's not going to be pretty, or fast, or user-friendly. I really like the approach suggested by the OP, and hope this is what gets implemented. |
Sorry, was a bit distracted by work matters. I will continue to work on this soon. |
@atombender Who or what is OP? @dennwc I am still excited to see what comes out. These things are typically more work than they seem so good luck. |
OP is "original poster", meaning @dennwc. |
What's the status, @dennwc, and what approach are you taking? |
A lot of learning has happened since my comment since, and I've come to realize that what we really need is actually protobuf4. Here's our approach: https://github.com/tendermint/go-amino Amino removes "oneof" and supports something called "interfaces" natively. It also has a ton of other features. Please check it out, we're looking for constructive feedback (or support)! |
@jaekwon Looks promising, and I wish this could be used as the direction for Protobuf4, but you're going to be fighting an uphill battle here. Much of the point of using Protobuf is the industry support for it, not just the technical aspects. I, and the companies I work for, would never adopt your project, simply because it doesn't have a major player behind it that can ensure that there's wide language support, continued maintenance, evangelism, and so on. |
@jaekwon If you do make a new serialization format, please make sure to investigate faster varint encodings. There are quite a few variants and they have interesting tradeoffs. |
For example |
…chResponse All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is ideal, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and its time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While this isn't ideal, I think we've waited long enough (~3 years) for a resolution on those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a 5% improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of 2.4%. Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best case for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects.
…chResponse All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is ideal, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and its time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While this isn't ideal, I think we've waited long enough (~3 years) for a resolution on those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a 5% improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of 2.4%. Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best case for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects.
27112: roachpb: replace `gogoproto.onlyone` with `oneof` in BatchRequest/BatchResponse r=nvanbenschoten a=nvanbenschoten All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is great, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and it's time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to **328 bytes** and `ResponseUnion` has grown to **320 bytes**. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes each). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like #4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While it we be nice for this to be resolved upstream, I think we've waited long enough (~3 years) for a resolution to those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a **5%** improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of **2.4%**. Running kv100 showed a much more dramatic improvement of **18%** on the three-node GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best-case scenario for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects. 27114: opt/sql: fix explain analyze missing option r=asubiotto a=asubiotto ConstructExplain previously ignored the ANALYZE option so any EXPLAIN ANALYZE statement would result in execution as an EXPLAIN (DISTSQL) statement. The ANALYZE option is now observed in ConstructExplain. Additionally, the stmtType field from the explainDistSQLNode has been removed because it was not necessary and it was unclear how to pass this from the `execFactory`. Release note: None 27116: Makefile: learn that roachtest depends on optimizer-generated code r=benesch a=benesch As mentioned in cd4415c, the Makefile will one day be smart enough to deduce this on its own, but for now it's simpler to explicitly list the commands that require generated code. Note that the simple but coarse solution of assuming that all commands depend on generated code is inviable as some of these commands are used to generate the code in the first place. Release note: None 27119: storage: extract replica unlinking into store method r=tschottdorf a=benesch Extract some code that was duplicated in three places into a dedicated helper method. Prerequisite for #27061. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
I found that oneof is useful to represent some value interface on the Go side. For example:
This will generate the following code:
Is it possible to make oneof to generate less structures for this case? For example:
The text was updated successfully, but these errors were encountered: