oneof: generate better code #168

dennwc · 2016-04-20T09:51:59Z

I found that oneof is useful to represent some value interface on the Go side. For example:

message Value {
  oneof value {
    string str = 1;
    int64  int = 2;
    double float_ = 3;
    bool boolean = 4;
    Timestamp time = 5;
  }
}

This will generate the following code:

type Value struct {
    // Types that are valid to be assigned to Value:
    //  *Value_Str
    //  *Value_Int
    //  *Value_Float_
    //  *Value_Boolean
    //  *Value_Time
    Value isValue_Value `protobuf_oneof:"value"`
}

type isValue_Value interface {
    isValue_Value()
    MarshalTo([]byte) (int, error)
    ProtoSize() int
}

type Value_Str struct {
    Str string `protobuf:"bytes,2,opt,name=str,proto3,oneof"`
}
type Value_Int struct {
    Int int64 `protobuf:"varint,7,opt,name=int,proto3,oneof"`
}
type Value_Float_ struct {
    Float_ float64 `protobuf:"fixed64,8,opt,name=float_,proto3,oneof"`
}
type Value_Boolean struct {
    Boolean bool `protobuf:"varint,9,opt,name=boolean,proto3,oneof"`
}
type Value_Time struct {
    Time *Timestamp `protobuf:"bytes,10,opt,name=time,oneof"`
}

Is it possible to make oneof to generate less structures for this case? For example:

// No need to generate isValue_Value interface if struct
// has only one field - the Value is an interface itself.
type Value interface {
    isValue_Value()
    MarshalTo([]byte) (int, error)
    ProtoSize() int
}
// Value_Str can be replaced with custom type instead of the struct.
// This type will have protobuf field id embedded in the serialization code.
type Value_Str string

// Value_Time can be a custom type the same way as Value_Str is.
// Serialization code will write appropriate field id and call MarshalTo on
// original Timestamp object.
type Value_Time Timestamp

awalterschulze · 2016-04-20T11:48:27Z

So each of the Value types will have implemented methods like so

func (v Value_Str) MarshalTo([]byte) (int, error) {
   ...
}

Looks pretty slick to me :)

Maybe change the example a little

message Value {
  oneof onevalue {
    string str = 1;
    int64  int = 2;
    double float_ = 3;
    bool boolean = 4;
    Timestamp time = 5;
  }
  string anotherstr = 6;
  oneof twovalue {
     string str2 = 7;
     int64  int2 = 8;
  }
}

Just so we can discriminate between the message struct and oneof value struct.

dennwc · 2016-04-20T12:29:20Z

Right :) The last example is a bit different, it will require a Value struct to be separated from OneValue/TwoValue interfaces. In the first example it is possible to inline Value struct serialization into each oneof sub type MarshalTo.

awalterschulze · 2016-04-20T12:37:46Z

I don't know why protobuf spec decided to not just make oneof a type of message, but here we are.

awalterschulze · 2016-04-20T13:04:50Z

We should link this issue #106

dennwc · 2016-04-20T13:08:20Z

@awalterschulze How hard it would be to implement this? It seems like it will add at least two additional parameters for generators (for both examples to work), and some check to detect if oneof is an only field in the message. Is there something else?

@stevvooe Maybe you have some other ideas to incorporate?

awalterschulze · 2016-04-20T13:12:04Z

@tamird what do you think?

I think we could have an extension. gogoproto.onlyoneof

This extension will give an error when applied to a message with more than one oneof or in fact any other fields that are not in the one oneof.

awalterschulze · 2016-04-20T13:13:34Z

Its not little effort, basically all the plugins are going to be touched and the proto/encode.go and proto/decode.go files are also going to see some action.

tamird · 2016-04-20T14:03:36Z

At a glance, I don't see the point. This change would not improve performance (since you still need an interface and so you're allocating, like it or not), and it doesn't really improve ergonomics either (if you want to improve ergonomics, a better alternative is to bring gogoproto.onlyone-style getters to oneof, see https://github.com/tamird/cockroach/commit/5abcff4cc8e00de49e3abe06753754fae7f28fa3#diff-060e95a3a2839243f16cfc13837a5913R221).

This proposal would make the generated code slightly prettier, but who cares?

awalterschulze · 2016-04-20T14:12:14Z

Ok and there we have the added restriction of the types of the fields that have to be different and then we are back at #103 which I think is probably still the easiest to implement.

dennwc · 2016-04-20T15:02:24Z

@tamird I thought that this fork is all about making the generated code faster AND prettier at the same time :)
To point the difference, you are doing this:

&RequestUnion_EndTransaction{&EndTransactionRequest{}}

When I propose to do just this:

EndTransactionRequest{}

You may see two things here:

You are generating two separate structures instead of one. This will make some small performance impact in any case.
They are pointers, but you actually may not want them to be pointers in oneof - if field in oneof exists with a certain type it must be valid or it doesn't make sense.

Even further, if performance of both cases is the same, why not to use more concise variant? Instead of generating setter/getter pair you can directly set or get your interface type.

tamird · 2016-04-20T15:08:59Z

OK, but this is negligible.
This is a good point; we can be smarter about the oneof's inner type. I filed this upstream last year but I got the usual "fuck you" response: oneof: generated code uses pointers for non-primitive types golang/protobuf#78.

Making the code prettier seems to be a nice-to-have, and in this case, it's going to be a lot of work. Generating convenient setters and getters is going to be a much easier solution to the ergonomics problem, and addressing the inner allocation can also be done without nearly as much rejiggering as you're suggesting.

tamird · 2016-04-20T15:10:09Z

Ah, that upstream issue already exists in this repo as well: #106

awalterschulze · 2016-04-20T15:41:48Z

Speed is gained by being able to use the struct that you want (pretty code) to use, since then you don't have to copy between the struct you want to use and the one you have to use.

But it will definitely be much less work to add some methods and even change the message pointer thing.

dennwc · 2016-04-20T16:01:19Z

I will try to implement it regardless of the amount of work needed - it's will make things much easier, as it seems to cover all three issues discussed.

But as the getter/setter approach can be done significantly faster we could do it as a separate extension at the first place.

What do you think? Or it may wait some time for me to implement another approach?

stevvooe · 2016-04-20T20:57:46Z

@dennwc I haven't thought this one through. Under usage of oneof, there have been complaints of verbosity, but I have not yet "internalized" these problems to provide usable feedback.

In general, my approach here would be to work it back from the desired usage.

As far as I see it, the problems come down from the following:

Literals using oneof fields are verbose and error prone. For example, we have this:

&Foo{
   OneOfField: &OneOfField_SomeNameForThatType{
     SomeType: ActualType{},
   }
}

Unpacking requires the usage of special types for a switch or getter+nil checks:

switch v := msg.OneOfField.(type) {
case *OneOfField_SomeNameForThatType:
  // further unpacking is required here
  v.ActualThingYouAreInterestedIn
/* ... */
}

In both cases, it is clear to me that the issue is the intermediary types, both for usability and performance.

I would suggest getting rid of the pointers. Most of the time, these just contain another pointer. Our unpacking code becomes this:
```
switch v := msg.Field.(type) {
case OneOfField_SomeNameForThatType:
  // further unpacking is required here
  v.ActualThingYouAreInterestedIn
/* ... */
}
```
This is small, but it may have an allocation and performance impact.
The next thing I would look at is getting rid of the actual types by looking at the target pack/unpack code for the field. For literals, we want this:
```
&Foo{
   OneOfField: &ActualType{},
}
```
And for unpacking, we want this:
```
switch v := msg.OneOfField.(type) {
case *ActualType:
}
```
The only way I can think of to do this is add an interface to the generated ActualType. Imagine having interface { oneOfOneOfFieldUnion() }. This works very well for cases where all the types of a generated field may be together. It becomes problematic with types that are generated outside of the current package (which are problematic for other reasons anyways).

That said, we could generate these methods for the local types and create holders for the non-local types. Better yet, we only need to do a holder for each type, not each occurrence of referencing that type in a oneof. For example, we might generate the following for an int64:
```
type Int64 int64

func (Int64) oneOfOneOfFieldUnion()
```
So, for primitives, we only require a cast. What about full structs? We can probably suffer a cast, as well, as long as we regenerate the other methods to satisfy marshaling interfaces. Alternatively, the Cast method can be used to get to the actual serialization methods:
```
type SomeMessageLocal package.SomeMessage

func (SomeMessageLocal) oneOfOneOfFieldUnion()
// regenerate any other methods that are lost by type alias

func (m *SomeMessageLocal) Cast() *package.SomeMessage { // get back original 
  return (*package.SomeMessage)(m)
}
```
There may be some cost here, since casting large structures may cost cycles or allocations if pointers are not used.

This gives us nice literals and type switching, at the cost of a few casts. For common cases, where messages are all declared in the same file, there is no cost:
```
&Foo{
   OneOfField: &ActualType{}, // declared in file
}

&Foo{
   OneOfField: Int64(42), // primitive cast
}

&Foo{
   OneOfField: SomeMessageLocal(package.SomeMessage), 
}
```
Type switches become nice, as well:
```
switch v := msg.OneOfField.(type) {
case *ActualType:
case Int64:
case *SomeMessageLocal:
  // must cast here
}
```

There are some details to work out but this the direction I would explore.

I am not sure about #103. Restrictions that affect data structure design for the target generation language are problematic.

@tamird In general, I disagree with your dismissal of aesthetics and ergonomics. Ergonomics/usability/aesthetics are a fantastic tool when the performant, correctness and pretty overlap. ;)

This is a good point; we can be smarter about the oneof's inner type. I filed this upstream last year but I got the usual "fuck you" response: oneof: generated code uses pointers for non-primitive types golang/protobuf#78.

@tamird You are not alone in receiving this style of response.

tamird · 2016-04-20T21:03:44Z

I would suggest getting rid of the pointers. Most of the time, these just contain another pointer....This is small, but it may have an allocation and performance impact.

This change would not reduce allocations; the thing you're setting is an interface type, so it's incurring an allocation whether it's a value or a pointer. Implementing interfaces on values is less type safe than implementing them on pointers, though.

It becomes problematic with types that are generated outside of the current package (which are problematic for other reasons anyways).

Are they problematic for other reasons? What other reasons? I think this is probably exactly the reason for this API being what it is.

Your suggestions for working around that are probably doable, but I'd be against that level of complexity in the interface. Also, I haven't dismissed ergonomics, only suggested an alternative. But yes, I stand by my rejection of prettification of generated code.

stevvooe · 2016-04-20T21:20:06Z

This change would not reduce allocations; the thing you're setting is an interface type, so it's incurring an allocation whether it's a value or a pointer.

With a value literal that gets immediately set to a field, we avoid the allocation (I may be wrong). Is there not an extra indirection when the union values have pointers?

Are they problematic for other reasons? What other reasons? I think this is probably exactly the reason for this API being what it is.

As I said, I haven't really thought this through, so I am not sure. Indeed, this is probably the reasoning behind solution arrived at.

Another alternative may be to generate helper functions that look up the correct type for literals:

&Foo{
  OneOfField: OneOfFieldValue(&AnotherType{}),
}

That solves the literal problem. Perhaps, we can solve the type switch problem similarly:

switch v := UnpackOneOfField(v).(type) {
case *AnotherType:
case int64:
case *package.SomeMessage:
}

The value and unpack functions just get generated to assist in hiding the interim type.

Your suggestions for working around that are probably doable, but I'd be against that level of complexity in the interface.

We also end up generating fewer new types. In cases where external types are, the user simply has to cast. Yes, it is complex in that we have to handle package internal and external types differently, but we already have some complexity.

From this perspective, I agree that my suggestion is non-ideal.

But yes, I stand by my rejection of prettification of generated code.

The most common reaction I get when trying to convince others to adopt protobufs is that it is "ugly". It was my original reaction, as well. It is not rational or logical, but it is a hurdle.

tamird · 2016-04-20T21:22:18Z

With a value literal that gets immediately set to a field, we avoid the allocation (I may be wrong).

This is correct, but irrelevant. You cannot set a value to a field here because you need polymorphism.

Is there not an extra indirection when the union values have pointers?

Are you talking about the inner pointers or the outer? If you're talking about the inner then I agree, and that is #106.

stevvooe · 2016-04-20T21:25:45Z

@tamird

This is correct, but irrelevant. You cannot set a value to a field here because you need polymorphism.

The concrete union type does not need to be a pointer and provides polymorphism.

I think we're on the same page, however.

awalterschulze · 2016-04-21T05:46:28Z

So it seems that in all your examples there are some common themes.
I want to know whether these are the use cases we are trying to cover.
Because each of them bring more possible optimizations or prettifications.

Each oneof is the only oneof in a message.
This oneof contains the only fields that are in the message.
Each of the fields have a different type.

With a message conforming to:

all 3 we can bring back onlyone, but for oneof
1 and 2 we and do @dennwc original suggestion
none: the inner pointer thing can be solved for the general case
or we could do something else.

Sorry this might be a bit of tangent on your discussion, but to me this makes the options we have clearer.
People can enable the new oneof extension when they are allowed to and errors can easily be detected by the generator.

stevvooe · 2016-04-21T20:16:29Z

@awalterschulze Part of the problem here (and with my suggestions) is that oneof is not actually a type union. It is a set of mutually exclusive fields. With Go's generation, we go down the path of trying to use types to do this, when every other implementation is using tagged union. If we continue down the path of using types here, and seek these optimizations, the usage is not going to match the abstraction.

After thinking about this, I'm wondering if, perhaps, the current solution is fine but we just need better naming generation and documentation around they map. Let's take the following:

message Foo {
  oneof value {
    string a = 1;
    string b = 2;
  }
}

The above generates the following:

type Foo struct {
    // Types that are valid to be assigned to Value:
    //  *Foo_A
    //  *Foo_B
    Value isFoo_Value `protobuf_oneof:"value"`
}

func (m *Foo) Reset()         { *m = Foo{} }
func (m *Foo) String() string { return proto.CompactTextString(m) }
func (*Foo) ProtoMessage()    {}

type isFoo_Value interface {
    isFoo_Value()
}

type Foo_A struct {
    A string `protobuf:"bytes,1,opt,name=a,proto3,oneof"`
}
type Foo_B struct {
    B string `protobuf:"bytes,2,opt,name=b,proto3,oneof"`
}

func (*Foo_A) isFoo_Value() {} 
func (*Foo_B) isFoo_Value() {}

We end up getting Foo_A and Foo_B, with no mention of the target of where that might land. If an API user is asking which field to set for Value, simple code completion is completely obtuse. If these were generated as FooValue_A and FooValue_B, or something like that, the usage would be obvious.

Furthermore, I think the original proposal from @dennwc, of using type declarations and casts, is more than sufficient to remove the pointer lookup situation. We still need to generate a type for each field name, but it will remove some overhead of having to do struct literals and have an extra indirection.

awalterschulze · 2016-04-22T08:07:07Z

Yes FooValue would be a much better naming scheme.

Interfaces do add another pointer, but is one of the few ways of mapping the oneof union typing without breaking anything or making any assumptions.

@DenWC solution requires assumptions 1 and 2. Namely:

Each oneof is the only oneof in a message.
This oneof contains the only fields that are in the message.

Personally I find myself having assumptions 1 and 3 typically, but its then easy to get to 1,2 and 3 by simply adding the extra fields in a wrapping message.

So yes oneof is not a type union, but what I am asking is whether that is the typical usecase.
If it is the typical usecase to have assumptions 1, 2 and 3 satisfied then

why don't we make the typical usecase nice.
else
fair enough, we can at least elliminate the message pointers
and can we do something else

awalterschulze · 2016-04-22T08:12:12Z

Here is another solution that only makes assumption 3 in the Set methods.

message Foo {
  oneof value {
    string a = 1;
    string b = 2;
  }
  oneof value2 {
    string c = 3;
    string d = 4;
  }
}

type Foo struct {
   Value Value
   Value2 Value2
}

type Value struct {
   A *string
   B *string
}

func (this *Foo) GetValue() interface{} {
  if this.A != nil {
    return *this.A
  }
  if this.B != nil {
    return *this.B
  }
  return nil
}

func (this *Foo) SetValue(interface{}) {
  ... type switch
}

type Value2 struct {
   C *string
   D *string
}

func (this *Foo) GetValue2() interface{} {
  if this.C != nil {
    return *this.C
  }
  if this.D != nil {
    return *this.D
  }
  return nil
}

func (this *Foo) SetValue2(interface{}) {
  ... type switch
}

dennwc · 2016-04-22T08:19:34Z

I think It's a bit confusing, because in this case you can actually set both fields.

The second point is that GetValue will return string type for both A and B field and it cannot be checked whatever it comes from field A or B.

This might be safer:

type Foo struct {
   Value Value
   Value2 Value2
}

type Value interface{
 isFooValue()
}

type FooValueA string
func (FooValueA) isFooValue(){}

type FooValueB string
func (FooValueB) isFooValue(){}

// same for Value2

awalterschulze · 2016-04-22T08:22:17Z

Ok sorry my bug.

message Foo {
  oneof value {
    string a = 1;
    int64 b = 2;
  }
  oneof value2 {
    string c = 3;
    int64 d = 4;
  }
}

type Foo struct {
   Value Value
   Value2 Value2
}

type Value struct {
   A *string
   B *int64
}

func (this *Foo) GetValue() interface{} {
  if this.A != nil {
    return *this.A
  }
  if this.B != nil {
    return *this.B
  }
  return nil
}

func (this *Foo) SetValue(interface{}) {
  ... type switch
}

type Value2 struct {
   C *string
   D *int64
}

func (this *Foo) GetValue2() interface{} {
  if this.C != nil {
    return *this.C
  }
  if this.D != nil {
    return *this.D
  }
  return nil
}

func (this *Foo) SetValue2(interface{}) {
  ... type switch
}

awalterschulze · 2016-04-22T08:22:34Z

Yes you can set both values, but this can give an error when marshaling.
But you definitely have a point.

dennwc · 2016-04-22T08:36:09Z

Would it be faster?

To give a context, we have a use case for goprotobuf when generated structures are exposed as part of our API. In this case other devs not familiar with protobuf will see both fields in oneof structure and they might try to set each of them. The problem is that the code will actually compile, but will lead to a runtime error later. I think this should be checked in compile time with the union interface.

awalterschulze · 2016-04-22T08:47:35Z

I am not particularly sure if it would be faster, probably not.

Good to know about your use case, this is what I am after.

awalterschulze · 2016-05-15T07:12:28Z

oneof is part of protoc 2.6.1

I don't see how the extra constraint of having the field named as 'value' gives us anything in return.

Would #106 also solve your problem?
Since then there is only one pointer.

jaekwon · 2016-08-19T22:39:27Z

Having the field be named "value" isn't a required constraint, but just a hint that we can use to tell the Protoc compiler to produce the following, instead of what's written in #168, when all three conditions are met. (Well, it's a good constraint because it lets the coder opt into the following behavior)

type Foo interface {
  isFooValue()
}

type Foo_A string
func (Foo_A) isFooValue(){}

type Foo_B string
func (Foo_B) isFooValue(){}

// we're assuming there is no Value2 field, as per the 3 conditions I noted

#106 would solve the potential memory/GC performance issue, which maybe is 50% of the problem. The other problem is that the indirection is arguably needless, and it creates ugly code.

awalterschulze · 2016-08-21T11:28:00Z

I agree the current implementation is slow and ugly.

dennwc · 2016-08-22T14:05:41Z

I'm working on it right now. There are few problems, mostly reflect-related (jsonpb, GetProperties, etc). Will see if they can be omitted.

awalterschulze · 2016-08-24T17:20:37Z

Excited to see progress :)

atombender · 2016-11-14T02:51:27Z

I'd love to see some progress on this. One big problem (perhaps already mentioned in this thread, I didn't read all the comments super carefully) with the current approach is that there's no easy way to wrap or unwrap a oneof, because every "wrapper" is its own type.

In order to get at the wrapped value, you have to write a big switch like the commit that @tamird mentions, and be careful to keep it in sync with the generated types. You can probably hack up some generic reflection-based get/set functions, but it's not going to be pretty, or fast, or user-friendly.

I really like the approach suggested by the OP, and hope this is what gets implemented.

dennwc · 2016-11-14T12:22:50Z

Sorry, was a bit distracted by work matters.

I will continue to work on this soon.

awalterschulze · 2016-11-20T11:54:01Z

@atombender Who or what is OP?

@dennwc I am still excited to see what comes out. These things are typically more work than they seem so good luck.

atombender · 2016-11-20T16:41:29Z

OP is "original poster", meaning @dennwc.

jaekwon · 2017-01-10T21:00:34Z

What's the status, @dennwc, and what approach are you taking?

jaekwon · 2018-03-12T15:25:15Z

A lot of learning has happened since my comment since, and I've come to realize that what we really need is actually protobuf4.

Here's our approach: https://github.com/tendermint/go-amino

Amino removes "oneof" and supports something called "interfaces" natively. It also has a ton of other features. Please check it out, we're looking for constructive feedback (or support)!

atombender · 2018-03-12T16:38:37Z

@jaekwon Looks promising, and I wish this could be used as the direction for Protobuf4, but you're going to be fighting an uphill battle here. Much of the point of using Protobuf is the industry support for it, not just the technical aspects. I, and the companies I work for, would never adopt your project, simply because it doesn't have a major player behind it that can ensure that there's wide language support, continued maintenance, evangelism, and so on.

awalterschulze · 2018-03-17T12:04:30Z

@jaekwon If you do make a new serialization format, please make sure to investigate faster varint encodings. There are quite a few variants and they have interesting tradeoffs.

awalterschulze · 2018-03-17T12:06:43Z

For example
https://github.com/pascaldekloe/flit

…chResponse All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is ideal, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and its time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While this isn't ideal, I think we've waited long enough (~3 years) for a resolution on those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a 5% improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of 2.4%. Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best case for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects.

27112: roachpb: replace `gogoproto.onlyone` with `oneof` in BatchRequest/BatchResponse r=nvanbenschoten a=nvanbenschoten All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is great, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and it's time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to **328 bytes** and `ResponseUnion` has grown to **320 bytes**. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes each). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like #4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While it we be nice for this to be resolved upstream, I think we've waited long enough (~3 years) for a resolution to those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a **5%** improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of **2.4%**. Running kv100 showed a much more dramatic improvement of **18%** on the three-node GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best-case scenario for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects. 27114: opt/sql: fix explain analyze missing option r=asubiotto a=asubiotto ConstructExplain previously ignored the ANALYZE option so any EXPLAIN ANALYZE statement would result in execution as an EXPLAIN (DISTSQL) statement. The ANALYZE option is now observed in ConstructExplain. Additionally, the stmtType field from the explainDistSQLNode has been removed because it was not necessary and it was unclear how to pass this from the `execFactory`. Release note: None 27116: Makefile: learn that roachtest depends on optimizer-generated code r=benesch a=benesch As mentioned in cd4415c, the Makefile will one day be smart enough to deduce this on its own, but for now it's simpler to explicitly list the commands that require generated code. Note that the simple but coarse solution of assuming that all commands depend on generated code is inviable as some of these commands are used to generate the code in the first place. Release note: None 27119: storage: extract replica unlinking into store method r=tschottdorf a=benesch Extract some code that was duplicated in three places into a dedicated helper method. Prerequisite for #27061. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

tamird mentioned this issue Apr 20, 2016

Vote For New Feature: generate getter for oneof when no type repeats #103

Closed

stevvooe mentioned this issue Apr 20, 2016

gostyle: provide option to generate golint compatible protobufs #149

Open

9 tasks

awalterschulze added the question label Aug 21, 2016

This was referenced Nov 6, 2016

optimization: messages in a oneof wrapping struct do not need to be pointers #106

Closed

Feature request: Extension to create setter for oneof #226

Closed

awalterschulze mentioned this issue Feb 22, 2018

protoc-gen-go: generate Setters for oneof fields golang/protobuf#283

Open

yurishkuro mentioned this issue Jun 6, 2018

Replace domain model with Protobuf/gogo-generated model jaegertracing/jaeger#856

Merged

3 tasks

nvanbenschoten mentioned this issue Jul 2, 2018

roachpb: replace gogoproto.onlyone with oneof in BatchRequest/BatchResponse cockroachdb/cockroach#27112

Merged

guevara mentioned this issue Jul 31, 2019

For Sum Types: Golang's multiple return parameters are overrated guevara/read-it-later#4895

Open

ysimonson mentioned this issue Sep 11, 2019

Use protobuf's oneof where appropriate pachyderm/pachyderm#3859

Closed

guevara mentioned this issue Nov 7, 2019

For Sum Types: Golang's multiple return parameters are overrated guevara/read-it-later#5646

Open

phemmer mentioned this issue Jan 29, 2020

Oneof still generates a getter when explicitly disabled #662

Open

oneof: generate better code #168

oneof: generate better code #168

Comments

dennwc commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

dennwc commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

dennwc commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

tamird commented Apr 20, 2016 • edited Loading

awalterschulze commented Apr 20, 2016 • edited Loading

dennwc commented Apr 20, 2016 • edited Loading

tamird commented Apr 20, 2016

tamird commented Apr 20, 2016

awalterschulze commented Apr 20, 2016

dennwc commented Apr 20, 2016

stevvooe commented Apr 20, 2016

tamird commented Apr 20, 2016

stevvooe commented Apr 20, 2016

tamird commented Apr 20, 2016

stevvooe commented Apr 20, 2016

awalterschulze commented Apr 21, 2016 • edited Loading

stevvooe commented Apr 21, 2016

awalterschulze commented Apr 22, 2016

awalterschulze commented Apr 22, 2016 • edited Loading

dennwc commented Apr 22, 2016

awalterschulze commented Apr 22, 2016

awalterschulze commented Apr 22, 2016 • edited Loading

dennwc commented Apr 22, 2016 • edited Loading

awalterschulze commented Apr 22, 2016

awalterschulze commented May 15, 2016

jaekwon commented Aug 19, 2016

awalterschulze commented Aug 21, 2016

dennwc commented Aug 22, 2016

awalterschulze commented Aug 24, 2016

atombender commented Nov 14, 2016

dennwc commented Nov 14, 2016

awalterschulze commented Nov 20, 2016

atombender commented Nov 20, 2016

jaekwon commented Jan 10, 2017

jaekwon commented Mar 12, 2018 • edited Loading

atombender commented Mar 12, 2018 • edited Loading

awalterschulze commented Mar 17, 2018

awalterschulze commented Mar 17, 2018

tamird commented Apr 20, 2016 •

edited

Loading

awalterschulze commented Apr 20, 2016 •

edited

Loading

dennwc commented Apr 20, 2016 •

edited

Loading

awalterschulze commented Apr 21, 2016 •

edited

Loading

awalterschulze commented Apr 22, 2016 •

edited

Loading

awalterschulze commented Apr 22, 2016 •

edited

Loading

dennwc commented Apr 22, 2016 •

edited

Loading

jaekwon commented Mar 12, 2018 •

edited

Loading

atombender commented Mar 12, 2018 •

edited

Loading