Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneof: generate better code #168

Open
dennwc opened this issue Apr 20, 2016 · 43 comments
Open

oneof: generate better code #168

dennwc opened this issue Apr 20, 2016 · 43 comments
Labels

Comments

@dennwc
Copy link
Contributor

dennwc commented Apr 20, 2016

I found that oneof is useful to represent some value interface on the Go side. For example:

message Value {
  oneof value {
    string str = 1;
    int64  int = 2;
    double float_ = 3;
    bool boolean = 4;
    Timestamp time = 5;
  }
}

This will generate the following code:

type Value struct {
    // Types that are valid to be assigned to Value:
    //  *Value_Str
    //  *Value_Int
    //  *Value_Float_
    //  *Value_Boolean
    //  *Value_Time
    Value isValue_Value `protobuf_oneof:"value"`
}

type isValue_Value interface {
    isValue_Value()
    MarshalTo([]byte) (int, error)
    ProtoSize() int
}

type Value_Str struct {
    Str string `protobuf:"bytes,2,opt,name=str,proto3,oneof"`
}
type Value_Int struct {
    Int int64 `protobuf:"varint,7,opt,name=int,proto3,oneof"`
}
type Value_Float_ struct {
    Float_ float64 `protobuf:"fixed64,8,opt,name=float_,proto3,oneof"`
}
type Value_Boolean struct {
    Boolean bool `protobuf:"varint,9,opt,name=boolean,proto3,oneof"`
}
type Value_Time struct {
    Time *Timestamp `protobuf:"bytes,10,opt,name=time,oneof"`
}

Is it possible to make oneof to generate less structures for this case? For example:

// No need to generate isValue_Value interface if struct
// has only one field - the Value is an interface itself.
type Value interface {
    isValue_Value()
    MarshalTo([]byte) (int, error)
    ProtoSize() int
}
// Value_Str can be replaced with custom type instead of the struct.
// This type will have protobuf field id embedded in the serialization code.
type Value_Str string

// Value_Time can be a custom type the same way as Value_Str is.
// Serialization code will write appropriate field id and call MarshalTo on
// original Timestamp object.
type Value_Time Timestamp
@awalterschulze
Copy link
Member

So each of the Value types will have implemented methods like so

func (v Value_Str) MarshalTo([]byte) (int, error) {
   ...
}

Looks pretty slick to me :)

Maybe change the example a little

message Value {
  oneof onevalue {
    string str = 1;
    int64  int = 2;
    double float_ = 3;
    bool boolean = 4;
    Timestamp time = 5;
  }
  string anotherstr = 6;
  oneof twovalue {
     string str2 = 7;
     int64  int2 = 8;
  }
}

Just so we can discriminate between the message struct and oneof value struct.

@dennwc
Copy link
Contributor Author

dennwc commented Apr 20, 2016

Right :) The last example is a bit different, it will require a Value struct to be separated from OneValue/TwoValue interfaces. In the first example it is possible to inline Value struct serialization into each oneof sub type MarshalTo.

@awalterschulze
Copy link
Member

I don't know why protobuf spec decided to not just make oneof a type of message, but here we are.

@awalterschulze
Copy link
Member

We should link this issue #106

@dennwc
Copy link
Contributor Author

dennwc commented Apr 20, 2016

@awalterschulze How hard it would be to implement this? It seems like it will add at least two additional parameters for generators (for both examples to work), and some check to detect if oneof is an only field in the message. Is there something else?

@stevvooe Maybe you have some other ideas to incorporate?

@awalterschulze
Copy link
Member

@tamird what do you think?

I think we could have an extension. gogoproto.onlyoneof

This extension will give an error when applied to a message with more than one oneof or in fact any other fields that are not in the one oneof.

@awalterschulze
Copy link
Member

Its not little effort, basically all the plugins are going to be touched and the proto/encode.go and proto/decode.go files are also going to see some action.

@tamird
Copy link
Contributor

tamird commented Apr 20, 2016

At a glance, I don't see the point. This change would not improve performance (since you still need an interface and so you're allocating, like it or not), and it doesn't really improve ergonomics either (if you want to improve ergonomics, a better alternative is to bring gogoproto.onlyone-style getters to oneof, see https://github.com/tamird/cockroach/commit/5abcff4cc8e00de49e3abe06753754fae7f28fa3#diff-060e95a3a2839243f16cfc13837a5913R221).

This proposal would make the generated code slightly prettier, but who cares?

@awalterschulze
Copy link
Member

awalterschulze commented Apr 20, 2016

Ok and there we have the added restriction of the types of the fields that have to be different and then we are back at #103 which I think is probably still the easiest to implement.

@dennwc
Copy link
Contributor Author

dennwc commented Apr 20, 2016

@tamird I thought that this fork is all about making the generated code faster AND prettier at the same time :)
To point the difference, you are doing this:

&RequestUnion_EndTransaction{&EndTransactionRequest{}}

When I propose to do just this:

EndTransactionRequest{}

You may see two things here:

  1. You are generating two separate structures instead of one. This will make some small performance impact in any case.
  2. They are pointers, but you actually may not want them to be pointers in oneof - if field in oneof exists with a certain type it must be valid or it doesn't make sense.

Even further, if performance of both cases is the same, why not to use more concise variant? Instead of generating setter/getter pair you can directly set or get your interface type.

@tamird
Copy link
Contributor

tamird commented Apr 20, 2016

  1. OK, but this is negligible.
  2. This is a good point; we can be smarter about the oneof's inner type. I filed this upstream last year but I got the usual "fuck you" response: oneof: generated code uses pointers for non-primitive types golang/protobuf#78.

Making the code prettier seems to be a nice-to-have, and in this case, it's going to be a lot of work. Generating convenient setters and getters is going to be a much easier solution to the ergonomics problem, and addressing the inner allocation can also be done without nearly as much rejiggering as you're suggesting.

@tamird
Copy link
Contributor

tamird commented Apr 20, 2016

Ah, that upstream issue already exists in this repo as well: #106

@awalterschulze
Copy link
Member

Speed is gained by being able to use the struct that you want (pretty code) to use, since then you don't have to copy between the struct you want to use and the one you have to use.

But it will definitely be much less work to add some methods and even change the message pointer thing.

@dennwc
Copy link
Contributor Author

dennwc commented Apr 20, 2016

I will try to implement it regardless of the amount of work needed - it's will make things much easier, as it seems to cover all three issues discussed.

But as the getter/setter approach can be done significantly faster we could do it as a separate extension at the first place.

What do you think? Or it may wait some time for me to implement another approach?

@stevvooe
Copy link
Contributor

@dennwc I haven't thought this one through. Under usage of oneof, there have been complaints of verbosity, but I have not yet "internalized" these problems to provide usable feedback.

In general, my approach here would be to work it back from the desired usage.

As far as I see it, the problems come down from the following:

  1. Literals using oneof fields are verbose and error prone. For example, we have this:

    &Foo{
       OneOfField: &OneOfField_SomeNameForThatType{
         SomeType: ActualType{},
       }
    }
  2. Unpacking requires the usage of special types for a switch or getter+nil checks:

    switch v := msg.OneOfField.(type) {
    case *OneOfField_SomeNameForThatType:
      // further unpacking is required here
      v.ActualThingYouAreInterestedIn
    /* ... */
    }

In both cases, it is clear to me that the issue is the intermediary types, both for usability and performance.

  1. I would suggest getting rid of the pointers. Most of the time, these just contain another pointer. Our unpacking code becomes this:

    switch v := msg.Field.(type) {
    case OneOfField_SomeNameForThatType:
      // further unpacking is required here
      v.ActualThingYouAreInterestedIn
    /* ... */
    }

    This is small, but it may have an allocation and performance impact.

  2. The next thing I would look at is getting rid of the actual types by looking at the target pack/unpack code for the field. For literals, we want this:

    &Foo{
       OneOfField: &ActualType{},
    }

    And for unpacking, we want this:

    switch v := msg.OneOfField.(type) {
    case *ActualType:
    }

    The only way I can think of to do this is add an interface to the generated ActualType. Imagine having interface { oneOfOneOfFieldUnion() }. This works very well for cases where all the types of a generated field may be together. It becomes problematic with types that are generated outside of the current package (which are problematic for other reasons anyways).

    That said, we could generate these methods for the local types and create holders for the non-local types. Better yet, we only need to do a holder for each type, not each occurrence of referencing that type in a oneof. For example, we might generate the following for an int64:

    type Int64 int64
    
    func (Int64) oneOfOneOfFieldUnion()

    So, for primitives, we only require a cast. What about full structs? We can probably suffer a cast, as well, as long as we regenerate the other methods to satisfy marshaling interfaces. Alternatively, the Cast method can be used to get to the actual serialization methods:

    type SomeMessageLocal package.SomeMessage
    
    func (SomeMessageLocal) oneOfOneOfFieldUnion()
    // regenerate any other methods that are lost by type alias
    
    func (m *SomeMessageLocal) Cast() *package.SomeMessage { // get back original 
      return (*package.SomeMessage)(m)
    }

    There may be some cost here, since casting large structures may cost cycles or allocations if pointers are not used.

    This gives us nice literals and type switching, at the cost of a few casts. For common cases, where messages are all declared in the same file, there is no cost:

    &Foo{
       OneOfField: &ActualType{}, // declared in file
    }
    
    &Foo{
       OneOfField: Int64(42), // primitive cast
    }
    
    &Foo{
       OneOfField: SomeMessageLocal(package.SomeMessage), 
    }

    Type switches become nice, as well:

    switch v := msg.OneOfField.(type) {
    case *ActualType:
    case Int64:
    case *SomeMessageLocal:
      // must cast here
    }

There are some details to work out but this the direction I would explore.

I am not sure about #103. Restrictions that affect data structure design for the target generation language are problematic.


@tamird In general, I disagree with your dismissal of aesthetics and ergonomics. Ergonomics/usability/aesthetics are a fantastic tool when the performant, correctness and pretty overlap. ;)

  1. This is a good point; we can be smarter about the oneof's inner type. I filed this upstream last year but I got the usual "fuck you" response: oneof: generated code uses pointers for non-primitive types golang/protobuf#78.

@tamird You are not alone in receiving this style of response.

@tamird
Copy link
Contributor

tamird commented Apr 20, 2016

I would suggest getting rid of the pointers. Most of the time, these just contain another pointer....This is small, but it may have an allocation and performance impact.

This change would not reduce allocations; the thing you're setting is an interface type, so it's incurring an allocation whether it's a value or a pointer. Implementing interfaces on values is less type safe than implementing them on pointers, though.

It becomes problematic with types that are generated outside of the current package (which are problematic for other reasons anyways).

Are they problematic for other reasons? What other reasons? I think this is probably exactly the reason for this API being what it is.

Your suggestions for working around that are probably doable, but I'd be against that level of complexity in the interface. Also, I haven't dismissed ergonomics, only suggested an alternative. But yes, I stand by my rejection of prettification of generated code.

@stevvooe
Copy link
Contributor

This change would not reduce allocations; the thing you're setting is an interface type, so it's incurring an allocation whether it's a value or a pointer.

With a value literal that gets immediately set to a field, we avoid the allocation (I may be wrong). Is there not an extra indirection when the union values have pointers?

Are they problematic for other reasons? What other reasons? I think this is probably exactly the reason for this API being what it is.

As I said, I haven't really thought this through, so I am not sure. Indeed, this is probably the reasoning behind solution arrived at.

Another alternative may be to generate helper functions that look up the correct type for literals:

&Foo{
  OneOfField: OneOfFieldValue(&AnotherType{}),
}

That solves the literal problem. Perhaps, we can solve the type switch problem similarly:

switch v := UnpackOneOfField(v).(type) {
case *AnotherType:
case int64:
case *package.SomeMessage:
}

The value and unpack functions just get generated to assist in hiding the interim type.

Your suggestions for working around that are probably doable, but I'd be against that level of complexity in the interface.

We also end up generating fewer new types. In cases where external types are, the user simply has to cast. Yes, it is complex in that we have to handle package internal and external types differently, but we already have some complexity.

From this perspective, I agree that my suggestion is non-ideal.

But yes, I stand by my rejection of prettification of generated code.

The most common reaction I get when trying to convince others to adopt protobufs is that it is "ugly". It was my original reaction, as well. It is not rational or logical, but it is a hurdle.

@tamird
Copy link
Contributor

tamird commented Apr 20, 2016

With a value literal that gets immediately set to a field, we avoid the allocation (I may be wrong).

This is correct, but irrelevant. You cannot set a value to a field here because you need polymorphism.

Is there not an extra indirection when the union values have pointers?

Are you talking about the inner pointers or the outer? If you're talking about the inner then I agree, and that is #106.

@stevvooe
Copy link
Contributor

@tamird

This is correct, but irrelevant. You cannot set a value to a field here because you need polymorphism.

The concrete union type does not need to be a pointer and provides polymorphism.

I think we're on the same page, however.

@awalterschulze
Copy link
Member

awalterschulze commented Apr 21, 2016

So it seems that in all your examples there are some common themes.
I want to know whether these are the use cases we are trying to cover.
Because each of them bring more possible optimizations or prettifications.

  1. Each oneof is the only oneof in a message.
  2. This oneof contains the only fields that are in the message.
  3. Each of the fields have a different type.

With a message conforming to:

  • all 3 we can bring back onlyone, but for oneof
  • 1 and 2 we and do @dennwc original suggestion
  • none: the inner pointer thing can be solved for the general case
    or we could do something else.

Sorry this might be a bit of tangent on your discussion, but to me this makes the options we have clearer.
People can enable the new oneof extension when they are allowed to and errors can easily be detected by the generator.

@stevvooe
Copy link
Contributor

@awalterschulze Part of the problem here (and with my suggestions) is that oneof is not actually a type union. It is a set of mutually exclusive fields. With Go's generation, we go down the path of trying to use types to do this, when every other implementation is using tagged union. If we continue down the path of using types here, and seek these optimizations, the usage is not going to match the abstraction.

After thinking about this, I'm wondering if, perhaps, the current solution is fine but we just need better naming generation and documentation around they map. Let's take the following:

message Foo {
  oneof value {
    string a = 1;
    string b = 2;
  }
}

The above generates the following:

type Foo struct {
    // Types that are valid to be assigned to Value:
    //  *Foo_A
    //  *Foo_B
    Value isFoo_Value `protobuf_oneof:"value"`
}

func (m *Foo) Reset()         { *m = Foo{} }
func (m *Foo) String() string { return proto.CompactTextString(m) }
func (*Foo) ProtoMessage()    {}

type isFoo_Value interface {
    isFoo_Value()
}

type Foo_A struct {
    A string `protobuf:"bytes,1,opt,name=a,proto3,oneof"`
}
type Foo_B struct {
    B string `protobuf:"bytes,2,opt,name=b,proto3,oneof"`
}

func (*Foo_A) isFoo_Value() {} 
func (*Foo_B) isFoo_Value() {} 

We end up getting Foo_A and Foo_B, with no mention of the target of where that might land. If an API user is asking which field to set for Value, simple code completion is completely obtuse. If these were generated as FooValue_A and FooValue_B, or something like that, the usage would be obvious.

Furthermore, I think the original proposal from @dennwc, of using type declarations and casts, is more than sufficient to remove the pointer lookup situation. We still need to generate a type for each field name, but it will remove some overhead of having to do struct literals and have an extra indirection.

@awalterschulze
Copy link
Member

Yes FooValue would be a much better naming scheme.

Interfaces do add another pointer, but is one of the few ways of mapping the oneof union typing without breaking anything or making any assumptions.

@DenWC solution requires assumptions 1 and 2. Namely:

  • Each oneof is the only oneof in a message.
  • This oneof contains the only fields that are in the message.

Personally I find myself having assumptions 1 and 3 typically, but its then easy to get to 1,2 and 3 by simply adding the extra fields in a wrapping message.

So yes oneof is not a type union, but what I am asking is whether that is the typical usecase.
If it is the typical usecase to have assumptions 1, 2 and 3 satisfied then

  • why don't we make the typical usecase nice.
    else
  • fair enough, we can at least elliminate the message pointers
  • and can we do something else

@awalterschulze
Copy link
Member

awalterschulze commented Apr 22, 2016

Here is another solution that only makes assumption 3 in the Set methods.

message Foo {
  oneof value {
    string a = 1;
    string b = 2;
  }
  oneof value2 {
    string c = 3;
    string d = 4;
  }
}
type Foo struct {
   Value Value
   Value2 Value2
}

type Value struct {
   A *string
   B *string
}

func (this *Foo) GetValue() interface{} {
  if this.A != nil {
    return *this.A
  }
  if this.B != nil {
    return *this.B
  }
  return nil
}

func (this *Foo) SetValue(interface{}) {
  ... type switch
}

type Value2 struct {
   C *string
   D *string
}

func (this *Foo) GetValue2() interface{} {
  if this.C != nil {
    return *this.C
  }
  if this.D != nil {
    return *this.D
  }
  return nil
}

func (this *Foo) SetValue2(interface{}) {
  ... type switch
}

@dennwc
Copy link
Contributor Author

dennwc commented Apr 22, 2016

I think It's a bit confusing, because in this case you can actually set both fields.

The second point is that GetValue will return string type for both A and B field and it cannot be checked whatever it comes from field A or B.

This might be safer:

type Foo struct {
   Value Value
   Value2 Value2
}

type Value interface{
 isFooValue()
}

type FooValueA string
func (FooValueA) isFooValue(){}

type FooValueB string
func (FooValueB) isFooValue(){}

// same for Value2

@awalterschulze
Copy link
Member

Ok sorry my bug.

message Foo {
  oneof value {
    string a = 1;
    int64 b = 2;
  }
  oneof value2 {
    string c = 3;
    int64 d = 4;
  }
}
type Foo struct {
   Value Value
   Value2 Value2
}

type Value struct {
   A *string
   B *int64
}

func (this *Foo) GetValue() interface{} {
  if this.A != nil {
    return *this.A
  }
  if this.B != nil {
    return *this.B
  }
  return nil
}

func (this *Foo) SetValue(interface{}) {
  ... type switch
}

type Value2 struct {
   C *string
   D *int64
}

func (this *Foo) GetValue2() interface{} {
  if this.C != nil {
    return *this.C
  }
  if this.D != nil {
    return *this.D
  }
  return nil
}

func (this *Foo) SetValue2(interface{}) {
  ... type switch
}

@awalterschulze
Copy link
Member

awalterschulze commented Apr 22, 2016

Yes you can set both values, but this can give an error when marshaling.
But you definitely have a point.

@dennwc
Copy link
Contributor Author

dennwc commented Apr 22, 2016

Would it be faster?

To give a context, we have a use case for goprotobuf when generated structures are exposed as part of our API. In this case other devs not familiar with protobuf will see both fields in oneof structure and they might try to set each of them. The problem is that the code will actually compile, but will lead to a runtime error later. I think this should be checked in compile time with the union interface.

@awalterschulze
Copy link
Member

I am not particularly sure if it would be faster, probably not.

Good to know about your use case, this is what I am after.

@awalterschulze
Copy link
Member

oneof is part of protoc 2.6.1

I don't see how the extra constraint of having the field named as 'value' gives us anything in return.

Would #106 also solve your problem?
Since then there is only one pointer.

@jaekwon
Copy link

jaekwon commented Aug 19, 2016

Having the field be named "value" isn't a required constraint, but just a hint that we can use to tell the Protoc compiler to produce the following, instead of what's written in #168, when all three conditions are met. (Well, it's a good constraint because it lets the coder opt into the following behavior)

type Foo interface {
  isFooValue()
}

type Foo_A string
func (Foo_A) isFooValue(){}

type Foo_B string
func (Foo_B) isFooValue(){}

// we're assuming there is no Value2 field, as per the 3 conditions I noted

#106 would solve the potential memory/GC performance issue, which maybe is 50% of the problem. The other problem is that the indirection is arguably needless, and it creates ugly code.

@awalterschulze
Copy link
Member

I agree the current implementation is slow and ugly.

@dennwc
Copy link
Contributor Author

dennwc commented Aug 22, 2016

I'm working on it right now. There are few problems, mostly reflect-related (jsonpb, GetProperties, etc). Will see if they can be omitted.

@awalterschulze
Copy link
Member

Excited to see progress :)

@atombender
Copy link
Contributor

I'd love to see some progress on this. One big problem (perhaps already mentioned in this thread, I didn't read all the comments super carefully) with the current approach is that there's no easy way to wrap or unwrap a oneof, because every "wrapper" is its own type.

In order to get at the wrapped value, you have to write a big switch like the commit that @tamird mentions, and be careful to keep it in sync with the generated types. You can probably hack up some generic reflection-based get/set functions, but it's not going to be pretty, or fast, or user-friendly.

I really like the approach suggested by the OP, and hope this is what gets implemented.

@dennwc
Copy link
Contributor Author

dennwc commented Nov 14, 2016

Sorry, was a bit distracted by work matters.

I will continue to work on this soon.

@awalterschulze
Copy link
Member

@atombender Who or what is OP?

@dennwc I am still excited to see what comes out. These things are typically more work than they seem so good luck.

@atombender
Copy link
Contributor

OP is "original poster", meaning @dennwc.

@jaekwon
Copy link

jaekwon commented Jan 10, 2017

What's the status, @dennwc, and what approach are you taking?

@jaekwon
Copy link

jaekwon commented Mar 12, 2018

A lot of learning has happened since my comment since, and I've come to realize that what we really need is actually protobuf4.

Here's our approach: https://github.com/tendermint/go-amino

Amino removes "oneof" and supports something called "interfaces" natively. It also has a ton of other features. Please check it out, we're looking for constructive feedback (or support)!

@atombender
Copy link
Contributor

atombender commented Mar 12, 2018

@jaekwon Looks promising, and I wish this could be used as the direction for Protobuf4, but you're going to be fighting an uphill battle here. Much of the point of using Protobuf is the industry support for it, not just the technical aspects. I, and the companies I work for, would never adopt your project, simply because it doesn't have a major player behind it that can ensure that there's wide language support, continued maintenance, evangelism, and so on.

@awalterschulze
Copy link
Member

@jaekwon If you do make a new serialization format, please make sure to investigate faster varint encodings. There are quite a few variants and they have interesting tradeoffs.

@awalterschulze
Copy link
Member

For example
https://github.com/pascaldekloe/flit

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 3, 2018
…chResponse

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is ideal, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and its time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While this isn't ideal, I think we've waited long enough (~3 years) for a
resolution on those discussions. For now, we'll just generate the getters
and setters ourselves.

This change demonstrated about a 5% improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of 2.4%.
Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster.
I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best case for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 3, 2018
…chResponse

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is ideal, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and its time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to 328 bytes and `ResponseUnion` has grown to 320 bytes.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like cockroachdb#4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While this isn't ideal, I think we've waited long enough (~3 years) for a
resolution on those discussions. For now, we'll just generate the getters
and setters ourselves.

This change demonstrated about a 5% improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of 2.4%.
Running kv100 showed an even more dramatic improvement of 18% on the GCE cluster.
I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best case for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Jul 3, 2018
27112: roachpb: replace `gogoproto.onlyone` with `oneof` in BatchRequest/BatchResponse r=nvanbenschoten a=nvanbenschoten

All Requests and Responses pass through RequestUnion/ResponseUnion structs
when they are added to BatchRequests/BatchResponses. In order to ensure
that only one Request type can be assigned to one of these RequestUnion
or ResponseUnion structs, we currently use gogoproto's approach to tagged
unions: the `gogoproto.onlyone` option.

This option was introduced before proto3. Proto3
then added the `oneof` option, which for all intents and purposes addresses
the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof.
However, there is one major difference between the two options, which
is in their generated code. `gogoproto.onlyone` will generate
a single flat struct with pointers to each possible variant type.
`oneof` will generate a union interface and an interface "wrapper"
struct for each variant type. The effect of this is that `onlyone`
will generate code that looks like this:

```
type Union struct {
    Variant1 *Variant1Type
    Variant2 *Variant2Type
    ...
}
```

While `oneof` will generate code the looks like this:

```
type Union struct {
    Value isUnion_Value
}

type isUnion_Value interface {
    ...
}

type Union_Variant1 struct {
    Variant1 *Variant1Type
}

type Union_Variant2 struct {
    Variant2 *Variant2Type
}
```

There are pretty obvious tradeoffs to each. For one, `oneof` introduces an
extra layer of indirection, which forces an extra allocation. It also doesn't
generate particularly useful setters and getters. On the other hand, `onlyone`
creates a large struct that grows linearly with the number of variants.
Neither approach is great, and there has been **A LOT** of discussion on this:
- golang/protobuf#78
- golang/protobuf#283
- gogo/protobuf#103
- gogo/protobuf#168

Clearly neither approach is ideal, ergonomically or with regard to performance.
However, over time, the tradeoff has been getting worse for us and it's time we
consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These
structs have gotten huge as more and more request variants have been added:
`RequestUnion` has grown to **328 bytes** and `ResponseUnion` has grown to **320 bytes**.
It has gotten to the point where the wasted space is non-negligible.

This change switches over to `oneof` to shrink these union structs down to more
manageable sizes (16 bytes each). The downside of this is that in reducing the struct
size we end up introducing an extra allocation. This isn't great, but we can avoid
the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping
the allocation with that of the Request/Response itself. We've seen previous cases
like #4216 where adding in an extra allocation/indirection is a net-win if it
reduces a commonly used struct's size significantly.

The other downside to this change is that the ergonomics of `oneof` aren't quite
as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and
setters called `GetValue` and `SetValue` that provide access to the wrapped
`interface{}`, which we can assert to a `Request`. `oneof` doesn't provide
such facilities. This was the cause of a lot of the discussions linked above.
While it we be nice for this to be resolved upstream, I think we've waited long
enough (~3 years) for a resolution to those discussions. For now, we'll just
generate the getters and setters ourselves.

This change demonstrated about a **5%** improvement when running kv95 on my local
laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were
less pronounced but still present. kv95 showed a throughput improvement of **2.4%**.
Running kv100 showed a much more dramatic improvement of **18%** on the three-node
GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss
because the cluster remains empty, so it's the best-case scenario for this change. Still,
the impact was shocking.

Release note (performance improvement): Reduce the memory size of commonly used
Request and Response objects.

27114: opt/sql: fix explain analyze missing option r=asubiotto a=asubiotto

ConstructExplain previously ignored the ANALYZE option so any EXPLAIN
ANALYZE statement would result in execution as an EXPLAIN (DISTSQL)
statement. The ANALYZE option is now observed in ConstructExplain.

Additionally, the stmtType field from the explainDistSQLNode has been
removed because it was not necessary and it was unclear how to pass this
from the `execFactory`.

Release note: None

27116: Makefile: learn that roachtest depends on optimizer-generated code r=benesch a=benesch

As mentioned in cd4415c, the Makefile will one day be smart enough to
deduce this on its own, but for now it's simpler to explicitly list the
commands that require generated code. Note that the simple but coarse
solution of assuming that all commands depend on generated code is
inviable as some of these commands are used to generate the code in the
first place.

Release note: None

27119: storage: extract replica unlinking into store method r=tschottdorf a=benesch

Extract some code that was duplicated in three places into a dedicated
helper method. Prerequisite for #27061.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants