proto: remove or improve registration #268

awalterschulze · 2016-12-17T10:31:35Z

We started a discussion here
cf10ca0#commitcomment-20160444

The text was updated successfully, but these errors were encountered:

This provides a more reasonable API for obtaining a FileDescriptorProto and DescriptorProto for a given proto.Message — a process that is currently possible (but undocumented) using the public proto API. The major use case for obtaining a DescriptorProto is to decode custom message options, which are encoded as extensions on the MessageOptions submessage. (See https://developers.google.com/protocol-buffers/docs/proto#customoptions.) Fixes #179. PiperOrigin-RevId: 139356528

awalterschulze · 2016-12-17T10:32:51Z

I think Registering is breaking people's setups and I am wondering why this is needed in the first place

awalterschulze · 2016-12-17T10:33:25Z

@bcmills says

proto.MessageType and proto.MessageName require it. So does extension support, and ptypes.DynamicAny, and there may be other cases I'm not thinking of at the moment.

awalterschulze · 2016-12-17T10:33:57Z

Also to be concrete here one place it is breaking people's setups
#178 (comment)

awalterschulze · 2016-12-17T10:34:51Z

Register fixes #179 only because ExtensionsMap was removed, which was another controversial move.

awalterschulze · 2016-12-17T10:35:19Z

proto.MessageType and proto.MessageName are nice features, but not really necessary.
Extension support: ok yes maybe, but sacrificing some safety can also be used to get rid of Register.
How is register needed for DynamicAny?

bcmills · 2016-12-19T20:42:29Z

Register fixes #179 only because ExtensionsMap was removed,

Register has ~nothing to do with #179. #179 requires having a copy of the generated Descriptor proto package; the only impact that registration has on that is to require the generated Descriptor package to be the same one that other uses of Descriptor depend on. (But that's true in general anyway: within a given binary, there should be one and only one copy of each package needed. To do otherwise leads to bloated binaries.)

bcmills · 2016-12-19T20:47:52Z

proto.MessageType and proto.MessageName are nice features, but not really necessary.

This is a Chesterton's Fence. If you want to remove these methods, you should start by understanding what purpose they serve (or served). "Not really necessary" merits further explanation.

Extension support: ok yes maybe, but sacrificing some safety can also be used to get rid of Register.

Extensions are already error-prone; sacrificing safety is unlikely to help most users. But can you be more specific about the tradeoff you're proposing? What kinds of errors would it fail to detect?

How is register needed for DynamicAny?

DynamicAny requires the ability to map from a fully-qualified message name to a Go type. That is, it requires proto.MessageType.

awalterschulze · 2016-12-26T10:47:54Z

How does having one copy of the generated Descriptor work with go vendoring?

My statement about not really necessary was a little strong, sorry. Thank you for explaining DynamicAny's usage of of proto.MessageType.

I have not really thought it through, but don't you think there could be a way to do Extensions without Register, since the user is already passing in the E_generatedextensionstruct

Ok I didn't know about the implementation of DynamicAny using proto.MessageType. Another way might be doing something similar to how extensions work. Ask the user to pass in the correct type to unmarshal Any into. But that will probably be a big change.

bcmills · 2017-01-10T21:18:51Z

How does having one copy of the generated Descriptor work with go vendoring?

I honestly don't know. The proto package was designed well before vendoring support and in the context of predominantly "monorepo" usage, and I personally haven't had much exposure to Go vendoring tools.

I have not really thought it through, but don't you think there could be a way to do Extensions without Register, since the user is already passing in the E_generatedextensionstruct

Yeah, it does seem possible to do extensions without Register. Thinking about it some more it's likely more of an issue for the (Google-internal) "weak" field feature (the one corresponding to FieldOptions.weak in the descriptor), which is a precursor to extensions. So it might be possible to remove registration from the open-source version of extensions.

That still leaves the problem of DynamicAny, though. Given AnyMessageName and UnmarshalAny, it's not obvious to me whether DynamicAny is actually all that useful, but removing it would definitely not be backward-compatible.

rogeralsing · 2017-01-16T23:33:22Z

I'm not sure I get the context here, but using Any types are not supported in all languages, so having the ability to simulate the same behavior using an envelope message with typename and messagedata, you still need to peek into the registered types if doing cross platform integration.

awalterschulze · 2017-01-17T17:58:23Z

Thanks for your really cool response @bcmills
I really appreciate it :)

I wonder whether:

MarshalAny could have taken the messageName as input.
UnmarshalAny could have taken in a proto.Message not of type DynamicAny and not do any name checking.

Then it doesn't seem like Any needs to call "home" to the proto package

bcmills · 2017-01-17T19:25:20Z

I wonder whether:

MarshalAny could have taken the messageName as input.

Possible but error-prone. The message name is proto-path-qualified, so it isn't derivable from the Go name for the type; without registration there's no easy way to detect typo'd names.

However, MarshalAny would be easy to implement (with its current signature) without registration. Generated message types already include a method to access the file and message descriptors, which contain the path and name information. So MarshalAny is (fortunately) not a problem.

UnmarshalAny could have taken in a proto.Message not of type DynamicAny and not do any name checking.

It already supports that usage, and the name checking could be implemented without registration using the same technique as for MarshalAny.

The major use-case that DynamicAny addresses is for programs that use type-switches. With DynamicAny, one can write this code:

var ext ptypes.DynamicAny
if err := ptypes.UnmarshalAny(a, &ext); err := nil {
  …
}
switch m := ext.Message.(type) {
case *somepb.SomeProtoExtension:
  …
case *otherpb.SomeOtherExtension:
  …
default:
  …
}

and the compilation process will ensure that the extensions are valid types and the (implict) names are correct. Without DynamicAny, the function becomes:

switch ptypes.AnyMessageName(a) {
case "some.proto.path.SomeProtoExtension":
  m := new(somepb.SomeProtoExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

case "other.proto.path.SomeOtherExtension":
  m := new(otherpb.SomeOtherExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

default:
  …
}

and now we've lost the compile-time checking on the paths: we'll have to rely on test coverage to avoid typos in the strings. (We've also introduced a bunch of redundancy in the user code: the message types already know their paths, so why should the user repeat them?)

Or, we could avoid DynamicAny by trying each possibility in sequence:

var (
  someExt somepb.SomeProtoExtension
  otherExt otherpb.SomeOtherExtension
)
switch {
case ptypes.UnmarshalAny(a, &someExt) == nil:
  …

case ptypes.UnmarshalAny(a, &otherExt) == nil:
  …

default:
  …
}

but that's O(N) for what ought to be an O(1) switch and also requires fairly sophisticated compiler optimizations to avoid wasting memory. (In my experience, one should not expect such optimizations from the gc compiler.)

There is an equivalent (and somewhat more subtle) use-case for programs which use reflection to process protocol messages using a sort of plugin architecture: currently, a package can use DynamicAny even for types that package doesn't depend on or know about. Such a package could use reflection to implement some generic operation on common fields or nested messages.

awalterschulze · 2017-01-18T11:28:48Z

I assume the name is the same name that is generated as a parameter to proto.RegisterType. This means that this could also be generated as a method on each type.
So then we chould have

switch ptypes.AnyMessageName(a) {
case (somepb.SomeProtoExtension{}).Name():
  m := new(somepb.SomeProtoExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

case (otherpb.SomeOtherExtension{}).Name():
  m := new(otherpb.SomeOtherExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

default:
  …
}

This is certainly not as simple as the type switch

awalterschulze · 2017-01-18T11:30:52Z

I don't really understand how the plugin usecase would work exactly, but I can understand the need for it, since I am pretty sure I have needed this in the past. I planned to do this with extensions instead of any, but go was not supporting plugins yet.

awalterschulze · 2017-01-18T11:32:11Z

Trying to unmarshal each message in succession is also not an option, its slow and error prone, since protos are so backwards and forwards compatible.

awalterschulze · 2017-09-02T09:12:37Z

We now have a resolver for Any #410 , which means there is one less reason to Register in the first place :)

dsnet · 2017-12-06T02:27:53Z

I'm in the process of thinking over the Go API for protobufs. I'm noting this here for future reference. A mechanism for registration makes the assumption that there is a one-to-one mapping between Go types and protobuf message types.

However, my observation is that this not always true. The following proto.Message types exist. Some internally to Google, and some externally. But the general idea of each exists in both places in some form:

There are generated protos, for which we commonly assume a one-to-one relation between Go types and protobuf message types.
There are raw message types (similar to json.RawMessage), which are effectively []byte containing the encoded the form of a proto message. By their very nature, they are a single Go type that can represent every possible protobuf message type.
There are also dynamic message types, which are proto.Message implementations that can dynamically represent any proto message type at runtime based on some proto description generated by the user. As such, they are also a single Go type that can represent every possible protobuf message type.
The existence of the previous three implies that there are more than one Go type that can represent a protobuf message type.

This seems to be evidence for a n-to-m relationship between Go types and protobuf message types, further indicating that registration needs to be rethought. The discussion that has occured thus far in this issue is also valuable.

I have absolutely not fully thought through what would happen if we removed registration.

awalterschulze · 2017-12-07T11:36:47Z

If I were to redo protobufs for go, which I would love to do, if I was given the time to do it.
I would love to try and think if I could remove all dependencies on the proto library.
If I couldn't do that I would try to inject Unmarshal with the Any and Extensions that could be used in unmarshaling and eliminate reasons for registering.

rogeralsing · 2017-12-07T11:42:43Z

Proto.Actor https://github.com/AsynkronIT/protoactor-go is heavily dependent on registered types.
We do not use 'Any' as that is too verbose for us, instead we do smart batching of messages and optimize our own lookup of what types are used in each "batch".
https://github.com/AsynkronIT/protoactor-go/blob/dev/remote/endpoint_writer.go#L111

e.g. if there are 50 messages of the same type being sent, the type name will only occur once in the batch lookup.

Redesigning the lookup / type information too much would break our project.
(and in extension Oracle Fn Project)

HakShak · 2018-03-24T20:32:29Z

I think that we have run into this, but not 100% sure it's relevant. In our case, we are wrapping concrete events within a generic event via Any. We tried to use DynamicAny, but we are missing something. Here's both:

package util

import (
	"reflect"

	"github.com/golang/protobuf/proto"
	"github.com/golang/protobuf/ptypes"
	"github.com/golang/protobuf/ptypes/any"
)

// UnmarshalPayload uses the internal proto.protoTypes registry to instance a concrete type
func UnmarshalPayload(payload *any.Any) (interface{}, error) {
	// Get the message name which is compatible with the golang proto.protoType registry
	payloadMessageName, err := ptypes.AnyMessageName(payload)
	if err != nil {
		return nil, err
	}

	// Get the type from the registry
	payloadMessageType := proto.MessageType(payloadMessageName)

	// .Elem() gives us the value of the struct type instead of a pointer
	// In other words: thing := MyStruct{}, without Elem() it would be: var thing *MyStruct
	// .Interface() is just so we can type assert when passing into UnmarshalAny
	payloadInstance := reflect.New(payloadMessageType.Elem()).Interface()

	err = ptypes.UnmarshalAny(payload, payloadInstance.(proto.Message))
	if err != nil {
		return nil, err
	}

	return payloadInstance, nil
}

// This should be what we use, but seems broken with errors like:
// mismatched message type: got "monolith.Event" want ""
func unmarshalPayload(payload *any.Any) (*ptypes.DynamicAny, error) {
	var result ptypes.DynamicAny
	err := ptypes.UnmarshalAny(payload, result)
	if err != nil {
		return nil, err
	}

	return &result, nil
}

jhump · 2018-03-26T20:23:56Z

#567 was closed as a duplicate of this issue. But the conversation up to this point seems to focus more on message registration, like for use with de-serializing Any messages. There is, of course, more to the existing registration than just messages: enum values, extensions, and files.

In particular, #567 described the problems/fragility inherent with registering files solely by name (FileDescriptorProto.Name to be more specific).

(I'm adding this comment to make sure that addressing the brittle file registration mechanism is included in whatever improvements come out of this issue.)

ejona86 · 2019-08-08T20:39:43Z

We discussed the .proto file name piece @jhump brought up in grpc/grpc#19832. The long and short of it is that protos should have a canonical name, otherwise things break in general. protocolbuffers/protobuf#6492 is tracking the issue in protobuf to make the situation more clear to users and to potentially detect when things are awry. It also references an issue where C++ breaks when imports are non-canonical.

dsnet · 2019-08-08T20:42:35Z

@ejona86 I agree. The v2 implementation will be moving towards a world where the set of protobuf declarations (i.e., enums, messages, extensions, etc) linked into a binary must be unique. The current situation where half the register functions fail loudly, and the other half silently fail is not okay.

jhump · 2019-08-08T23:56:28Z

I guess this is not a problem in Java because the descriptors are loaded (e.g. cross-linked with their dependencies) in static initializers, so if there is a problem linking descriptors, the message classes cannot even be loaded. Maven and gradle plugins for protos also make this much less error-prone than how other repos (such as Go repos) tend to handle managing their proto dependencies.

So I guess this will largely be "fixed" by v2 of the go proto APIs since it looks like they also construct and link descriptors during an init() function. Basically, the runtime will perform the necessary validation during initialization (so things like server reflection will be correct because invalid registrations would have caused the program to fail to start).

However, be prepared for the blast radius once v2 is released/GA: there's a good chance it will be a source of new issues filed and an inhibitor to projects upgrading to v2 because they may have dependencies that have unlinkable descriptors. They've basically been able to "get away with it" up to now because there was nothing in package initialization that was checking the validity of the compiled descriptors.

dsnet · 2019-08-09T02:46:52Z

However, be prepared for the blast radius once v2 is released/GA: there's a good chance it will be a source of new issues filed and an inhibitor to projects upgrading to v2 because they may have dependencies that have unlinkable descriptors.

I agree. For that reason, we'll be easing users into it, but the end goal is to enforce strict unique-ness in the protobuf namespace.

Today (in v1), the behavior is that two of the register functions panic on conflicts, two of them log warnings to stderr on conflicts, and two of them silently allow the conflict. This situation is inconsistent and confusing.

For the initial release of v2, all registration functions will log warnings to stderr on any conflicts. The warning will 1) warn users that a future release will panic on conflicts, and 2) point users to documentation on how to resolve it.

For a later minor release of v2, all registration functions will panic on any conflicts. We may provide an opt-in to go back to the behavior of making them warnings, but that opt-out will not be around forever.

dsnet · 2020-03-03T05:21:57Z

While developing the new protobuf implementation, we considered removing global registration entirely (the Java approach), but decided that it wasn't feasible if we want to a straightforward way for existing code to migrate. As such, we are double-downing on the concept of a global registry. However, almost every top-level function that consults the global registry can have a custom resolver passed in. The v1.20.0 release implements these details.

bcmills added the question label Jan 16, 2017

awalterschulze mentioned this issue Sep 2, 2017

support custom URL resolver when marshaling/unmarshaling Any messages to/from JSON #410

Merged

bcmills mentioned this issue Sep 8, 2017

proposal: reflect: add GetTypeByName function golang/go#21754

Closed

jellevandenhooff mentioned this issue Feb 21, 2018

compatibility with golang/protobuf gogo/protobuf#386

Open

dsnet changed the title ~~Why Register in the first place?~~ proto: remove or improve registration Mar 9, 2018

dsnet added enhancement and removed question labels Mar 9, 2018

dsnet mentioned this issue Mar 14, 2018

proto: panic: duplicate enum registered: google.protobuf.FieldDescriptorProto_Type #178

Closed

johanbrandhorst mentioned this issue Mar 23, 2018

status: Allow external packages to produce status-compatible errors grpc/grpc-go#1927

Merged

dsnet mentioned this issue Mar 26, 2018

File descriptor registration is very error-prone #567

Closed

dsnet mentioned this issue Mar 30, 2018

proto: ensure consistency of behavior in Register* functions #575

Closed

dsnet mentioned this issue May 31, 2018

proto: support looking up by package #623

Closed

dsnet added the api-v2 label Aug 2, 2018

dsnet mentioned this issue Aug 20, 2018

proto: should not print to fmt or log #667

Closed

dsnet added pending-v2 and removed enhancement labels Jul 10, 2019

dsnet added this to the v2 release milestone Aug 21, 2019

muvaf mentioned this issue Dec 10, 2019

Importing kustomize causes duplicate proto registration errors kubernetes-sigs/kustomize#1915

Closed

dsnet closed this as completed Mar 3, 2020

golang locked as resolved and limited conversation to collaborators Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proto: remove or improve registration #268

proto: remove or improve registration #268

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

bcmills commented Dec 19, 2016

bcmills commented Dec 19, 2016

awalterschulze commented Dec 26, 2016

bcmills commented Jan 10, 2017

rogeralsing commented Jan 16, 2017

awalterschulze commented Jan 17, 2017

bcmills commented Jan 17, 2017 •

edited

Loading

awalterschulze commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

awalterschulze commented Sep 2, 2017

dsnet commented Dec 6, 2017

awalterschulze commented Dec 7, 2017

rogeralsing commented Dec 7, 2017 •

edited

Loading

HakShak commented Mar 24, 2018

jhump commented Mar 26, 2018 •

edited

Loading

ejona86 commented Aug 8, 2019

dsnet commented Aug 8, 2019

jhump commented Aug 8, 2019

dsnet commented Aug 9, 2019

dsnet commented Mar 3, 2020

proto: remove or improve registration #268

proto: remove or improve registration #268

Comments

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

awalterschulze commented Dec 17, 2016

bcmills commented Dec 19, 2016

bcmills commented Dec 19, 2016

awalterschulze commented Dec 26, 2016

bcmills commented Jan 10, 2017

rogeralsing commented Jan 16, 2017

awalterschulze commented Jan 17, 2017

bcmills commented Jan 17, 2017 • edited Loading

awalterschulze commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

awalterschulze commented Sep 2, 2017

dsnet commented Dec 6, 2017

awalterschulze commented Dec 7, 2017

rogeralsing commented Dec 7, 2017 • edited Loading

HakShak commented Mar 24, 2018

jhump commented Mar 26, 2018 • edited Loading

ejona86 commented Aug 8, 2019

dsnet commented Aug 8, 2019

jhump commented Aug 8, 2019

dsnet commented Aug 9, 2019

dsnet commented Mar 3, 2020

bcmills commented Jan 17, 2017 •

edited

Loading

rogeralsing commented Dec 7, 2017 •

edited

Loading

jhump commented Mar 26, 2018 •

edited

Loading