Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto: remove or improve registration #268

Closed
awalterschulze opened this issue Dec 17, 2016 · 26 comments
Closed

proto: remove or improve registration #268

awalterschulze opened this issue Dec 17, 2016 · 26 comments

Comments

@awalterschulze
Copy link

We started a discussion here
cf10ca0#commitcomment-20160444

awalterschulze referenced this issue Dec 17, 2016
This provides a more reasonable API for obtaining a FileDescriptorProto and
DescriptorProto for a given proto.Message — a process that is currently possible
(but undocumented) using the public proto API.

The major use case for obtaining a DescriptorProto is to decode custom message
options, which are encoded as extensions on the MessageOptions submessage.
(See https://developers.google.com/protocol-buffers/docs/proto#customoptions.)

Fixes #179.
PiperOrigin-RevId: 139356528
@awalterschulze
Copy link
Author

I think Registering is breaking people's setups and I am wondering why this is needed in the first place

@awalterschulze
Copy link
Author

@bcmills says

proto.MessageType and proto.MessageName require it. So does extension support, and ptypes.DynamicAny, and there may be other cases I'm not thinking of at the moment.

@awalterschulze
Copy link
Author

Also to be concrete here one place it is breaking people's setups
#178 (comment)

@awalterschulze
Copy link
Author

Register fixes #179 only because ExtensionsMap was removed, which was another controversial move.

@awalterschulze
Copy link
Author

proto.MessageType and proto.MessageName are nice features, but not really necessary.
Extension support: ok yes maybe, but sacrificing some safety can also be used to get rid of Register.
How is register needed for DynamicAny?

@bcmills
Copy link
Contributor

bcmills commented Dec 19, 2016

Register fixes #179 only because ExtensionsMap was removed,

Register has ~nothing to do with #179. #179 requires having a copy of the generated Descriptor proto package; the only impact that registration has on that is to require the generated Descriptor package to be the same one that other uses of Descriptor depend on. (But that's true in general anyway: within a given binary, there should be one and only one copy of each package needed. To do otherwise leads to bloated binaries.)

@bcmills
Copy link
Contributor

bcmills commented Dec 19, 2016

proto.MessageType and proto.MessageName are nice features, but not really necessary.

This is a Chesterton's Fence. If you want to remove these methods, you should start by understanding what purpose they serve (or served). "Not really necessary" merits further explanation.

Extension support: ok yes maybe, but sacrificing some safety can also be used to get rid of Register.

Extensions are already error-prone; sacrificing safety is unlikely to help most users. But can you be more specific about the tradeoff you're proposing? What kinds of errors would it fail to detect?

How is register needed for DynamicAny?

DynamicAny requires the ability to map from a fully-qualified message name to a Go type. That is, it requires proto.MessageType.

@awalterschulze
Copy link
Author

How does having one copy of the generated Descriptor work with go vendoring?

My statement about not really necessary was a little strong, sorry. Thank you for explaining DynamicAny's usage of of proto.MessageType.

I have not really thought it through, but don't you think there could be a way to do Extensions without Register, since the user is already passing in the E_generatedextensionstruct

Ok I didn't know about the implementation of DynamicAny using proto.MessageType. Another way might be doing something similar to how extensions work. Ask the user to pass in the correct type to unmarshal Any into. But that will probably be a big change.

@bcmills
Copy link
Contributor

bcmills commented Jan 10, 2017

How does having one copy of the generated Descriptor work with go vendoring?

I honestly don't know. The proto package was designed well before vendoring support and in the context of predominantly "monorepo" usage, and I personally haven't had much exposure to Go vendoring tools.

I have not really thought it through, but don't you think there could be a way to do Extensions without Register, since the user is already passing in the E_generatedextensionstruct

Yeah, it does seem possible to do extensions without Register. Thinking about it some more it's likely more of an issue for the (Google-internal) "weak" field feature (the one corresponding to FieldOptions.weak in the descriptor), which is a precursor to extensions. So it might be possible to remove registration from the open-source version of extensions.


That still leaves the problem of DynamicAny, though. Given AnyMessageName and UnmarshalAny, it's not obvious to me whether DynamicAny is actually all that useful, but removing it would definitely not be backward-compatible.

@rogeralsing
Copy link

I'm not sure I get the context here, but using Any types are not supported in all languages, so having the ability to simulate the same behavior using an envelope message with typename and messagedata, you still need to peek into the registered types if doing cross platform integration.

@awalterschulze
Copy link
Author

Thanks for your really cool response @bcmills
I really appreciate it :)

I wonder whether:

  • MarshalAny could have taken the messageName as input.
  • UnmarshalAny could have taken in a proto.Message not of type DynamicAny and not do any name checking.

Then it doesn't seem like Any needs to call "home" to the proto package

@bcmills
Copy link
Contributor

bcmills commented Jan 17, 2017

I wonder whether:

  • MarshalAny could have taken the messageName as input.

Possible but error-prone. The message name is proto-path-qualified, so it isn't derivable from the Go name for the type; without registration there's no easy way to detect typo'd names.

However, MarshalAny would be easy to implement (with its current signature) without registration. Generated message types already include a method to access the file and message descriptors, which contain the path and name information. So MarshalAny is (fortunately) not a problem.

  • UnmarshalAny could have taken in a proto.Message not of type DynamicAny and not do any name checking.

It already supports that usage, and the name checking could be implemented without registration using the same technique as for MarshalAny.


The major use-case that DynamicAny addresses is for programs that use type-switches. With DynamicAny, one can write this code:

var ext ptypes.DynamicAny
if err := ptypes.UnmarshalAny(a, &ext); err := nil {
  …
}
switch m := ext.Message.(type) {
case *somepb.SomeProtoExtension:
  …
case *otherpb.SomeOtherExtension:
  …
default:
  …
}

and the compilation process will ensure that the extensions are valid types and the (implict) names are correct. Without DynamicAny, the function becomes:

switch ptypes.AnyMessageName(a) {
case "some.proto.path.SomeProtoExtension":
  m := new(somepb.SomeProtoExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

case "other.proto.path.SomeOtherExtension":
  m := new(otherpb.SomeOtherExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

default:
  …
}

and now we've lost the compile-time checking on the paths: we'll have to rely on test coverage to avoid typos in the strings. (We've also introduced a bunch of redundancy in the user code: the message types already know their paths, so why should the user repeat them?)

Or, we could avoid DynamicAny by trying each possibility in sequence:

var (
  someExt somepb.SomeProtoExtension
  otherExt otherpb.SomeOtherExtension
)
switch {
case ptypes.UnmarshalAny(a, &someExt) == nil:
  …

case ptypes.UnmarshalAny(a, &otherExt) == nil:
  …

default:
  …
}

but that's O(N) for what ought to be an O(1) switch and also requires fairly sophisticated compiler optimizations to avoid wasting memory. (In my experience, one should not expect such optimizations from the gc compiler.)

There is an equivalent (and somewhat more subtle) use-case for programs which use reflection to process protocol messages using a sort of plugin architecture: currently, a package can use DynamicAny even for types that package doesn't depend on or know about. Such a package could use reflection to implement some generic operation on common fields or nested messages.

@awalterschulze
Copy link
Author

I assume the name is the same name that is generated as a parameter to proto.RegisterType. This means that this could also be generated as a method on each type.
So then we chould have

switch ptypes.AnyMessageName(a) {
case (somepb.SomeProtoExtension{}).Name():
  m := new(somepb.SomeProtoExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

case (otherpb.SomeOtherExtension{}).Name():
  m := new(otherpb.SomeOtherExtension)
  if err := ptypes.UnmarshalAny(a, m); err != nil {
    …
  }
  …

default:
  …
}

This is certainly not as simple as the type switch

@awalterschulze
Copy link
Author

I don't really understand how the plugin usecase would work exactly, but I can understand the need for it, since I am pretty sure I have needed this in the past. I planned to do this with extensions instead of any, but go was not supporting plugins yet.

@awalterschulze
Copy link
Author

Trying to unmarshal each message in succession is also not an option, its slow and error prone, since protos are so backwards and forwards compatible.

@awalterschulze
Copy link
Author

We now have a resolver for Any #410 , which means there is one less reason to Register in the first place :)

@dsnet
Copy link
Member

dsnet commented Dec 6, 2017

I'm in the process of thinking over the Go API for protobufs. I'm noting this here for future reference. A mechanism for registration makes the assumption that there is a one-to-one mapping between Go types and protobuf message types.

However, my observation is that this not always true. The following proto.Message types exist. Some internally to Google, and some externally. But the general idea of each exists in both places in some form:

  • There are generated protos, for which we commonly assume a one-to-one relation between Go types and protobuf message types.
  • There are raw message types (similar to json.RawMessage), which are effectively []byte containing the encoded the form of a proto message. By their very nature, they are a single Go type that can represent every possible protobuf message type.
  • There are also dynamic message types, which are proto.Message implementations that can dynamically represent any proto message type at runtime based on some proto description generated by the user. As such, they are also a single Go type that can represent every possible protobuf message type.
  • The existence of the previous three implies that there are more than one Go type that can represent a protobuf message type.

This seems to be evidence for a n-to-m relationship between Go types and protobuf message types, further indicating that registration needs to be rethought. The discussion that has occured thus far in this issue is also valuable.

I have absolutely not fully thought through what would happen if we removed registration.

@awalterschulze
Copy link
Author

If I were to redo protobufs for go, which I would love to do, if I was given the time to do it.
I would love to try and think if I could remove all dependencies on the proto library.
If I couldn't do that I would try to inject Unmarshal with the Any and Extensions that could be used in unmarshaling and eliminate reasons for registering.

@rogeralsing
Copy link

rogeralsing commented Dec 7, 2017

Proto.Actor https://github.com/AsynkronIT/protoactor-go is heavily dependent on registered types.
We do not use 'Any' as that is too verbose for us, instead we do smart batching of messages and optimize our own lookup of what types are used in each "batch".
https://github.com/AsynkronIT/protoactor-go/blob/dev/remote/endpoint_writer.go#L111

e.g. if there are 50 messages of the same type being sent, the type name will only occur once in the batch lookup.

Redesigning the lookup / type information too much would break our project.
(and in extension Oracle Fn Project)

@HakShak
Copy link

HakShak commented Mar 24, 2018

I think that we have run into this, but not 100% sure it's relevant. In our case, we are wrapping concrete events within a generic event via Any. We tried to use DynamicAny, but we are missing something. Here's both:

package util

import (
	"reflect"

	"github.com/golang/protobuf/proto"
	"github.com/golang/protobuf/ptypes"
	"github.com/golang/protobuf/ptypes/any"
)

// UnmarshalPayload uses the internal proto.protoTypes registry to instance a concrete type
func UnmarshalPayload(payload *any.Any) (interface{}, error) {
	// Get the message name which is compatible with the golang proto.protoType registry
	payloadMessageName, err := ptypes.AnyMessageName(payload)
	if err != nil {
		return nil, err
	}

	// Get the type from the registry
	payloadMessageType := proto.MessageType(payloadMessageName)

	// .Elem() gives us the value of the struct type instead of a pointer
	// In other words: thing := MyStruct{}, without Elem() it would be: var thing *MyStruct
	// .Interface() is just so we can type assert when passing into UnmarshalAny
	payloadInstance := reflect.New(payloadMessageType.Elem()).Interface()

	err = ptypes.UnmarshalAny(payload, payloadInstance.(proto.Message))
	if err != nil {
		return nil, err
	}

	return payloadInstance, nil
}

// This should be what we use, but seems broken with errors like:
// mismatched message type: got "monolith.Event" want ""
func unmarshalPayload(payload *any.Any) (*ptypes.DynamicAny, error) {
	var result ptypes.DynamicAny
	err := ptypes.UnmarshalAny(payload, result)
	if err != nil {
		return nil, err
	}

	return &result, nil
}

@jhump
Copy link
Contributor

jhump commented Mar 26, 2018

#567 was closed as a duplicate of this issue. But the conversation up to this point seems to focus more on message registration, like for use with de-serializing Any messages. There is, of course, more to the existing registration than just messages: enum values, extensions, and files.

In particular, #567 described the problems/fragility inherent with registering files solely by name (FileDescriptorProto.Name to be more specific).

(I'm adding this comment to make sure that addressing the brittle file registration mechanism is included in whatever improvements come out of this issue.)

@ejona86
Copy link

ejona86 commented Aug 8, 2019

We discussed the .proto file name piece @jhump brought up in grpc/grpc#19832. The long and short of it is that protos should have a canonical name, otherwise things break in general. protocolbuffers/protobuf#6492 is tracking the issue in protobuf to make the situation more clear to users and to potentially detect when things are awry. It also references an issue where C++ breaks when imports are non-canonical.

@dsnet
Copy link
Member

dsnet commented Aug 8, 2019

@ejona86 I agree. The v2 implementation will be moving towards a world where the set of protobuf declarations (i.e., enums, messages, extensions, etc) linked into a binary must be unique. The current situation where half the register functions fail loudly, and the other half silently fail is not okay.

@jhump
Copy link
Contributor

jhump commented Aug 8, 2019

I guess this is not a problem in Java because the descriptors are loaded (e.g. cross-linked with their dependencies) in static initializers, so if there is a problem linking descriptors, the message classes cannot even be loaded. Maven and gradle plugins for protos also make this much less error-prone than how other repos (such as Go repos) tend to handle managing their proto dependencies.

So I guess this will largely be "fixed" by v2 of the go proto APIs since it looks like they also construct and link descriptors during an init() function. Basically, the runtime will perform the necessary validation during initialization (so things like server reflection will be correct because invalid registrations would have caused the program to fail to start).

However, be prepared for the blast radius once v2 is released/GA: there's a good chance it will be a source of new issues filed and an inhibitor to projects upgrading to v2 because they may have dependencies that have unlinkable descriptors. They've basically been able to "get away with it" up to now because there was nothing in package initialization that was checking the validity of the compiled descriptors.

@dsnet
Copy link
Member

dsnet commented Aug 9, 2019

However, be prepared for the blast radius once v2 is released/GA: there's a good chance it will be a source of new issues filed and an inhibitor to projects upgrading to v2 because they may have dependencies that have unlinkable descriptors.

I agree. For that reason, we'll be easing users into it, but the end goal is to enforce strict unique-ness in the protobuf namespace.

Today (in v1), the behavior is that two of the register functions panic on conflicts, two of them log warnings to stderr on conflicts, and two of them silently allow the conflict. This situation is inconsistent and confusing.

For the initial release of v2, all registration functions will log warnings to stderr on any conflicts. The warning will 1) warn users that a future release will panic on conflicts, and 2) point users to documentation on how to resolve it.

For a later minor release of v2, all registration functions will panic on any conflicts. We may provide an opt-in to go back to the behavior of making them warnings, but that opt-out will not be around forever.

@dsnet
Copy link
Member

dsnet commented Mar 3, 2020

While developing the new protobuf implementation, we considered removing global registration entirely (the Java approach), but decided that it wasn't feasible if we want to a straightforward way for existing code to migrate. As such, we are double-downing on the concept of a global registry. However, almost every top-level function that consults the global registry can have a custom resolver passed in. The v1.20.0 release implements these details.

@dsnet dsnet closed this as completed Mar 3, 2020
@golang golang locked as resolved and limited conversation to collaborators Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants