Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptor registration is very error-prone #567

Closed
jhump opened this issue Mar 22, 2018 · 1 comment
Closed

File descriptor registration is very error-prone #567

jhump opened this issue Mar 22, 2018 · 1 comment

Comments

@jhump
Copy link
Contributor

jhump commented Mar 22, 2018

File descriptors are registered by name:

https://github.com/golang/protobuf/blob/master/ptypes/any/any.pb.go#L162

func init() { proto.RegisterFile("google/protobuf/any.proto", fileDescriptor0) }

The name that is used is the one given to protoc, including any path provided to protoc, when the generated code was created.

Reflection tools (like protoreflect and service reflection) rely on this nominal registration to gather the transitive dependencies for a file in order to construct a fully linked data structure.

This breaks down if the name given to protoc when a file is compiled differs from how that same file is referenced by import statements in other files.

This happens to be the root cause for #298, too. (And also for fullstorydev/grpcurl#22.)

In addition to failures in reflection operations, there is another issue this leads to: accidental conflicts. It is currently possible for two files to be named foo.proto, but live in different Go packages. However, if they were both referenced as merely foo.proto in the protoc command-line and are both linked into the same Go program or test, they will not both be registered correctly. Package init will not panic either -- the last one to be registered will win. This can lead to confusing behavior as trying to reflect into one of the file's contents will return the completely wrong information. (In the short term, perhaps proto.RegisterFile should panic on duplicate, like proto.RegisterType does?)


Example

This example is taken from the bug report for fullstorydev/grpcurl#22.

Consider the file location.proto, compiled into location.pb.go, which is registered with the name "location.proto":
https://github.com/jamisonhyatt/grpc-multi-pkg-protos/blob/d0a70c3f9f1b30322386bd768353bd22242d53c0/pkg/external/location/location.pb.go#L227

Elsewhere, a file named mobile_svc.proto tries to import it. But the import path given to protoc differs from when location.proto itself was compiled. So to make it work, the import statement in that other file looks like so:

import "location/location.proto";

https://github.com/jamisonhyatt/grpc-multi-pkg-protos/blob/d0a70c3f9f1b30322386bd768353bd22242d53c0/protos/mobile_svc/mobile_svc.proto#L6

All of the code is correctly generated and seems to work just fine. However, when reflection tools are trying to gather the dependencies for mobile_svc.proto, they will encounter an error because they will be unable to load location/location.proto using proto.FileDescriptor because it was registered as just location.proto.


Less error-prone linking

First, observe that there is already an implicit constraint that, even if files from different directories map to the same Go package (using go_path option), they can't both have the same file name. For example, if foo/bar/baz.proto and frobnitz/baz.proto both indicated github.com/foo/bar as their Go package, they would clobber each other as protoc-gen-go would try to generate github.com/foo/bar/baz.pb.go for both.

This is actually true in other languages supported by protoc, too, such as Java. If two files with the same name but in different locations map to the same Java package, their generated code will conflict.

So that means within a Go package we can rely on the distinctness of only the proto file's base name. That means that, theoretically, we should not have to care about the path used to refer to it, either in the protoc invocation or in an import statement.

This would enable code generation to register files using their base name and their Go package path, ignoring other path information that may be present in FileDescriptorProto.Name. This registration would record not only the file descriptor but also the "names" (Go package path plus base file name) of imported files, since inspecting FileDescriptorProto.Dependencies no longer suffices: you must know the target file's Go path, too. So once you had a reference to one file, you could reliably crawl that file's entire dependency graph. The existing Descriptor methods for enums and messages would also need to provide this extra information about imports.

This would also fix the accidental collision issue: since the two files necessarily have different Go paths, they cannot collide with this scheme.

The "entry points", for attaining a descriptor, will mostly be via messages and enums. But this suggestion does make entry via filename harder: you don't just have to know the proto's file name to ask for it but also its Go package path. This may not necessarily be a bad thing. But it does beg (IMO) for another entry point that is file-based. (Heck, service-based descriptor access, from the grpc plugin for protoc-gen-go, would be great, too.)

What I mean by a file-based entry point is an exported variable or function for acquiring a file descriptor. If we transform the proto file's base name into a Go symbol, we could access the descriptor (and name information for its imports) via an exported package function:

// for foo.proto, in package bar
fileDescInfo := bar.FileDescriptor_foo_proto()

Even better would be if this returned not just the descriptor and "names" of the imports, but the entire transitive dependency graph:

type FileDescriptorInfo struct {
  // Bytes for the file descriptor
  FileDesc     []byte
  // Map of imported file name to corresponding descriptor info
  Dependencies map[string]FileDescriptorInfo
}

// Example generated code for foo.proto
var fileDescriptor_foo_proto FileDescriptorInfo

func init() {
  fileDescriptor_foo_proto = FileDescriptorInfo{
    FileDesc:     fileDescriptor0, // existing descriptor bytes var
    Dependencies: map[string]FileDescriptorInfo{
      "some/path/bar.proto":       somepath.FileDescriptor_bar_proto(),
      "some/other/path/baz.proto": otherpath.FileDescriptor_baz_proto(),
    },
  }
}

func FileDescriptor_foo_proto() FileDescriptorInfo {
  return fileDescriptor_foo_proto
}

That way, we have descriptor information that is effectively verified by the compiler and linker (due to the calls to FileDescriptor_* during package init). So if the generated code compiles, reflection is guaranteed to work. (FWIW, this is more like how Java descriptor access works.)

@dsnet
Copy link
Member

dsnet commented Mar 26, 2018

Thank you for this excellent write up. I'm going to close this as a duplicate of #268, which targets a wholesale re-thinking of registration. Your report here is very helpful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants