Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python_out files seem to forget about --proto_path #4614

Closed
bwo opened this issue May 11, 2018 · 11 comments
Closed

python_out files seem to forget about --proto_path #4614

bwo opened this issue May 11, 2018 · 11 comments

Comments

@bwo
Copy link

bwo commented May 11, 2018

Here's what I mean.

suppose you have this directory structure:

(tmp) ~/tmp 12:53:53 $ tree .
.
├── x
│   └── pb
│       ├── setup.cfg
│       ├── setup.py
│       └── tmp
│           ├── __init__.py
│           └── x
│               ├── __init__.py
│               └── pb
│                   ├── __init__.py
│                   ├── x_pb2.py
│                   └── x.proto
└── y
    └── pb
        ├── setup.cfg
        ├── setup.py
        └── tmp
            ├── __init__.py
            └── y
                ├── __init__.py
                └── pb
                    ├── __init__.py
                    ├── y_pb2.py
                    └── y.proto

10 directories, 14 files

The contents of the .proto files are as follows:

(tmp) ~/tmp 12:54:50 $ cat x/pb/tmp/x/pb/x.proto
syntax = "proto3";
package tmp.x.pb;

message Thing {
    string t = 1;
}
(tmp) ~/tmp 12:55:26 $ cat y/pb/tmp/y/pb/y.proto
syntax = "proto3";
package tmp.y.pb;
import "tmp/x/pb/x.proto";

message Thing2 {
    tmp.x.pb.Thing t = 1;
}
(tmp) Ben-Wolfson-Tower ~/tmp 12:55:34 $

The reason that y.proto imports x.proto with the path "tmp/x/pb/x.proto" is that when both of these packages are installed, x_pb2 will be importable as tmp.x.pb.x_pb2, and (to my astonishment) the only way to get protoc to generate such an import path in python is to import it with that path in the proto file. This means that we generate the *_pb2.py files with some --proto_path shenanigans:

(tmp) ~/tmp 12:55:34 $ cd x
(tmp) ~/tmp/x 12:57:54 $ find . -name '*.proto' | xargs protoc --python_out=. --proto_path=.
(tmp) ~/tmp/x 12:58:01 $ cd ../y
(tmp) ~/tmp/y 12:58:04 $ find . -name '*.proto' | xargs protoc --python_out=. --proto_path=$HOME/tmp/x/pb --proto_path=.

This works, in the sense that protoc doesn't complain about anything, it finds all the necessary files, and it generates Python code. But in another, more accurate, sense, it does not work:

(tmp) Ben-Wolfson-Tower ~/tmp/y 12:58:09 $ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tmp.x.pb.x_pb2
>>> import tmp.y.pb.y_pb2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benwolfson/tmp/y/pb/tmp/y/pb/y_pb2.py", line 25, in <module>
    dependencies=[tmp_dot_x_dot_pb_dot_x__pb2.DESCRIPTOR,])
  File "/usr/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 829, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "pb/tmp/y/pb/y.proto":
  pb/tmp/y/pb/y.proto: Import "tmp/x/pb/x.proto" has not been loaded.
  tmp.y.pb.Thing2.t: "tmp.x.pb.Thing" seems to be defined in "pb/tmp/x/pb/x.proto", which is not imported by "pb/tmp/y/pb/y.proto".  To use it here, please add the necessary import.

>>>

That is, even though the protoc compiler made use of the --proto_path options in order to find and generate its output, the output it generated is not aware of the paths provided: it's precisely because pb/tmp/y/pb/y.proto doesn't import pb/tmp/x/pb/x.proto, but rather simply tmp/x/pb/x.proto, that we passed the --proto_path option to the compiler!

@TeBoring TeBoring added this to To Do in Weekly Fixit via automation May 14, 2018
@anandolee
Copy link
Contributor

I think we do not have plan to support relative imports. See #1491 for more discussion

@bwo
Copy link
Author

bwo commented May 21, 2018

I don't see the connection to relative imports. I have only, and only wish to have, absolute imports; looking at that ticket, it seems like prima facie to be a different issue. There isn't a problem with Python finding the module I want to import; IMO the problem is with protoc's codegen.

@bwo
Copy link
Author

bwo commented May 21, 2018

That is, when I run this command:

(tmp) ~/tmp/y 12:58:04 $ find . -name '*.proto' | xargs protoc --python_out=. --proto_path=$HOME/tmp/x/pb --proto_path=.

and protoc encounters the line import "tmp/x/pb/x.proto";, and it looks for the path /tmp/x/pb/x.proto relative to each specified --proto_path, it finds it in exactly one, and presumably can know relative to which such path it found the specified import, namely, it was relative to $HOME/tmp/x/pb. Nevertheless, it writes out to the generated Python file data that (AFAICT, I'm not an expert on the internals here by any means) seems to suggest/require that protoc have found tmp/x/pb/x.proto somewhere else.

The failure happens in this instantiation, in y_pb2.py:

DESCRIPTOR = _descriptor.FileDescriptor(
  name='pb/tmp/y/pb/y.proto',
  package='tmp.y.pb',
  syntax='proto3',
  serialized_pb=_b('\n\x13pb/tmp/y/pb/y.proto\x12\x08tmp.y.pb\x1a\x10tmp/x/pb/x.proto\"$\n\x06Thing2\x12\x1a\n\x01t\x18\x01 \x01(\x0b\x32\x0f.tmp.x.pb.Thingb\x06proto3')
  ,
  dependencies=[tmp_dot_x_dot_pb_dot_x__pb2.DESCRIPTOR,])

By this point, we have already executed the line from tmp.x.pb import x_pb2 as tmp_dot_x_dot_pb_dot_x__pb2 and Python's import machinery has (clearly) found the y_pb2 module. So it doesn't seem, as I said, to be a matter of (Python's) relative imports. It does seem to be a matter of protoc's relative imports, but those do seem to be intended to work.

@anandolee
Copy link
Contributor

anandolee commented Jun 21, 2018

Why not use --python_out for your x.proto to produce x_pb2.py under y dictionary?

If your .proto files are under different path, use --proto_path to find the import files and --python_out to make sure the generated _pb2 files are under same path

Plus we will appreciate if you can change your structure to less complicated graph next time like:

├── x
│   └── pb
│         ├── __init__.py
│         ├── x_pb2.py
│         └── x.proto
└── y
    └── pb
           ├── __init__.py
           ├── y_pb2.py
           └── y.proto

@bwo
Copy link
Author

bwo commented Jun 22, 2018

Well, for one thing:

(tmp) Ben-Wolfson-Tower ~/tmp/x 04:59:41 $ find . -name '*.proto' | xargs protoc --python_out=../y/pb/tmp/y/pb/ --proto_path=.
(tmp) Ben-Wolfson-Tower ~/tmp/x 04:59:42 $ cd ../y
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:00:18 $  find . -name '*.proto' | xargs protoc --python_out=. --proto_path=$HOME/tmp/x/pb --proto_path=.
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:00:38 $ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tmp.y.pb.y_pb2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benwolfson/tmp/y/pb/tmp/y/pb/y_pb2.py", line 16, in <module>
    from tmp.x.pb import x_pb2 as tmp_dot_x_dot_pb_dot_x__pb2
ImportError: cannot import name x_pb2
>>>

(You might just ask "why not have x.proto right alongside y.proto?". I do have reasons that I think are sound for this layout of files, but I also think that it's a bit of a distraction as to why I want to have my proto files arranged as I do.)

And even if it did work, it would require copying the generated file into every directory that might want to refer to it, which I don't think is either practical or good design.

Possibly, of course, you had something else in mind. If so, I would appreciate it if you could tell me what it was that you did have in mind, rather than suggesting that I do something unspecified, and leaving me to discover whether or not the thing I think you might have meant works. The directory structure I've given is not that hard to reproduce, nor is it unrealistic. I can even give you a tarball containing all the files, including the setup.pys, which would enable you to create a virtualenv and install the packages and everything.

@anandolee
Copy link
Contributor

Can you double check what file do you get after :
$ cd $HOME/tmp/y/
$ protoc --python_out=. --proto_path=$HOME/tmp/x/pb --proto_path=. pb/tmp/y/pb.y.proto

This should produce a y_pb2.py under $HOME/tmp/y/ not $HOME/tmp/y/pb/tmp/y

Please understand we always ask users try to minimize how to reproduce the error to help us better understanding what the real issue is.
And we ask why user request a feature to measure if it is widely needed or just for specific project, also try to provide work around if possible.

@bwo
Copy link
Author

bwo commented Jun 23, 2018

(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:42 $ pwd
/home/benwolfson/tmp/y
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:44 $ ls
pb
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:44 $ protoc --python_out=. --proto_path=$HOME/tmp/x/pb --proto_path=. pb/tmp/y/pb/y.proto
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:52 $ ls
pb
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:54 $ ls pb
setup.cfg  setup.py  tmp
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:55 $ ls pb/tmp
__init__.py  y
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:56 $ ls pb/tmp/y
__init__.py  __init__.pyc  pb
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:21:58 $ ls pb/tmp/y/pb
__init__.py  y_pb2.py  y.proto
(tmp) Ben-Wolfson-Tower ~/tmp/y 05:22:00 $

@bwo
Copy link
Author

bwo commented Jun 23, 2018

Anyway, here's a tarball. If you unpack it and source setup.sh in the folder, helpfully named tmp, that it will create, you'll get (assuming you have virtualenvwrapper available) a virtualenv also with the wonderful name tmp, in which the two packages tmp.y.pb and tmp.x.pb are installed.

tmp.tar.gz

@gcarling
Copy link

@bwo @anandolee I am running into this issue as well and am curious if there is any known workaround. I added --proto_path to my compile step due to a folder reorganization and now I'm getting the "has not been loaded." error even though my protos compile fine. Definitely seems like a bug in the compiler if it compiles without error but the code then doesn't run, though I could be misunderstanding how this works.

@bwo
Copy link
Author

bwo commented Feb 21, 2019

We're literally using sed to munge import paths after protoc emits the wrong ones.

@zhangskz
Copy link
Member

zhangskz commented Sep 1, 2022

Closing this to dedupe against #10329 which tracks the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Weekly Fixit
  
To Do
Development

No branches or pull requests

6 participants