Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: use relative imports in generated modules #1491

Closed
little-dude opened this issue May 5, 2016 · 107 comments
Closed

python: use relative imports in generated modules #1491

little-dude opened this issue May 5, 2016 · 107 comments
Assignees
Labels

Comments

@little-dude
Copy link

@little-dude little-dude commented May 5, 2016

I have a package foo that looks like this:

.
├── data
│   ├── a.proto
│   └── b.proto
└── generated
    ├── a_pb2.py
    ├── b_pb2.py
    └── __init__.py
# a.proto
package foo;
# b.proto
import "a.proto";

package foo;

Generate the code: protoc -I ./data --python_out=generated data/a.proto data/b.proto.
Here is the failure:

Python 3.5.1 (default, Mar  3 2016, 09:29:07) 
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from generated import b_pb2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/corentih/repro/generated/b_pb2.py", line 16, in <module>
    import a_pb2
ImportError: No module named 'a_pb2'

This is beacuse the generated code looks like this:

import a_pb2

If the import was relative it would actually work:

from . import a_pb2
@little-dude little-dude changed the title use relative imports in generated modules python: use relative imports in generated modules May 6, 2016
@goldenbull
Copy link

@goldenbull goldenbull commented May 10, 2016

I have the exactly same problem, hope to be fixed

little-dude added a commit to little-dude/protobuf that referenced this issue May 10, 2016
@little-dude
Copy link
Author

@little-dude little-dude commented May 10, 2016

@goldenbull I submitted a fix, let's see if it makes it through. I'm just not sure: are there cases where we don't want relative imports?

@goldenbull
Copy link

@goldenbull goldenbull commented May 11, 2016

@little-dude how about if a_pb2.py is generated into a different folder as b_pb2.py?

@little-dude
Copy link
Author

@little-dude little-dude commented May 11, 2016

Could you provide a small example of what you're thinking about, so that I try it with my change?

@goldenbull
Copy link

@goldenbull goldenbull commented May 11, 2016

.
├── proto
│   ├── a.proto
│   └── b.proto
├── pkg_a
│   ├── a_pb2.py
│   └── __init__.py
└── pkg_b
     ├── b_pb2.py
     └── __init__.py

maybe this is not a good case, I don't have enough knowledge about how protobuf/python/etc. deal with the importing

@little-dude
Copy link
Author

@little-dude little-dude commented May 11, 2016

I don't think this is actually possible because the generated modules follow the hierarchy of the proto files.
However we could imagine that we have the following:

.
└── data
    ├── a.proto
    ├── b.proto
    └── sub
        ├── c.proto
        └── sub
             └── d.proto

with the following:

# a.proto
package foo;
import "b.proto";
import "sub/c.proto";
import "sub/sub/d.proto";

# b.proto
package foo;
import "sub/c.proto";
import "sub/sub/d.proto";

# sub/c.proto
package foo;
import "sub/d.proto";

# sub/sub/d.proto
package foo;

We generate the code with:

protoc -I data -I data/sub -I data/sub/sub --python_out=generated data/a.proto data/b.proto data/sub/c.proto data/sub/sub/d.proto

which generated the following:

.
└── generated
    ├── a_pb2.py
    ├── b_pb2.py
    └── sub
        ├── c_pb2.py
        └── sub
            └── d_pb2.py

But this is a more complex case than what I am trying to fix.

Edit: I'm not even sure this is a valid case but here is the error I'm getting with the master branch (4c6259b):

In [1]: from generated import a_pb2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-f28bccc761b6> in <module>()
----> 1 from generated import a_pb2

/home/corentih/repro/generated/a_pb2.py in <module>()
     14 
     15 
---> 16 import b_pb2 as b__pb2
     17 from sub import c_pb2 as sub_dot_c__pb2
     18 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2

/home/corentih/repro/generated/b_pb2.py in <module>()
     14 
     15 
---> 16 from sub import c_pb2 as sub_dot_c__pb2
     17 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2
     18 

/home/corentih/repro/generated/sub/c_pb2.py in <module>()
     14 
     15 
---> 16 from sub import d_pb2 as sub_dot_d__pb2
     17 
     18 

/home/corentih/repro/generated/sub/sub/d_pb2.py in <module>()
     20   package='foo',
     21   syntax='proto2',
---> 22   serialized_pb=_b('\n\x0fsub/sub/d.proto\x12\x03\x66oo')
     23 )
     24 _sym_db.RegisterFileDescriptor(DESCRIPTOR)

TypeError: __init__() got an unexpected keyword argument 'syntax'

@little-dude
Copy link
Author

@little-dude little-dude commented Jun 1, 2016

@haberman is there any chance for this to be fixed before next release? It's quite limiting for python 3.

@drats
Copy link

@drats drats commented Jun 17, 2016

I have exactly the same problem (using protobuf v3beta3), the generated imports do not conform to PEP 328 (finalized in 2004), restated in the Python docs: https://docs.python.org/3/tutorial/modules.html#intra-package-references - the 12 yr-old specification is enforced in Python3, so generated protobufs are unusable without further modification.

@asgoel
Copy link

@asgoel asgoel commented Jul 15, 2016

any updates on this?

@haberman
Copy link
Contributor

@haberman haberman commented Jul 29, 2016

Yikes, sorry for the slow reply on this.

Wouldn't relative imports break the case that you are importing protos from a different pip package?

For example, the well-known types come from the google-protobuf pip package. If we merge a change to use relative imports, imports of google/protobuf/timestamp.proto (for example) would be broken.

@drats
Copy link

@drats drats commented Aug 8, 2016

@haberman This bug has to do with protos importing protos in the same package, even in the same directory. The compiler converts this into relative imports with defective syntax under Python 3, so the generated code cannot execute at all. I don't see how you can get away without using relative imports in this case. I've had to manually edit the compiler generated pb2.py files to get them to work at all.

@oc243
Copy link

@oc243 oc243 commented Oct 22, 2016

+1 for fixing this bug. It's stopping me migrating from python2 to python3.

@brynmathias
Copy link

@brynmathias brynmathias commented Nov 15, 2016

+1 for fixing as well

@27359794
Copy link

@27359794 27359794 commented Nov 18, 2016

+1, as far as I can tell this completely prevents proto imports in Python 3. Seems extremely worrying that this isn't fixed.

EDIT: this is not quite right, see my comment below.

@ylwu
Copy link

@ylwu ylwu commented Nov 18, 2016

+1 for fixing

@xfxyjwf
Copy link
Contributor

@xfxyjwf xfxyjwf commented Nov 18, 2016

I believe protobuf is working as intended in this case. The python package generated for a .proto file mirrors exactly the relative path of the .proto file itself. For example, if you have a .proto file "proto/a.proto", the generated python code must be "proto/a_pb2.py" and it must be in the "proto" package. In @little-dude 's example, if you want the generated code in the "generated" package, the .proto files themselves must be put in the "generated" directory:

└── generated
    ├── a.proto
    ├── b.proto

with protoc invoked as:

$ protoc --python_out=. generated/a.proto generated/b.proto

This way, the output will have the correct import statements (it will be "import generated.a_pb2" rather than "import a_pb2").

Using relative imports only solves the problem when all generated py code is put in the same directory. That's not the case when you import protos from other projects though (e.g., use protobuf's well-known types). It will likely break more than it fixes.

@haberman
Copy link
Contributor

@haberman haberman commented Nov 19, 2016

I am confused by the claims that this is totally broken in Python 3. We have Python 3 tests (and have for a while) that are passing AFAIK. Why would Python 3 require relative imports?

@27359794
Copy link

@27359794 27359794 commented Nov 21, 2016

The issue that I'm having and that I believe others are having is that the proto code import "foo.proto" compiles into the Python3 code import foo_pb2. However, implicit relative imports were disabled in Python3, so relative imports must be of the form from . import foo_pb2. Manually changing the generated proto code to this form after proto compilation fixes the issue.

There are already multiple existing issues concerning this problem, and it first seems to have been recognised in 2014 (!!!): #90, #762, #881, #957

@27359794
Copy link

@27359794 27359794 commented Nov 25, 2016

I read a bit more about Python 3's import rules and I think I can give a better explanation.

In Python 3 the syntax import foo imports from the interpreter's current working directory, from $PYTHONPATH, or from an installation-dependent default. So if you compile proto/foo.proto to gen/foo_pb2.py, the syntax import foo_pb2 works only if the current working directory is gen/ or if you placed gen/ on your python path.

If you are compiling protos as part of a Python package (which is the case in most non-trivial Python projects), the interpreter's current working directory is the directory of your main module (suppose the directory is mypackage/), and modules in the package must either use fully-qualified absolute imports (e.g. import mypackage.gen.foo_pb2) or relative imports (e.g. from .gen import foo_pb2).

In Python 2, a module inside gen/ could do import foo_pb2 and this would import mypackage.gen.foo_pb2 into its namespace, regardless of the current working directory. This is an implicit relative import.

In Python 3, implicit relative imports don't exist and import foo_pb2 will not find foo_pb2.py, even if the module importing foo_pb2 is inside gen/. This is the issue that people are complaining about in the thread.


The root of this problem seems to be that import "foo.proto"; needs to compile into from <absolute or relative package path> import foo_pb2 when the proto is inside a package, and import foo_pb2 otherwise. Neither syntax will work in both scenarios. The proto compiler ignores the package name in the proto file and only observes the directory structure of the proto files, so if you want the from <path> import foo_pb2 output you need to place your protos in a directory structure mirroring the Python structure. For instance, if you have the following directory structure and you set the proto path to proto_files/ and python_out to mypackage/proto/, the correct import line is generated, but the compiled python is put in the wrong directory.

Pre-compilation:

proto_files/
  mypackage/
    proto/
      foo.proto  # import "mypackage/proto/bar.proto";
      bar.proto
mypackage/
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2
  proto/

Post-compilation:

proto_files/
  mypackage/
    proto/
      foo.proto  # import "mypackage/proto/bar.proto";
      bar.proto
mypackage/
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2`
  proto/
    mypackage/
      proto/
        foo_pb2.py  # from mypackage.proto import bar_pb2 (the import we want! but file should be in ../../)
        bar_pb2.py

This is close to the desired result, but not quite it, because now the absolute reference to the compiled file is mypackage.proto.my_package.proto.foo_pb2 rather than mypackage.proto.foo_pb2.

In this instance you can actually get it to produce the right output by specifying the python output path mypackage/. Here, the compiler detects that it doesn't need to create mypackage/proto because it already exists, and it just plops the generated files in that directory. However, this doesn't play nicely when the project directory structure makes use of symlinks. e.g. if mypackage/proto is a symlink to somewhere else and you actually want to dump the compiled protos there instead.

I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.

@haberman
Copy link
Contributor

@haberman haberman commented Nov 30, 2016

@DanGoldbach Thanks very much for all of the detail. I think a lot of the confusion here has been a result of not fully explaining all of the background and assumptions we are making. The more full description really helps clarify things.

Let me first respond to this:

I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.

Can you be more specific about exactly what fix you are proposing? An example would help.

One thing people seem to want, but that doesn't seem to work in practice, is that a message like this:

package foo.bar;

message M {}

...can be imported like this in Python:

from foo.bar import M

That is a very natural thing to want, but doesn't work out, as I described here: grpc/grpc#2010 (comment)

Overall, your directory structure appears to be more complicated than what we generally do at Google (which is the environment where all this behavior was designed/evolved). At Google we generally have a single directory structure for all source files, including protos. So we would anticipate something more like this:

Pre-compilation:

mypackage/
  foo.proto  # import "mypackage/bar.proto";
  bar.proto
  qux/
    mymodule.py  # import mypackage.foo_pb2

Post-compilation:

mypackage/
  foo.proto  # import "mypackage/proto/bar.proto";
  foo_pb2.py # import mypackage.proto.bar_pb2
  bar.proto
  bar_pb2.py
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2

Because protobuf thinks in terms of this single, flat namespace, that's why we get a little confused when people talk about needing relative imports. I haven't wrapped my head around why this is necessary. Why doesn't the scheme I outlined above work for your use case?

@27359794
Copy link

@27359794 27359794 commented Nov 30, 2016

Thanks, I understand much better now.

Can you be more specific about exactly what fix you are proposing?

I meant that it would be nice if the compiled proto module hierarchy mirrored the package hierarchy specified in the proto source file. As you pointed out in the grpc thread, this isn't feasible right now. Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.

It sounds like protos work best when the generated files compile to the same directory as the source files, as per your example. Our directory structure has a separate build/ directory for generated code which isn't indexed by source control.

/build/  # generated code directory
  proto/
    # compiled protos go here
/python/  # parent directory for python projects
  my_python_pkg/  # root of this python package
    proto -> /build/proto/  # symlink to compiled proto dir
    main.py  # import my_python_pkg.proto.compiled_proto_pb2

We explicitly keep generated and source files separate, so your scheme doesn't suit our current repo layout.

We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal. At Google this isn't an issue because IIRC the entire repo acts like one massive Python package and blaze provides you with the bits of the repo that you need.

I think we'll get around this by either adding the compiled proto directory to our Python path or by writing a build command to manually edit the imports in the generated protos to be package-relative imports.

Hopefully this helps other people reading the thread.

@haberman
Copy link
Contributor

@haberman haberman commented Nov 30, 2016

Cool, glad we're getting closer to understanding the problems.

Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.

I think this would be difficult to do. Right now we guarantee that:

$ protoc --python_out=. foo.proto bar.proto

...is equivalent to:

$ protoc --python_out=. foo.proto
$ protoc --python_out=. bar.proto

This is important because it's what allows the build to be parallelized. At Google we have thousands of .proto files (maybe even tens or hundreds of thousands of files, haven't checked lately) that all need to be compiled for a given build. It's not practical to do one big protoc run for all of them.

It's also not practical to try and ensure that all .proto files with a given (protobuf) package get compiled together. Protobuf doesn't require the file/directory to match the package, so .proto files for package foo could exist literally anywhere in the whole repo. So we have to allow that two different protoc runs will both contain messages for the same package.

So with these constraints we're a bit stuck. It leads to the conclusion that we can't have more than one .proto file put symbols into the same Python module, because the two protoc runs would overwrite the same output file.

We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal.

Usually for this case we would put the generated code for those protos into a package that multiple other packages can use. Isn't that usually the solution when you want to share code?

If you have foo/bar.proto that you want to share across multiple packages, can't you put it in a package such that anyone from any package can import it as foo.bar_pb2?

@27359794
Copy link

@27359794 27359794 commented Nov 30, 2016

I hadn't considered the constraints placed on the proto compiler by Google's scale and parallelism requirements, but that makes sense.

I guess I can compile the protos into their own proto-only package in build/ and then import that package from wherever I need it. I think you still need to add that the parent of that package to the python path.

@hindman
Copy link

@hindman hindman commented Jan 10, 2017

@DanGoldbach Thanks for your example -- it helped me solve a problem.

I think your example work as desired if you run protoc like this:

protoc --python_out . --proto_path proto_files proto_files/mypackage/proto/*.proto

It generates correct import lines and places the _pb2.py files in the correct location.

@Teivaz
Copy link

@Teivaz Teivaz commented Jan 21, 2021

This issue has been active since 2016. Is there any more evidence needed that there is a problem that needs solving?

@wojciechrauk
Copy link

@wojciechrauk wojciechrauk commented Jan 23, 2021

Yes, the problem still exists (using grpcio-tools==1.35.0)

@aggieNick02
Copy link

@aggieNick02 aggieNick02 commented Jan 26, 2021

Count me as frustrated with the issue too. Like @W35170 above, I'm generating both cpp and python code. I only have one proto file, but still am getting bit by this, because I'm using grpc. Playing "fun" "games" with directory structure got me the python code I wanted, but broke my cpp code.

I still don't have my head wrapped around this enough to be confident that something in protobuf needs to be fixed. But the large amount of activity and confusion on this issue means this is way harder than it should be, whether because of a bug or documentation/explanation that isn't coming across well to most.

@fsufitch
Copy link

@fsufitch fsufitch commented Feb 3, 2021

+1 person frustrated with this. This is a case where code built for Python 2 has different behavior in Python 3, causing breakage in some of the simplest use cases. For folks still confused as to what's going on, I am including a breakdown below. I also have a repo that demonstrates the issue and a few workarounds until a fix comes out.

So what's the problem?

TLDR: protoc --python_out=OUTPUT_DIR produces code that is broken at runtime unless OUTPUT_DIR is in sys.path or PYTHONPATH.

The Protobuf compiler plugin for Python was designed for Python 2's import mechanics. Different behavior under
Python 3 is causing trouble. The key problematic generated code is of the shape:

import foo as foo_2

In Python 2, this had two interpretations:

  1. If the current file has a sibling called foo.py, then load it and expose its namespace as the variable foo_2. (Relative import)
  2. Iterate through the paths in sys.path and look for a fitting import path called foo. If it's found, load it and expose its namespace as the variable foo_2. (Absolute import)

In Python 3, the same code does not execute relative imports anymore. It does not check siblings, then fall back on sys.path; instead, it directly refers to
sys.path. Relative imports need to be explicit, using dot notation (e.g. import .foo as foo_2 or from . import foo as foo_2), and don't fall back on the system path.

This is a problem for Protobuf because a very simple source structure like this:

- main.py         # contains: from pb_sources import foo_pb2
- pb_sources/
  |-- foo.proto   # contains: import "bar.proto"
  |-- bar.proto
- pb_generated/
  |-- foo_pb2.py
  |-- bar_pb2.py

Which is compiled with a command like this:

protoc -I pb_sources/ --python_out pb_generated/ pb_sources/foo.proto pb_soutces/bar.proto

Results in foo_pb2.py containing this line:

import bar_pb2 as bar__pb2

This parallels the exact description of the difference in import mechanics between Python 2 and 3. The import fails because the code appears to be designed assuming that import
could be relative, and while that was a true assumption in Python 2, it is no longer true in Python 3.

Should protoc just generate relative imports to fix this?

No! The proposed fix in the OP is not workable. It would work fine when the .proto files are siblings, but would otherwise
create a giant mess -- especially when protoc has -I specified multiple times.

What is the real solution?

There are a couple solutions that could work and not cause a mess, though I do not presume to know which is best:

  1. Actually use the package metadata in the .proto files, some other python-specific field, or even a CLI argument (--python_import_prefix?) to define what the correct
    absolute import path should be. That value in the above example would be pb_generated, so the generated files can contain import pb_generated.bar_pb2 as bar__pb2.

  2. A more clever solution using Python 3's package loading mechanics (while simultaneously not requiring a package name like the prior solution) probably exists, but I do not
    personally see how. Someone more competent than me may know better.

  3. A refactor/rework of how import statements are generated in the Python plugin, so broken imports are not generated.

  4. Document this quirk (the output directory needing to be in PYTHONPATH or sys.path) in the tutorial
    or reference guide. This is the least the Protobuf team can do.

@tpboudreau
Copy link

@tpboudreau tpboudreau commented Feb 3, 2021

@fsufitch -- this is a good summary. I submitted a patch that added a CLI argument similar to your suggestion (I called it python option replace_import_package) here: #7470. It seemed like a lightweight approach for fixing some of the more straightforward use cases, but it hasn't gotten a lot of traction unfortunately.

@averhagen
Copy link

@averhagen averhagen commented May 14, 2021

Not sure if this helps, but I stumbled upon this problem while huge problem getting my python grpc file to import from the correct package. The solution I found was adding a "package_name"= to my proto_path. example:

poetry run python -m grpc_tools.protoc --proto_path recommendations=./protobufs --python_out=./ --grpc_python_out=./ ./protobufs/recommendations.proto

@gnossen
Copy link

@gnossen gnossen commented Jul 9, 2021

@haberman From my perspective as a maintainer of gRPC Python, this is the hardest and most fundamental problem the users of gRPC python in OSS face. Absolutely no one gets around this problem without using sys.path.append, hacking together a sed script operating on their _pb2.py files, or just manually editing their generated code.

IMO, the best approach is a new option for .proto files allowing the user to specify the proper import path. At the very least though, we should bless the sys.path.append workaround as the recommended method with some documentation.

What can we do to get the ball rolling on this?

@boukeversteegh
Copy link

@boukeversteegh boukeversteegh commented Jul 17, 2021

If you're interested, at https://github.com/danielgtaylor/python-betterproto/tree/release-v2.0.0b1 (release v2) we have a working implementation that compiles proto to python with a package structure that matches the protobuf packages.

Compiled files correctly import dependencies using relative paths and users can import the files using the generated package name, without hacking the path.

Please have a look by trying to compile some protobuf files with betterproto 2 to see if you can learn anything from the approach.

It took a lot of effort and a ton of unit and integration tests to get right, but its possible.
See here for a list of issues that came up while writing this:
https://github.com/danielgtaylor/python-betterproto/issues?q=is%3Aissue+milestone%3A%22Better+Imports%22+is%3Aclosed

@haberman
Copy link
Contributor

@haberman haberman commented Jul 20, 2021

@boukeversteegh I tried betterproto to see how it would solve the problem I described in #1491 (comment)

I created these input files:

// test1.proto
syntax = "proto3";

package hello;

// Greeting represents a message you can tell a user.
message Greeting {
    string message = 1;
}
// test2.proto
syntax = "proto3";

package hello;

// Greeting represents a message you can tell a user.
message Greeting2 {
    string message = 1;
}

I found that compiling them together works as expected:

$ protoc -I . --python_betterproto_out=lib test1.proto test2.proto
Writing __init__.py
Writing hello.py
$ grep Greeting lib/hello.py
class Greeting(betterproto.Message):
    """Greeting represents a message you can tell a user."""
class Greeting2(betterproto.Message):
    """Greeting represents a message you can tell a user."""

However if I compile them separately, one overwrites the other:

$ protoc -I . --python_betterproto_out=lib test1.proto
Writing __init__.py
Writing hello.py
$ protoc -I . --python_betterproto_out=lib test2.proto
Writing __init__.py
Writing hello.py
$ grep Greeting lib/hello.py 
class Greeting2(betterproto.Message):
    """Greeting represents a message you can tell a user."""

From this I conclude that betterproto is susceptible to the problem I described in #1491 (comment). It cannot support parallel compilation.

This is the fundamental issue preventing us from turning package statements into Python module names.

@boukeversteegh
Copy link

@boukeversteegh boukeversteegh commented Jul 21, 2021

@boukeversteegh I tried betterproto to see how it would solve the problem I described in #1491 (comment)

This is the fundamental issue preventing us from turning package statements into Python module names.

Thank you for trying it out. Indeed, parallel compilation is something we haven't looked into at betterproto.

  • I think you may have tried out v1 of betterproto. V2 does things slightly differently, namely it generates an actual package for each protobuf package, such that the output ends up in hello/__init__.py.

The "distributivity" problem of protoc(A+B) == protoc(A) + protoc(B) is probably fundamentally unsolvable whenever the outputs are not completely isolated (non-overlapping filenames), since as far as I know protoc can only overwrite files, and there is no built-in mechanism to append to existing files. If this is not the case, there may be possibilities.

However, if we think about it more practically, the problem is not distributivity, we just want parallelization to speed up the compilation. Therefor, the problem might as well be protoc(A+B) == merge(protoc2(A), protoc2(B)). We are looking for a function protoc2 (potentially protoc2 == protoc, or protoc2 == protoc --some-parameter) of which its outputs can be combined by function merge, such that the result is equivalent to compiling all files together.

As long as we can avoid overwriting during compilation, and we then merge the two compiled outputs, the problem is solvable.

  • For example, for a package called foobar, with two files a.proto and b.proto, instead of generating a single foobar/__init__.py file, you could generate foobar/a.py and foobar/b.py, and during the merge phase, you could expose the classes of a.py and b.py in foobar/__init__.py.

It's a rough idea and I haven't analyzed it deeply, but if parallelization is the main goal, I feel it should be possible in theory.

@haberman
Copy link
Contributor

@haberman haberman commented Jul 21, 2021

However, if we think about it more practically, the problem is not distributivity, we just want parallelization to speed up the compilation.

Build systems generally require that every output file is produced by exactly one action. Whether it is Bazel, make, CMake, etc, all of these will parallelize the build. There is no way for two actions to both write/modify foobar/__init__.py safely.

If the "merge phase" was its own action that depended on both a.proto and b.proto, that would be fine. But in general there is no way of doing this. a.proto and b.proto can be anywhere in the repo, and may not know anything about each other. For example, with Bazel they could come from two different proto_library() rules.

The rule that generates foobar/__init__.py would need to have global knowledge of every possible proto file that uses package hello. But this is generally impossible. This is especially true when you consider dependencies. What if a.proto is in your project and b.proto is in a project you depend on? It would be impossible to compile two projects independently.

One approach that could be possible is to have a.proto generate _a_pb2.py (like protobuf does now), but as an internal-only module that is not meant to be imported by users. Then you could also generate hello/Greeting.py that exposes the Greeting class from _a_pb2.py. This isn't as pretty, since the full name of the class would be hello.Greeting.Greeting, instead of just hello.Greeting, but it would be unique as we can be guaranteed that no two different protobuf files will declare the same message name. So the hello/Greeting.py filename would be unique. But this is ugly enough that I'm guessing it's less desirable overall than what we have now.

@DmitriGekhtman
Copy link

@DmitriGekhtman DmitriGekhtman commented Jul 22, 2021

Are the workarounds to this issue documented somewhere? There's been quite a bit of discussion since this issue was closed -- it's hard to understand what's going on.

@StephenCarboni
Copy link

@StephenCarboni StephenCarboni commented Sep 5, 2021

Why not just output all generated Python code to one module? Avoid this issue entirely.

@VeNoMouS
Copy link

@VeNoMouS VeNoMouS commented Oct 17, 2021

4 years... and still not resolved :|

@wonderbeyond
Copy link

@wonderbeyond wonderbeyond commented Nov 8, 2021

5 years passed...

@cpcloud
Copy link

@cpcloud cpcloud commented Nov 24, 2021

I've started working on tool https://github.com/cpcloud/protoletariat, that aims to solve the problem of Python imports outside of the protobuf repository since it appears like this is unlikely to be fixed here any time soon.

Please try it out, report issues or otherwise get involved. There's a PyPI package, as well as a docker image that you can pull from ghcr.io.

Update: there's now a conda package available: conda install -c conda-forge protoletariat

@parksj10
Copy link

@parksj10 parksj10 commented Dec 16, 2021

github actions hack... where $target is name of module ($reponame is folder you're editing the code in, recursively searched here)

find $REPOPATH/$reponame/ -type f -name "*.py" -print0 | xargs -0 sed -i -e 's, import '"$target"'_pb2, from . import '"$target"'_pb2, g'

@cpcloud
Copy link

@cpcloud cpcloud commented Dec 16, 2021

github actions hack... where $target is name of module ($reponame is folder you're editing the code in, recursively searched here)

find $REPOPATH/$reponame/ -type f -name "*.py" -print0 | xargs -0 sed -i -e 's, import '"$target"'_pb2, from . import '"$target"'_pb2, g'

This will only work for proto files that live alongside each other in the same directory. Any from imports will be replaced by broken code using this hack.

@parksj10
Copy link

@parksj10 parksj10 commented Dec 16, 2021

github actions hack... where $target is name of module ($reponame is folder you're editing the code in, recursively searched here)

find $REPOPATH/$reponame/ -type f -name "*.py" -print0 | xargs -0 sed -i -e 's, import '"$target"'_pb2, from . import '"$target"'_pb2, g'

This will only work for proto files that live alongside each other in the same directory. Any from imports will be replaced by broken code using this hack.

Yep, thanks for pointing it out!

YikSanChan added a commit to oom-ai/oomstore that referenced this issue Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet