Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python generated file compiled with Cython causes error with protobuf 3.2.0 #2896

Closed
arthur-tacca opened this issue Mar 22, 2017 · 8 comments
Assignees
Labels

Comments

@arthur-tacca
Copy link

arthur-tacca commented Mar 22, 2017

When a _pb2 is compiled with Cython, and the C++ backend of the Python library is turned on (as it now is on Linux for the PyPI deployed protobuf, as of 3.2.0), instantiating any message object gives an error. I am using Python 3.5. Here's an example of the error:

Traceback (most recent call last):
  File "bug.py", line 1, in <module>
    from addressbook_pb2 import Person
  File "addressbook_pb2.py", line 34, in init addressbook_pb2 (addressbook_pb2.c:1997)
    _descriptor.EnumValueDescriptor(
  File "/home/arthur/envs/default/lib/python3.5/site-packages/google/protobuf/descriptor.py", line 649, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors should not be created directly, but only retrieved from their parent.

The immediate cause of the error is _CheckCalledFromGeneratedFile in google/protobuf/pyext/message.cc, which calls _CalledFromGeneratedFile in google/protobuf/pyext/descriptor.cc. I haven't looked any further than this, but my suspicion is raised by the strcmp against "_pb2.py" at the end of this function. I think that for Cython the filename would end with "_pb2.c", but I'm not sure.

Just to be completely explicit, here are steps to reproduce:

  1. Create a new Python virtual environment with the relevant packages (I am just using grpio-tools for easy access to protoc):
pip install grpcio-tools==1.1.0 protobuf==3.2.0 Cython==0.25.2
  1. Generate a pb2:
wget https://raw.githubusercontent.com/google/protobuf/master/examples/addressbook.proto
python -m grpc_tools.protoc -I. addressbook.proto --python_out=.
  1. Cythonise and compile this file. The easiest way to do this is with a little setup.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(name="bug", ext_modules = cythonize("addressbook_pb2.py"))

Then run: python setup.py build_ext --inplace

  1. Try to instantiate a message:
python -c 'from addressbook_pb2 import Person; p = Person()'
@arthur-tacca
Copy link
Author

I added some trace to _CalledFromGeneratedFile to see how the stack differs when using Cython. It turns out there's no entry at all for the Cython function (at least not one related to the filename). Here's what it looks like normally:

_CalledFromGeneratedFile(0)
  0: filename: /home/arthur/src/proto_tmp/addressbook_pb2.py
  1: filename: <frozen importlib._bootstrap>
  2: filename: <frozen importlib._bootstrap_external>
  3: filename: <frozen importlib._bootstrap>
  4: filename: <frozen importlib._bootstrap>
  5: filename: <frozen importlib._bootstrap>
  6: filename: bug.py
  7: frame is NULL

Here is what it looks like when addressbook_pb2 has been Cythonized:

_CalledFromGeneratedFile(0)
  0: filename: <frozen importlib._bootstrap>
  1: filename: <frozen importlib._bootstrap_external>
  2: filename: <frozen importlib._bootstrap>
  3: filename: <frozen importlib._bootstrap>
  4: filename: <frozen importlib._bootstrap>
  5: filename: <frozen importlib._bootstrap>
  6: filename: bug.py
  7: frame is NULL

I also forced _CalledFromGeneratedFile to return true and did a basic test of functionality: I made a Person and added some phone numbers, serialized to bytes, then deserialized to a new object. This all worked fine.

One option of course is to turn off this check, as is already done for PyPy. An argument in favor of this is that the docs for PyFrameObject say that "the fields of this type are subject to change at any time", but this function reaches into them gratuitously. A compromise might be to look at the module filename, and if it ends with ".py" check further that it ends in "_pb2.py", but if not just pass the check.

@haberman
Copy link
Member

If this check can't be implemented with Cython, feel free to disable it for that environment.

I'm not aware of anyone using protobuf with Cython! Will be interested to hear more about how well it works (or doesn't) on this issue.

@arthur-tacca
Copy link
Author

I'm not Cythoning the protobuf library, only the generated user _pb2.py files. This means there can't be a static check (something like #ifdef CYTHON) because it's impossible to tell when the protobuf library is built whether it's going to be used with Cythoned generated files. The closest I can think of to a run time check is the compromise I mentioned at the end of my second comment. I can't do that (maintainably) on my end because it requires changing the protobuf library and recompiling, whereas I'd prefer everyone to keep installing it just by using pip.

The pure-Python versions of Protobuf seem to work fine with Cython, and as I mentioned in my second comment a brief check suggests the C++-based version works fine too. I can't tell you if it works any better i.e. faster because we're not really Cythoning it for that... We just have some _pb2 files as part of larger packages where other modules definitely do benefit from Cython, and it would be a huge administrative pain to run Cython on some modules and not others (essentially I think we'd have to separate them out in to separate packages). I expect the fancy descriptor stuff that generated pb2s use are quite opaque to Cython, and even if they used normal Python methods instead they'd probably also need some static type hints (i.e. cdefs) to get much benefit.

@arthur-tacca
Copy link
Author

Just for reference, the issue where Linux wheels were enabled is #2623 (but don't get me wrong, it's great that they are).

@anandolee
Copy link
Contributor

anandolee commented Jan 30, 2018

Pure python does not do the check because it "is somewhat difficult to implement correctly in PyPy". I don't think we have enough reasons to disable it in cpp implementation only for Cython.

arthur-tacca, have your checked the issue if it is raised by the strcmp against "_pb2.py" at the end of this function? If so, Your other suggestion make sense to me:
look at the module filename, and if it ends with ".py" check further that it ends in "_pb2.py", but if not just pass the check. Are you able to make a PR?

@anandolee anandolee self-assigned this Jan 30, 2018
@arthur-tacca
Copy link
Author

@anandolee Yes that fixes it for me. Although the change is tiny I don't think I can make a pull request because I'm not sure I can sign the CLI. I could test if someone else is prepared to make a pull request? Perhaps @BigQuant who opened #4231?

@anandolee
Copy link
Contributor

I can make a PR. Will be great if you can help test and review

@filiplindqvist-tink
Copy link

Just a note / learning.

After seeing the same message in our code.

Traceback (most recent call last):
  File "./backend/src/tests/service_test.py", line 5, in <module>
    from src.service.external.models_service import MyServiceMock
  File "./backend/src/external/models_service.py", line 12, in <module>
    from src.service.external.models.pb2 import (
  File "./backend/src/external/models/pb2.py", line 42, in <module>
    type=None),
  File "./pip_deps/pypi__protobuf/google/protobuf/descriptor.py", line 733, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors should not be created directly, but only retrieved from their parent.

The problem was that the file was named pb2.py, which apperently is not allowed. After rename this worked as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

4 participants