Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should protobuf Python be packaged to coexist with other packages in the google namespace? #1296

Closed
haberman opened this issue Mar 5, 2016 · 33 comments

Comments

@haberman
Copy link
Member

haberman commented Mar 5, 2016

I'm creating this as a master issue to track this long-standing problem. Several people have encountered this problem, but no one seems to know the correct way to solve it.

Here are several issues that say that installing the protobuf package makes other Python packages under the google. namespace inaccessible because protobuf has no __init__.py. All of these issues were reported against version <= 3.0.0b2: #713 #859 #1153 #1272

These issues were supposed to be fixed by this PR: #1259 That PR was integrated into a new release on PyPI, 3.0.0b2.post1 (https://pypi.python.org/pypi/protobuf/3.0.0b2.post1).

However that PR caused a different bug: #1294 googleapis/google-cloud-python#1570

We need a solution to this problem, but finding a solution that works has proven to be elusive. Would very much appreciate anyone who can propose a fix they are confident will fix all of these problems.

@awan1
Copy link

awan1 commented Mar 5, 2016

Currently I've hacked this to work locally by going into the site-packages/google folder in my virtualenv (introduced by pip install protobuf==3.0.0b2.post1) and introducing a simlink to the appengine folder in my GAE SDK install. This will work in general for any package that gets shadowed by the google top-level module.

This is very far from optimal and I hope we can work towards a solution!

craigcitro added a commit to craigcitro/protobuf that referenced this issue Mar 5, 2016
Improves protocolbuffers#1296.

The problem: in the previous patch, we tweaked the __init__.py files to use
namespaces, but no longer declared ourselves as a namespace package. The
second half was unwise.

Note that this only comes up when installing protobuf alongside another
package that also installs into the google namespace; as of right now, the
only PyPI package that does is googleapis-common-protos, though the GAE SDK
also uses google.appengine. Installing either or both of those alongside this
package now works.

The case that still remains is the upgrade path, which is also what worried me
in protocolbuffers#713. It seems that if protobuf 2.6.1 is installed, there's no way to
safely upgrade that to work with a newer protobuf. However, `pip uninstall` &&
`pip install` does the trick.
@theacodes
Copy link

Every python package that wants to use the google.* namespace needs to do the following:

Add google/__init__.py:

# this is a namespace package
try:
    import pkg_resources
    pkg_resources.declare_namespace(__name__)
except ImportError:
    import pkgutil
    __path__ = pkgutil.extend_path(__path__, __name__)

(pkg_resources is preferred because

Add namespace_packages to setup.py:

setup(
    name='protobuf',
    namespace_packages=['google'],
    ...
)

If any one package doesn't do this and is installed alongside the others, it's very likely to completely break the installation.

We should get a list of all known packages and ensure they conform to this.

@craigcitro
Copy link
Contributor

With the other PR merged, we're now at least back to a reasonable working state. (Famous last words?) I think the pending things we should do before closing out this PR include:

  1. As Jon said, making sure we have a complete list of packages we expect to populate the google namespace, and make sure they're all playing nice (eg use the setup Jon described).
  2. We need to deal with an upgrade from protobuf 2.6.1 to protobuf 3.0.0 (once we have an official release). I realized that an easy-but-unfortunate answer is to force users to uninstall the old protobuf (thereby removing the unwanted google directory in site-packages) before installing something in the 3.x series; we could even enforce this in setup.py, I think?
  3. Add some tests that include installing protobuf alongside other google-namespace-using packages, and potentially upgrade tests for whatever paths we support. I don't think these necessarily need to be part of the usual Travis run; is there a hook for system/release tests?

Also, @jonparrott -- what was the rest of the the sentence you started with "pkg_resources is preferred because"? 😉 In particular, I'd always just used pkgutil, based on this section in the PEP, but I'm happy to learn more about this.

@theacodes
Copy link

I got some conflicting information here. ¯_(ツ)_/¯

pkgutil is part of the standard library and is fine for this particular use case as far as I can tell.
pkg_resources is part of setuptools and pre-dates pkgutil being in the standard library, so it can be helpful in cases where you're supporting interpreters that don't have pkgutil. It also supports some more fancy features like adding zips and such, but again, not horribly applicable here.

I think what I've recommended above is fine, and it is what's used in other common namespace packages (namely zope). @tseaver might be able to shed some insight there as well.

@tseaver
Copy link
Contributor

tseaver commented Mar 6, 2016

FWIW, packages in the zope namespace used to do the conditional import dance outlined above; they now just assume 'pkg_resources` is available (comes with setuptools, and everything in the Python web world expects setuptools / pip to be available):

__import__('pkg_resources').declare_namespace(__name__)

@theacodes
Copy link

@craigcitro for the in-place upgrade, you can actually check in setup.py if an old version is installed and issue a warning saying to remove then install instead of upgrade. I've seen another package do this.

@craigcitro
Copy link
Contributor

@jonparrott sweet -- if there's prior art, I think that's the right fix for the upgrade path. @haberman does that sound reasonable to you?

@lovetox
Copy link

lovetox commented Aug 16, 2016

so whats the solution now? if i install protobuf over pip no init file is placed inside the google folder, how can i import protobuf?

@theacodes
Copy link

theacodes commented Aug 16, 2016

The presence of a google/__init__.py doesn't determine whether it's a namespace package, the google.pth file in site-packages determines it.

@haberman
Copy link
Member Author

@craigcitro If there is a possible solution to this problem by all means let's move forward with it!

@tseaver
Copy link
Contributor

tseaver commented Aug 23, 2016

AFAIK, the current version of protobuf installs cleanly in the google namespace under current versions of pip and setuptools. The .pth files of the various packages which install in that namespace collaborate to ensure that the google package imports cleanly, as do its subpackages.

$ $VENV/bin/pip freeze
-f file:///home/tseaver/.pip/wheels
enum34==1.1.6
future==0.15.2
futures==3.0.5
gax-google-logging-v2==0.8.1
gax-google-pubsub-v1==0.8.1
gcloud==0.18.0
google-gax==0.12.5
googleapis-common-protos==1.2.0
grpc-google-logging-v2==0.8.1
grpc-google-pubsub-v1==0.8.1
grpcio==1.0.0rc2
httplib2==0.9.2
nose==1.3.7
nose-exclude==0.4.1
oauth2client==3.0.0
ply==3.8
protobuf==3.0.0
pyasn1==0.1.9
pyasn1-modules==0.0.8
rsa==3.4.2
six==1.10.0
$ for sub in $(ls $VENV/lib/python2.7/site-packages/google); do
    echo $sub
    $VENV/bin/python -c "import google.$sub"
done
api
cloud
gax
logging
longrunning
protobuf
pubsub
rpc
type

@glyph
Copy link

glyph commented Aug 23, 2016

Should this issue still be open? It seems like the usage of namespace_packages is the correct answer, and the referenced bugs are all fixed.

@craigcitro
Copy link
Contributor

@glyph @haberman It's maybe reasonable to close this issue; however, the work around the upgrade path from this comment still remains. Do we want to just move that to a new issue?

In particular, we need to:

  1. Spit out a warning and bail in setup.py if there's an old (non-namespace) version of protobuf installed.
  2. Write a test.

@glyph
Copy link

glyph commented Aug 23, 2016

@craigcitro - Bailing in setup.py will rarely help. Since protobuf is distributed as a wheel, no code from protobuf is run at installation time; it's just unpacked by pip into the appropriate location.

@craigcitro
Copy link
Contributor

@glyph Yep, that's bad. 😉 So that means that if we wanted to do this, we'd have to basically detect it at runtime?

@haberman Given that 3.0.0 is out, should we wait and see how much of a problem this is before we write some code we'll regret?

@glyph
Copy link

glyph commented Aug 23, 2016

@glyph Yep, that's bad. 😉

No, it's better than bad, it's good :). Trust me - as somebody who has spent roughly a decade of my life debugging weird setup.py interactions, the occasional corner-case like this is a miniscule cost to pay :).

In other words: the small detriment to you is that you can't run code in setup.py, but the tremendous benefit to you is that nobody else can run code in setup.py either :).

So that means that if we wanted to do this, we'd have to basically detect it at runtime?

It'd probably be better and more helpful to contribute upstream to something like pip to generalize the detection and resolution of non-cooperative namespace packages. that would have a lot broader impact, and would be a lot easier to make reliably. Or possibly delete old versions of gcloud-python from PyPI so that folks have to upgrade. (You could do new point-releases that fix nothing but the packaging.)

@craigcitro
Copy link
Contributor

@glyph

No, it's better than bad, it's good :).

Oh, sorry, I was unclear there -- I meant it was bad in the context of trying to detect this issue at install time. I totally agree that it's great for everyone's sanity; I have many fewer setup.py scars than you, I'm sure, but I've got plenty of my own. 😉

It'd probably be better and more helpful to contribute upstream to something like pip to generalize the detection and resolution of non-cooperative namespace packages.

Definitely -- but unfortunately I don't have time for that at the moment. For now, we're going to just wait and see if this causes us friction, but if it does, that's probably the best way to invest our time wrt fixing it.

Or possibly delete old versions of gcloud-python from PyPI so that folks have to upgrade.

Well, in this case, protobuf was the non-namespace package, though maybe gcloud-python picked it up as a dependency. Lots of packages depend on 2.6.1, so I don't think removing it would be a net win.

@theacodes
Copy link

theacodes commented Dec 1, 2016 via email

@craigcitro
Copy link
Contributor

@jonparrott is there already an issue filed against the GAE side to fix that? (Or should we file one?)

@theacodes
Copy link

@craigcitro

It is known

@haberman
Copy link
Member Author

I'm going to close this issue, please re-open if this requires any specific action from our side.

benley added a commit to benley/protobuf that referenced this issue Jul 31, 2017
Related to protocolbuffers#1296: The issue seems to be fixed for consumers of the
python protobuf package from pypi, but not for anyone getting it from
here as a Bazel remote repository.
jpopelka added a commit to jpopelka/fabric8-analytics-jobs that referenced this issue Jul 31, 2017
Protobuf needs to be installed from source to workaround
protocolbuffers/protobuf#1296
moggers87 added a commit to moggers87/subscribae that referenced this issue Aug 6, 2017
This could happen with any package in the `google` namespace, but
protobuf is the only one I've ever seen in the wild

See protocolbuffers/protobuf#1296 for details
@ShahinNamin
Copy link

ShahinNamin commented Nov 15, 2017

Hi
I think I am encountering the same issue, I am using cloudstorage (GoogleAppEngineCloudStorageClient), in the same project with tensorflow and tensorflow-transform in the flexible app engine environment, but importing cloudstorage will result in the error: from google.appengine.api import app_identity ImportError: No module named appengine.api.

As far as I am concerned, google protobuf is installed in conjunction with the tensorflow transform as a dependency.

I (tried to) follow the workarounds suggested, but couldn't manage to get the project working. Is it possible for someone to write a clean workaround for this so that it can be followed easily?

@craigcitro
Copy link
Contributor

@ShahinNamin new versions of all the libraries involved should be well behaved. What does pip freeze say for your setup?

@ShahinNamin
Copy link

it has installed protobuf 1.3.4 google cloud storage 1.4.0 and tensorflow 1.4.0. I removed the tensorflow and protobuf and it seems the problem persists, so it seems it not the protobuf issue.
I the same error can be replicated by adding GoogleAppEngineCloudStorageClient v 1.9.22.1 as the requirement and try to import cloudstorage.

@craigcitro
Copy link
Contributor

@ShahinNamin I would recommend trying to come up with a minimal repro against current versions of libraries, and reporting a new issue with the library or libraries that cause conflict; this particular issue is long fixed. 😁

(Protobuf is at 3.4.0.)

@ChuckNXP
Copy link

ChuckNXP commented Dec 3, 2017

I can't believe I found this:
With the current release (3.1.0.post1), we had fun issues where import google.appengine would fail, but import google.protobuf; import google.appengine would work
I don't even use protobuf so it is greyed out in Pycharm, but the error:
ImportError: No module named appengine.ext
goes away!

@fengli79
Copy link

fengli79 commented Jan 18, 2018

I get the same issue (#1296 (comment)) when using protobuf 3.5.1.
The protobuf is installed in:

site-packages/
    google/
        protobuf/
            __init__.py

There's no "__init__.py" install under the google package, and I have to import the google.protobuf package to make other packages under the google namespace be available.

@fengli79
Copy link

@haberman I'd like to re-open this, but seems I don't have permission to do that. Could you help?

@craigcitro
Copy link
Contributor

Please don't reopen a two-year-old bug. Just open a new one and reference this.

akesandgren added a commit to akesandgren/easybuild-easyblocks that referenced this issue Aug 11, 2018
This might really be a bug in protobuf-python,
protocolbuffers/protobuf#1296
dlj-NaN added a commit to dlj-NaN/protobuf that referenced this issue Sep 21, 2020
In protocolbuffers#713 and protocolbuffers#1296, the `google` package in protobuf sources was found
to cause conflicts with other Google projects, because it was not
properly configured as a namespace package [1]. The initial fix in
786f80f addressed part of the issue, and protocolbuffers#1298 fixed the rest.

However, 786f80f (the initial fix) also made `google.protobuf` and
`google.protobuf.pyext` into namespace packages. This was not correct:
they are both regular, non-namespace, sub-subpackages.

However (still), the follow-up protocolbuffers#1298 did not nominate them as
namespace packages, so the namespace registration behavior has
remained, but without benefit.

This change removes the unnecessary namespace registration, which has
substantial overhead, thus reducing startup time substantially when
using protobufs.

Because this change affects the import internals, quantifying the
overhead requires a full tear-down/start-up of the Python interpreter.
So, to capture the full cost for every run, I measured the time to
launching a _fresh_ Python instance in a subprocess, varying the
imports and code under test. In other words, I used `timeit` to
measure the time to launch a _fresh_ Python subprocess which actually
performs the imports.

* Reference: normal Python startup (i.e., don't import protobuf at all).
  ```
   % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "pass"])'
  10 loops, best of 3: 27.1 msec per loop
  ```

* Baseline: cost to import `google.protobuf.descriptor`, with
  extraneous namespace packages.
  ```
  % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])'
  10 loops, best of 3: 133 msec per loop
  ```

* This change: cost to import `google.protobuf.descriptor`, without
  extraneous namespace packages.
  ```
  % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])'
  10 loops, best of 3: 43.1 msec per loop
  ```

[1]:  https://packaging.python.org/guides/packaging-namespace-packages/
dlj-NaN added a commit that referenced this issue Sep 23, 2020
In #713 and #1296, the `google` package in protobuf sources was found
to cause conflicts with other Google projects, because it was not
properly configured as a namespace package [1]. The initial fix in
786f80f addressed part of the issue, and #1298 fixed the rest.

However, 786f80f (the initial fix) also made `google.protobuf` and
`google.protobuf.pyext` into namespace packages. This was not correct:
they are both regular, non-namespace, sub-subpackages.

However (still), the follow-up #1298 did not nominate them as
namespace packages, so the namespace registration behavior has
remained, but without benefit.

This change removes the unnecessary namespace registration, which has
substantial overhead, thus reducing startup time substantially when
using protobufs.

Because this change affects the import internals, quantifying the
overhead requires a full tear-down/start-up of the Python interpreter.
So, to capture the full cost for every run, I measured the time to
launching a _fresh_ Python instance in a subprocess, varying the
imports and code under test. In other words, I used `timeit` to
measure the time to launch a _fresh_ Python subprocess which actually
performs the imports.

* Reference: normal Python startup (i.e., don't import protobuf at all).
  ```
   % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "pass"])'
  10 loops, best of 3: 27.1 msec per loop
  ```

* Baseline: cost to import `google.protobuf.descriptor`, with
  extraneous namespace packages.
  ```
  % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])'
  10 loops, best of 3: 133 msec per loop
  ```

* This change: cost to import `google.protobuf.descriptor`, without
  extraneous namespace packages.
  ```
  % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])'
  10 loops, best of 3: 43.1 msec per loop
  ```

[1]:  https://packaging.python.org/guides/packaging-namespace-packages/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests