Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop requiring __file__ in Python packages #69

Open
indygreg opened this issue Jul 2, 2019 · 17 comments
Open

Stop requiring __file__ in Python packages #69

indygreg opened this issue Jul 2, 2019 · 17 comments
Labels
compatibility For compatibility issues with 3rd party Python projects

Comments

@indygreg
Copy link
Owner

indygreg commented Jul 2, 2019

The Problem

Many Python modules and scripts use __file__ to derive the filesystem path to the current file.

As documented at https://docs.python.org/3/reference/datamodel.html (search for __file__), __file__ is optional (the __file__ attribute may be missing for certain types of modules).

However, because Python has traditionally relied on filesystem-based imports and hasn't had a stable story around non-module resource handling, , __file__ is almost always defined and has been used to locate and load files next to Python source files for seemingly forever. This is arguably tolerable. But reliance on __file__ undermines tools - like PyOxidizer - which don't import Python modules from the filesystem. This in turn constrains the flexibility and utility of the larger Python ecosystem.

The Solution

Python code should be rewritten to not assume the existence of __file__. By doing so, Python code will be more compatible with more Python execution environments (such as PyOxidizer), and this benefits the overall Python ecosystem.

Instructions for writing portable Python code that doesn't rely on __file__ can be found at https://pyoxidizer.readthedocs.io/en/latest/packaging_pitfalls.html#reliance-on-file.

This Issue

This issue can serve as a focal point for tracking and coordinating Python packages and tools which currently rely on __file__ but shouldn't. If you file a GitHub issue against a project that relies on __file__, you can reference this issue by typing indygreg/PyOxidizer#69 and provide Python project maintainers with enough context to make informed decisions about the use of __file__ in their projects.

@QuantumChamploo
Copy link

I am having a similar issue, just when I am trying to import a package, for example numpy. So I am not sure how to change my code not to depend on file as the only line of code I have is "import numpy"

I have this issue when I use the repl mode, the eval mode, and when I try to run a script

indygreg added a commit that referenced this issue Jul 11, 2019
The warning instructs them to look for more info at #69.
@indygreg indygreg added the compatibility For compatibility issues with 3rd party Python projects label Jul 11, 2019
gnprice added a commit to gnprice/hello-pyoxidizer that referenced this issue Jul 16, 2019
I chose the library for this demo like so:
* Went to https://pypi.org/ and looked at "Trending projects".
* Tried each one, using example code from its README.
* Went with the first one that worked.

In the sample that happened to give me today, this was the 4th
on the list:

* `cpp-demangle` was first, and failed at `pip install` time
  with an error about `setuptools_rust`.  I think it just doesn't
  build from source in a stock environment; I get the same error
  with `pip install .`, in a fresh venv, after cloning the source.

* `pygrok` failed at import time, trying to use `__file__`.
  See indygreg/PyOxidizer#69.

* `python-whois` failed at import time: imports `past`, which
  imports `lib2to3`, which uses `__file__`.

* `area` works.
@Alex-Mann
Copy link

Alex-Mann commented Aug 9, 2019

Is there a recommended strategy for patching these modules locally to try and get them to work with PyOxidizer? Currently I am unable to find a way to build when I use a module that contains the __file__ variable.

I have set up a venv, downloaded all the packages that I am using locally, and have tried to monkey patch the __file__ variables where I am hitting errors, but it seems like pyoxidize run is using a cached version of the modules somewhere (unless they get pulled directly from pip each time) since the same error is still showing even though I have removed the __file__ variables.

Is it possible to somehow search and replace the __file__ variable when it is found? I've only just started using this module but it seems like many python packages are probably using it as well. Expecting them to accept pull requests / updates to deprecate this in the near future doesn't really seem all that feasible.

@Alex-Mann
Copy link

Ok, after doing some more digging around the repo, I found the solution offered for the black example. I didn't see this earlier so I didn't know it existed. The package I'm using is GitPython, so what I did was add:

[[embedded_python_config]]
sys_paths = ["$ORIGIN/lib"]

[[packaging_rule]]
type = "pip-install-simple"
package = "gitpython"
install_location = "app-relative:lib"

Now, the python application that I am developing lives in a virtualenv, so originally I was just using that virtualenv to dictate what packages needed to be rolled up. Is there an equivalent command for install_location = "app-relative:lib" and sys_paths = ["$ORIGIN/lib"] to get this to work with a virtualenv?

@n8henrie
Copy link

n8henrie commented Sep 4, 2019

it seems like pyoxidize run is using a cached version of the modules somewhere

Try making a minor (e.g. whitespace) change to the toml file

@n8henrie
Copy link

n8henrie commented Sep 4, 2019

Running a file with python -i and looking through its globals, I noted that __spec__.origin is the same as __file__, but when I made a test package that prints its value into a PyOxidizer executable, it turns into None. (At least the file runs.)

@naufraghi
Copy link
Contributor

naufraghi commented Oct 9, 2019

Having a similar problem, PyInstaller uses this workaround:

A Python error trace will point to the source file from which the archive entry was created (the __file__ attribute from the time the .pyc was compiled, captured and saved in the archive). This will not tell your user anything useful, but if they send you a Python error trace, you can make sense of it.

Because, in the user mind, if a tool cannot package a script that was working if not packaged, is a tool fault and not a "that library is accessing an optional attribute without checking".

@jayvdb
Copy link
Contributor

jayvdb commented Oct 9, 2019

It would be nice if there was the ability to specify in the toml how __file__ should be handled, choosing between various workarounds that other similar tools use, and ideally choose the workaround on a per-package level.

Workaround strategies would include:

  1. use garbage (this would be enough for pytz c.f. Accessing __file__ from certifi and pytz #91 which falls back to pkg_resources but that depends on fixing importing pkg_resources fails #134)
  2. use built name (i.e. the exe name; this is good enough for packages who use __file__ as part of some printout, which might be enough for sentry/raven c.f. import raven fails #63) I believe this is sort-of what Nuitka does, at least for its compiled modules
  3. use source filename (e.g. like PyInstaller)

Likely others exist too.

@jayvdb
Copy link
Contributor

jayvdb commented Oct 10, 2019

For any maintainer of a package which has been directed to read this issue for PyOxidizer compatibility, if your use of __file__ is to load package data, the following alternative ways to load data are also not supported:

It would be good to know what standardised method of loading package data does work.

(update: #53 suggests that importlib.resource and backport importlib_resources is supposed to work. I saw elsewhere that Greg raised https://bugs.python.org/issue36128 about that. And #128 suggests the support is currently buggy)

@eldipa
Copy link

eldipa commented Jun 14, 2020

For future reference, the pinned link to the reliance on file document (the doc is missing in v0.7.0)

jstasiak added a commit to netaddr/netaddr that referenced this issue Jun 15, 2020
Assuming __file__ exists breaks packaging netaddr with PyOxidirzer[1].

Initially I wanted to use pkgutil.get_data()[2] as it's been in the Python
standard library since Python 2.6 but it always reads and returns the
whole resource and I decided I don't like reading whole out.txt and
iab.txt in OUI and IAB constructors just to read few small bits of data.
importlib.resources provides an API[3] that should in the usual cases
cases[4] avoid loading whole resources into memory. Maybe IAB and OUI
constructors shouldn't read files and the API needs to be rethought but
that's an issue for another day.

Granted, this is a tradeoff, as we have a dependency now which means
slighly more network traffic and slightly more complexity, but all
things considered I think that's better than the alternative.

(Ironically this introduces a new piece of code using __file__ but this
should be benign as it's not in the code that'll be present in
a PyOxidizer-produced binary. It's necessary to have that setup.py hack
as netaddr can't be imported without importlib_resources or
importlib.resources present now so it can't be unconditionally imported
from setup.py where importlib_resources may not be installed yet).

Fixes GH-188.

[1] indygreg/PyOxidizer#69
[2] https://docs.python.org/2/library/pkgutil.html#pkgutil.get_data
[3] https://docs.python.org/3.9/library/importlib.html#importlib.resources.open_binary
[4] https://gitlab.com/python-devs/importlib_resources/-/blob/2707fb7384e76cda715de14bea5956339969950f/importlib_resources/_py3.py#L24
jstasiak added a commit to netaddr/netaddr that referenced this issue Jun 17, 2020
Assuming __file__ exists breaks packaging netaddr with PyOxidirzer[1].

Initially I wanted to use pkgutil.get_data()[2] as it's been in the Python
standard library since Python 2.6 but it always reads and returns the
whole resource and I decided I don't like reading whole out.txt and
iab.txt in OUI and IAB constructors just to read few small bits of data.
importlib.resources provides an API[3] that should in the usual cases
cases[4] avoid loading whole resources into memory. Maybe IAB and OUI
constructors shouldn't read files and the API needs to be rethought but
that's an issue for another day.

Granted, this is a tradeoff, as we have a dependency now which means
slighly more network traffic and slightly more complexity, but all
things considered I think that's better than the alternative.

(Ironically this introduces a new piece of code using __file__ but this
should be benign as it's not in the code that'll be present in
a PyOxidizer-produced binary. It's necessary to have that setup.py hack
as netaddr can't be imported without importlib_resources or
importlib.resources present now so it can't be unconditionally imported
from setup.py where importlib_resources may not be installed yet).

Fixes GH-188.

[1] indygreg/PyOxidizer#69
[2] https://docs.python.org/2/library/pkgutil.html#pkgutil.get_data
[3] https://docs.python.org/3.9/library/importlib.html#importlib.resources.open_binary
[4] https://gitlab.com/python-devs/importlib_resources/-/blob/2707fb7384e76cda715de14bea5956339969950f/importlib_resources/_py3.py#L24
@seghier
Copy link

seghier commented Aug 13, 2020

Why it have alot of problem? it need installation of tons of programs than don't work properly

@dmwyatt
Copy link

dmwyatt commented Oct 8, 2020

So, I think I'll put this here even though I think it also goes in #73.

certifi implemented what was supposedly a fix for __file__ issues here. I know @indygreg prompted them about this a bit in this issue.

But, I don't think the fix works for pyoxidizer.

I'm still getting what I think is basically related to __file__ issues that they had originally.

File "<stdin>", line 1, in <module>
  File "main", line 8, in <module>
  File "httpx", line 2, in <module>
  File "httpx._api", line 3, in <module>
  File "httpx._client", line 11, in <module>
  File "httpx._config", line 54, in <module>
  File "httpx._config", line 59, in SSLConfig
  File "certifi.core", line 37, in where
  File "contextlib", line 112, in __enter__
  File "importlib.resources", line 196, in path
  File "pathlib", line 1022, in __new__
  File "pathlib", line 669, in _from_parts
  File "pathlib", line 653, in _parse_args
TypeError: expected str, bytes or os.PathLike object, not NoneType

I don't understand the intricacies of importlib.resources well enough to understand what is happening here, but somehow package.__spec__.origin is None.

Is this a pyoxidizer problem or a problem with certifi's usage of importlib.resources?

@manugarri
Copy link

same issue here, cant really import anything beyond stdlib because of the error ```TypeError: expected str, bytes or os.PathLike object, not NoneType

@ofek
Copy link
Sponsor

ofek commented Jan 5, 2022

The new guide is here: https://pyoxidizer.readthedocs.io/en/latest/oxidized_importer_resource_files.html#porting-code-to-modern-resources-apis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility For compatibility issues with 3rd party Python projects
Projects
None yet
Development

No branches or pull requests