Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/usr/bin/python3: can't find '__main__' module in 'blah.pex' #958

Closed
dgkatz opened this issue Apr 17, 2020 · 20 comments
Closed

/usr/bin/python3: can't find '__main__' module in 'blah.pex' #958

dgkatz opened this issue Apr 17, 2020 · 20 comments
Assignees

Comments

@dgkatz
Copy link

dgkatz commented Apr 17, 2020

I have a pex file which i build via the following command:
pex -v -r requirements.txt -c gunicorn -D . -o blah.pex
which contructs a pex, installing requirement, and all files in project, setting entrypoint to gunicorn.

Build process works just fine. but when it comes time to run the pex, depending on where i run it ubuntu vm/ mac local/ ubuntu docker sometimes i get the following error:
/usr/bin/python3: can't find '__main__' module in 'blah.pex'

When I unzip the pex, i do see a main.py file in there, so im not sure what the problem is.

Has anyone experience this error? Ideas on what the problem is?

@tentwelfths
Copy link

Okay, I recently had this problem as well and it turns out that python doesn't know how to handle .pex files that are larger than 2gb in size and that gives you the incredibly useful "Can't find main" error.
So, check the size of your pex, if it is >2gb, it's time to go on a dependency diet or try and find other ways to package/deploy your project

@jsirois
Copy link
Member

jsirois commented Jan 7, 2021

Sorry to see this so late - thanks for the bump with data @tentwelfths. Perhaps you could try the next Pex release or Pex master which now support a --venv mode. If you build your PEX file using that flag, when run the PEX file will unpack itself into a virtual environment under ~/.pex/venvs and re-execute from there. The upshot is the PEX runs just like any other Python application and size limits on the PEX zip file, etc are all sidestepped.

@jsirois
Copy link
Member

jsirois commented Jan 7, 2021

Specifically, --venv mode was added in #1153. Going to grab that PR link though reminded me the existing --unzip mode should provide the same remedy in this case. Perhaps you could also or alternatively try that?

@jsirois
Copy link
Member

jsirois commented Jan 7, 2021

@tentwelfths I repro, although I get a different error message:

$ rm -rf big* && mkdir big && yes "#" | head -n 2000000000 > big/data.py && echo "import data; print(data.__file__)" > big/exe.py && ls -lh big/
total 3.8G
-rw-r--r-- 1 jsirois jsirois 3.8G Jan  7 11:17 data.py
-rw-r--r-- 1 jsirois jsirois   34 Jan  7 11:17 exe.py
$ pex -D big/ --entry-point exe -obig.pex && ls -lh big.pex
-rwxr-xr-x 1 jsirois jsirois 4.1M Jan  7 11:18 big.pex
$ ./big.pex 
Traceback (most recent call last):
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 487, in execute
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 404, in _wrap_coverage
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 435, in _wrap_profiling
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 543, in _execute
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 645, in execute_entry
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/.bootstrap/pex/pex.py", line 653, in execute_module
  File "/usr/lib/python3.9/runpy.py", line 213, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jsirois/dev/pantsbuild/jsirois-pex/big.pex/exe.py", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 982, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 925, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1349, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1323, in _get_spec
  File "<frozen importlib._bootstrap_external>", line 1304, in _legacy_get_spec
  File "<frozen importlib._bootstrap>", line 423, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 656, in spec_from_file_location
  File "<frozen zipimport>", line 191, in get_filename
  File "<frozen zipimport>", line 709, in _get_module_code
  File "<frozen zipimport>", line 560, in _get_data
OSError: zipimport: can't read data

But the --unzip remedy works. Here I simply use the runtime PEX_UNZIP (see pex --help-variables) equivalent instead of rebuilding the PEX file with --unzip. Slow 1st run when the initial unzip happens, fastish after that:

$ time PEX_UNZIP=1 ./big.pex 
/home/jsirois/.pex/unzipped_pexes/d5e0ee5a82eafd6fe49ccc04bc54f8ec86a7218c/data.py

real	0m58.322s
user	0m46.461s
sys	0m4.554s
$ time PEX_UNZIP=1 ./big.pex 
/home/jsirois/.pex/unzipped_pexes/d5e0ee5a82eafd6fe49ccc04bc54f8ec86a7218c/data.py

real	0m0.584s
user	0m0.448s
sys	0m0.034s

@stuhood
Copy link

stuhood commented Jan 8, 2021

@jsirois
Copy link
Member

jsirois commented Jan 8, 2021

That's the default for all versions of Python Pex supports save 2.7. IOW building @tentwelfths PEX file would have failed in the 1st place if using 2.7 implying they used 3.x. It seems like an issue not in the zipfile stdlib but in CPython c code that implements zipimport. IE, changing my repro above a bit I get:

$ python2.7 -mpex -D big/ --entry-point exe -obig.pex --venv && ls -lh big.pex
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/jsirois/dev/pantsbuild/pex/pex/__main__.py", line 8, in <module>
    __name__ == "__main__" and pex.main()
  File "pex/bin/pex.py", line 1048, in main
    deterministic_timestamp=not options.use_system_time,
  File "pex/pex_builder.py", line 561, in build
    self._chroot.zip(tmp_zip, mode="a", deterministic_timestamp=deterministic_timestamp)
  File "pex/common.py", line 666, in zip
    write_entry(f)
  File "pex/common.py", line 646, in write_entry
    zf.writestr(zip_entry.info, zip_entry.data)
  File "/usr/lib/python2.7/zipfile.py", line 1257, in writestr
    self._writecheck(zinfo)
  File "/usr/lib/python2.7/zipfile.py", line 1137, in _writecheck
    " would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions

@jsirois
Copy link
Member

jsirois commented Jan 8, 2021

I'm going to guess the CPython code doesn't re-implement zip but links ziplib and that's where the variance @dgkatz noted comes from. Some versions of ziplib support this and some don't ... that seems likely anyhow.

@jsirois
Copy link
Member

jsirois commented Jan 8, 2021

Eek - I was living a fantasy. The zipimport lib is written in Python and does not track the zipfile stdlib. https://github.com/python/cpython/blob/3.9/Lib/zipimport.py#L360 - so it just gets this wrong and assumes 32 bit straight up. So this does not explain the @dgkatz works someplaces and not others OP, but we never had clinching evidence that the @tentwelfths issue was exactly the @dgkatz issue anyhow.

@jsirois
Copy link
Member

jsirois commented Jan 8, 2021

And .. that zipimport shortcoming is documented here: https://bugs.python.org/issue32959

@rom1504
Copy link

rom1504 commented Feb 17, 2022

I'm trying to put torch in a pex and hitting this problem since torch unzipped is 3GB, and zipped a bit more than 2GB
any suggestion here, or any way to split a pex in 2 maybe ?

@jsirois
Copy link
Member

jsirois commented Feb 17, 2022

@rom1504 it depends what you want to do with the PEX. Since you'll already have to split it in 2 and have more than 1 file to ship around, perhaps you're ok with --layout packed or --layout loose? If your Pex doesn't have that feature please try using a newer version of Pex, reading the --layout help and giving it a whirl. You'll get a directory instead of a zip though.

There were a few hot fixes after the initial release here: https://github.com/pantsbuild/pex/releases/tag/v2.1.48 but its been stable for a while now and is used by stable Pants for example for all the internal PEXes it creates to improve caching characteristics.

For example:

$ pex example ansicolors requests -o zipapp.pex
$ ls -lh zipapp.pex 
-rwxr-xr-x 1 jsirois jsirois 1.4M Feb 17 14:13 zipapp.pex
$ ./zipapp.pex -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py

$ pex --layout packed example ansicolors requests -o packed.pex
$ tree -ah packed.pex/
[4.0K]  packed.pex/
├── [409K]  .bootstrap
├── [4.0K]  .deps
│   ├── [ 21K]  ansicolors-1.1.8-py2.py3-none-any.whl
│   ├── [149K]  certifi-2021.10.8-py2.py3-none-any.whl
│   ├── [ 82K]  charset_normalizer-2.0.12-py3-none-any.whl
│   ├── [1.8K]  example-0.1.0-py3-none-any.whl
│   ├── [ 82K]  idna-3.3-py3-none-any.whl
│   ├── [197K]  requests-2.27.1-py2.py3-none-any.whl
│   ├── [ 46K]  six-1.16.0-py2.py3-none-any.whl
│   └── [354K]  urllib3-1.26.8-py2.py3-none-any.whl
├── [2.7K]  __main__.py
└── [1.2K]  PEX-INFO

1 directory, 11 files
$ packed.pex/__main__.py -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py
$ python packed.pex/ -c 'import colors; print(colors.__file__)'
/home/jsirois/.pex/installed_wheels/f25c1d6c49102373d349f5f8f1cddc41ce409e15/ansicolors-1.1.8-py2.py3-none-any.whl/colors/__init__.py

@rom1504
Copy link

rom1504 commented Feb 17, 2022

thanks!
my use case is https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pex
I will give that --layout packed mode a try

@jsirois
Copy link
Member

jsirois commented Feb 17, 2022

Ok, I have not used Spark before, but I'd guess you want the following config tweaking their example:

export PYSPARK_DRIVER_PYTHON=python  # Do not set in cluster modes.
export PYSPARK_PYTHON=./pyspark_pex_env.pex/__main__.py
spark-submit --files $(find packed.pex -type f | xargs | tr ' ' ',') app.py

@rom1504
Copy link

rom1504 commented Feb 17, 2022

I confirm the --layout packet mode worked to produce this .pex folder (containing one wheel file per dependency) and I was able to run the __main__.py file locally with that
This files schema should allow me to distribute this in 2 archive, one with the torch dependency and one with everything else, which is convenient.
I will next try to use this with pyspark. Will say here if this works as expected but it seems it will.

@rom1504
Copy link

rom1504 commented Feb 18, 2022

It's indeed working well for my use case, thanks for the suggestion!

It seems this new format could be quite adapted to splitting the building process in several parts, and building each part independently, and maybe even in parallel, speeding up the pex generation, which can take several minutes today.
Did you ever consider that option? I guess maybe pants is doing that?

@jsirois
Copy link
Member

jsirois commented Feb 18, 2022

@rom1504 Pex does already build in parallel using your number of cores by default (see --help for --jobs). The build process looks like:

  1. Single subprocess: pip download (this performs the resolve and downloads wheels and sdists)
  2. --jobs number of sub-processes: pip wheel (this builds any downloaded sdists into wheels)
  3. --jobs number of sub-processes: pip install (this installs each wheel in its own directory)
    The only remaining non-parallelized aspect of the PEX build for packed layout is zipping up all the individual wheel install chroots from step 3. That is done in serial:
    https://github.com/pantsbuild/pex/blob/7a6e9a46c7e4fc67c6d3f1a0fc19d5b204d5ee81/pex/pex_builder.py#L696-L719

@jsirois
Copy link
Member

jsirois commented Feb 18, 2022

@dgkatz - finally looping back. was your issue related to the huge PEX issue @tentwelfths and @rom1504 encountered (>2GB PEX)? If so, I'd like to close this issue since @rom1504 confirms the --layout packed Pex option is a viable workaround.

@tentwelfths hopefully --layout packed gives you an escape hatch too when you simply can't pare down dependencies.

@rahul-theorem
Copy link

rahul-theorem commented Jun 8, 2022

I just encountered a similar issue w/ a pex file that was ~700M (lower than the 2G reported earlier in this issue). Using --layout packed resolved this for me as well (we also resolved the problem by removing some unnecessary files that were making their way into the original pex file)

jcuquemelle added a commit to criteo/cluster-pack that referenced this issue Aug 23, 2022
pex-tool/pex#958

This allows seamlessly use the env with archive distribution system,
e.g. spark.yarn.dist.archive
@jsirois
Copy link
Member

jsirois commented Aug 14, 2024

@rahul-theorem there are (at least) 2 axes which can cause a zip to use ZIP64 extensions: size and entry count. If there are >2^16 files in the zip, it will use ZIP64 extensions and won't boot under Python<3.13.

@jsirois jsirois self-assigned this Aug 14, 2024
@jsirois
Copy link
Member

jsirois commented Aug 14, 2024

I'm going to close this as an answered question. In the meantime two things have improved:

  1. Pex now warns when the final PEX zip requires ZIP64 extensions: Add --check support for zipapps. #2253
  2. Python 3.13 finally handles ZIP64 in zipimporter and can boot these beasts: https://docs.python.org/3.13/whatsnew/3.13.html#zipimport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants