Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard header files (and static libraries on Windows) of torch #666

Merged
merged 9 commits into from
Dec 4, 2023

Conversation

CarlGao4
Copy link
Contributor

@CarlGao4 CarlGao4 commented Nov 27, 2023

I've read #531 and found a new solution.
We know thousands of headers are included in the wheel of torch, which are provided for situations like libraries which would be compiled when being installed as they should use the same version of CUDA, because none of source codes of PyTorch is included. So those header files can be simply filtered to reduce file amount in the packed application (so it can be moved or copied easier)

I still encountered a problem, though. After modifying these, tests on Linux will fail:

Click to expand
----------------------------- Captured stdout call -----------------------------
------- Starting build. -------
------- Build finished, now running executable. -------
[6545]  RUNNING:  '/tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/test_source' , args:  ['./test_source']
----------------------------- Captured stderr call -----------------------------
------- Starting build. -------
------- Build finished, now running executable. -------
[6545] PyInstaller Bootloader 6.x
[6545] LOADER: executable is /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/test_source
[6545] LOADER: _MEIPASS2 is not set
[6545] LOADER: archivename is /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/test_source
[6545] LOADER: Cookie found at offset 0x14FC749
[6545] LOADER: No need to extract files to run; setting up environment and restarting bootloader...
[6545] LOADER: LD_LIBRARY_PATH_ORIG=/opt/hostedtoolcache/Python/3.11.6/x64/lib
[6545] LOADER: LD_LIBRARY_PATH=/tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/_internal:/opt/hostedtoolcache/Python/3.11.6/x64/lib
[6545] PyInstaller Bootloader 6.x
[6545] LOADER: executable is /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/test_source
[6545] LOADER: _MEIPASS2 is /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/_internal
[6545] LOADER: archivename is /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/test_source
[6545] LOADER: Cookie found at offset 0x14FC749
[6545] LOADER: Already in the child - running user's code.
[6545] LOADER: Python library: /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/_internal/libpython3.11.so.1.0
[6545] LOADER: Loaded functions from Python library.
[6545] LOADER: Pre-initializing embedded python interpreter...
[6545] LOADER: Creating PyConfig structure...
[6545] LOADER: Initializing interpreter configuration...
[6545] LOADER: Setting program name...
[6545] LOADER: Setting python home path...
[6545] LOADER: Setting module search paths...
[6545] LOADER: Setting sys.argv...
[6545] LOADER: Applying run-time options...
[6545] LOADER: Starting embedded python interpreter...
[6545] LOADER: setting sys._MEIPASS
[6545] LOADER: importing modules from CArchive
[6545] LOADER: extracted struct
[6545] LOADER: running unmarshalled code object for struct...
[6545] LOADER: extracted pyimod01_archive
[6545] LOADER: running unmarshalled code object for pyimod01_archive...
[6545] LOADER: extracted pyimod02_importers
[6545] LOADER: running unmarshalled code object for pyimod02_importers...
[6545] LOADER: extracted pyimod03_ctypes
[6545] LOADER: running unmarshalled code object for pyimod03_ctypes...
[6545] LOADER: Installing PYZ archive with Python modules.
[6545] LOADER: PYZ archive: PYZ-02.pyz
[6545] LOADER: Running pyiboot01_bootstrap.py
[6545] LOADER: Running pyi_rth_inspect.py
[6545] LOADER: Running pyi_rth_pkgutil.py
[6545] LOADER: Running pyi_rth_multiprocessing.py
[6545] LOADER: Running pyi_rth_pkgres.py
[6545] LOADER: Running pyi_rth_setuptools.py
[6545] LOADER: Running test_source.py
Traceback (most recent call last):
  File "PyInstaller/loader/pyimod03_ctypes.py", line 53, in __init__
  File "ctypes/__init__.py", line 376, in __init__
OSError: /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/_internal/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test_source.py", line 3, in <module>
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "PyInstaller/loader/pyimod02_importers.py", line 419, in exec_module
  File "torch/__init__.py", line 234, in <module>
  File "torch/__init__.py", line 193, in _load_global_deps
  File "torch/__init__.py", line 174, in _load_global_deps
  File "PyInstaller/loader/pyimod03_ctypes.py", line 55, in __init__
pyimod03_ctypes.install.<locals>.PyInstallerImportError: Failed to load dynlib/dll '/tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/dist/test_source/_internal/torch/lib/libtorch_global_deps.so'. Most likely this dynlib/dll was not found when the application was frozen.
[6545] Failed to execute script 'test_source' due to unhandled exception!
[6545] LOADER: ERROR.
[6545] LOADER: Manually flushing stdout and stderr
[6545] LOADER: Cleaning up Python interpreter.
------------------------------ Captured log call -------------------------------
INFO     PyInstaller.configure:configure.py:104 UPX is available but is disabled on non-Windows due to known compatibility problems.
INFO     PyInstaller.__main__:__main__.py:184 PyInstaller: 6.2.0
INFO     PyInstaller.__main__:__main__.py:185 Python: 3.11.6
INFO     PyInstaller.__main__:__main__.py:186 Platform: Linux-6.2.0-1016-azure-x86_64-with-glibc2.35
INFO     PyInstaller.__main__:__main__.py:63 wrote /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/test_source.spec
INFO     PyInstaller.building.build_main:build_main.py:[411](https://github.com/CarlGao4/pyinstaller-hooks-contrib/actions/runs/7006213955/job/19057606366#step:7:412) Extending PYTHONPATH with paths
['/tmp/pytest-of-runner/pytest-0/test_torch_onedir_0',
 '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/tests/modules']
INFO     PyInstaller.building.datastruct:datastruct.py:169 checking Analysis
INFO     PyInstaller.building.datastruct:datastruct.py:173 Building Analysis because Analysis-02.toc is non existent
INFO     PyInstaller.depend.analysis:analysis.py:919 Reusing cached module dependency graph...
INFO     PyInstaller.depend.analysis:analysis.py:121 Caching module graph hooks...
INFO     PyInstaller.building.build_main:build_main.py:574 Running Analysis Analysis-02.toc
INFO     PyInstaller.building.build_main:build_main.py:577 Looking for Python shared library...
INFO     PyInstaller.building.build_main:build_main.py:582 Using Python shared library: /opt/hostedtoolcache/Python/3.11.6/x64/lib/libpython3.11.so.1.0
INFO     PyInstaller.building.build_main:build_main.py:607 Analyzing /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/test_source.py
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-torch.py' from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/stdhooks'...
INFO     PyInstaller.utils.hooks:hook-torch.py:24 hook-torch: raising recursion limit to 5000
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-platform.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-packaging.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-pkg_resources.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-xml.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-sysconfig.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-difflib.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-multiprocessing.util.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-sympy.py' from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/stdhooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-py.py' from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/stdhooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-pytest.py' from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/stdhooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-xml.dom.domreg.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-setuptools.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.analysis:analysis.py:[459](https://github.com/CarlGao4/pyinstaller-hooks-contrib/actions/runs/7006213955/job/19057606366#step:7:460) Processing pre-safe import module hook distutils from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-distutils.py'.
INFO     PyInstaller.depend.analysis:analysis.py:502 Processing pre-find module path hook distutils from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/pre_find_module_path/hook-distutils.py'.
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-distutils.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-distutils.util.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-psutil.py' from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/stdhooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-xml.etree.cElementTree.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.analysis:analysis.py:459 Processing pre-safe import module hook win32com from '/home/runner/work/pyinstaller-hooks-contrib/pyinstaller-hooks-contrib/src/_pyinstaller_hooks_contrib/hooks/pre_safe_import_module/hook-win32com.py'.
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-sqlite3.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.analysis:analysis.py:304 Processing module hooks...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-pickle.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-setuptools.msvc.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-setuptools._distutils.command.check.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-heapq.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-encodings.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.building.build_main:build_main.py:660 Looking for ctypes DLLs
WARNING  PyInstaller.depend.utils:utils.py:298 Library msvcp140.dll required via ctypes not found
WARNING  PyInstaller.depend.utils:utils.py:298 Library vcruntime140.dll required via ctypes not found
WARNING  PyInstaller.depend.utils:utils.py:298 Library vcruntime140_1.dll required via ctypes not found
WARNING  PyInstaller.depend.utils:utils.py:110 Ignoring /usr/lib64/libgomp.so.1 imported from /opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/torch/_inductor/codecache.py - only basenames are supported with ctypes imports!
INFO     PyInstaller.depend.analysis:analysis.py:687 Analyzing run-time hooks ...
INFO     PyInstaller.depend.analysis:analysis.py:707 Including run-time hook '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/rthooks/pyi_rth_inspect.py'
INFO     PyInstaller.depend.analysis:analysis.py:707 Including run-time hook '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pkgutil.py'
INFO     PyInstaller.depend.analysis:analysis.py:502 Processing pre-find module path hook _pyi_rth_utils from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/pre_find_module_path/hook-_pyi_rth_utils.py'.
INFO     PyInstaller.depend.imphook:imphook.py:381 Loading module hook 'hook-_pyi_rth_utils.py' from '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks'...
INFO     PyInstaller.depend.analysis:analysis.py:707 Including run-time hook '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py'
INFO     PyInstaller.depend.analysis:analysis.py:707 Including run-time hook '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pkgres.py'
INFO     PyInstaller.depend.analysis:analysis.py:707 Including run-time hook '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/hooks/rthooks/pyi_rth_setuptools.py'
INFO     PyInstaller.building.build_main:build_main.py:799 Looking for dynamic libraries
WARNING  PyInstaller.depend.bindepend:bindepend.py:217 Library not found: could not resolve 'libnvfuser_codegen.so', dependency of '/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/torch/bin/nvfuser_tests'.
INFO     PyInstaller.building.build_main:build_main.py:887 Warnings written to /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/build/test_source/warn-test_source.txt
INFO     PyInstaller.building.build_main:build_main.py:896 Graph cross-reference written to /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/build/test_source/xref-test_source.html
INFO     PyInstaller.building.datastruct:datastruct.py:169 checking PYZ
INFO     PyInstaller.building.datastruct:datastruct.py:173 Building PYZ because PYZ-02.toc is non existent
INFO     PyInstaller.building.api:api.py:130 Building PYZ (ZlibArchive) /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/build/test_source/PYZ-02.pyz
INFO     PyInstaller.building.api:api.py:151 Building PYZ (ZlibArchive) /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/build/test_source/PYZ-02.pyz completed successfully.
INFO     PyInstaller.building.datastruct:datastruct.py:169 checking PKG
INFO     PyInstaller.building.datastruct:datastruct.py:173 Building PKG because PKG-02.toc is non existent
INFO     PyInstaller.building.api:api.py:256 Building PKG (CArchive) test_source.pkg
INFO     PyInstaller.building.api:api.py:331 Building PKG (CArchive) test_source.pkg completed successfully.
INFO     PyInstaller.building.api:api.py:706 Bootloader /opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/PyInstaller/bootloader/Linux-64bit-intel/run_d
INFO     PyInstaller.building.datastruct:datastruct.py:169 checking EXE
INFO     PyInstaller.building.datastruct:datastruct.py:173 Building EXE because EXE-02.toc is non existent
INFO     PyInstaller.building.api:api.py:717 Building EXE from EXE-02.toc
INFO     PyInstaller.building.api:api.py:730 Copying bootloader EXE to /tmp/pytest-of-runner/pytest-0/test_torch_onedir_0/build/test_source/test_source
INFO     PyInstaller.building.api:api.py:788 Appending PKG archive to custom ELF section in EXE
INFO     PyInstaller.building.api:api.py:858 Building EXE from EXE-02.toc completed successfully.
INFO     PyInstaller.building.datastruct:datastruct.py:169 checking COLLECT
INFO     PyInstaller.building.datastruct:datastruct.py:173 Building COLLECT because COLLECT-01.toc is non existent
INFO     PyInstaller.building.api:api.py:1097 Building COLLECT COLLECT-01.toc
INFO     PyInstaller.building.api:api.py:1162 Building COLLECT COLLECT-01.toc completed successfully.

@CarlGao4 CarlGao4 marked this pull request as ready for review November 27, 2023 14:54
@rokm
Copy link
Member

rokm commented Nov 29, 2023

I still encountered a problem, though. After modifying these, tests on Linux will fail:

Yeah,

datas = [(get_package_paths('torch')[1], "torch")]

and

datas = collect_data_files("torch", excludes=["**/*.h", "**/*.hpp", "**/*.cuh",
                                              "**/*.lib"], include_py_files=True)

are not really the same.

The first tells PyInstaller to simply collect the whole package directory and treat is as data files (there's shared libraries and extensions in there, but in PyInstaller 6.x those get sorted out by automatic binary-vs-data reclassification). The second tells PyInstaller to collect just data files, so shared libs are not necessarily collected (I think it depends on the name).

That datas = [(get_package_paths('torch')[1], "torch")] should be replaced by a combination of:

  • datas = collect_data_files('torch', ...) without include_py_files=True. Maybe check if there are any data files that need to be collected from the torch package at all. The include_py_files=True should not be used for collection of source .py files anymore; that should be done via combination of collect_submodules and module collection mode.
  • binaries = collect_dynamic_libs('torch') to collect bundled shared libraries
  • hiddenimports = collect_submodules('torch') to collect any lazy imports within the package
  • module_collection_mode = 'pyz+py' to collect the modules both as byte-compiled .pyc modules into PYZ archive, and as .py source files (which are required for JIT)

But if we want the hook to be compatible with older PyInstaller versions, this should be done only for PyInstaller >= 6.0, while for older versions, the old approach should be used.


That said, are you sure that headers and .lib files are not required by torch JIT compiler? Why do torch wheels ship them in the first place?

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Nov 30, 2023

I've tried wrapping a function using @torch.jit.script, and the program runs well without the include folder and static libs. I have used libraries whose wheels are not provided, as it requires the same CUDA version as PyTorch, which made it impossible for them to build wheels. So the library should be compiled each time it is installed. It used torch API and was written in C++, so they included PyTorch headers in their codes. That's why PyTorch added their headers (and static libraries) to their wheel in my opinion. Then I guess that we can just discard the headers in our packed applications. I've tested the new hook on Windows and macOS (You can download them from https://github.com/CarlGao4/Demucs-Gui/releases), and my applications ran well.


Added: I didn't try packing an application that requires JIT. I just deleted the static libraries and headers then ran the codes. Besides, this is only tested on Windows.

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 2, 2023

@rokm Do you have any idea? Or maybe we can enable this only on 6.x and only Windows and macOS?

@rokm
Copy link
Member

rokm commented Dec 2, 2023

@rokm Do you have any idea? Or maybe we can enable this only on 6.x and only Windows and macOS?

If you're asking about the failing linux test - you probably need to add collect_dynamic_libs like I wrote earlier.

I'll look at refactoring the hook, though. It's been on my TODO list for a while, albeit not exactly a high-priority thing...

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 3, 2023

@rokm I've fixed it with collect_dynamic_libs which collects all python files using ALL_SUFFIXES (in fact, Cython library extensions on Windows, macOS and Linux are pyd, so and so, dynamic libraries on those are dll, dylib and so, collect_data_files will leave out so on both macOS and Linux, so dylib will be collected on macOS but not on Linux). Now it collects all files except .h, .hpp, .cuh, .lib and .pyc

@rokm
Copy link
Member

rokm commented Dec 3, 2023

Strictly speaking, the output of collect_dynamic_libs should be assigned to binaries, not datas. Also, this part should not concern itself with extension modules, as those should be discovered via import analysis (and if not, hiddenimports and/or collect_submodules is needed).

And as I wrote earlier, collect_data_files should be called without include_py_files - source .py files should be collected via module collection setting (plus collect_submodules, if necessary - this will also ensure that all submodules are properly scanned for imports).

Maybe one more thing to consider - do we need to collect the data files that make it through the filter at all? On linux, most remaining files seem to be .pyi ones (which might be needed if there is a lazy-loader involved somewhere), .cmake files (which are likely unnecessary), a torch/utils/benchmark/utils/valgrind_wrapper/timer_callgrind_template.cpp (unnecesary?), torch/utils/model_dump/preact.mjs (?). Because if we need only .pyi files, it might be better to just use include patterns rather than exclude ones...

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 3, 2023

.pyi is unnecessary, too, I think
But I'm still doubting using a "white list" instead of "black list", so this hook will be unlikely to fail with future PyTorch updates

Maybe a better solution could be like this?

datas, binaries, hiddenimports = collect_all('torch', exclude_datas=["**/*.h", "**/*.hpp", "**/*.cuh", "**/*.lib", "**/*.pyi"])

@rokm
Copy link
Member

rokm commented Dec 3, 2023

.pyi is unnecessary, too, I think But I'm still doubting using a "white list" instead of "black list", so this hook will be unlikely to fail with future PyTorch updates

Maybe a better solution could be like this?

datas, binaries, hiddenimports = collect_all('torch', exclude_datas=["**/*.h", "**/*.hpp", "**/*.cuh", "**/*.lib", "**/*.pyi"])

I explicitly don't want to see collect_all in the hooks that we maintain. Same goes for collect_data with include_py_files (which is what collect_all does).

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 3, 2023

Then what about using three separated calls one by one?

@rokm
Copy link
Member

rokm commented Dec 3, 2023

Then what about using three separated calls one by one?

Yes, that's what I originally described in #666 (comment)

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 3, 2023

Then what about using three separated calls one by one?

Yes, that's what I originally described in #666 (comment)

Sorry for making you explain again...

@rokm
Copy link
Member

rokm commented Dec 3, 2023

Then what about using three separated calls one by one?

Yes, that's what I originally described in #666 (comment)

Sorry for making you explain again...

No problem, it's good to see that someone else is willing to take a stab at modernizing that hook.

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 3, 2023

I'd like to ask one thing: how does hiddenimports work? I realized that the JIT requires those .py files and using this method may cause it fails to work


Update:

from PyInstaller.utils.hooks import logger, collect_data_files, is_module_satisfies, collect_dynamic_libs, collect_submodules

datas = collect_data_files("torch", excludes=["**/*.h", "**/*.hpp", "**/*.cuh",
                                              "**/*.lib", "**/*.cpp", "**/*.pyi", "**/*.cmake"])
binaries = collect_dynamic_libs("torch")
hiddenimports = collect_submodules("torch")

# With torch 2.0.0, PyInstaller's modulegraph analysis hits the recursion limit.
# So, unless the user has already done so, increase it automatically.
if is_module_satisfies('torch >= 2.0.0'):
    import sys
    new_limit = 5000
    if sys.getrecursionlimit() < new_limit:
        logger.info("hook-torch: raising recursion limit to %d", new_limit)
        sys.setrecursionlimit(new_limit)

I've tried this, all .py files are not copied and JIT doesn't work

@rokm
Copy link
Member

rokm commented Dec 3, 2023

I'd like to ask one thing: how does hiddenimports work? I realized that the JIT requires those .py files and using this method may cause it fails to work

Hidden imports by itself won't solve the .py files problem.

That's why you also need module_collection_mode = 'pyz+py', which will ensure that modules are collected both as byte-compiled modules in PYZ archive (PyInstaller default) and the source .py files. But this applies only to modules that acually go through PyInstaller's analysis (so hiddenimports = collect_submodules() might be necessary).

See https://pyinstaller.org/en/stable/hooks.html#hook-global-variables

@rokm
Copy link
Member

rokm commented Dec 3, 2023

I still suspect we could do without the collect_data_files() at all now, but we can leave that for another time.

These new modifications, however, implicitly rely on functionality that's available only in recent PyInstaller versions; the module collection mode requires PyInstaller >= 5.3, and the shared lib parent directory preservation works across all OSes only as of PyInstaller 6.0. So I think we should put this new code block under is_module_satisfies('PyInstaller >= 6.0') and leave the old approach for older PyInstaller versions. And we can remove it only after we are fully ready to drop support for PyInstaller < 6.x in these hooks.

Also, don't forget to add a news fragment for the change log (something along the lines of "Modernize the torch for hook and reduce the amount of unnecessarily collected data files (header files and static libraries). Requires PyInstaller >= 6.0."), and then we should be ready to merge.

I see you already ran oneshot test for the latest torch version. Can you do the same for last couple of major releases, the ensure we have not missed anything (and preferably also include corresponding torchvision version)? Otherwise, I'll try running these tomorrow myself.

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 4, 2023

image
That's strange. This only happens on torch 1.12.1

@rokm
Copy link
Member

rokm commented Dec 4, 2023

image That's strange. This only happens on torch 1.12.1

Yeah, I can reproduce this locally. It is probably due to the fact that we are analyzing all collected modules (like we should), and somehow we trigger the hook for six even though six is not available (which should not happen). I'll dig deeper to see what's going on and what needs to be fixed (and where).

@CarlGao4
Copy link
Contributor Author

CarlGao4 commented Dec 4, 2023

https://github.com/CarlGao4/pyinstaller-hooks-contrib/actions/runs/7085858732

It's fixed after I added six as a dependency

@rokm
Copy link
Member

rokm commented Dec 4, 2023

https://github.com/CarlGao4/pyinstaller-hooks-contrib/actions/runs/7085858732

It's fixed after I added six as a dependency

That's a way to work around it, yeah. But ultimately this is a bug that should be fixed (presumably) in PyInstaller's modulegraph. It's not your problem to fix, though.


One thing I've noticed when investigating the above issue is that we now generate missing-hidden-import warnings, although those modules do seem to be collected in the end:

11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops._common" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.binary_cmp" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.chunk" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.elementwise_ops" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.init" not found!
11392 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.math_ops" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.matrix_ops" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor._ops.tensor_ops" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor.api" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor.metadata" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor.reshard" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor.shard" not found!
11393 WARNING: Hidden import "torch.distributed._sharded_tensor.utils" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec._internals" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.api" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops._common" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.embedding" not found!
11394 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.embedding_bag" not found!
11395 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.linear" not found!
11395 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.math_ops" not found!
11395 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.matrix_ops" not found!
11395 WARNING: Hidden import "torch.distributed._sharding_spec.chunk_sharding_spec_ops.softmax" not found!

Can you add a warn_on_missing_hiddenimports = False to PyInstaller >= 6.0 codepath, so that those will be suppresed, similarly to what we do in tensorflow hook:

# Suppress warnings for missing hidden imports generated by this hook.
# Requires PyInstaller > 5.1 (with pyinstaller/pyinstaller#6914 merged); no-op otherwise.
warn_on_missing_hiddenimports = False

news/666.update.rst Outdated Show resolved Hide resolved
@rokm
Copy link
Member

rokm commented Dec 4, 2023

Nice work @CarlGao4, thank you!

@bwoodsend
Copy link
Member

I have no idea what's going on here so if you think this ready Rok then just go ahead and hit the merge button. My vote would be arbitrary at this point.

@rokm rokm merged commit 2b0f62f into pyinstaller:master Dec 4, 2023
2 checks passed
github-actions bot pushed a commit to wxx9248/CIS-Game-Project-2023W that referenced this pull request Dec 20, 2023
…23.11 (#82)

Bumps
[pyinstaller-hooks-contrib](https://github.com/pyinstaller/pyinstaller-hooks-contrib)
from 2023.10 to 2023.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/releases">pyinstaller-hooks-contrib's
releases</a>.</em></p>
<blockquote>
<h2>2023.11</h2>
<p>Please see the <a
href="https://www.github.com/pyinstaller/pyinstaller-hooks-contrib/tree/master/CHANGELOG.rst">changelog</a>
for more details</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/blob/master/CHANGELOG.rst">pyinstaller-hooks-contrib's
changelog</a>.</em></p>
<blockquote>
<h2>2023.11 (2023-12-20)</h2>
<p>New hooks</p>
<pre><code>
* Add a hook for ``freetype`` that collects the shared library that is
bundled with ``freetype-py`` PyPI wheels.
(`[#674](pyinstaller/pyinstaller-hooks-contrib#674)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/674&gt;`_)
* Add a hook for ``z3c.rml`` that collects the required subset of
Bitstream
Vera TTF fonts from the ``reportlab`` package.
(`[#674](pyinstaller/pyinstaller-hooks-contrib#674)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/674&gt;`_)
* Add hook for ``eth_rlp``.
(`[#672](pyinstaller/pyinstaller-hooks-contrib#672)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/672&gt;`_)
* Add hook for ``eth_typing`` which requires its package metadata.
(`[#656](pyinstaller/pyinstaller-hooks-contrib#656)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/656&gt;`_)
* Add hook for ``eth_utils`` to collect its embedded JSON files.
(`[#656](pyinstaller/pyinstaller-hooks-contrib#656)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/656&gt;`_)
* Add hook for ``rlp``.
(`[#672](pyinstaller/pyinstaller-hooks-contrib#672)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/672&gt;`_)
* Add hook for ``sspilib`` that collects submodules of ``sspilib.raw``,
most of which are cythonized extensions.
(`[#669](pyinstaller/pyinstaller-hooks-contrib#669)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/669&gt;`_)
<p>Updated hooks
</code></pre></p>
<ul>
<li>Modernize the hook for <code>torch</code> and reduce the amount of
unnecessarily
collected data files (header files and static libraries). Requires
PyInstaller &gt;= 6.0.
(<code>[#666](pyinstaller/pyinstaller-hooks-contrib#666)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/666&gt;</code>_)</li>
<li>Update <code>pyarrow</code> hook to collect all of the package's
submodules.
(<code>[#662](pyinstaller/pyinstaller-hooks-contrib#662)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/662&gt;</code>_)</li>
<li>Update <code>rtree</code> hook for compatibility with <code>Rtree
&gt;= 1.1.0</code>.
(<code>[#657](pyinstaller/pyinstaller-hooks-contrib#657)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/657&gt;</code>_)</li>
<li>Update <code>sudachipy</code> hook for <code>sudachipy</code> 0.6.8.
(<code>[#673](pyinstaller/pyinstaller-hooks-contrib#673)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/673&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/3bd34d400863c8b0466bf1e84a57b6814e13476e"><code>3bd34d4</code></a>
Release v2023.11</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/11dff78740e2dc7b6685388c70c8e9ba0e07a03a"><code>11dff78</code></a>
hooks: add hook for freetype</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/7916492b83f98674529594083e2a6ba1e294ccbd"><code>7916492</code></a>
hooks: add hook for z3c.rml</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/937ca15b5f4d17ed0421e4a3fe2bcafd234954a9"><code>937ca15</code></a>
Scheduled weekly dependency update for week 51 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/671">#671</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/6a303f6c0bd7c2c51adf0418ed0cef19d647bb76"><code>6a303f6</code></a>
tests: fix deprecation warnings in test_sudachipy</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/6cb34e6a78b8a1368297cbef80bf96605cafe15b"><code>6cb34e6</code></a>
hooks: sudachipy: add hiddenimports for v0.6.8</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/97800abfb16f3e4a54bcc38f752b924a9e93d9ba"><code>97800ab</code></a>
tests: repin sudachipy to latest version</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/23cd820137a8905b018cc1c56d4e2d0a347bd9f4"><code>23cd820</code></a>
ci: install sudachidict packages on sudachipy update</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/f8aaa39b4dac28b8da62f3b2f5633425ac5fd153"><code>f8aaa39</code></a>
tests: fix dependencies of sudachipy test</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/f9c592dfcd7491cf6a6d33adde89fdf413d0914f"><code>f9c592d</code></a>
hooks: add hook for eth_rlp</li>
<li>Additional commits viewable in <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/compare/2023.10...2023.11">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyinstaller-hooks-contrib&package-manager=pip&previous-version=2023.10&new-version=2023.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants