Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bunch of hooks for contemporary ML stuff #676

Merged
merged 35 commits into from
Dec 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ecade84
tests: create separate test file for torch and its libraries
rokm Dec 12, 2023
84bf7f1
tests: improve the enforcement of onedir-only tests for torch
rokm Dec 12, 2023
e2105a4
tests: torchvision: add test that uses torchscript
rokm Dec 12, 2023
f428a74
hooks: torchvision: collect source .py files
rokm Dec 12, 2023
5bf7e4f
hooks: torch: explicitly collect versioned .so files
rokm Dec 12, 2023
c901c07
hooks: torch: add support for external nvidia.* packages on linux
rokm Dec 12, 2023
d6a66a0
hooks: add hooks for subpackages of nvidia package
rokm Dec 12, 2023
a9389f1
tests: add test for torchaudio that uses torchscript
rokm Dec 13, 2023
ae2b12b
hooks: add hook for torchaudio
rokm Dec 13, 2023
061efe9
tests: add test for torchtext that uses torchscript
rokm Dec 13, 2023
dd3bed6
hooks: add hook for torchtext
rokm Dec 13, 2023
293a458
tests: move tensorflow tests to their own file
rokm Dec 13, 2023
ceff7b1
tests: add tests for transformers package
rokm Dec 13, 2023
663df32
hooks: add hook for transformers
rokm Dec 13, 2023
fc744e9
hooks: add a hook for fastai
rokm Dec 14, 2023
bbb9872
hooks: torchvision: ensure torchvision.io.image works
rokm Dec 14, 2023
030e9bb
hooks: add hook for timm (Hugging face torch image models)
rokm Dec 14, 2023
64751d9
tests: add test for lightning
rokm Dec 14, 2023
aa493db
hooks: add hook for lightning
rokm Dec 16, 2023
a81ba9a
hooks: add hooks for bitsandbytes and triton
rokm Dec 16, 2023
7f57989
hooks: add hook for linear_operator
rokm Dec 16, 2023
85bf2ce
tests: add test for gpytorch
rokm Dec 16, 2023
368fc11
hooks: add hook for fvcore.nn
rokm Dec 16, 2023
5b51e85
hooks: add hook for detectron2
rokm Dec 16, 2023
4ba8a67
hooks: add hook for Hugging Face datasets
rokm Dec 16, 2023
fac988f
tests: add a basic test for Hugging Face accelerate
rokm Dec 17, 2023
4319060
hook: tensorflow: reformat line wrapping
rokm Dec 17, 2023
2b4eeae
hook: tensorflow: revise the _pywrap_tensorflow_internal hack
rokm Dec 17, 2023
8103da5
hooks: tensorflow: collect sources for tensorflow.python.autograph
rokm Dec 17, 2023
c85cc87
Add _pyinstaller_hooks_contrib.compat module
rokm Dec 17, 2023
6ef9304
hooks: tensorflow: rework the tensorflow version check
rokm Dec 17, 2023
39f7dee
hooks: tensorflow: generate hidden imports for nvidia.* modules
rokm Dec 20, 2023
cce1021
pytest: consolidate pytest.ini into setup.cfg
rokm Dec 20, 2023
0772f24
tests: add multiprocessing.freeze_support() call to lightning test
rokm Dec 20, 2023
5ec43b8
hooks: tensorflow: collect plugins from tensorflow-plugins
rokm Dec 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions news/676.new.1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``torchaudio`` that collects dynamically-loaded extensions,
as well as source .py files for TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.10.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``fvcore.nn`` to collect its source .py files for
TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.11.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``detectron2`` to collect its source .py files for
TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.12.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for Hugging Face ``datasets`` to collect its source .py files for
TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``torchtext`` that collects dynamically-loaded extensions,
as well as source .py files for TorchScript/JIT.
6 changes: 6 additions & 0 deletions news/676.new.3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Add hook for Hugging Face ``transformers``. The hook attempts to
automatically collect the metadata of all dependencies (as declared
in `deps` dictionary in the `transformers.dependency_versions_table`
module), in order to make dependencies available at build time visible
to ``transformers`` at run time. The hook also collects source .py files
as some of the package's functionality uses TorchScript/JIT.
1 change: 1 addition & 0 deletions news/676.new.4.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add hook for ``fastai`` to collect its source .py files for TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.5.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``torchvision.io.image`` to ensure that dynamically-loaded
extension, required by this module, is collected.
2 changes: 2 additions & 0 deletions news/676.new.6.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``timm`` (Hugging Face PyTorch Image Models) to collect its
source .py files for TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.7.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``lightning`` (PyTorch Lightning) to ensure that its
``version.info`` data file is collected.
7 changes: 7 additions & 0 deletions news/676.new.8.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Add hooks for ``bitsandbytes``, and its dependency ``triton``. Both
packages have dynamically-loaded extension libraries that need to be
collected, and both require collection of source .py files for
(``triton``'s) JIT module. Some submodules of ``triton`` need to be
collected only as source .py files (bypassing PYZ archive), because the
code naively assumes that ``__file__`` attribute points to the source
.py file.
2 changes: 2 additions & 0 deletions news/676.new.9.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hook for ``linear_operator`` to collect its source .py files for
TorchScript/JIT.
2 changes: 2 additions & 0 deletions news/676.new.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add hooks for ``nvidia.*`` packages, which provide a way of installing
CUDA via PyPI wheels (e.g., ``nvidia-cuda-runtime-cu12``).
2 changes: 2 additions & 0 deletions news/676.update.1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
(Linux) Update ``torch`` hook to explicitly collect versioned .so files
in the new PyInstaller >= 6.0 codepath.
4 changes: 4 additions & 0 deletions news/676.update.2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
(Linux) Extend ``torch`` hook to automatically collect CUDA libraries
distributed via ``nvidia-*`` packages (such as ``nvidia-cuda-runtime-cu12``)
if they are specified among the requirements in the ``torch`` distribution's
metadata.
8 changes: 8 additions & 0 deletions news/676.update.3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
(Linux) Remove the ``tensorflow.python._pywrap_tensorflow_internal``
hack in the ``tensorflow`` hook (i.e., adding it to excluded modules
to avoid duplication) when using PyInstaller >= 6.0, where the
duplication issue is alleviated thanks to the binary dependency analysis
preserving the parent directory layout of discovered/collected shared
libraries. This should fix the problem with ``tensorflow`` builds where
the ``_pywrap_tensorflow_internal`` module is not used as a shared
library, as seen in ``tensorflow`` builds for Raspberry Pi.
3 changes: 3 additions & 0 deletions news/676.update.4.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Update ``tensorflow`` hook to collect source .py files for
``tensorflow.python.autograph`` in order to silence a run-time warning
about AutoGraph not being available.
5 changes: 5 additions & 0 deletions news/676.update.5.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Update ``tensorflow`` hook to attempt to resolve the top-level distribution
name and infer the package version from it, in order to improve version
handling when the "top-level" ``tensorflow`` dist is not installed (for
example, user installs only ``tensorflow-intel`` or ``tensorflow-macos``)
or has a different name (e.g., ``tf-nightly``).
4 changes: 4 additions & 0 deletions news/676.update.6.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
(Linux) Extend ``tensorflow`` hook to automatically collect CUDA libraries
distributed via ``nvidia-*`` packages (such as ``nvidia-cuda-runtime-cu12``)
if they are specified among the requirements in the ``tensorflow``
distribution's metadata.
5 changes: 5 additions & 0 deletions news/676.update.7.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Extend ``tensorflow`` hook to collect plugins installed in the
``tensorflow-plugins`` directory/package. Have the run-time ``tensorflow``
hook provide an override for ``site.getsitepackages()`` that allows us
to work around a broken module file location check and trick ``tensorflow``
into loading the collected plugins.
2 changes: 2 additions & 0 deletions news/676.update.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Update ``torchvision`` hook to collect source .py files for TorchScript/JIT
(requires PyInstaller >= 5.3 to take effect).
8 changes: 0 additions & 8 deletions pytest.ini

This file was deleted.

15 changes: 15 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ package_dir=
=src
packages=find:
python_requires = >=3.7
install_requires =
setuptools >= 42.0.0
importlib_metadata >= 4.6 ; python_version < "3.10"
packaging >= 22.0

[options.packages.find]
where=src
Expand All @@ -43,3 +47,14 @@ where=src
# E265 - block comment should start with '# '
extend-ignore = E265
max-line-length=120

[tool:pytest]
# Display summary info for (s)skipped, (x)xfailed, (X)xpassed, (f)failed and (E)errored tests
# Skip doctest text files
addopts=--maxfail=3 -m "not slow" -v -rsxXfE --doctest-glob=

markers =
darwin: only run on macOS
linux: only runs on GNU/Linux
win32: only runs on Windows
slow: Long tests are disabled by default. Re-enable with -m slow
42 changes: 42 additions & 0 deletions src/_pyinstaller_hooks_contrib/compat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

import sys

from PyInstaller.utils.hooks import is_module_satisfies


if is_module_satisfies("PyInstaller >= 6.0"):
# PyInstaller >= 6.0 imports importlib_metadata in its compat module
from PyInstaller.compat import importlib_metadata
else:
# Older PyInstaller version - duplicate logic from PyInstaller 6.0
class ImportlibMetadataError(SystemExit):
def __init__(self):
super().__init__(
"pyinstaller-hooks-contrib requires importlib.metadata from python >= 3.10 stdlib or "
"importlib_metadata from importlib-metadata >= 4.6"
)

if sys.version_info >= (3, 10):
import importlib.metadata as importlib_metadata
else:
try:
import importlib_metadata
except ImportError as e:
raise ImportlibMetadataError() from e

import packaging.version # For importlib_metadata version check

# Validate the version
if packaging.version.parse(importlib_metadata.version("importlib-metadata")) < packaging.version.parse("4.6"):
raise ImportlibMetadataError()
50 changes: 42 additions & 8 deletions src/_pyinstaller_hooks_contrib/hooks/rthooks/pyi_rth_tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,45 @@
# SPDX-License-Identifier: Apache-2.0
#-----------------------------------------------------------------------------

# `tensorflow` versions prior to 2.3.0 attempt to use `site.USER_SITE` in path/string manipulation functions.
# As frozen application runs with disabled `site`, the value of this variable is `None`, and causes path/string
# manipulation functions to raise an error. As a work-around, we set `site.USER_SITE` to an empty string, which is
# also what the fake `site` module available in PyInstaller prior to v5.5 did.
import site

if site.USER_SITE is None:
site.USER_SITE = ''
def _pyi_rthook():
import sys

# `tensorflow` versions prior to 2.3.0 attempt to use `site.USER_SITE` in path/string manipulation functions.
# As frozen application runs with disabled `site`, the value of this variable is `None`, and causes path/string
# manipulation functions to raise an error. As a work-around, we set `site.USER_SITE` to an empty string, which is
# also what the fake `site` module available in PyInstaller prior to v5.5 did.
import site

if site.USER_SITE is None:
site.USER_SITE = ''

# The issue described about with site.USER_SITE being None has largely been resolved in contemporary `tensorflow`
# versions, which now check that `site.ENABLE_USER_SITE` is set and that `site.USER_SITE` is not None before
# trying to use it.
#
# However, `tensorflow` will attempt to search and load its plugins only if it believes that it is running from
# "a pip-based installation" - if the package's location is rooted in one of the "site-packages" directories. See
# https://github.com/tensorflow/tensorflow/blob/6887368d6d46223f460358323c4b76d61d1558a8/tensorflow/api_template.__init__.py#L110C76-L156
# Unfortunately, they "cleverly" infer the module's location via `inspect.getfile(inspect.currentframe())`, which
# in the frozen application returns anonymized relative source file name (`tensorflow/__init__.py`) - so we need one
# of the "site directories" to be just "tensorflow" (to fool the `_running_from_pip_package()` check), and we also
# need `sys._MEIPASS` to be among them (to load the plugins from the actual `sys._MEIPASS/tensorflow-plugins`).
# Therefore, we monkey-patch `site.getsitepackages` to add those two entries to the list of "site directories".

_orig_getsitepackages = getattr(site, 'getsitepackages')

def _pyi_getsitepackages():
return [
sys._MEIPASS,
"tensorflow",
*(_orig_getsitepackages() if _orig_getsitepackages is not None else []),
]

site.getsitepackages = _pyi_getsitepackages

# NOTE: instead of the above override, we could also set TF_PLUGGABLE_DEVICE_LIBRARY_PATH, but that works only
# for tensorflow >= 2.12.


_pyi_rthook()
del _pyi_rthook
23 changes: 23 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-bitsandbytes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ---------------------------------------------------

from PyInstaller.utils.hooks import collect_dynamic_libs

# bitsandbytes contains several extensions for CPU and different CUDA versions: libbitsandbytes_cpu.so,
# libbitsandbytes_cuda110_nocublaslt.so, libbitsandbytes_cuda110.so, etc. At build-time, we could query the
# `bitsandbytes.cextension.setup` and its `binary_name˙ attribute for the extension that is in use. However, if the
# build system does not have CUDA available, this would automatically mean that we will not collect any of the CUDA
# libs. So for now, we collect them all.
binaries = collect_dynamic_libs("bitsandbytes")

# bitsandbytes uses triton's JIT module, which requires access to source .py files.
module_collection_mode = 'pyz+py'
14 changes: 14 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

# Collect source .py files for JIT/torchscript. Requires PyInstaller >= 5.3, no-op in older versions.
module_collection_mode = 'pyz+py'
14 changes: 14 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-detectron2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

# Collect source .py files for JIT/torchscript. Requires PyInstaller >= 5.3, no-op in older versions.
module_collection_mode = 'pyz+py'
14 changes: 14 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-fastai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

# Collect source .py files for JIT/torchscript. Requires PyInstaller >= 5.3, no-op in older versions.
module_collection_mode = 'pyz+py'
14 changes: 14 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-fvcore.nn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

# Collect source .py files for JIT/torchscript. Requires PyInstaller >= 5.3, no-op in older versions.
module_collection_mode = 'pyz+py'
21 changes: 21 additions & 0 deletions src/_pyinstaller_hooks_contrib/hooks/stdhooks/hook-lightning.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

from PyInstaller.utils.hooks import collect_data_files

# Collect version.info (which is read during package import at run-time). Avoid collecting data from `lightning.app`,
# which likely does not work with PyInstaller without additional tricks (if we need to collect that data, it should
# be done in separate `lightning.app` hook).
datas = collect_data_files(
'lightning',
includes=['version.info'],
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ---------------------------------------------------

# Collect source .py files for JIT/torchscript. Requires PyInstaller >= 5.3, no-op in older versions.
module_collection_mode = 'pyz+py'
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

from _pyinstaller_hooks_contrib.hooks.utils.nvidia_cuda import collect_nvidia_cuda_binaries

binaries = collect_nvidia_cuda_binaries(__file__)
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

from _pyinstaller_hooks_contrib.hooks.utils.nvidia_cuda import collect_nvidia_cuda_binaries

binaries = collect_nvidia_cuda_binaries(__file__)
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# ------------------------------------------------------------------
# Copyright (c) 2023 PyInstaller Development Team.
#
# This file is distributed under the terms of the GNU General Public
# License (version 2.0 or later).
#
# The full license is available in LICENSE.GPL.txt, distributed with
# this software.
#
# SPDX-License-Identifier: GPL-2.0-or-later
# ------------------------------------------------------------------

from PyInstaller.utils.hooks import collect_data_files
from _pyinstaller_hooks_contrib.hooks.utils.nvidia_cuda import collect_nvidia_cuda_binaries

# Ensures that versioned .so files are collected
binaries = collect_nvidia_cuda_binaries(__file__)

# Collect additional resources:
# - ptxas executable (which strictly speaking, should be collected as a binary)
# - nvvm/libdevice/libdevice.10.bc file
# - C headers; assuming ptxas requires them - if that is not the case, we could filter them out.
datas = collect_data_files('nvidia.cuda_nvcc')