Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bunch of hooks for contemporary ML stuff #676

Merged
merged 35 commits into from
Dec 23, 2023

Commits on Dec 21, 2023

  1. tests: create separate test file for torch and its libraries

    Create a separate `test_pytorch.py` test for `torch` and its
    associated libraries. Move existing tests to the new file.
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    ecade84 View commit details
    Browse the repository at this point in the history
  2. tests: improve the enforcement of onedir-only tests for torch

    Explicitly force the `pyi_builder` into onedir-only mode instead of
    skipping onefile tests. Reduces number of reported tests.
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    84bf7f1 View commit details
    Browse the repository at this point in the history
  3. tests: torchvision: add test that uses torchscript

    Add a test that uses `torchvision.datasets`, `torchvision.transforms`,
    and torchscript. The latter demonstrates the need for collecting
    source .py files from `torchvision`.
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    e2105a4 View commit details
    Browse the repository at this point in the history
  4. hooks: torchvision: collect source .py files

    Rename `hook-torchvision.ops.py` to `hook-torchvision.py`, and
    add `module_collection_mode = 'pyz+py'` to collect source .py
    files for torch JIT/torchscript.
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    f428a74 View commit details
    Browse the repository at this point in the history
  5. hooks: torch: explicitly collect versioned .so files

    When collecting binaries in the PyInstaller >= 6.0 codepath,
    explicitly collect versioned .so files by adding '*.so.*' to the
    list of search patterns passed to `collect_dynamic_libs`.
    
    Just in case that any of those libs happens to be dynamically
    loaded at run-time...
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    5bf7e4f View commit details
    Browse the repository at this point in the history
  6. hooks: torch: add support for external nvidia.* packages on linux

    The contemporary PyPI torch wheels for linux use CUDA libraries
    that are installed via `nvidia-*` packages. Therefore, attempt to
    convert the `nvidia-*` requirements from the `torch` metadata
    into hidden imports. This way, we can provide hooks for `nvidia.*`
    packages that collect the shared libs, in case any of them are
    dynamically loaded (which currently seems to be the case with
    some of the shared libraries from `nvidia.cudnn`).
    rokm committed Dec 21, 2023
    Configuration menu
    Copy the full SHA
    c901c07 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2023

  1. hooks: add hooks for subpackages of nvidia package

    On Linux, NVIDIA CUDA 11.x and 12.x shared libraries can be
    installed via PyPI wheels (e.g., `nvidia-cuda-runtime-cu12`,
    `nvidia-cudnn-cu12`, `nvidia-cublas-cu12`). These all
    provide sub-packages under `nvidia` top level package (e.g.,
    `nvidia.cuda_runtime`, `nvidia.cudnn`, `nvidia.cublas`).
    
    Add hooks for these sub-packages that ensure that all shared
    libraries from the sub-packages `lib` directory are collected,
    in case they are dynamically loaded.
    
    For example, `torch` PyPI wheels for linux do not bundle CUDA
    inside the `torch/lib` (whereas the wheels from their own
    server, built with "non-default" CUDA versions bundle them),
    and dynamically load `libcudnn_ops_infer.so.8` from
    `nvidia.cudnn.lib`.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    d6a66a0 View commit details
    Browse the repository at this point in the history
  2. tests: add test for torchaudio that uses torchscript

    Add a test for torchaudio that uses a transform to resample a
    signal. The test shows the need to collect binaries from the
    torchaudio package, and, due to use of torchscript, also shows
    the need to collect source .py files.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    a9389f1 View commit details
    Browse the repository at this point in the history
  3. hooks: add hook for torchaudio

    Add hook for torchaudio that collects binaries and ensures that
    source .py files are collected.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    ae2b12b View commit details
    Browse the repository at this point in the history
  4. tests: add test for torchtext that uses torchscript

    Add a test for torchext that uses a tokenization transform from
    Berta Encoder. The test shows the need to collect binaries from the
    torchtext package, and, due to use of torchscript, also shows the
    need to collect source .py files.
    
    We perform only tokenization part of processing, in order to avoid
    having to download the whole model (~240 MB).
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    061efe9 View commit details
    Browse the repository at this point in the history
  5. hooks: add hook for torchtext

    Add hook for torchtext that collects binaries and ensures that
    source .py files are collected.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    dd3bed6 View commit details
    Browse the repository at this point in the history
  6. tests: move tensorflow tests to their own file

    Move tensorflow tests into separate `test_tensorflow` file.
    Improve the `tensorflow_onedir_only` test mark (force pyi_builder
    into generating only onedir case instead of skipping the onefile
    case).
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    293a458 View commit details
    Browse the repository at this point in the history
  7. tests: add tests for transformers package

    Add a basic `transformers` pipeline test. Add a basic import test
    for `transformers.DebertaModel`, which shows that we need to collect
    source .py files for TorchScript.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    ceff7b1 View commit details
    Browse the repository at this point in the history
  8. hooks: add hook for transformers

    Add hook for Hugging Face `transformers` package.
    
    Attempt to automatically collect metadata for all of package's
    dependencies (as declared in `deps` dictionary in the
    `transformers.dependency_versions_table` module).
    
    Collect source .py files as some of the functionality uses
    TorchScript.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    663df32 View commit details
    Browse the repository at this point in the history
  9. hooks: add a hook for fastai

    Add a hook for fastai, which ensures that the package's source .py
    files are collected, as they are required for TorchScript.
    
    Add a test based on the "building models from tabular data" example
    from https://docs.fast.ai/quick_start.html, which demonstrates
    the need for the source .py files.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    fc744e9 View commit details
    Browse the repository at this point in the history
  10. hooks: torchvision: ensure torchvision.io.image works

    `torchvision.io.image` attempts to dynamically load
    `torchvision.image` extension, so add a hook that ensures the
    latter is collected.
    
    Add a basic image encoding/decoding test.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    bbb9872 View commit details
    Browse the repository at this point in the history
  11. hooks: add hook for timm (Hugging face torch image models)

    Add hook for timm, which ensures that the package's source .py
    files are collected, as they are required for TorchScript.
    
    Add a basic model listing and creation test.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    030e9bb View commit details
    Browse the repository at this point in the history
  12. tests: add test for lightning

    Add test for `lightning`, based on their "Getting started" example
    with autoencoder trained on MNIST dataset. Requires `torchivsion`
    for dataset.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    64751d9 View commit details
    Browse the repository at this point in the history
  13. hooks: add hook for lightning

    Add hook for (PyTorch) `lightning`. Currently, the main functionality
    is to ensure that the `version.info` file from the package is
    collected.
    
    We do not collect source .py files, as it seems that even if
    `lightning.LightningModule.to_torchscript()` is used, it requires
    the source where the model inheriting from `lightning.LightningModule`
    is defined, rather than `lightning`'s own sources. We can always
    add source .py files collection later, if it proves to be necessary.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    aa493db View commit details
    Browse the repository at this point in the history
  14. hooks: add hooks for bitsandbytes and triton

    Add hooks for `bitsandbytes`, and its dependency `triton`. Both
    packages have dynamically-loaded extension libraries, and both
    require collection of source .py files for (`triton`'s) JIT module.
    
    With `triton`, some of the submodules need to be collected only as
    source .py files (no PYZ), because the code naively tries to read
    the files pointed to by `__file__` attribute under assumption that
    they are source .py files. So we must not collect these modules
    into the PYZ.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    a81ba9a View commit details
    Browse the repository at this point in the history
  15. hooks: add hook for linear_operator

    Add hook and basic test for `linear_operator`. The package uses
    torchscript/JIT, so we need to collect its source .py files.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    7f57989 View commit details
    Browse the repository at this point in the history
  16. tests: add test for gpytorch

    Add a basic test for GPyTorch, based on their "Simple GP Regression"
    example.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    85bf2ce View commit details
    Browse the repository at this point in the history
  17. hooks: add hook for fvcore.nn

    Add hook for `fvcore.nn` to collect its source .py files for
    torchscript/JIT.
    
    Add a basic import test that demonstrates the need for that.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    368fc11 View commit details
    Browse the repository at this point in the history
  18. hooks: add hook for detectron2

    Add hook for `detectron` to collect its source .py files for
    torchscript/JIT.
    
    Add a basic import test that demonstrates the need for that.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    5b51e85 View commit details
    Browse the repository at this point in the history
  19. hooks: add hook for Hugging Face datasets

    Add hook for `datasets` to collect its source .py files for
    torchscript/JIT.
    
    Add a basic dataset loading test that demonstrates the need for that.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    4ba8a67 View commit details
    Browse the repository at this point in the history
  20. tests: add a basic test for Hugging Face accelerate

    Add basic test for Hugging Face `accelerate`; demonstrates that
    for the tested subset of functionality, no hook is necessary.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    fac988f View commit details
    Browse the repository at this point in the history
  21. hook: tensorflow: reformat line wrapping

    Use 120-character lines to reduce amount of line wrapping and make
    the hook easier to read.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    4319060 View commit details
    Browse the repository at this point in the history
  22. hook: tensorflow: revise the _pywrap_tensorflow_internal hack

    Remove the `tensorflow.python._pywrap_tensorflow_internal` hack
    (adding it to excluded modules to avoid duplication) for
    PyInstaller >= 6.0, where the issue is alleviated thanks to
    the binary dependency analysis preserving the parent directory
    layout of discovered/collected shared libraries.
    
    This should fix the problem with `tensorflow` builds where the
    `_pywrap_tensorflow_internal` module is not used as a shared
    library, as seen in `tensorflow` builds for Raspberry Pi:
    pyinstaller#121
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    2b4eeae View commit details
    Browse the repository at this point in the history
  23. hooks: tensorflow: collect sources for tensorflow.python.autograph

    Collect sources for `tensorflow.python.autograph` to avoid
    run-time warning about AutoGraph being unavailable. Not sure if
    we need to collect sources for other parts of `tensorflow`, though;
    if that proves to be the case, we can adjust the collection mode
    later.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    8103da5 View commit details
    Browse the repository at this point in the history
  24. Add _pyinstaller_hooks_contrib.compat module

    Add `_pyinstaller_hooks_contrib.compat` module to hide the gory
    details of stdlib `importlib.metadata` vs. `importlib_metadata`.
    Import the `importlib_metadata` from `PyInstaller.compat` if
    available (PyInstaller >= 6.0), otherwise duplicate the logic.
    
    Copy `importlib_metadata` and `packaging` requirements from
    PyInstaller to pyinstaller-hooks-contrib.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    c85cc87 View commit details
    Browse the repository at this point in the history
  25. hooks: tensorflow: rework the tensorflow version check

    Determine the tensorflow's dist name using list of potential
    candidates, and if a dist is found, retrieve the version from
    it. Otherwise, fall back to reading `tensorflow.__version__`.
    
    The `tensorflow` package is available in several variants, and
    sometimes the `tensorflow` dist installs a separate dist (e.g.,
    `tensorflow-intel` on Windows); but the user can install this
    separate dist without installing the "top-level" `tensorflow`
    one. In PyInstaller v5 and earlier, the `is_module_satisfies`
    fell back to querying `tensorflow.__version__` if the dist could
    not be found - in v6, the implementation of `is_module_satisfies`
    (or rather, `check_requirement`) checks only the metadata.
    
    As an added bonus, the direct version comparisons are nicer to
    read than the comparisons against `tf_pre_1_15_0`, `tf_post_1_15_0`,
    etc.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    6ef9304 View commit details
    Browse the repository at this point in the history
  26. hooks: tensorflow: generate hidden imports for nvidia.* modules

    Linux builds of `tensorflow` can install CUDA via nvidia-* packages
    that are enabled by `and-cuda` extra marker. So parse the dist
    metadata for requirements, and turn the `nvidia-*` requirements
    into `nvidia.*` hidden imports.
    
    Consolidate the shared code for conversion of `nvidia-*` dist name
     into  `nvidia.*` module in a utility function, and use it in
    both `torch` and `tensorflow` hooks.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    39f7dee View commit details
    Browse the repository at this point in the history
  27. pytest: consolidate pytest.ini into setup.cfg

    Consolidate `pytest` configuration from `pytest.ini` into
    `setup.cfg`, to match what we have in the main PyInstaller
    repository.
    
    Add -v, -rsxXfE, and ----doctest-glob= to test flags. The
    addition of -v ensures that in manual (local) pytest runs,
    the test names are displayed as they are ran (the CI
    workflows seem to explicitly specify -v as part of their
    pytest commnads).
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    cce1021 View commit details
    Browse the repository at this point in the history
  28. tests: add multiprocessing.freeze_support() call to lightning test

    When running the `test_lightning_mnist_autoencoder` under arm64
    macOS, `multiprocessing` seems to be activated at some point,
    and the test gets stuck due to lack of
    `multiprocessing.freeze_support` call.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    0772f24 View commit details
    Browse the repository at this point in the history
  29. hooks: tensorflow: collect plugins from tensorflow-plugins

    Have the `tensorflow` standard hook collect binaries from the
    `tensorflow-plugins` package; this contains plugins for tensorflow's
    pluggable device architecture (such as `tensorflow-metal` for macOS
    and `tensorflow-directml-plugin` for Windows).
    
    Have the `tensorflow` run-time hook override the `site.getsitepackages()`
    with custom implementation that allows us to trick `tensorflow` into
    loading the plugins.
    rokm committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    5ec43b8 View commit details
    Browse the repository at this point in the history