Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up bindepend and building codebase #7784

Merged
merged 30 commits into from Jul 30, 2023

Conversation

rokm
Copy link
Member

@rokm rokm commented Jul 23, 2023

This cleanup was initially aimed at PyInstaller.depend.bindepend, and attempting to sort out getImports, selectImports, Dependencies, getfullnameof. These are now reworked into low-level get_imports and high-level binary_dependency_analysis, and the name resolving functionality has been consolidated into resolve_library_path (with compatibility aliases findLibrary and findSystemLibrary pointing to it, because some hooks in the contrib repository are making use of that).

The low-level get_imports and its platform-specific implementations do not generate missing-lib warnings anymore; instead, they return list of tuples in form of (ref_name, lib_path), where ref_name is the referenced dependency name (a base-name, or something like @rpath/dependency.dylib) and lib_name is resolved library full-path, if resolved (and None otherwise). The missing-lib warnings are now handled by the caller (e.g., the high-level binary_dependency_analysis), who has more information available to decide whether to emit the warning or not.

Further clean up spread to other parts of the codebase, and involves removal of:

  1. support for analyzing, collecting, and processing WinSxS assemblies. These are mostly a thing of the past (last VC runtime that used WinSxS was VC9 from VS 2008, used to build Python 2.7 and 3.3). Considering the amount of (poorly-tested) code required to support this, and considering that mangling of collected assembly manifests went unnoticed for a rather long time (Error in Microsoft.VC90.CRT.manifest file generates side by side issue #6509), I'm opting for removal of this feature. If someone requires collection of binaries compiled with VS2008 and requires the VC9 runtime to be automatically collected, they should stick to PyInstaller 5.x series.

  2. support for zipped eggs. Similar to the above, the related codepaths are slowly bit-rotting due to lack of testing, both on our CI and in the wild. In addition, remove the attempt at collecting data files from egg-installed packages (zipped or non-zipped); this should be handled by hooks, regardless of how the package is installed, in order to have consistent and predictable behavior.

The removal of WinSxS processing simplifies the processing of collected binaries, and allows us to replace PyInstaller.utils.win32.winmanifest with a simplified version that just applies the uac_admin and uac_uiaccess settings and ensures that MS Common-Controls dependency is listed among dependencies. This is done by searching and modifying the corresponding DOM nodes in the manifest, which means that any other elements in user-provided manifest are preserved (as opposed to old approach that parsed the manifest into internal structures, made modifications, and generated output XML, thus removing any elements that it did not recognize, even if they were valid).

The command-line options --win-private-assemblies and --win-no-prefer-redirects have been removed; the corresponding arguments to Analysis have been deprecated and raise error if not set to True (the old default), and they have been removed from .spec template. The option for external executable manifest in onedir mode has also been removed; --no-embed-manifest command line option is gone, and corresponding embed_manifest argument to EXE has been deprecated to raise an error if not set to True (the old default).

As part of the cleanup, binary dependency analysis does not search sys.path anymore, so we can close #5560. In addition, it does not use deprecated distutils.sysconfig module, which should prevent an issue with tensorflow and its plugins; hence closes #7347.

Closes #6509 as collection from WinSxS is not supported anymore - that particular issue was likely "fixed" by c5a64a0, but only because it accidentally disables the processing of manifests unless caching is required (i.e., upx or strip is enabled)...

rokm added 2 commits July 23, 2023 12:47
Fix RemovedCipherFeatureError to display the whole error message
instead of just the passed argument.
Remove the deprecated and ignored `key` argument from `main`.
@rokm rokm force-pushed the bindepend-cleanup branch 2 times, most recently from d34f707 to 3056181 Compare July 23, 2023 20:10
rokm added 5 commits July 24, 2023 13:02
Remove support for generating external (side-by-side) manifest
for the Windows executable. This has been available for onedir
builds, but is now broken due to relocation of all bundle contents
into sub-directory. Instead of trying to fix it, simply drop the
option; if anyone needs to post-process the manifest, they can
used 3rd party tools to extract and re-embed it.
… to EXE

Defer building of the default executable manifest from analysis stage
to the EXE instantiation. Up until now, the default manifest was
built in the analysis stage, because scanning of the python executable
for its dependencies also added potential WinSxS dependencies to the
generated manifest. This "dependency" manifest was then picked up by
EXE and used to generate the final version (with applied uac_admin
and uac_uiaccess options).

However, contemporary python versions have no WinSxS dependencies
(python 2.7 and <= 3.3 used to reference VC9 runtime this way)
except for MS CommonControls, which we can reference in our default
manifest template, without having to rely on the results of the
said analysis.

So, move all executable-manifest-related processing into EXE.
Remove support for analysis and collection of WinSxS assemblies.
The amount of code involved is disproportionate to the chance of
encountering a binary making use of WinSxS (and the required
assembly not being available on the target system) in contemporary
python and python packages (e.g., VC9 runtime from VS2008 was the
last VC runtime that made use of WinSxS; and was used to build
python 2.7 and python <= 3.3).
Remove helpers for dealing with zipped eggs, as part of wider
clean-up that will remove support for zipped eggs.
With support for zipped eggs being removed, there is no need to
check whether the path to otherwise non-existant file happens to
point inside of the egg.

Also remove the `is_path_to_egg` from `depend.utils`.
@rokm rokm marked this pull request as ready for review July 29, 2023 17:01
@rokm rokm requested a review from bwoodsend July 29, 2023 17:02
news/7784.breaking.4.rst Show resolved Hide resolved
rokm added 17 commits July 29, 2023 21:29
Remove the `building.toc_conversion` module and its
`DependencyProcessor` class. The main purpose of this class was
aggregation of `binaries` and `datas` TOC lists returned from
the hooks, with additional collection of files from eggs.

In contemporary python environments, however, that extra data
collection should be handled by hooks, which should work with
non-zipped eggs as well as wheel-installed packages. So remove
the `DependencyProcessor` and replace it with two helper methods
on `PyiModuleGraph` that return `binaries` and `datas` TOC lists
for entries returned by hooks.

This removes the support for zipped eggs from PyInstaller;
the `zipfiles` and `zipped_data` TOC lists of Analysis should now
be always empty.
The `zipfiles` and `zipped_data` TOC lists are now always empty,
so remove them from the spec templates, and remove their references
from the docs (mostly from examples of spec files).

They can still be passed to the corresponding targets, so existing
.spec files are not affected.
In binary-vs-data-reclassification tests, adjust the indices of
`binaries` and `datas` TOCs in the serialized `Analysis` data
struct; this is needed because elements were removed from the
object's guts.

Fix `test_egg_unzipped`. The test expects the data file from
egg to be automatically collected, but it should in fact be
collected via hook. So add the said hook.

Remove the `test_egg_zipped`, as zipped eggs are not supported
anymore.

Remove the egg-based variant of tests for `pkg_resources` provider.
As we do not support zipped eggs anymore, drop the (zipped) egg
variant of pkg_resources provider tests.
Rename the `checkCache` helper to `process_collected_binary`, and
clean up naming of internal variables. Also use `exist_ok=True`
with `os.makedirs` instead of explicitly checking if the directory
exists.
Remove special error message about installing python-dev package if
pyconfig.h cannot be found. Even when we were trying to collect that
file, the general-purpose `format_binaries_and_datas` helper was
not the best place for such a message...
Reorder the functions in `depend.bindepend` module, and group them
by functionality (high-level import analysis, low-level import
analysis, library full-path resolution, etc.).

This commit should make it easier to track the changes made in
functions themselves in the subsequent commits.
Rename getImports* functions to get_imports*, and clean them up.
During analysis, avoid performing separate binary dependency
analysis on python executable, and always use the
`get_python_library_path` helper from `bindepend` instead.

This ensures that the same search logic is always employed in
both analysis and later on in EXE/PKG construction, where we
search for python shared library again using `get_python_library_path`
(which is necessary due to inability to pass the value from Analysis
to EXE without breaking existing .spec design).

Also perform the second `get_python_library_path` in EXE and store
the result, which can be used both in `EXE.assemble` (in macOS
codepath) as well as passed on to PKG, to avoid having to call it
there again.
Sort the obtained imports and print them on separate lines to make
them easier to read.
Rework the _get_imports_macholib helper, especially the part
handling run paths. Instead of trying to manually substitute
@loader_path and @executable_path in an @rpath, record the
run path entries as they are; when prefixing the library path
with run-paths to obtain candidate paths, have `dyld_find`
from `macholib` deal with @loader_path and @executable_path
prefix in the resulting library candidate path.
Up until now, `get_imports` (and its predecessor) had inconsistent
behavior w.r.t. resolution of full paths to the referenced
libraries. On Unix-like systems, path resolution is performed by
ldd, so the corresponding codepath always returned full paths.
On macOS, referenced libraries were explicitly resolved in
post-processing codepath. On Windows, no resolution is attempted.

Rework the get_imports and its helpers to always (attempt to)
resolve the library paths. In addition, return (name, fullpath)
tuples, where name is the referenced library name (which,
depending on the platform, may be just a basename or not).
The platform-specific behavior w.r.t referenced name is allowed
in order to give caller more context when library cannot be
resolved (e.g., a system-provided library on macOS). This will
also allow us to move the warning messages out of the get_imports
and let the caller handle them as neccessary.

The `get_imports` helper for linux (based on `ldd`) and macOS
(based on `macholib`) now fall back to searching the supplied
search paths (if any) when the base platform-specific resolution
fails. This will provide a mechanism for suppressing
missing-library warnings in cases libraries cannot be resolved
due to missing rpath but could be found in a search path
provided by the corresponding hook.

The library path resolution helpers have been consolidated and
cleaned up. The high-level function is now called `resolve_library_path`
(formerly `findSystemLibrary`), and the low-level helper for Unix
(formerly called `findLibrary`) has been renamed to
`_resolve_library_path_unix`. Compatibility aliases for
`findSystemLibrary` and `findLibrary` are made (both pointing to
high-level `resolve_library_path`) due their use in contributed
hooks repository. The Windows-specific `getfullnameof` helper has
been split into generic `_resolve_library_path_in_search_paths`
(which is also used by `get_imports` helpers for search path
fallback), while the Windows-specific search paths are now handled
in Windows-specific codepath of `resolve_library_path`.
Consolidate the `Dependencies` and `selectImports` helper functions
into single `binary_dependency_analysis` function, and get rid of
global `seen` variable.
Replace the manifest parser (and processor) in winmanifest module
with a simplified version.

The former parser/processor included code for WinSxS assembly
processing, look-up and binding redirection; with support for
WinSxS processing dropped from PyInstaller, this is not needed
anymore.

If provided with manifest XML, the former parser attempted to
load it into its structures, and then write them back into XML
when embedding the manifest into the executable. Due to processing
only a subset of manifest XML schema, this inherently meant that
elements unknown to the parser were lost in the process. Which in
turn negated main reason for providing external manifest (aside
from being able to set custom application name and version).

The new manifest processing code encodes the default manifest as
an XML, making it easier to read and edit. When creating application
manifest, the base XML (the template or externally-provided XML)
is processed just to:

* write uac_admin and uac_uiaccess options into corresponding
  attributes of the <requestedExecutionLevel> element (creating it
  and its parents if necessary)

* ensure that <dependency> element exists for MS Common-Controls
  dependent assembly, and that the attributes of corresponding
  <assemblyIdentity> element are set to the expected values
  (this is required by windowed bootloader for the unhandled
  exception dialog).

All other parts of the manifest XML are left untouched.

The new manifest processing does not require writing manifest
to the disk any more, as all processing is done in-memory.

The winmanifest module can now be imported on all platforms, as
`winresource` module is imported only within the functions that
read and write the manifest from/to the executable.
Having icon group with ID 0 seems to abort the enumeration of
resources using `winresource.GetResources` helper with
`win32ctypes.pywin32.pywintypes.error: (15106, 'EnumResourceNamesW',
'User stopped resource enumeration.')`. Presumably this is because
the 0 ID evaluates to falsely value somewhere in the underyling
win32-ctypes code.

Even if this is underlying win32-ctypes problem, Windows executables
seem to have their icon group IDs start with 1, so keep in line
with that.
Move the code for embedding or copying win32 resources to a
separate helper function to make code in `assemble` easier to
follow. Improve resource parameters validation and error messages,
and raise exceptions on errors instead of swallowing them.
Turn the function-local makeabs() helper function (used for icons)
into a static method called _makeabs(), so it can be reused in
several other places where relative paths need to be anchored to
the spec file location.
Reduce the amount of functions to the ones that are actually
used, and clean up the names and implementation.
rokm added 6 commits July 29, 2023 21:29
As the missing-library warnings are now handled by the high-level
binary dependency analysis function, we can be a bit smarter about
them, and suppress the warnings for dependencies that happen to
be collected anyway.

This might happen either because the library was collected by some
other mechanism (for example, via hook, or supplied by the user),
or because it was discovered during the analysis of another binary
(which, for example, had properly set run-paths on Linux/macOS or
was located next to that other analyzed binary on Windows).

Therefore, we defer the missing-library warnings until after binary
dependency analysis is complete; at that point, we can compare the
basenames of unresolved dependencies to basenames of discovered
(collected) binaries for improved warning suppression.
The global missing-lib warning suppression for libshiboken should now
be redundant, thanks to the improved missing-lib warning suppression
from the preceding commit.
On Windows, have `resolve_library_path` first process the supplied
search paths, and then fall back to searching only locations found
in the `PATH` environment variable.

I.e., change the old behavior for Windows library path resolution
using the former `getfullnameof`, where following the supplied
search paths, we would search all locations in `sys.path`, then
attempt to search pywin32's `pywin32_system32` directory, then
search Windows' system directories, and then locations in `PATH`.

We do not search `sys.path` anymore, because doing so may lead to
corner-case issues when system-installed programs modify PYTHONPATH
to point to directories that happen to contain program's bundled
DLLs (pyinstaller#5560).

Adding `pywin32_system32` directory to search paths is now handled
further up the call chain, in `find_binary_dependencies` helper
of the `building.build_main` module, where other search paths are
set as well.

The search for `pywin32_system32` directory (and other `pywin32`
directories) is now done by treating the directory as a namespace
package and looking up its path, which should allow us to find
it without assuming the prefix path of the `site-packages` directory.
This also removes the use of deprecated `distutils.sysconfig` and its
`get_python_lib` function, which was triggerring pyinstaller#7347.

Also note that `PATH` should already contain Windows' system
directories, so we can avoid explicitly searching paths returned by
`winutils.get_system_path()`.
And remove unnecessary intermediate variable in `get_system_path`.
There is little point in trying to work around broken pywin32
installations on our side; if pythoncom/pywintypes cannot be
imported in the given python environment as-is, the user cannot
run and test their unfrozen program, either.

Remove the `PyInstaller.utils.hooks.win32` module and move its
only function, `get_pywin32_module_file_attribute` into
`PyInstaller.utils.hooks`. The two hooks in contributed hooks
repository that make use of this function are importing it from
this package, anyway. Reduce the implementation into simple
`get_module_attribute(name, '__file__')` call.
Remove part of the code that extends sys.path with contents of the
`os.path.join(sys._MEIPASS, 'eggs')`, as we should not be collecting
eggs there anymore.
@rokm rokm merged commit 43b502c into pyinstaller:develop Jul 30, 2023
18 checks passed
@rokm rokm deleted the bindepend-cleanup branch July 30, 2023 16:04
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants