Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let there be symbolic links #7619

Merged
merged 36 commits into from Jul 1, 2023
Merged

Conversation

rokm
Copy link
Member

@rokm rokm commented May 11, 2023

Act I: Let there be symbolic links!

The initial series of commits in this PR implements support for creation of symbolic links - either at build-time (in COLLECT or BUNDLE for onedir builds) or at run-time (created by bootloader from PKG for onefile builds).

This allows us to preserve collected symbolic links, which up until now have always been collected as duplicated hard copies of the underlying file. If we collect both a file and a (relative) symbolic link that points to it, and if the relative relationship between the two is preserved when collected, we now preserve the symbolic link.

On one hand, this should help reducing the size of the frozen application in cases when many symbolic links are present. Which admittedly does not happen often with PyPI wheels that do not support symbolic links and thus ship their shared libraries as a single file. On the other hand, Anaconda-installed libraries on Linux and macOS typically include multiple versioned symbolic links for each shared library; normally, all libraries should be linked against the fully-versioned file (which is the actual copy of the file), but sometimes this is not the case, and we collect the file and a symbolic link (and collecting them as duplicated hard-copies results in a crash on macOS; #5710). And sometimes, we have a hook that collects all shared libraries from Anaconda package and its dependencies (as indicated by package's metadata) - this is the case for numpy (our copy of the hook is here, and it is the same in the version that we submitted to upstream).

So this fixes #5710.

Ability to create symbolic links also allows us to start preserving the parent directory structure of shared libraries that are discovered via binary dependency analysis on non-Windows platforms. I.e., instead of automatically collecting the shared library into top-level application directory, we keep it in its original directory structure. We already implemented this for Windows in #7028, but no additional considerations were needed there because packages can dynamically modify DLL search paths. On Linux and macOS, this is not possible, and the assumption is that the shared libraries are located in top-level directory - we can now satisfy this requirement by creating symbolic link in the top-level directory that points to the collected shared library. This avoids duplication issues and related crashes like the one discussed in pyinstaller/pyinstaller-hooks-contrib#375.

Preserving parent directories of collected shared libraries also allows us to start preserving the Qt5 / Qt6 .framework bundles that are shipped with macOS PySide2/PySide6/PyQt5/PyQt6 PyPI wheels (and also with Homebrew Qt installations) - we just need to automatically collect the corresponding Info.plist file, and we've covered most of the cases (and collection of extra files from QtWebEngineCore.framework is handled by the corresponding hook).

This removes the need for the horrible work-around that we had for QtWebEngine, and fixes a series of associated issues.

QtWebEngine now works in onefile builds on macOS (fixes #4361).

The regular (POSIX) onedir programs using QtWebEngine on macOS are not mis-identified as QtWebEngineCore in the application's menu bar (fixes #5409).

The duplication of QtWebEngine files on macOS is now gone, so we can also close #6896.

Preservation of QtWebEngineCore.framework fixes the QtWebEngine sanboxing in Qt6 (PySide6 and PyQt6), which we had to "temporarily" disable in #6903. This should take care of qutebrowser/qutebrowser#7278 without requiring extra post-processing from qutebrowser/qutebrowser#7500.

Preservation of .framework bundles will likely also fix the crashes we've seen with the latest release of PyQt6 on our CI (which I reproduced locally, and seem related to missing Info.plist).

Act II: Let there be reorganized macOS .app bundles!

Preservation of parent directory structure for collected shared libraries, for all its benefits, comes with a price - on macOS, python packages often place their bundled dylibs into directories whose name contains a dot: numpy/.dylibs, scipy/.dylibs, and so on.

In macOS .app bundle, these directories need to be placed in location that expects binaries (Contents/MacOS or Contents/Frameworks), and these disallow dots in directory names, except for nested .framework bundles.

This makes for a good opportunity to evaluate our current content placing and relocation effort when generating macOS .bundles, and its limitations:

  • we had to exempt source .py files (data files) from being relocated to Contents/Resources in order to avoid issues with source-only packages that expect to find their extensions next to the .py files (specifically, the source-only loader from cv2 - see Error "ImportError: ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation." with Pyinstaller #7128, and consequently building: BUNDLE: exempt collected .py/.pyc files from relocation #7180). As pointed out by @phw in building: BUNDLE: exempt collected .py/.pyc files from relocation #7180 (comment), having data files in Contents/MacOS causes signature data for those files to be stored in their extended attributes. These might be lost when the bundle transferred using a method that does not support extended attributes, which in turn invalidates the signature on the .app bundle.

  • the contents of PySide2, PySide6, PyQt5, and PyQt6 are also completely exempted from relocation. This again means that we end up with data files in Contents/MacOS (e.g., the translations directory), but there is another problem - the mixed-content qml directory that (at least with Qt5) contains directories with dot in the name (QtQuick.2, QtQuick/Controls.2, etc), which cause codesign errors. While moving the qml directory to Contents/Resources might hide the immediate code-signing problems, the QML component plugins (binaries) need to be signed manually, otherwise notarization later (rightfully) complains about missing signatures.

With that in mind, the content relocation mechanism for macOS .app bundles has been completely redesigned. We now strictly place all data files in directory structure rooted in Contents/Resources, and all binaries (and nested .framework bundles) in directory structure rooted in Contents/Frameworks (in initial implementation Contents/MacOS was used, but one of late-stage commits changes this and switches sys._MEIPASS to Contents/Frameworks).

In contrast to old implementation that linked only contents from Contents/Resources back to Contents/MacOS, we now cross-link the contents between both directory structures, so that both give illusion of original mixed-content directories. This fixes the problem with cv2 loader from #7128, whose .py file now sees the extension binary (or rather a symbolic link to it) next to itself. Similarly, this allows us to split the contents of Qt's qml directory between Contents/Resources and Contents/Frameworks (and perform relocation on whole PySide2, PySide6, PyQt5 or PyQt6 directory and its contents) - we do need to extend QML_IMPORT_PATH to contain both paths, but that seems to do the trick with QML tests that we have.

For directories with dot in name we apply the following work-around: we rename the directory (replacing the dot with custom pattern, which at the moment is __dot__), and then create a symbolic link under original name that points to the actual (renamed) directory.

For gory details, see the comment in actual implementation in PyInstaller/building/osx.py.

The resulting .app bundle structure should address the outstanding compliance issues w.r.t. code-signing, and should make it easier to properly sign PySide or PyQt based application.

Closes #6612.
Closes #7123.
Closes #7191.

Act III: Let binaries and data files be automatically (re)classified!

At the end of the day, we can only properly place binaries and data files into their corresponding directory structures in a macOS .app bundle if they have correct typecodes - i.e., data files are classified as DATA, and binaries are classified as BINARY or EXTENSION.

Unfortunately, the TOC entries coming from the .spec file (passed to Analysis via binaries and datas arguments) and coming from hooks (via eponymous global hook variables) are not necessarily properly classified:

  • the user might pass a mixed-content directory via --add-data
  • collect_data_files hook utility function might be called with include_py_files, which will happily collect binaries whose suffices match those from importlib.machinery.all_suffixes()
  • a hook might collect binaries into datas instead of binaries

Therefore, the analysis now performs automatic (re)classification of files passed from .spec file and hooks, using platform-specific mechanism:

  • Windows: attempt to open the file using pefile
  • macOS: attempt to open the file using macholib
  • Linux: see if objdump -a recognizes the file
  • other platforms: no reclassification is not performed

In addition to improving situation w.r.t. macOS .app bundles, automatic (re)classification also ensures that all collected binaries undergo binary dependency analysis and any other platform-specific processing.

It might also be useful if we ever want to start code-signing binaries on Windows, or if/when we implement rpath-based library search paths on linux.

Supersedes and closes #7690.

@rokm rokm force-pushed the let-there-be-symlinks-final branch 2 times, most recently from 9b47930 to 67de771 Compare May 20, 2023 17:58
@rokm rokm force-pushed the let-there-be-symlinks-final branch 2 times, most recently from 25289c0 to fc273d3 Compare May 31, 2023 10:59
@rokm rokm force-pushed the let-there-be-symlinks-final branch from fc273d3 to 71e40ef Compare June 13, 2023 09:43
@rokm rokm changed the title Let there be symlinks Let there be symbolic links Jun 13, 2023
@rokm
Copy link
Member Author

rokm commented Jun 13, 2023

Since we have limited means of testing macOS changes (especially w.r.t. signing and notarization), I would ask our macOS users to give this PR a spin if they can, so we get some feedback ahead of actually merging the changes.

Note that bootloader needs to be rebuilt:

PYINSTALLER_COMPILE_BOOTLOADER=1 pip install git+https://github.com/rokm/pyinstaller.git@let-there-be-symlinks-final

@Safihre @phw @The-Compiler @Shirk @nyavramov

@rokm rokm requested a review from bwoodsend June 13, 2023 12:05
@rokm rokm marked this pull request as ready for review June 13, 2023 12:05
@phw
Copy link

phw commented Jun 13, 2023

@rokm This looks very promising. I'll test this on macOS. But might take me a few days before I get to it.

@Safihre
Copy link
Contributor

Safihre commented Jun 13, 2023

I quickly threw it into our build pipeline. At the end of the build we try to run the binary and see if it starts, but unfortunately it fails to start without logging the start failure to output.
Will have to pull out my mac to debug. I can do that tomorrow.

@HealsCodes
Copy link

Checked against a very simple PySide6 (Qt for Python) application; seems like the generated loader is still looking for the Python3 library in the wrong directory. (This is even before applying any codesigning)

./dist/demo.app/Contents/MacOS/demo
[5642] Error loading Python lib '/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3': dlopen:
dlopen(/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3, 0x000A): tried:
'/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file),
'/System/Volumes/Preboot/Cryptexes/OS/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file),
'/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file)

Python3 was correctly bundled but now resides in @executable_path/../Frameworks:

MacOS % ls -1 ../Frameworks
PySide6
Python3
QtCore
QtNetwork
base_library.zip
demo
lib-dynload
libpyside6.abi3.6.5.dylib
libshiboken6.abi3.6.5.dylib
shiboken6

@rokm
Copy link
Member Author

rokm commented Jun 13, 2023

@Safihre The errors in your (verbose) build log suggest that you are anchoring your paths to the executable. Which does not work anymore, because sys._MEIPASS has been relocated to Contents/Frameworks. So sys._MEIPASS != os.path.dirname(sys.executable) anymore (although in your case, you are using sys.argv[0] instead of sys.executable, but it's likely the same issue):
https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L968-L969
and
https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L862

@rokm
Copy link
Member Author

rokm commented Jun 13, 2023

Checked against a very simple PySide6 (Qt for Python) application; seems like the generated loader is still looking for the Python3 library in the wrong directory. (This is even before applying any codesigning)

./dist/demo.app/Contents/MacOS/demo
[5642] Error loading Python lib '/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3': dlopen:
dlopen(/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3, 0x000A): tried:
'/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file),
'/System/Volumes/Preboot/Cryptexes/OS/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file),
'/Users/xxx/Projects/pyinstaller-test/dist/demo.app/Contents/MacOS/Python3' (no such file)

Python3 was correctly bundled but now resides in @executable_path/../Frameworks:

MacOS % ls -1 ../Frameworks
PySide6
Python3
QtCore
QtNetwork
base_library.zip
demo
lib-dynload
libpyside6.abi3.6.5.dylib
libshiboken6.abi3.6.5.dylib
shiboken6

My guess would be you did not rebuild the bootloader; this is necessary for sys._MEIPASS to be set to Contents/Frameworks.

@HealsCodes
Copy link

My guess would be you did not rebuild the bootloader; this is necessary for sys._MEIPASS to be set to Contents/Frameworks.

That was indeed the issue, rebuilding the bootloader (fun experience without vagrant) resulted in a working executable bundle that I could also successfully codesign with our production certificate and entitlements.

PyInstaller/building/osx.py Outdated Show resolved Hide resolved
@bwoodsend
Copy link
Member

Reckon it's worth holding this one off until v6? I'd quite like to get one release containing #7670 out before we start doing scary stuff like this and #2702.

@rokm
Copy link
Member Author

rokm commented Jun 14, 2023

Reckon it's worth holding this one off until v6? I'd quite like to get one release containing #7670 out before we start doing scary stuff like this and #2702.

I'm fine with this not going into the next release or two (whatever we decide to version them), if you want to get python 3.12 support out first (that's assuming we won't be waiting for final release in October).

And it should definitely go in the same release as the regular onedir build reorganization, so that all breakage related to sys._MEIPASS != os.path.dirname(sys.executable) happens at once.

@rokm rokm force-pushed the let-there-be-symlinks-final branch from eb6e396 to f442d41 Compare June 18, 2023 13:10
@rokm rokm force-pushed the let-there-be-symlinks-final branch from f442d41 to 7ff2848 Compare June 18, 2023 23:44
@Safihre
Copy link
Contributor

Safihre commented Jun 19, 2023

@Safihre The errors in your (verbose) build log suggest that you are anchoring your paths to the executable. Which does not work anymore, because sys._MEIPASS has been relocated to Contents/Frameworks. So sys._MEIPASS != os.path.dirname(sys.executable) anymore (although in your case, you are using sys.argv[0] instead of sys.executable, but it's likely the same issue): https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L968-L969 and https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L862

@rokm Is this also something that will change on other platforms? E.g. I only need to make exception for macOS binary when determining the location?
I guess from a PyInstaller point it makes sense that these things are re-located, but from an application point it complicates things that locations depend on being frozen or not.

@rokm
Copy link
Member Author

rokm commented Jun 19, 2023

@Safihre The errors in your (verbose) build log suggest that you are anchoring your paths to the executable. Which does not work anymore, because sys._MEIPASS has been relocated to Contents/Frameworks. So sys._MEIPASS != os.path.dirname(sys.executable) anymore (although in your case, you are using sys.argv[0] instead of sys.executable, but it's likely the same issue): https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L968-L969 and https://github.com/sabnzbd/sabnzbd/blob/96f0743ce5b5cf2faa8ec4b40c7c6cab512199a4/SABnzbd.py#L862

@rokm Is this also something that will change on other platforms? E.g. I only need to make exception for macOS binary when determining the location?

The plan is to also change the structure of "regular" onedir builds (on all OSes), so that all the contents except the executable are moved into a sub-directory, and have sys._MEIPASS point there. See #7713.

I guess from a PyInstaller point it makes sense that these things are re-located, but from an application point it complicates things that locations depend on being frozen or not.

Anchoring paths to the os.path.dirname(__file__) should work regardless of being frozen or not, and regardless of being frozen in onedir or onefile mode (or the .app bundle).

Split the parent directory creation into separate helper that we
can reuse elsewhere.

Use low-level helpers for file and directory manipulation
(checking whether path exists, creating a directory, etc.)
to reduce amount of OS-specific code in higher-level helpers.
rokm added 20 commits July 1, 2023 00:24
We do not need to override QTWEBENGINEPROCESS_PATH anymore, as
the helper executable's location is now correctly inferred from
the shared library location within the .framework bundle.

And with PySide6/PyQt6 we do not need to disable sandboxing anymore,
either - it should work with the properly preserved
QtWebEngineCore.framework bundle layout.
When we are collecting a shared library into top-level directory
and this shared library comes from a .framework bundle, re-create
the .framework bundle directory structure in the top-level
directory, and create symlink to library for backward compatibility.

The Info.plist should then also be automatically collected as
part of previously-added collection mechanism.

This functionality aims to further preserve .framework bundle
structures, this time in cases when they are collected from
system-wide library directory instead of from a package; for
example, when collecting Qt .framework bundles from Homebrew
installation.
With Homebrew python and Qt, we are collecting the Qt libraries
from system-wide installation, into top-level application
directory (while preserving the .framework bundle directories).

So the extra helper files from QtWebEngineCore.framework bundle
need to be collected into QtWebEngineCore.framework directory in
top-level application directory instead of the sub-directory
in PySide/PyQt package directory.
When collecting the `Info.plist file`, we need to collect the
original file from `Versions/<version>/Resources`, rather than
from symlinked `Resources` in the top-level directory. Otherwise
`codesign` verification reports `embedded framework contains
modified or invalid version`.

We also need to (re)create the symlink `Versions/Current` pointing
to `Versions/<version>` directory from which we collected the
binary. Framework bundles shipped with python packages may not
include this, but it is required by `codesign`, so we always
manually create this symlink (i.e., do not attempt to collect it
even if it is present).

Neither signing nor verification with `codesign` seem to require
us to symlink `Versions/Current/Resources` and `Versions/Current/<binary>`
to the top-level directory, so for now, we do not do that.
Add optional strict verification of the generated bundle w.r.t.
codesign requirements. It can be enabled by setting the
`PYINSTALLER_VERIFY_BUNDLE_SIGNATURE` environment variable to
a value different from 0, and is meant to supplement the existing
`PYINSTALLER_STRICT_BUNDLE_CODESIGN_ERROR`.

The first step is verification with `codesign --verify`; it seems
that bundles can fail verification even though preceding signing
command succeeded without complaints.

The second step is scanning the extended attributes of all files
collected in the bundle, to see if any of them store the
code-signing data. This indicates that the files were not relocated
properly (e.g., data files being located outside of the
`Contents/Resources` directory) and risk losing the codesign
data when bundle is transferred using methods that do not preserve
the extended attributes.
Enable `PYINSTALLER_STRICT_BUNDLE_CODESIGN_ERROR` and
`PYINSTALLER_VERIFY_BUNDLE_SIGNATURE`, which will force us to
mop up the codesign-related issues, such as dots in the directory
names and Qt data files not being relocated to `Contents/Resources`.
While the well-formed versioned .framework bundles should contain
their `Info.plist` in the `Versions/<version>/Resources` directory,
we are likely to come across the bundles that contain `Info.plist`
in the top-level `Resources` directory (while the binary is in the
versioned directory). This seems to be the case with contemporary
`PySide2`, `PyQt5`, and `PyQt6`.

Accommodate such cases, but ensure that the file is collected into
versioned `Resource` directory, which seems to be a requirement
for code-signing.
Redesign and rewrite the resource relocation mechanism used to divert
data files into `Contents/Resources` directory while keeping the
binaries in `Contents/MacOS`.

The relocation is now performed as a pre-processing step on the input
TOC, and produces the "final" TOC, which is then used by the `assemble`
method to preform the actual collection. The separation of relocation
step from the collection step makes it easier to make adjustments to
the relocation implementation. At the same time, it significantly
simplifies the collection itself, with the code there now being very
similar to what we have in the COLLECT target.

The relocation mechanism has been completely redesigned. We now
relocate ALL data files to `Contents/Resources`. That includes the
source  .py and byte-compiled .pyc files (which were exempted from
relocation in # in an attempt to fix `cv2` loader scripts expecting
to find the extension binary next to source .py file). It also includes
data files from the PySide2/PySide6/PyQt5/PyQt6 which have been
explicitly excluded from relocation up until now.

The nested .framework bundles are treated as monolihic BINARY
entities, and thus kept in `Contents/MacOS` (i.e., the DATA entries
from within such .framework bundles are not relocated).

While previously only data files (and data-only directories) have
been symlinked from `Contents/Resources` back to `Contents/MacOS`,
we now automatically cross-link all files between the two directory
structures in an attempt to maintain the illusion of mixed-content
directories at either location. Data-only and binary-only directories
are cross-linked at directory level, while files in mixed-content
directories are cross-linked at individual file level.

An automatic work-around for directories in `Contents/MacOS` that
have dot in their name is implemented; the directory is created
with a modified name, and a symbolic link pointing to it is created
under the original name.

A series of tests has been added to verify that the basic rules of
relocation.
As part of post-processing of collected files from .framework bundles,
re-create the symlinks to the content within the `Versions/Current`
(which itself is a re-created symlink to the `Versions/<version>`)
from the top-level .framework directory: the binary file, the
Resources directory, and auxiliary directories, such as Helpers.

While this is a generic solution, it is primarily aimed at
seamlessly accommodating the QtWebEngine in contemporary PySide/PyQt,
where the QtWebEngineProcess helper process is sought in the
top-level Helpers directory and the resources are sought in the
top-level Resources directory. On the other hand, we need to collect
these directories in the `Versions/<version>` directory to be
compliant with codesign...
The Resources and Helpers directory that we collect from the
QtWebEngineCore.framework need to be collected in the versioned
directory (Versions/<version>/Helpers and Versions/<version>/Resources)
in order to be compliant with codesign, so adjust the paths
accordingly.

The .framework bundles shipped with contemporary PyPI PyQt/PySide wheels
are naturally not well-formed, so we need to fall-back to collecting
stuff from top-level .framework directory. But collect into versioned
directory to be compliant with codesign.

The helper process and resources are actually sought in the top-level
.framework directory (which is why everything except the codesign
worked so far) - this is now handled by symlinks to the top-level
directory that are automatically created by .framework post-processing
code in the `PyInstaller.utils.osx.collect_files_from_framework_bundles`
helper function (see preceding commit).
Instead of using system-provided `/usr/bin/xattr` utility to list
the extended attributes on collected files, add an optional
dependency on `xattr` package and use `xattr.listxattr` function.

Turns out that system-provided `/usr/bin/xattr` is in fact just an
entry-point provided by the `xattr` package installed in the
system-provided python, which on macOS <= 11 is still python2.
Therefore, trying to run it in a sub-process from python3 might
lead to weird compatibility issues to due python3 site-packages
leaking into the python2 process, as seen on our CI:

```
 output: Traceback (most recent call last):
  File "/usr/bin/xattr", line 8, in <module>
    from pkg_resources import load_entry_point
  File "/Users/runner/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1423
    local = f"sanitized.{_safe_segment(rest)}".strip(".")
                                              ^
SyntaxError: invalid syntax
```
To satisfy codesign requirements, we are forced to split the
collected `qml` directory into two parts; one that keeps only
binaries (rooted in the `Contents/MacOS`) and one that keeps
only data files (rooted in the `Contents/Resources`), with files
from one directory tree being symlinked to the other to mantinain
illusion of a single mixed-content directory.

As Qt seems to compute the identifier of its QML components based
on location of the `qmldir` file w.r.t. the registered QML import
paths, we need to register both paths, because the `qmldir` file
for a component could be reached via either directory tree.
When assembling macOS .app bundle, put the dylibs and nested
.framework bundles into `Contents/Frameworks` directory, so that
only the program executable is left in the `Contents/MacOS`.

This requires us to relocate `sys._MEIPASS` from `Contents/MacOS`
(= parent of the executable directory) to `Contents/Frameworks`
in lieu of having to cross-link everything back (which would
defeat the purpose of the relocation in the first place).

A side effect is that `sys._MEIPASS` does not correspond to
`os.path.dirname(sys.executable)` in onedir .app bundles anymore.
But that never held for onefile builds (regular or .app bundles),
and may not hold for regular onedir builds in the near future.
Perform automatic BINARY vs. DATA (re)classification for TOC
entries received from user via Analysis' input arguments and
from hooks. These entries might have incorrect typecode due to
various reasons: user passing binaries as datas or vice versa, a
hook brute-force collecting a directory as datas even though it
also contains binaries, or due to incorrect classification done
by our hook utilities.

Automatic (re)classification helps ensure that *all* binaries
undergo proper binary dependency analysis.

Furthermore, in generated macOS .app bundles, we rely on proper
data/binary typecodes to put the files into their corresponding
directory structure (`Contents/Resources` vs `Contents/MacOS` or
`Contents/Frameworks`) to satisfy code-signing requirements.

The classification is currently implemented for:
 - Windows: attempt to open the file using `pefile`
 - macOS: attempt to open the file using `macholib`
 - Linux: see if `objdump -a` recognizes the file

On these platforms, the TOC entry's typecode is automatically
corrected and the entry is moved to the correct TOC list (`binaries`
or `datas`). On other platforms, no changes are made.
When creating parent directory structure for collected files
in COLLECT and BUNDLE, avoid explicitly checking whether the
directory already exists. Instead, call `os.makedirs` with
`exists_ok=True` and catch the `FileExistsError`.
The .app bundle's executable's location should be always
discoverable via `sys.executable`, so there is no reason to
cross-link it into `Contents/Frameworks` and `Contents/Resources`.

By not cross-liking the executable, we also prevent potential
conflicts between the executable (i.e., program/app bundle name)
and an eponymous package (from which we collect binaries or data
files), which arise in scenario when same base name is used for
the entry-point script (`something.py`) and a package (`something`).
The crash caused by missing Info.plist should be solved now that we
properly preserve the Qt .framework bundles.
If we have a symbolic link chain with intermediate links that we
do not collect, attempt to rewrite the links to "jump over" the
missing ones.

This attempts to mitigate potential issues under Homebrew python,
where shared libraries for `wxwidgets` have the following layout:
 * libwx_baseu-3.2.0.2.1.dylib
 * libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib
 * libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib

If the other collected binaries reference only the two of them,
`libwx_baseu-3.2.0.2.1.dylib` and `libwx_baseu-3.2.dylib`, we
end up missing the intermediate link, and the final link ends
up as a hard copy, causing issues due to duplication.

Therefore, if we do not seem to be collecting the file referenced
by a symlink, but the referenced file itself is a symlink, follow
it again and see if we happen to be collecting that file (and if
that file happens to be a symlink, follow it again...).
Rework the symbolic link check to handle the problem with linked
parent directory, as observed on macOS:
 * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.2.1.dylib
 * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib
 * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib
and
 * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.0.2.1.dylib
 * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib
 * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib
which are actually the same, because
 * /usr/local/opt/wxwidgets -> ../Cellar/wxwidgets/3.2.2.1_1

Other binaries end up referencing
`/usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.dylib` and
`/usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.2.1.dylib`.

So in addition the the problem with chained links and missing
intermediate link, we also need to be able to handle the situation
where the files appear to be originating from different locations
due to the linking of parent directories.
@rokm rokm force-pushed the let-there-be-symlinks-final branch from 099cec8 to d011333 Compare June 30, 2023 22:25
@rokm
Copy link
Member Author

rokm commented Jul 1, 2023

The last two commits rework the symlink preservation logic a bit, so that we can now cope with Homebrew situation in #7738.

@rokm rokm merged commit c342dac into pyinstaller:develop Jul 1, 2023
18 checks passed
@rokm rokm deleted the let-there-be-symlinks-final branch July 1, 2023 10:33
toofar added a commit to qutebrowser/qutebrowser that referenced this pull request Jul 27, 2023
PyInstaller has [recently][symlinks] landed a change that should restore
the webengine sandbox on macOS builds. With a bit more testing we are
hoping that we can go ahead releasing builds based on that
in-development PyInstaller codebase even before they've made a release.

This commit pins our pyinstaller dependency to be the latest commit on
their develop branch.

[symlinks]: pyinstaller/pyinstaller#7619
toofar added a commit to qutebrowser/qutebrowser that referenced this pull request Jul 27, 2023
In the past various workarounds have been put in place to move/copy/symlink
files in the macOS app build to make PyQt work and let us sign it.

As of pyinstaller/pyinstaller#7619 our downstream
patching should not be required.

The application seems to run fine.
The app size is 155 MB.
Signing is still to be verified.
toofar added a commit to qutebrowser/qutebrowser that referenced this pull request Jul 30, 2023
In the past various workarounds have been put in place to move/copy/symlink
files in the macOS app build to make PyQt work and let us sign it.

As of pyinstaller/pyinstaller#7619 our downstream
patching should not be required.

The application seems to run fine.
The app size is 155 MB.
Signing is still to be verified.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.