New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let there be symbolic links #7619
Conversation
9b47930
to
67de771
Compare
25289c0
to
fc273d3
Compare
fc273d3
to
71e40ef
Compare
Since we have limited means of testing macOS changes (especially w.r.t. signing and notarization), I would ask our macOS users to give this PR a spin if they can, so we get some feedback ahead of actually merging the changes. Note that bootloader needs to be rebuilt:
|
@rokm This looks very promising. I'll test this on macOS. But might take me a few days before I get to it. |
I quickly threw it into our build pipeline. At the end of the build we try to run the binary and see if it starts, but unfortunately it fails to start without logging the start failure to output. |
Checked against a very simple PySide6 (Qt for Python) application; seems like the generated loader is still looking for the Python3 library in the wrong directory. (This is even before applying any codesigning)
Python3 was correctly bundled but now resides in @executable_path/../Frameworks:
|
@Safihre The errors in your (verbose) build log suggest that you are anchoring your paths to the executable. Which does not work anymore, because |
My guess would be you did not rebuild the bootloader; this is necessary for |
That was indeed the issue, rebuilding the bootloader (fun experience without vagrant) resulted in a working executable bundle that I could also successfully codesign with our production certificate and entitlements. |
71e40ef
to
b1cdbf1
Compare
I'm fine with this not going into the next release or two (whatever we decide to version them), if you want to get python 3.12 support out first (that's assuming we won't be waiting for final release in October). And it should definitely go in the same release as the regular onedir build reorganization, so that all breakage related to |
eb6e396
to
f442d41
Compare
f442d41
to
7ff2848
Compare
@rokm Is this also something that will change on other platforms? E.g. I only need to make exception for macOS binary when determining the location? |
The plan is to also change the structure of "regular"
Anchoring paths to the |
7ff2848
to
099cec8
Compare
Split the parent directory creation into separate helper that we can reuse elsewhere. Use low-level helpers for file and directory manipulation (checking whether path exists, creating a directory, etc.) to reduce amount of OS-specific code in higher-level helpers.
We do not need to override QTWEBENGINEPROCESS_PATH anymore, as the helper executable's location is now correctly inferred from the shared library location within the .framework bundle. And with PySide6/PyQt6 we do not need to disable sandboxing anymore, either - it should work with the properly preserved QtWebEngineCore.framework bundle layout.
These should work now.
When we are collecting a shared library into top-level directory and this shared library comes from a .framework bundle, re-create the .framework bundle directory structure in the top-level directory, and create symlink to library for backward compatibility. The Info.plist should then also be automatically collected as part of previously-added collection mechanism. This functionality aims to further preserve .framework bundle structures, this time in cases when they are collected from system-wide library directory instead of from a package; for example, when collecting Qt .framework bundles from Homebrew installation.
With Homebrew python and Qt, we are collecting the Qt libraries from system-wide installation, into top-level application directory (while preserving the .framework bundle directories). So the extra helper files from QtWebEngineCore.framework bundle need to be collected into QtWebEngineCore.framework directory in top-level application directory instead of the sub-directory in PySide/PyQt package directory.
When collecting the `Info.plist file`, we need to collect the original file from `Versions/<version>/Resources`, rather than from symlinked `Resources` in the top-level directory. Otherwise `codesign` verification reports `embedded framework contains modified or invalid version`. We also need to (re)create the symlink `Versions/Current` pointing to `Versions/<version>` directory from which we collected the binary. Framework bundles shipped with python packages may not include this, but it is required by `codesign`, so we always manually create this symlink (i.e., do not attempt to collect it even if it is present). Neither signing nor verification with `codesign` seem to require us to symlink `Versions/Current/Resources` and `Versions/Current/<binary>` to the top-level directory, so for now, we do not do that.
Add optional strict verification of the generated bundle w.r.t. codesign requirements. It can be enabled by setting the `PYINSTALLER_VERIFY_BUNDLE_SIGNATURE` environment variable to a value different from 0, and is meant to supplement the existing `PYINSTALLER_STRICT_BUNDLE_CODESIGN_ERROR`. The first step is verification with `codesign --verify`; it seems that bundles can fail verification even though preceding signing command succeeded without complaints. The second step is scanning the extended attributes of all files collected in the bundle, to see if any of them store the code-signing data. This indicates that the files were not relocated properly (e.g., data files being located outside of the `Contents/Resources` directory) and risk losing the codesign data when bundle is transferred using methods that do not preserve the extended attributes.
Enable `PYINSTALLER_STRICT_BUNDLE_CODESIGN_ERROR` and `PYINSTALLER_VERIFY_BUNDLE_SIGNATURE`, which will force us to mop up the codesign-related issues, such as dots in the directory names and Qt data files not being relocated to `Contents/Resources`.
While the well-formed versioned .framework bundles should contain their `Info.plist` in the `Versions/<version>/Resources` directory, we are likely to come across the bundles that contain `Info.plist` in the top-level `Resources` directory (while the binary is in the versioned directory). This seems to be the case with contemporary `PySide2`, `PyQt5`, and `PyQt6`. Accommodate such cases, but ensure that the file is collected into versioned `Resource` directory, which seems to be a requirement for code-signing.
Redesign and rewrite the resource relocation mechanism used to divert data files into `Contents/Resources` directory while keeping the binaries in `Contents/MacOS`. The relocation is now performed as a pre-processing step on the input TOC, and produces the "final" TOC, which is then used by the `assemble` method to preform the actual collection. The separation of relocation step from the collection step makes it easier to make adjustments to the relocation implementation. At the same time, it significantly simplifies the collection itself, with the code there now being very similar to what we have in the COLLECT target. The relocation mechanism has been completely redesigned. We now relocate ALL data files to `Contents/Resources`. That includes the source .py and byte-compiled .pyc files (which were exempted from relocation in # in an attempt to fix `cv2` loader scripts expecting to find the extension binary next to source .py file). It also includes data files from the PySide2/PySide6/PyQt5/PyQt6 which have been explicitly excluded from relocation up until now. The nested .framework bundles are treated as monolihic BINARY entities, and thus kept in `Contents/MacOS` (i.e., the DATA entries from within such .framework bundles are not relocated). While previously only data files (and data-only directories) have been symlinked from `Contents/Resources` back to `Contents/MacOS`, we now automatically cross-link all files between the two directory structures in an attempt to maintain the illusion of mixed-content directories at either location. Data-only and binary-only directories are cross-linked at directory level, while files in mixed-content directories are cross-linked at individual file level. An automatic work-around for directories in `Contents/MacOS` that have dot in their name is implemented; the directory is created with a modified name, and a symbolic link pointing to it is created under the original name. A series of tests has been added to verify that the basic rules of relocation.
As part of post-processing of collected files from .framework bundles, re-create the symlinks to the content within the `Versions/Current` (which itself is a re-created symlink to the `Versions/<version>`) from the top-level .framework directory: the binary file, the Resources directory, and auxiliary directories, such as Helpers. While this is a generic solution, it is primarily aimed at seamlessly accommodating the QtWebEngine in contemporary PySide/PyQt, where the QtWebEngineProcess helper process is sought in the top-level Helpers directory and the resources are sought in the top-level Resources directory. On the other hand, we need to collect these directories in the `Versions/<version>` directory to be compliant with codesign...
The Resources and Helpers directory that we collect from the QtWebEngineCore.framework need to be collected in the versioned directory (Versions/<version>/Helpers and Versions/<version>/Resources) in order to be compliant with codesign, so adjust the paths accordingly. The .framework bundles shipped with contemporary PyPI PyQt/PySide wheels are naturally not well-formed, so we need to fall-back to collecting stuff from top-level .framework directory. But collect into versioned directory to be compliant with codesign. The helper process and resources are actually sought in the top-level .framework directory (which is why everything except the codesign worked so far) - this is now handled by symlinks to the top-level directory that are automatically created by .framework post-processing code in the `PyInstaller.utils.osx.collect_files_from_framework_bundles` helper function (see preceding commit).
Instead of using system-provided `/usr/bin/xattr` utility to list the extended attributes on collected files, add an optional dependency on `xattr` package and use `xattr.listxattr` function. Turns out that system-provided `/usr/bin/xattr` is in fact just an entry-point provided by the `xattr` package installed in the system-provided python, which on macOS <= 11 is still python2. Therefore, trying to run it in a sub-process from python3 might lead to weird compatibility issues to due python3 site-packages leaking into the python2 process, as seen on our CI: ``` output: Traceback (most recent call last): File "/usr/bin/xattr", line 8, in <module> from pkg_resources import load_entry_point File "/Users/runner/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1423 local = f"sanitized.{_safe_segment(rest)}".strip(".") ^ SyntaxError: invalid syntax ```
To satisfy codesign requirements, we are forced to split the collected `qml` directory into two parts; one that keeps only binaries (rooted in the `Contents/MacOS`) and one that keeps only data files (rooted in the `Contents/Resources`), with files from one directory tree being symlinked to the other to mantinain illusion of a single mixed-content directory. As Qt seems to compute the identifier of its QML components based on location of the `qmldir` file w.r.t. the registered QML import paths, we need to register both paths, because the `qmldir` file for a component could be reached via either directory tree.
When assembling macOS .app bundle, put the dylibs and nested .framework bundles into `Contents/Frameworks` directory, so that only the program executable is left in the `Contents/MacOS`. This requires us to relocate `sys._MEIPASS` from `Contents/MacOS` (= parent of the executable directory) to `Contents/Frameworks` in lieu of having to cross-link everything back (which would defeat the purpose of the relocation in the first place). A side effect is that `sys._MEIPASS` does not correspond to `os.path.dirname(sys.executable)` in onedir .app bundles anymore. But that never held for onefile builds (regular or .app bundles), and may not hold for regular onedir builds in the near future.
Perform automatic BINARY vs. DATA (re)classification for TOC entries received from user via Analysis' input arguments and from hooks. These entries might have incorrect typecode due to various reasons: user passing binaries as datas or vice versa, a hook brute-force collecting a directory as datas even though it also contains binaries, or due to incorrect classification done by our hook utilities. Automatic (re)classification helps ensure that *all* binaries undergo proper binary dependency analysis. Furthermore, in generated macOS .app bundles, we rely on proper data/binary typecodes to put the files into their corresponding directory structure (`Contents/Resources` vs `Contents/MacOS` or `Contents/Frameworks`) to satisfy code-signing requirements. The classification is currently implemented for: - Windows: attempt to open the file using `pefile` - macOS: attempt to open the file using `macholib` - Linux: see if `objdump -a` recognizes the file On these platforms, the TOC entry's typecode is automatically corrected and the entry is moved to the correct TOC list (`binaries` or `datas`). On other platforms, no changes are made.
When creating parent directory structure for collected files in COLLECT and BUNDLE, avoid explicitly checking whether the directory already exists. Instead, call `os.makedirs` with `exists_ok=True` and catch the `FileExistsError`.
The .app bundle's executable's location should be always discoverable via `sys.executable`, so there is no reason to cross-link it into `Contents/Frameworks` and `Contents/Resources`. By not cross-liking the executable, we also prevent potential conflicts between the executable (i.e., program/app bundle name) and an eponymous package (from which we collect binaries or data files), which arise in scenario when same base name is used for the entry-point script (`something.py`) and a package (`something`).
The crash caused by missing Info.plist should be solved now that we properly preserve the Qt .framework bundles.
If we have a symbolic link chain with intermediate links that we do not collect, attempt to rewrite the links to "jump over" the missing ones. This attempts to mitigate potential issues under Homebrew python, where shared libraries for `wxwidgets` have the following layout: * libwx_baseu-3.2.0.2.1.dylib * libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib * libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib If the other collected binaries reference only the two of them, `libwx_baseu-3.2.0.2.1.dylib` and `libwx_baseu-3.2.dylib`, we end up missing the intermediate link, and the final link ends up as a hard copy, causing issues due to duplication. Therefore, if we do not seem to be collecting the file referenced by a symlink, but the referenced file itself is a symlink, follow it again and see if we happen to be collecting that file (and if that file happens to be a symlink, follow it again...).
Rework the symbolic link check to handle the problem with linked parent directory, as observed on macOS: * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.2.1.dylib * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib * /usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib and * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.0.2.1.dylib * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.0.dylib -> libwx_baseu-3.2.0.2.1.dylib * /usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.dylib -> libwx_baseu-3.2.0.dylib which are actually the same, because * /usr/local/opt/wxwidgets -> ../Cellar/wxwidgets/3.2.2.1_1 Other binaries end up referencing `/usr/local/opt/wxwidgets/lib/libwx_baseu-3.2.dylib` and `/usr/local/Cellar/wxwidgets/3.2.2.1_1/lib/libwx_baseu-3.2.0.2.1.dylib`. So in addition the the problem with chained links and missing intermediate link, we also need to be able to handle the situation where the files appear to be originating from different locations due to the linking of parent directories.
099cec8
to
d011333
Compare
The last two commits rework the symlink preservation logic a bit, so that we can now cope with Homebrew situation in #7738. |
PyInstaller has [recently][symlinks] landed a change that should restore the webengine sandbox on macOS builds. With a bit more testing we are hoping that we can go ahead releasing builds based on that in-development PyInstaller codebase even before they've made a release. This commit pins our pyinstaller dependency to be the latest commit on their develop branch. [symlinks]: pyinstaller/pyinstaller#7619
In the past various workarounds have been put in place to move/copy/symlink files in the macOS app build to make PyQt work and let us sign it. As of pyinstaller/pyinstaller#7619 our downstream patching should not be required. The application seems to run fine. The app size is 155 MB. Signing is still to be verified.
In the past various workarounds have been put in place to move/copy/symlink files in the macOS app build to make PyQt work and let us sign it. As of pyinstaller/pyinstaller#7619 our downstream patching should not be required. The application seems to run fine. The app size is 155 MB. Signing is still to be verified.
Act I: Let there be symbolic links!
The initial series of commits in this PR implements support for creation of symbolic links - either at build-time (in
COLLECT
orBUNDLE
foronedir
builds) or at run-time (created by bootloader fromPKG
foronefile
builds).This allows us to preserve collected symbolic links, which up until now have always been collected as duplicated hard copies of the underlying file. If we collect both a file and a (relative) symbolic link that points to it, and if the relative relationship between the two is preserved when collected, we now preserve the symbolic link.
On one hand, this should help reducing the size of the frozen application in cases when many symbolic links are present. Which admittedly does not happen often with PyPI wheels that do not support symbolic links and thus ship their shared libraries as a single file. On the other hand, Anaconda-installed libraries on Linux and macOS typically include multiple versioned symbolic links for each shared library; normally, all libraries should be linked against the fully-versioned file (which is the actual copy of the file), but sometimes this is not the case, and we collect the file and a symbolic link (and collecting them as duplicated hard-copies results in a crash on macOS; #5710). And sometimes, we have a hook that collects all shared libraries from Anaconda package and its dependencies (as indicated by package's metadata) - this is the case for
numpy
(our copy of the hook is here, and it is the same in the version that we submitted to upstream).So this fixes #5710.
Ability to create symbolic links also allows us to start preserving the parent directory structure of shared libraries that are discovered via binary dependency analysis on non-Windows platforms. I.e., instead of automatically collecting the shared library into top-level application directory, we keep it in its original directory structure. We already implemented this for Windows in #7028, but no additional considerations were needed there because packages can dynamically modify DLL search paths. On Linux and macOS, this is not possible, and the assumption is that the shared libraries are located in top-level directory - we can now satisfy this requirement by creating symbolic link in the top-level directory that points to the collected shared library. This avoids duplication issues and related crashes like the one discussed in pyinstaller/pyinstaller-hooks-contrib#375.
Preserving parent directories of collected shared libraries also allows us to start preserving the Qt5 / Qt6 .framework bundles that are shipped with macOS
PySide2
/PySide6
/PyQt5
/PyQt6
PyPI wheels (and also with Homebrew Qt installations) - we just need to automatically collect the correspondingInfo.plist
file, and we've covered most of the cases (and collection of extra files fromQtWebEngineCore.framework
is handled by the corresponding hook).This removes the need for the horrible work-around that we had for
QtWebEngine
, and fixes a series of associated issues.QtWebEngine
now works inonefile
builds on macOS (fixes #4361).The regular (POSIX)
onedir
programs usingQtWebEngine
on macOS are not mis-identified asQtWebEngineCore
in the application's menu bar (fixes #5409).The duplication of QtWebEngine files on macOS is now gone, so we can also close #6896.
Preservation of
QtWebEngineCore.framework
fixes theQtWebEngine
sanboxing in Qt6 (PySide6
andPyQt6
), which we had to "temporarily" disable in #6903. This should take care of qutebrowser/qutebrowser#7278 without requiring extra post-processing from qutebrowser/qutebrowser#7500.Preservation of .framework bundles will likely also fix the crashes we've seen with the latest release of
PyQt6
on our CI (which I reproduced locally, and seem related to missingInfo.plist
).Act II: Let there be reorganized macOS .app bundles!
Preservation of parent directory structure for collected shared libraries, for all its benefits, comes with a price - on macOS, python packages often place their bundled dylibs into directories whose name contains a dot:
numpy/.dylibs
,scipy/.dylibs
, and so on.In macOS .app bundle, these directories need to be placed in location that expects binaries (
Contents/MacOS
orContents/Frameworks
), and these disallow dots in directory names, except for nested .framework bundles.This makes for a good opportunity to evaluate our current content placing and relocation effort when generating macOS .bundles, and its limitations:
we had to exempt source .py files (data files) from being relocated to
Contents/Resources
in order to avoid issues with source-only packages that expect to find their extensions next to the .py files (specifically, the source-only loader fromcv2
- see Error "ImportError: ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation." with Pyinstaller #7128, and consequently building: BUNDLE: exempt collected .py/.pyc files from relocation #7180). As pointed out by @phw in building: BUNDLE: exempt collected .py/.pyc files from relocation #7180 (comment), having data files inContents/MacOS
causes signature data for those files to be stored in their extended attributes. These might be lost when the bundle transferred using a method that does not support extended attributes, which in turn invalidates the signature on the .app bundle.the contents of
PySide2
,PySide6
,PyQt5
, andPyQt6
are also completely exempted from relocation. This again means that we end up with data files inContents/MacOS
(e.g., thetranslations
directory), but there is another problem - the mixed-contentqml
directory that (at least with Qt5) contains directories with dot in the name (QtQuick.2
,QtQuick/Controls.2
, etc), which causecodesign
errors. While moving theqml
directory toContents/Resources
might hide the immediate code-signing problems, the QML component plugins (binaries) need to be signed manually, otherwise notarization later (rightfully) complains about missing signatures.With that in mind, the content relocation mechanism for macOS .app bundles has been completely redesigned. We now strictly place all data files in directory structure rooted in
Contents/Resources
, and all binaries (and nested .framework bundles) in directory structure rooted inContents/Frameworks
(in initial implementationContents/MacOS
was used, but one of late-stage commits changes this and switchessys._MEIPASS
toContents/Frameworks
).In contrast to old implementation that linked only contents from
Contents/Resources
back toContents/MacOS
, we now cross-link the contents between both directory structures, so that both give illusion of original mixed-content directories. This fixes the problem withcv2
loader from #7128, whose .py file now sees the extension binary (or rather a symbolic link to it) next to itself. Similarly, this allows us to split the contents of Qt'sqml
directory betweenContents/Resources
andContents/Frameworks
(and perform relocation on wholePySide2
,PySide6
,PyQt5
orPyQt6
directory and its contents) - we do need to extendQML_IMPORT_PATH
to contain both paths, but that seems to do the trick with QML tests that we have.For directories with dot in name we apply the following work-around: we rename the directory (replacing the dot with custom pattern, which at the moment is
__dot__
), and then create a symbolic link under original name that points to the actual (renamed) directory.For gory details, see the comment in actual implementation in
PyInstaller/building/osx.py
.The resulting .app bundle structure should address the outstanding compliance issues w.r.t. code-signing, and should make it easier to properly sign PySide or PyQt based application.
Closes #6612.
Closes #7123.
Closes #7191.
Act III: Let binaries and data files be automatically (re)classified!
At the end of the day, we can only properly place binaries and data files into their corresponding directory structures in a macOS .app bundle if they have correct typecodes - i.e., data files are classified as DATA, and binaries are classified as BINARY or EXTENSION.
Unfortunately, the TOC entries coming from the .spec file (passed to
Analysis
viabinaries
anddatas
arguments) and coming from hooks (via eponymous global hook variables) are not necessarily properly classified:--add-data
collect_data_files
hook utility function might be called withinclude_py_files
, which will happily collect binaries whose suffices match those fromimportlib.machinery.all_suffixes()
Therefore, the analysis now performs automatic (re)classification of files passed from .spec file and hooks, using platform-specific mechanism:
pefile
macholib
objdump -a
recognizes the fileIn addition to improving situation w.r.t. macOS .app bundles, automatic (re)classification also ensures that all collected binaries undergo binary dependency analysis and any other platform-specific processing.
It might also be useful if we ever want to start code-signing binaries on Windows, or if/when we implement rpath-based library search paths on linux.
Supersedes and closes #7690.