Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building: macOS: limit binaries' architecture validation to extensions #6587

Merged
merged 3 commits into from Feb 18, 2022

Conversation

rokm
Copy link
Member

@rokm rokm commented Feb 13, 2022

As demonstrated by scipy 1.8.0, the multi-arch universal2 extensions may have their individual arch slices linked against distinct single-arch thin shared libraries.

Such thin shared libraries will fail the current strict architecture validation, either by virtue of being single-arch (whereas the target architecture is universal2) or by virtue of at least one single-arch thin shared library being of incompatible architecture (e.g., arm64 thin shared library will be flagged as incompatible for x86_64 target, and x86_64 thin shared library will be flagged as incompatible for arm64 target).

Therefore, limit the architecture validation only to python extension modules, which do need to be fully compatible with the target arch (at least until someone decides to ship distinct, arch-specific modules. But if that does happen, we can probably prevent the collection of incompatible module via hooks). The extension validation should hopefully still catch the attempts at using incompatible single-arch packages when trying to build a universal2 application or a single-arch application for architecture that's different from the running one.

As demonstrated by scipy 1.8.0, the multi-arch universal2 extensions
may have their individual arch slices linked against distinct
single-arch thin shared libraries.

Such thin shared libraries will fail the current strict architecture
validation, either by virtue of being single-arch (whereas the target
architecture is universal2) or by virtue of at least one single-arch
thin shared library being of incompatible architecture (e.g., arm64
thin shared library will be flagged as incompatible for x86_64 target,
and x86_64 thin shared library will be flagged as incompatible for
arm64 target).

Therefore, limit the architecture validation only to python extension
modules, which do need to be fully compatible with the target arch
(at least until someone decides to ship distinct, arch-specific
modules. But if that does happen, we can probably prevent the
collection of incompatible module via hooks). The extension
validation should hopefully still catch the attempts at using
incompatible single-arch packages when trying to build a universal2
application or a single-arch application for architecture that's
different from the running one.
@rokm
Copy link
Member Author

rokm commented Feb 13, 2022

(Not so) fun fact: despite scipy having universal2 wheels, it is still impossible to create universal2 frozen applications or "cross-compile" for the non-running arch due to numpy wheels still being strictly single-arch. But at least it offers a good way to (manually) test that our arch validation of extension modules manages to catch such cases.

@rokm
Copy link
Member Author

rokm commented Feb 13, 2022

Also, this PR does not address the underlying issue, which is that both single-arch x86_64 and arm64 thin shared libraries are collected based on analysis of multi-arch universal2 extensions. So a x86_64 build with scipy 1.8.0 will have the collected universal2 extensions trimmed down to x86_64, but the arm64 shared libs from original analysis will also be collected.

While we could try to explicitly ignore shared libraries of incompatible architecture at collection stage, this poses a conceptually similar problem as with the original validation issue; we cannot be certain that the binary should not be collected. For example, a package (universal2 or arm64) might ship a x86_64-only helper program that is executed via subprocess regardless of the arch.

So I think the proper approach is to add target_arch argument to Analysis as well (and bind --target-arch command-line switch to that one instead of target_arch of EXE(); and fix the template to pass a.target_arch to EXE's target_arch). And then propagate that all the way down to bindepend.getImports_macho and skip the headers that do not match the target arch(s). But the affected parts of code have diverged between develop and v4, so I think it's better to add that in separate PR(s).

@bwoodsend
Copy link
Member

(Not so) fun fact: despite scipy having universal2 wheels, it is still impossible to create universal2 frozen applications or "cross-compile" for the non-running arch due to numpy wheels still being strictly single-arch. But at least it offers a good way to (manually) test that our arch validation of extension modules manages to catch such cases.

The whole point on PyPA adopting universal2 instead of single arch was so that py2app (and by extension us) could produce universal2 Python applications. So in that case, SciPy's universal2 binary soup is not accomplishing anything.

@bwoodsend
Copy link
Member

I can't say I'm comfortable with not validating shared libraries. Anything brew provides is single arch. Would it make sense to only skip validation for shared libraries provided by Python packages?

I'm honestly tempted to scrap universal2 support completely and just tell users to petition Apple to provide M1 support for VMware so that CI/CD on M1 is actually possible and we can forget this fat binary cross compile polava.

@rokm
Copy link
Member Author

rokm commented Feb 14, 2022

I can't say I'm comfortable with not validating shared libraries. Anything brew provides is single arch.

I think in this regard, brew is the least of our problems, precisely because it is predictably single-arch. Meaning that if someone tries to create a universal2 or an arm64 frozen application under x86_64, they should encounter an error on the first python-provided extension module that gets collected (_struct, for example).

Would it make sense to only skip validation for shared libraries provided by Python packages?

We could do something like that - did you have any specific mechanism in mind, or just checking if the shared library comes from site-packages (i.e., validate if extension or not coming from site-packages)?

I'm honestly tempted to scrap universal2 support completely and just tell users to petition Apple to provide M1 support for VMware so that CI/CD on M1 is actually possible and we can forget this fat binary cross compile polava.

I wouldn't object to that. The less I have to deal with macOS and its oddities, the better... :)

@bwoodsend
Copy link
Member

Meaning that if someone tries to create a universal2 or an arm64 frozen application under x86_64, they should encounter an error on the first python-provided extension module that gets collected (_struct, for example).

Only if they install Python from brew too, right? If they use a universal2 Python but collect libcairo.dylib from wherever brew puts it they won't know it's broken until some disgruntled M1 user tells them so.

We could do something like that - did you have any specific mechanism in mind, or just checking if the shared library comes from site-packages (i.e., validate if extension or not coming from site-packages)?

I'd probably go for a big set() of os.path.normpath()-ed filenames from importlib.metadata.

_py_dist_files = set()
for distribution in importlib_metadata.distributions():
    for file in distribution.files:
        _py_dist_files.add(os.path.normpath(os.path.realpath(file.locate())))


def is_python_distribution_file(path):
    return os.path.normpath(os.path.realpath(path)) in _py_dist_files

I'm honestly tempted to scrap universal2 support completely and just tell users to petition Apple to provide M1 support for VMware so that CI/CD on M1 is actually possible and we can forget this fat binary cross compile polava.

I wouldn't object to that. The less I have to deal with macOS and its oddities, the better... :)

@Legorooj ?

@rokm
Copy link
Member Author

rokm commented Feb 15, 2022

I'd probably go for a big set() of os.path.normpath()-ed filenames from importlib.metadata.

_py_dist_files = set()
for distribution in importlib_metadata.distributions():
    for file in distribution.files:
        _py_dist_files.add(os.path.normpath(os.path.realpath(file.locate())))


def is_python_distribution_file(path):
    return os.path.normpath(os.path.realpath(path)) in _py_dist_files

Do we have any idea what the performance implications are (e.g., in a conda environment with an assortment of deep learning and GUI frameworks installed)?

@bwoodsend
Copy link
Member

Do we have any idea what the performance implications are (e.g., in a conda environment with an assortment of deep learning and GUI frameworks installed)?

Hmm, takes about 2 seconds to crunch through the ~200 packages I have in my default Python environment. I believe that full Conda ships about 250 by default so it shouldn't be much worse - especially against the several minutes PyInstaller would likely take to do everything else in such a large environment.

@Legorooj
Copy link
Member

I'm fairly clueless when it comes to macOS's many architectures, I trust you guys.

Copy link
Member

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let's just go for it. I'm rather low on patience for Apple's self inflicted quirks.

@rokm
Copy link
Member Author

rokm commented Feb 18, 2022

Hmm, let's just go for it. I'm rather low on patience for Apple's self inflicted quirks.

OK. We can revisit this and tighten the check later, if necessary (FWIW, is_python_distribution_file approach seems reasonable to me).

@rokm rokm merged commit 9db19c9 into pyinstaller:develop Feb 18, 2022
@rokm rokm deleted the macos-arch-support-revision branch February 18, 2022 20:39
@rokm rokm added the backported Backported to v4 branch label Feb 18, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backported Backported to v4 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants