Skip to content

Reducing and removing runtime binding errors for CPython extensions on macOS #103306

@rickmark

Description

@rickmark

Feature or enhancement

Utilize -bundle_loader to make C extensions to CPython more reliable.

Pitch

Today the majority of CPython extensions on macOS are built with lazy, dynamic bound symbols (-undefined dynamic_lookup). This is due to the fact that the linker would fail for a great many inputs. The reason for this is that distutils and others are not using the macOS system linker as intended. When compiling a "bundle" (a specialized type of dylib) one can and should pass an executable "bundle loader" (-bundle_loader) that defines all the symbols that are expected to be ambient prior to the loading of the bundle. This allows for the linker to correctly validate the closure of all symbols when producing the linked product. Due to the fact that it is possible to compile python the binary with a shared library, it would also be required that the binary need "re-export" all symbols from the shared library. This would make the python executable suitable for being used as the bundle loader during extension linking. The undefined switch can then be removed, and the linker will provide actual valuable information as to the extensions ability to execute at time of compile, perhaps due to an ABI change, rather then breaking at time of load.

As far as I can tell, there is no downside to "re-exporting" shared library symbols in the python binary, other then a negligable increase in binary size for the defined symbols.

From my current understanding, LLVM's lld will only accept a "bundle" or "executable" for the bundle loader. This means we cannot simply pass the shared library in as current. An alternative to re-exporting the symbols would be to create a "dummy" bundle that exports all the same symbols as the shared library. This dummy bundle on macOS can be used as the ABI and should always exactly match the headers used by the extension.

It may be even better to influence lld (and possibly ld64) to accept tbd or text based symbols or some other format of textual symbol as to not have to worry about the dummy bundle or re-exporting (after all the linker is really only using the exported symbols of the executable to verify that all symbols are in fact defined, not to bind them - the loader is not defined as a load command as that would create a circular dependency). The -undefined is too course grained if its on, then you have no verification all symbols are defined.

A simple proof of concept can be generated to create the dummy bundle and to use it in conjunction with the building of included core modules.

R

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-macbuildThe build process and cross-buildtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions