Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removing PYTHONHOME dependency from a static libpython? #57

Open
mvphilip opened this issue Aug 28, 2020 · 4 comments
Open

removing PYTHONHOME dependency from a static libpython? #57

mvphilip opened this issue Aug 28, 2020 · 4 comments

Comments

@mvphilip
Copy link

Hello, First thank you for a great package. I spent quite a bit of time trial and erroring what python-build-standalone does well.

My question is that I have a simple c program which is essentially the example from the python doc on how to embed libpython.
I've managed to compile a fully static exec. As you've already solved the issue of module importing with PyOxidizer for a python executable, would you have any thoughts or suggestions on the libpython embedded side? I mean beyond the trivial mechanism of storing the files in the exec and using local disk or shm to extract out the files when needed? I do not want to modify
python source if I can help it as that would make any method highly dependent on the specific python version. But perhaps modifying the Python module initialization to load from memory instead of a file is the easiest way forward.

Any feedback would be highly appreciated.

Thank you,

-mp

@mvphilip
Copy link
Author

I would also like to ask if you would be interested in how I was able to make a static compile from the python-build-standalone output? In addition, I'm working to try and reduce the size of the PYTHONHOME dependency for my embedded case. I may be able to provide a post build script to generate a minimal size "install" dir for static embedded versions as well.

Thank you,

-mp

@indygreg
Copy link
Owner

For importing modules from memory, have you seen oxidized_importer? https://gregoryszorc.com/blog/2020/05/10/using-rust-to-power-python-importing-with-oxidized_importer/

As for compiling the output of python-build-standalone, the process varies depending on the platform, programming language, and python-build-standalone distribution used.

On Linux, if you want a fully static executable, you'll need to use the musl libc distribution. You should be able to link the python/install/lib/libpython3.8.a static library into your executable and have a fully embedded Python. If you are OK having a dependency on e.g. libc.so.6, you can use the non-musl Linux distribution. I just noticed that this distribution only distributes a dynamically linked libpython. It could be useful to distribute a statically linked version as well so it can be linked more easily. But on Linux, a static archive is effectively a tar archive of all the composite object files. And the python-build-standalone distribution already contains these .o files and annotates them in the PYTHON.json file. So you could statically link by feeding all those .o files into your link command. Or you could run ar to link all those .o into a .a.

It's a legitimate feature request for the documentation to be improved here. I also think it would be a legitimate request to ask for scripts in the distribution to help with this. e.g. you could run a scripts/create-static-archive.py script to produce a static library from the appropriate .o files.

@mvphilip
Copy link
Author

Thank you for the quick reply. I wasnt so clear perhaps on my question. I already have a fully static executable in linux by
linking to the static libs after python-build-standalone is run. The linking is fine. My issue the ancillary files required
by python even if embedded. For example the encoding.py files etc appear to be required by Py_Initialize() . This has
nothing to do with c library linking. Python seems to need a PYTHONHOME defined and files in a certain order.

In the docs/status.rst I see you've also run into this issue:

test_executable_without_cwd (test.test_subprocess.ProcessTestCaseNoPoll) ... Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: initfsencoding: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007fd77c231740 (most recent call first):
FAIL

This is exactly my problem. It goes away if I specify a PYTHONHOME env var to python-build-standalone's build path.

Thank you for oxidizer_importer. I believe I can use that to find the minimum set of files required by libpython and
somehow club together a temporary file store etc in order to be able to run Py_Initialize().

I will see what I can do about a script to produce a static archive. In the meantime if someone stumbles across this,
I will paste a snip should help you compile.

Thank you again,

-mp

To compile use the following. The ordering is explicit as there are dependencies. PYTHON_DIST_PATH is where you
have untard the python-build-standalone tar file. I have not tested on python3.8 but the ordering should be the
same.

x86_64-linux-musl-g++ -static test.cxx -IPYTHON_DIST_PATH/python/install/include/python3.7m
PYTHON_DIST_PATH/python/install/lib/libpython3.7m.a
PYTHON_DIST_PATH/python/build/lib/libuuid.a
PYTHON_DIST_PATH/python/build/lib/libpanelw.a
PYTHON_DIST_PATH/python/build/lib/libdb.a
PYTHON_DIST_PATH/python/build/lib/libsqlite3.a
PYTHON_DIST_PATH/python/build/lib/libbz2.a
PYTHON_DIST_PATH/python/build/lib/liblzma.a
PYTHON_DIST_PATH/python/build/lib/libz.a
PYTHON_DIST_PATH/python/build/lib/libreadline.a
PYTHON_DIST_PATH/python/build/lib/libffi.a
PYTHON_DIST_PATH/python/build/lib/libtk8.6.a
PYTHON_DIST_PATH/python/build/lib/libtcl8.6.a
PYTHON_DIST_PATH/python/build/lib/libX11.a
PYTHON_DIST_PATH/python/build/lib/libxcb.a
PYTHON_DIST_PATH/python/build/lib/libXau.a
PYTHON_DIST_PATH/python/build/lib/libncursesw.a
PYTHON_DIST_PATH/python/build/lib/libtls.a
PYTHON_DIST_PATH/python/build/lib/libssl.a
PYTHON_DIST_PATH/python/build/lib/libcrypto.a
-lpthread -ldl -lutil -lm

And a sample test.cxx to validate if the above works:

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include // Required if using g++ ; doesnt hurt if musl
#include // Required if using g++; doesnt hurt if musl
#include <stdio.h>

int main(int argc, char* argv[])
{
wchar_t* program = Py_DecodeLocale(argv[0], NULL);
if (program == NULL) {
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
Py_SetProgramName(program); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print('Today is', ctime(time()))\n");
if (Py_FinalizeEx() < 0) {
exit(120);
}
PyMem_RawFree(program);
return 0;
}

@indygreg
Copy link
Owner

The overarching issue here is that the Python runtime can't find the .py files constituting the Python standard library. That's likely because the default search path compiled into the binary reflects the build environment instead of the run-time layout.

(import encodings is usually the first import serviced by the Python interpreter during its initialization that is also backed by a .py/.pyc file - so if you see a failure to import encodings it almost always means the Python standard library couldn't be located.)

You have a few options for this.

One is to call a Python C API to set the path of the Python standard library based on the executable path. You could use Py_SetPythonHome() in place of an environment variable. Or if you are targeting Python 3.8+, I highly recommend using the new initialization APIs introduced in that version: https://docs.python.org/3/c-api/init_config.html. You can set module_search_paths to an explicit list of paths to use for sys.path upon interpreter initialization. See https://docs.python.org/3/c-api/init_config.html#c.Py_InitializeFromConfig for an example.

Modules like encodings are imported as part of interpreter startup via Py_Initialize(). So you are limited to using the Python module importers that are part of the standard library to service imports during Py_Initialize(). However, there are crafty ways around this and these workarounds are part of what makes PyOxidizer able to run fully statically linked binaries with no .py file dependencies and no run-time file extraction to the filesystem!

The solution is described in writing at https://github.com/indygreg/PyOxidizer/blob/dc74b305a6f0fe763d04881d3771e484bc4d9659/pyembed/src/technotes.rs and in code in the pyembed Rust crate. The relevant code begins at https://github.com/indygreg/PyOxidizer/blob/dc74b305a6f0fe763d04881d3771e484bc4d9659/pyembed/src/interpreter.rs#L336. See also https://github.com/indygreg/PyOxidizer/blob/dc74b305a6f0fe763d04881d3771e484bc4d9659/pyembed/src/importer.rs#L1501. The short version is we use the multi-phase initialization API introduced in Python 3.8. After the 1st phase, we construct an OxidizedImporter instance and register it on sys.meta_path so it can service imports [from memory]. We then proceed with the rest of interpreter initialization. Since the meta-path importer is registered, it is called to service import encodings and other .py-implemented modules during interpreter startup. These modules are retrieved from memory (if configured in that mode) and everything just works.

If you are using Python 3.7, the hacks to make this all work are a lot uglier, but achievable. I highly recommend targeting Python 3.8. If you want to use Python 3.7, I can send you links to the relevant code. I think PyOxidizer Git commit ca9a4bb0856a029bc05e00532c8e5f3651496dd6 is the final one before we dropped Python 3.7 support and switched to the 3.8+ APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants