Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHONPATH is not picked up in isolated mode even if use_environment is 1 #101471

Closed
bazsi opened this issue Jan 31, 2023 · 8 comments
Closed

PYTHONPATH is not picked up in isolated mode even if use_environment is 1 #101471

bazsi opened this issue Jan 31, 2023 · 8 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@bazsi
Copy link

bazsi commented Jan 31, 2023

Bug report

I am embedding Python into a larger application (https://github.com/syslog-ng/syslog-ng)

As we were updating to Python 3.x we started using the new PyConfig based initialization, which worked nicely with Python 3.10 and is now broken with 3.11.1, it might be our fault, but I've now spent the better part of a day figuring out how things are intended to work. I am reading the documentation as well as reading the related source code.

When implementing our PyConfig based initialization, I choose to use the "isolated" mode, as I don't like Python interfering with our signal handlers and would like in general be as detached from the system as possible.

We are shipping our own "glue" modules written in Python and we want its location to be added to sys.path. We are also using a virtualenv at runtime, which we activate ourselves. Our glue modules and the virtualenv are in separate locations.

Glue modules: /usr/lib/syslog-ng/python/
Virtualenv: /var/lib/syslog-ng/python-venv/

This is how this is initialized:

static gboolean
_py_configure_interpreter(gboolean use_virtualenv)
{
  PyConfig config;
  PyConfig_InitIsolatedConfig(&config);

  gboolean success = use_virtualenv ? _py_configure_virtualenv_python(&config)
                     : _py_configure_system_python(&config);
  if (success)
    {
      Py_InitializeFromConfig(&config);
      PyConfig_Clear(&config);
      return TRUE;
    }
  return FALSE;
}

_py_configure_virtualenv_python() will in turn initialize config.pythonpath_env and config.argv

The problem:

Starting with 3.11 the value of pythonpath_env is not picked up even if I set it explicitly. The reason is probably the reimplementation of Python PATH calculation from C to Python in Modules/getpath.py

It seems that the new implementation does not pick this value up if use_environment is FALSE. This probably worked earlier, as use_environment only controlled the getenv() call but since I am setting this explicitly, the earlier path calculation picked it up anyway.

But I can understand that now even the path calculation logic is guarded by the value of use_environment, so I tried set use_environment to TRUE in my _py_configure_interpreter() function. This did not help, probably because at one point "isolated" being TRUE causes use_environment to be reset to FALSE, even if I set it to TRUE in my original code.

Breakpoint 1, config_init_import (config=0x7ffff6a971a0 <_PyRuntime+59904>, compute_path_config=1) at ../Python/initconfig.c:2074
2074	in ../Python/initconfig.c
(gdb) bt
#0  config_init_import (config=0x7ffff6a971a0 <_PyRuntime+59904>, compute_path_config=1) at ../Python/initconfig.c:2074
#1  0x00007ffff65faec1 in _PyConfig_InitImportConfig (config=<optimized out>) at ../Python/initconfig.c:2108
#2  init_interp_main (tstate=tstate@entry=0x7ffff6ab1158 <_PyRuntime+166328>) at ../Python/pylifecycle.c:1117
#3  0x00007ffff66005a3 in pyinit_main (tstate=0x7ffff6ab1158 <_PyRuntime+166328>) at ../Python/pylifecycle.c:1230
#4  Py_InitializeFromConfig (config=<optimized out>) at ../Python/pylifecycle.c:1261
#5  Py_InitializeFromConfig (config=<optimized out>) at ../Python/pylifecycle.c:1239
#6  0x00007ffff70a9821 in _py_configure_interpreter (use_virtualenv=1) at /source/modules/python/python-main.c:515
#7  0x00007ffff70a98b4 in _py_init_interpreter (use_virtualenv=1) at /source/modules/python/python-main.c:593
#8  0x00007ffff70a992c in python_module_init (context=0x555555578ba0, args=0x0) at /source/modules/python/python-plugin.c:77
#9  0x00007ffff7eec245 in plugin_load_module (context=0x555555578ba0, module_name=0x5555555b1ea0 "mod-python", args=0x0) at /source/lib/plugin.c:405
#10 0x00007ffff7eebfa0 in plugin_find (context=0x555555578ba0, plugin_type=1, plugin_name=0x555555576590 "python") at /source/lib/plugin.c:337
#11 0x00007ffff7ec2c22 in cfg_find_plugin (cfg=0x555555578b90, plugin_type=1, plugin_name=0x555555576590 "python") at /source/lib/cfg.c:239
#12 0x00007ffff7efc7c5 in main_parse (lexer=0x555555610d50, dummy=0x7fffffffe868, arg=0x0) at /source/lib/cfg-grammar.y:514
#13 0x00007ffff7ec9013 in cfg_parser_parse (self=0x7ffff7fbb040 <main_parser>, lexer=0x555555610d50, instance=0x7fffffffe868, arg=0x0) at /source/lib/cfg-parser.c:408
#14 0x00007ffff7ec39d5 in cfg_run_parser (self=0x555555578b90, lexer=0x555555610d50, parser=0x7ffff7fbb040 <main_parser>, result=0x7fffffffe868, arg=0x0) at /source/lib/cfg.c:596
#15 0x00007ffff7ec3ceb in cfg_read_config (self=0x555555578b90, fname=0x55555555b800 "etc/syslog-ng-python.conf", preprocess_into=0x0) at /source/lib/cfg.c:683
#16 0x00007ffff7ee343e in main_loop_read_and_init_config (self=0x7ffff7fbe660 <main_loop>) at /source/lib/mainloop.c:618
#17 0x00005555555569e3 in main (argc=1, argv=0x7fffffffea28) at /source/syslog-ng/main.c:307
(gdb) p config
$1 = (PyConfig *) 0x7ffff6a971a0 <_PyRuntime+59904>
(gdb) p config.use_environment
$2 = 0
(gdb) p *config
$5 = {_config_init = 3, isolated = 1, use_environment = 0, dev_mode = 0, install_signal_handlers = 0, use_hash_seed = 0, hash_seed = 0, faulthandler = 0, tracemalloc = 0, import_time = 0, 
  code_debug_ranges = 1, show_ref_count = 0, dump_refs = 0, dump_refs_file = 0x0, malloc_stats = 0, filesystem_encoding = 0x5555555ed400 L"ANSI_X3.4-1968", 
  filesystem_errors = 0x555555564e80 L"surrogateescape", pycache_prefix = 0x0, parse_argv = 0, orig_argv = {length = 1, items = 0x555555606e10}, argv = {length = 1, 
    items = 0x555555582c10}, xoptions = {length = 0, items = 0x0}, warnoptions = {length = 0, items = 0x0}, site_import = 1, bytes_warning = 0, warn_default_encoding = 0, inspect = 0, 
  interactive = 0, optimization_level = 0, parser_debug = 0, write_bytecode = 1, verbose = 0, quiet = 0, user_site_directory = 0, configure_c_stdio = 0, buffered_stdio = 1, 
  stdio_encoding = 0x5555555f09b0 L"ANSI_X3.4-1968", stdio_errors = 0x55555560bab0 L"surrogateescape", check_hash_pycs_mode = 0x55555560a8e0 L"default", use_frozen_modules = 1, 
  safe_path = 1, pathconfig_warnings = 0, program_name = 0x0, pythonpath_env = 0x555555586d40 L"/install/etc/python:/install/lib/syslog-ng/python", home = 0x0, platlibdir = 0x0, 
  module_search_paths_set = 0, module_search_paths = {length = 0, items = 0x0}, stdlib_dir = 0x0, executable = 0x0, base_executable = 0x0, prefix = 0x0, base_prefix = 0x0, 
  exec_prefix = 0x0, base_exec_prefix = 0x0, skip_source_first_line = 0, run_command = 0x0, run_module = 0x0, run_filename = 0x0, _install_importlib = 1, _init_main = 1, 
  _isolated_interpreter = 0, _is_python_build = 0}

As you can see, use_environment is FALSE, even though I set it to TRUE, originally.

As an alternative, I've switched to using "normal" config and reset most config values to match those of the isolated config.

@@ -492,8 +504,14 @@ static gboolean
 _py_configure_interpreter(gboolean use_virtualenv)
 {
   PyConfig config;
-  PyConfig_InitIsolatedConfig(&config);
-
+  PyConfig_InitPythonConfig(&config);
+  
+  /* to pick up PYTHONPATH */
+  config.configure_c_stdio = 0;
+  config.install_signal_handlers = 0;
+  config.parse_argv = 0;
+  config.pathconfig_warnings = 0;
+  config.user_site_directory = 0;
   gboolean success = use_virtualenv ? _py_configure_virtualenv_python(&config)
                      : _py_configure_system_python(&config);
   if (success)

This seems to work now, but I have a feeling this behaviour is not entirely intentional.

Your environment

I am doing all the above in a Debian testing based container:

(dbld)bazsi@bzorp:/install$ dpkg -l python\* | grep ii
ii  python3                3.11.1-2      amd64        interactive high-level object-oriented language (default python3 version)
ii  python3-dbg            3.11.1-2      amd64        debug build of the Python 3 Interpreter (version 3.11)
ii  python3-dev            3.11.1-2      amd64        header files and a static library for Python (default)
ii  python3-distutils      3.10.8-1      all          distutils package for Python 3.x
ii  python3-lib2to3        3.10.8-1      all          Interactive high-level object-oriented language (lib2to3)
ii  python3-minimal        3.11.1-2      amd64        minimal subset of the Python language (default python3 version)
ii  python3-pip            22.3.1+dfsg-2 all          Python package installer
ii  python3-pip-whl        22.3.1+dfsg-2 all          Python package installer (pip wheel)
ii  python3-pkg-resources  65.6.3-1      all          Package Discovery and Resource Access using pkg_resources
ii  python3-ply            3.11-5        all          Lex and Yacc implementation for Python3
ii  python3-setuptools     65.6.3-1      all          Python3 Distutils Enhancements
ii  python3-setuptools-whl 65.6.3-1      all          Python Distutils Enhancements (wheel package)
ii  python3-venv           3.11.1-2      amd64        venv module for python3 (default python3 version)
ii  python3-wheel          0.38.4-1      all          built-package format for Python
ii  python3.11             3.11.1-2      amd64        Interactive high-level object-oriented language (version 3.11)
ii  python3.11-dbg         3.11.1-2      amd64        Debug Build of the Python Interpreter (version 3.11)
ii  python3.11-dev         3.11.1-2      amd64        Header files and a static library for Python (v3.11)
ii  python3.11-minimal     3.11.1-2      amd64        Minimal subset of the Python language (version 3.11)
ii  python3.11-venv        3.11.1-2      amd64        Interactive high-level object-oriented language (pyvenv binary, version 3.11)
@bazsi bazsi added the type-bug An unexpected behavior, bug, or error label Jan 31, 2023
@bazsi
Copy link
Author

bazsi commented Jan 31, 2023

This might be the culprit:

initconfig.c, _PyConfig_Read() function, but I am not sure:

   2940     assert(config->isolated >= 0);
   2941     if (config->isolated) {
   2942         config->safe_path = 1;
   2943         config->use_environment = 0;
   2944         config->user_site_directory = 0;
   2945     }

My breakpoint triggers on this like if I start it with config.use_environment==1

Breakpoint 1, _PyConfig_Read (config=0x7fffffffbd20, compute_path_config=0) at ../Python/initconfig.c:2942
2942 ../Python/initconfig.c: No such file or directory.
(gdb) c

bazsi added a commit to bazsi/syslog-ng that referenced this issue Jan 31, 2023
This is an attempt to work around
python/cpython#101471 until a more permanent
solution is found.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
@zooba
Copy link
Member

zooba commented Feb 3, 2023

This behaviour is intentional. The documentation is likely unclear, but the isolated flag overrules use_environment (and the others you noticed).

If you want to pick up paths from the environment, don't use an isolated config, or (much much better), read them yourself and add them to the search path directly. You might even consider using a different variable for your app, to avoid being broken by users who set PYTHONPATH for other purposes.

@zooba zooba closed this as completed Feb 3, 2023
@bazsi
Copy link
Author

bazsi commented Feb 3, 2023 via email

@zooba
Copy link
Member

zooba commented Feb 6, 2023

There aren't really any "standard" directories once you're embedding the interpreter - it's all up to you at that point. So make sure you include the directory that has the standard library (putting the stdlib into a zip file is often a good idea, too), and the directory that has the standard library extension modules, and also any other directories you want to reference.

(Personally if I had my way, I'd pull all the "standard" path configuration out of the DLL and put it in python.exe, since it really does relate more to the entry point than to the runtime. But we're well past the point where that would be a compatible change, unfortunately.)

@bazsi
Copy link
Author

bazsi commented Feb 6, 2023

In my embedded use-case I want to allow the use of the Python stdlib.

I don't want to replicate what is in Modules/getpath.py to achieve that.

@zooba
Copy link
Member

zooba commented Feb 6, 2023

If you installed Python as part of your app (which is what we call "embedding"), then you should know the directory already. It's going to be relative to your app, at a location you get to decide. Just hard code it in relative to your install dir.

If you're using a separate install of Python, then you may as well just assume the locations relative to the install location, since you already have to assume a lot of other stuff about it to load and initialise the DLL directly. Including the entire runtime as part of your application is much safer.

@bazsi
Copy link
Author

bazsi commented Feb 6, 2023

If you installed Python as part of your app (which is what we call "embedding"), then you should know the directory already. It's going to be relative to your app, at a location you get to decide. Just hard code it in relative to your install dir.

If you're using a separate install of Python, then you may as well just assume the locations relative to the install location, since you already have to assume a lot of other stuff about it to load and initialise the DLL directly. Including the entire runtime as part of your application is much safer.

I am using the system Python instance on a Linux distribution and I don't know where the libpython.so resides, I just link against it using -lpython3.10 as per the python-3.10-embed.pc pkg-config file (which is pulled in by the configure script).

$ cat /usr/lib/x86_64-linux-gnu/pkgconfig/python-3.10-embed.pc 
# See: man pkg-config
prefix=/usr
exec_prefix=${prefix}
libdir=/usr/lib/x86_64-linux-gnu
includedir=${prefix}/include

Name: Python
Description: Embed Python into an application
Requires:
Version: 3.10
Libs.private: -lcrypt -ldl  -lm
Libs: -L${libdir} -lpython3.10
Cflags: -I${includedir}/python3.10

Although prefix and exec_prefix are specified in the pkg-config file, but that starts along the path that is a reimplementation of getpath.py, which takes those values and calculates sys.path

If I start python3, I get this sys.path without any environment variables:

$ python3
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.10/dist-packages']
>>> 
```

If I start it using -I:
```
$ python3 -I
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.10/dist-packages']
>>> 
```

When I run this inside my embedding application, in isolated mode, sys.path contains (this is using 3.11, the previous ones 3.10 were on my workstation):

```
['/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/install/var/python-venv/lib/python3.11/site-packages']
```

The virtualenv was activated, is still has the directory for the system package directory. How do I come up with the rest of that list?

Thanks for your help.

@zooba
Copy link
Member

zooba commented Feb 6, 2023

I know a lot less about embedding on Linux, but I recognise that embedding the whole runtime is more difficult than on Windows.

You might be best off to just add the few calls to read PYTHONPATH manually and insert it into sys.path before running any of your user code.

Genfood pushed a commit to Genfood/syslog-ng that referenced this issue Jun 14, 2023
This is an attempt to work around
python/cpython#101471 until a more permanent
solution is found.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants