Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subprocesses are not fully isolated when --extend-pythonpath is used #138

Closed
itamarst opened this issue Mar 23, 2020 · 7 comments · Fixed by #150
Closed

Subprocesses are not fully isolated when --extend-pythonpath is used #138

itamarst opened this issue Mar 23, 2020 · 7 comments · Fixed by #150

Comments

@itamarst
Copy link
Contributor

itamarst commented Mar 23, 2020

It is useful to be able to launch subprocesses with the exact same PYTHONPATH as the parent
process, but it doesn't seem possible with shiv at the moment.

For example, if I have a virtualenv, and I either activate it or set PATH to use it, Python subprocesses will import from the virtualenv.

Shiv's --extend-pythonpath is close to that behavior, but not sufficient. Specifically, when the option is enabled, PYTHONPATH for subprocesses is extended (

# add our site-packages to the environment, if requested
if env.extend_pythonpath:
_extend_python_path(os.environ, sys.path[index:])
). That is, if there's already Python package X in the pre-existing PYTHONPATH, it will be used instead of X from the shiv.

The parent process, in contrast, adds the shiv site-packages to the start of sys.path:

# reorder to place our site-packages before any others found
sys.path = sys.path[:index] + sys.path[length:] + sys.path[index:length]

I can think of two ways to address this:

  1. Change --extend-pythonpath to match shiv's internal behavior. This is not backwards-compatible, however.
  2. Add a new option (--override-pythonpath maybe?). Potentially this would also involve deprecating the old option, but perhaps some people have a reason for that variant too.

If you have a preference, I can submit a PR with your preferred option.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 1, 2020

Just checking again; do you (the maintainers) have a preference?

@lorencarvalho
Copy link
Contributor

Hi @itamarst,

Sorry for the delay in getting back to you! Times are strange.

The reason that I implemented PYTHONPATH handling for shiv as an extension was to ensure that the behavior was consistent between the calling program and the subprocess.


Consider this example:

$ ls
setup.py  show_path.py

setup.py:

from setuptools import setup

setup(
    py_modules=["show_path"],
)

show_path.py:

import sys

def main():
    print()
    for index, element in enumerate(sys.path):
        print(f"{index}: {element}")

if __name__ == "__main__":
    main()

If I create a pyz using shiv, this program will simply print it's sys.path and exit:

$ shiv -e show_path.main -o example.pyz . --quiet
$ ./example.pyz

0: ./example.pyz
1: /export/apps/python/3.7.7/lib/python37.zip
2: /export/apps/python/3.7.7/lib/python3.7
3: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
4: /Users/lcarvalh/.shiv/example_78f13afe9894f0980c9df4b020d2eb4760ca75b3ab5fbd7589a61a149801f703/site-packages
5: /export/apps/python/3.7.7/lib/python3.7/site-packages

As you can see, a pyz created with shiv inserts it's site-packages onto sys.path before any other site-packages or dist-packages. Not first, just before any other {site,dist}-packages (this is important later). Let's see how it behaves with PYTHONPATH:

$ shiv -e show_path.main -o example.pyz . --quiet --extend-pythonpath
$ PYTHONPATH=/some/path ./example.pyz

0: ./example.pyz
1: /some/path
2: /export/apps/python/3.7.7/lib/python37.zip
3: /export/apps/python/3.7.7/lib/python3.7
4: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
5: /Users/lcarvalh/.shiv/example_a9ec8bd694bd688c0bcbaa8c1620f0667b88c5fe78e2bd8efaa0bc485fad1042/site-packages
6: /export/apps/python/3.7.7/lib/python3.7/site-packages

The PYTHONPATH variable is 2nd, this is standard Python interpreter behavior, and the frozen environment's site-packages are inserted before the interpreter's.
But, if our PYTHONPATH variable contains a site-packages or dist-packages directory, that changes the outcome:

$ PYTHONPATH=/uh/oh/site-packages ./example.pyz

0: ./example.pyz
1: /Users/lcarvalh/.shiv/example_123d9c51ad18ec10647814ac3c2bed077f07753c94a9c7d1b8fa90cc9b29a283/site-packages
2: /uh/oh/site-packages
3: /export/apps/python/3.7.7/lib/python37.zip
4: /export/apps/python/3.7.7/lib/python3.7
5: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
6: /export/apps/python/3.7.7/lib/python3.7/site-packages

As you can see, now our frozen environment comes before the PYTHONPATH's site-packages, ensuring that the frozen dependencies are searched before any others.


Now, if we add a new module to our package named parent.py, we can observe how sys.path is resolved when the additional Python process is spawned from a pyz:

parent.py:

import subprocess, sys

def main():
    subprocess.run(f"{sys.executable} show_path.py".split()).stdout

Normal pyz creation (note we changing the entry-point of the pyz):

$ shiv -e parent.main -o example.pyz . --quiet
$ ./example.pyz

0: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP
1: /export/apps/python/3.7.7/lib/python37.zip
2: /export/apps/python/3.7.7/lib/python3.7
3: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
4: /export/apps/python/3.7.7/lib/python3.7/site-packages

So, pretty similar output to before except that, as expected, the child process doesn't have the frozen environment from the pyz in it's sys.path! This was the whole motivation behind adding the -E --extend-pythonpath argument, so let's see what changes if we add it:

$ shiv -e parent.main -o example.pyz . --quiet --extend-pythonpath
$ ./example.pyz

0: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP
1: /export/apps/python/3.7.7/lib/python3.7/site-packages
2: /Users/lcarvalh/.shiv/example_123d9c51ad18ec10647814ac3c2bed077f07753c94a9c7d1b8fa90cc9b29a283/site-packages
3: /export/apps/python/3.7.7/lib/python37.zip
4: /export/apps/python/3.7.7/lib/python3.7
5: /export/apps/python/3.7.7/lib/python3.7/lib-dynload

🚨
I suspect this is what you are unhappy with, and frankly I am too.

If we are adding the frozen environment's site-packages to PYTHONPATH, it should be searched first. IMO shiv is currently violating principle of least surprise, and that's a bug we ought to fix.

I believe the behavior that you desire can be accomplished via https://github.com/linkedin/shiv/compare/itamarst-pythonpath-fixes which just re-orders the PYTHONPATH handling logic to ensure that the entire manipulated sys.path is added to PYTHONPATH, not just the slice from our site-packages index (though looking at the code now, sys.path[index+1:] might suffice?).

We have to use either the whole sys.path or a slice from it rather than the singular site_packages variable because the call to site.addsitedir can add additional items to sys.path via .pth files.

Here's how it behaves with that diff:

$ shiv -e parent.main -o example.pyz . --quiet --extend-pythonpath
$ ./example.pyz

0: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP
1: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP/example.pyz
2: /export/apps/python/3.7.7/lib/python37.zip
3: /export/apps/python/3.7.7/lib/python3.7
4: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
5: /Users/lcarvalh/.shiv/example_4b13b6efb441c79de0fc36ffb7fcc2fb0c8ca7a99c92d65315e93f2c69f78c8d/site-packages
6: /export/apps/python/3.7.7/lib/python3.7/site-packages

So now the frozen environment's site-packages is on sys.path before any others. I want to make sure that's the case if any PYTHONPATH site-packages are present too:

$ PYTHONPATH=/path/to/site-packages ./example.pyz

0: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP
1: /private/var/folders/4y/x7m68gms1cjcc8tpjlxcwz5h0005tq/T/tmp.TvGEFzcP/example.pyz
2: /Users/lcarvalh/.shiv/example_4b13b6efb441c79de0fc36ffb7fcc2fb0c8ca7a99c92d65315e93f2c69f78c8d/site-packages
3: /path/to/site-packages
4: /export/apps/python/3.7.7/lib/python37.zip
5: /export/apps/python/3.7.7/lib/python3.7
6: /export/apps/python/3.7.7/lib/python3.7/lib-dynload
7: /export/apps/python/3.7.7/lib/python3.7/site-packages

That also looks good to me, what do you think?
My only hesitation now is that it feels a little haphazard, especially these last two examples. While it now correctly places the frozen environment's site-packages, the rest of the elements are being shifted around in an inconsistent way that could cause problems.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2020

Thanks for the thorough response, and for propsing a fix! And no need to apologize for late reply.

The proposed fix does seem like it makes subprocesses have a more consistent-with-parent-PYTHONPATH, yes.

Perhaps unrelated, or perhaps a way to make the last example at least less haphazard: the fact that path order depends on the name of the directory ("site-packages") makes me feel a little uneasy. What were the circumstances that motivated this particular logic? Virtualenvs, for example, don't modify the PYTHONPATH env variable at all, they work by having a different sys.prefix.

@lorencarvalho
Copy link
Contributor

Hi again @itamarst,

Perhaps unrelated, or perhaps a way to make the last example at least less haphazard: the fact that path order depends on the name of the directory ("site-packages") makes me feel a little uneasy. What were the circumstances that motivated this particular logic? Virtualenvs, for example, don't modify the PYTHONPATH env variable at all, they work by having a different sys.prefix.

Putting the dependencies managed by shiv before any others in sys.path is essential to its functionality. Virtualenvs have a different sys.prefix as a side-effect of how they are created (by copying/linking the interpreter binary into the venv directory), whereas pyz files don't include an interpreter, they rely on a shebang.

So, the way we ensure that the dependencies that you include in your pyz are the ones that are always chosen is by extending Python's search path (sys.path) using the standard-library blessed way of adding additional site directories, using the site.addsitedir function [code].

site.addsitedir is a nice little function in that it adds the directory to sys.path and it evaluates any .pth files, but adding always appends, so if you have a third-party library installed into your global site-packages (via sudo pip install or even pip install --user) it will take precedence over what's in the pyz, unless we re-order sys.path and put our site-packages first. Hopefully that explains the rationale.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 6, 2020

I may not have been clear, sorry. The specific thing that bothers me a little is the logic described by your statement "if our PYTHONPATH variable contains a site-packages or dist-packages directory, that changes the outcome". That is, directories with particular names have special logic (somewhere).

@lorencarvalho
Copy link
Contributor

Oh sure, but that's not something that we specifically look for in PYTHONPATH, we always dice up the sys.path to ensure that our dependencies come first [code]. The interpreter will have already included PYTHONPATH items on sys.path by the time it gets to shiv's bootstrapping code, so the usual sys.path manipulation handles the special site-packages/dist-packages detection logic. We just want to make sure that PYTHONPATH cargoes the shiv-manipulated sys.path items to child processes as well.

I think the lack of clarity is that we are using PYTHONPATH as a means of informing subprocesses about the dependencies included by shiv, while simultaneously trying to avoid breaking assumptions about standard PYTHONPATH usage. Does that make sense?

@itamarst
Copy link
Contributor Author

itamarst commented Apr 6, 2020

def _first_sitedir_index():
just... makes me a little uncomfortable since it's based on name. But also this part of Python is full of awful edge cases, and I don't have a particularly strong feeling about it, so don't want to waste any more of your time on it :)

Back to the issue at hand, your proposed patch seems reasonable, if you want to merge it. Thanks for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants