Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing info about currently executing shiv to child subprocesses #40

Closed
kennydo opened this issue Jun 21, 2018 · 10 comments
Closed

Passing info about currently executing shiv to child subprocesses #40

kennydo opened this issue Jun 21, 2018 · 10 comments

Comments

@kennydo
Copy link

kennydo commented Jun 21, 2018

I have a particular python script that uses subprocess to create child processes that execute another script. What is the recommended way of passing down the info about the shiv's site directory down to the child?

As an example, suppose I made a shiv like this:

$ shiv requests -o foo.shiv

I then make a parent.py that creates a child process:

import os
import subprocess
import sys

print("Parent sys.executable:", sys.executable)
print("Parent sys.path:", sys.path)
print("Parent sys.argv", sys.argv)
print("Parent PYTHONPATH", os.environ.get("PYTHONPATH"))

child = subprocess.run([
    sys.executable,
    'child.py',
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)

print(child)

And then I make a child.py that makes use of requests, like so:

import requests

print(requests)

When I run it, I get output that looks like this:

$ ./foo.shiv parent.py
Parent sys.executable: /Users/kedo/.pyenv/versions/3.6.5/bin/python
Parent sys.path: ['./foo.shiv', '/Users/kedo/.pyenv/versions/3.6.5/lib/python36.zip', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6/lib-dynload', '/Users/kedo/.local/lib/python3.6/site-packages', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6/site-packages', '/Users/kedo/Workspace/shiv/src', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6/site-packages/importlib_resources-0.8-py3.6.egg', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg', '/Users/kedo/.pyenv/versions/3.6.5/lib/python3.6/site-packages/wheel-0.31.1-py3.6.egg', '/Users/kedo/.shiv/foo_0550dd79-b895-48c0-91fa-11306e18f9cd/site-packages']
Parent sys.argv ['parent.py']
Parent PYTHONPATH None
CompletedProcess(args=['/Users/kedo/.pyenv/versions/3.6.5/bin/python', 'child.py'], returncode=1, stdout=b'', stderr=b'Traceback (most recent call last):\n  File "child.py", line 1, in <module>\n    import requests\nModuleNotFoundError: No module named \'requests\'\n')

Is there a non-hacky way for parent to know that it's running via a shiv, and that it needs to pass down info about the shiv to the child? Could we set some sort of environment variable in https://github.com/linkedin/shiv/blob/master/src/shiv/bootstrap/__init__.py#L86 that parent.py could check and use?

@lorencarvalho
Copy link
Contributor

Is there a non-hacky way

Not sure about non-hacky, but I can make your example use case by simply swapping sys.executable for sys.path[0]. Only because the first path element is the shiv executable itself. That's not behavior I'd be comfortable relying on in any sort of production capacity, but if you had a foolproof way of accessing the path to the pyz/shiv file itself you could simply use that.

@kennydo
Copy link
Author

kennydo commented Jun 21, 2018

Yea, relying on sys.path[0] is not ideal.

We currently use symlinks to make a certain path point to the latest shiv to use, but it would be nice to be able to move the shiv around (or have shivs in different locations) without having to update parent.py's code to look in the new location.

What I'd like is something like this: https://github.com/linkedin/shiv/compare/master...kennydo:kedo-set-env-var?expand=1. So parent.py can 1) know that it's being run with a shiv and 2) be able to instantiate a child process using the same shiv (since it resolves the filename), all while 3) not hard-coding the path of the shiv.

@lorencarvalho
Copy link
Contributor

@kennydo what about using the shiv as both your script's shebang?

darwin ~ ❱❱❱ shiv -o ~/bin/requests.shiv requests -q
darwin ~ ❱❱❱ cat parent.py
#!/usr/bin/env requests.shiv

import subprocess

child = subprocess.run([
    './child.py',
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)

print(child)
darwin ~ ❱❱❱ cat child.py
#!/usr/bin/env requests.shiv

import requests

print(requests)
darwin ~ ❱❱❱ ./parent.py
CompletedProcess(args=['./child.py'], returncode=0, stdout=b"<module 'requests' from '/Users/lcarvalh/.shiv/requests_b97d7357-468b-4b32-a657-2d373c9dda86/site-packages/requests/__init__.py'>\n", stderr=b'')

@kennydo
Copy link
Author

kennydo commented Jun 26, 2018

That approach works, but still requires the child to know the name of the shiv. I think it'd be great if the child file didn't have to know that.

In the meantime, I'm making do with having the parent look at sys.path and then passing down possible shiv extraction directories into the child via PYTHONPATH. It's hacky, but works!

@pfmoore
Copy link
Contributor

pfmoore commented Oct 30, 2018

As a suggestion for a more formal fix, how about this? We add an extra argument to shiv, something like --add-to-pythonpath, and if that option is present, the bootstrap code appends the shiv's site-packages to the PYTHONPATH environment variable before calling the entry point.

This works in my use case (creating a zipapp for pew), at least based on a simple test. I've suggested making the PYTHONPATH modification opt-in at the time the zipapp is created, because for most cases it's not needed, and modifying the environment potentially has other effects, so it should be the user's choice.

If this seems like a reasonable approach, I can create a PR for it.

@lorencarvalho
Copy link
Contributor

@pfmoore sounds great! can we call it --extend-pythonpath? looking forward to a PR

@pfmoore
Copy link
Contributor

pfmoore commented Oct 30, 2018

Sounds good. I'll pull something together tonight.

@pfmoore
Copy link
Contributor

pfmoore commented Oct 30, 2018

@sixninetynine What's the policy on tests? The best way to test the new code would probably be to refactor bootstrap() a bit to make it more testable. But that would end up being extra code changes. The alternative would be to do an integration-style test and build a zipapp that reports $PYTHONPATH and check it includes the value we expect. Do you have a preference?

@lorencarvalho
Copy link
Contributor

@pfmoore yeah, I intend to refactor bootstrap out a bit (for the reason you cite as well as to inch closer to closing #36) so for now a straight integ test or augmenting this integration test should be enough :)

@lorencarvalho
Copy link
Contributor

fixed in #76

thanks @pfmoore !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants