Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running scripts with dependencies using pipx #913

Closed
pfmoore opened this issue Nov 21, 2022 · 23 comments · Fixed by #916
Closed

Allow running scripts with dependencies using pipx #913

pfmoore opened this issue Nov 21, 2022 · 23 comments · Fixed by #916
Labels
enhancement New feature or request

Comments

@pfmoore
Copy link
Member

pfmoore commented Nov 21, 2022

How would this feature be useful?
At the moment, to run a Python script that depends on 3rd party packages, it is necessary to manually create a virtual environment, populate it with the dependencies, and then run the script. And then delete the environment (or keep it if you think you're likely to re-run the script and remember to delete it later).

pipx run allows you to run packages in a temporary venv with all dependencies set up, but it doesn't work for scripts. It allows you to run scripts, but that doesn't use any of the venv mechanisms, just running the script with the system Python interpreter.

Describe the solution you'd like
If a script needs dependencies, use a temporary virtualenv, the same as for packages, and run the script in that.

To determine if a script needs dependencies, a very simplistic approach would be to check if --pip-args was specified. So invocation would be something like:

pipx run --pip-args numpy file:myscript.py

Even better would be to parse the script, looking for an embedded list of dependencies to install. This would require agreeing on a format for the list. My initial proposal would be something simple, like:

# Requirements:
# <Specifier>
# <Specifier>
<blank line>

The specifiers could be any requirement spec acceptable to pip, so numpy, or click>=7.0, or even a URL.

Describe alternatives you've considered
The pip-run command offers similar functionality, but it recreates the environment every time, which makes for very slow runtimes. It also has (IMO) a clumsy UI for specifying the packages to install and the script invocation.

Possible Issues
Script dependencies could potentially change more frequently than package dependencies do. So there's a potential for the cached environment to no longer match the declared requirements. This is also a problem with a package, though, so it's possibly something we can live with. Maybe having an option to re-create the cached environment would be enough to alleviate this issue?

Implementation
I would be happy to create a PR implementing this feature, if the view is that it would be a good idea.

@dukecat0 dukecat0 added the enhancement New feature or request label Nov 21, 2022
@uranusjr
Copy link
Member

Similar request in the past: #562. If we’re doing that I think a comment (or maybe docstring?) would be better than a global __requires__.

@pfmoore
Copy link
Member Author

pfmoore commented Nov 22, 2022

Thanks for the link. I did a search but didn't spot that request. I agree a comment is better than __requires__, or anything that affects runtime, as the data is static and should be clearly identifiable as so in the source code.

I'll start work on a PR for this, as the comments on #562 seemed favourable.

@agoose77
Copy link

agoose77 commented Nov 22, 2022

I wrote a reply to @pfmoore in jaraco/pip-run#52 (comment), but I'm following up here so that I can track this discussion :)

In that comment, I suggested making a plugin system for pipx, so that this would generalise beyond a single Python script with a particular metadata format.

For example Conda used to support (at least, from my reading around — I never used it) specifying package dependencies inside a Jupyter Notebook. This meant that Conda could provision an environment in which to run the notebook from the notebook itself. It would be fun to support that with pipx, e.g.
pipx run-script /tmp/app.ipynb -- some-args. The plugin responsible for integrating this with pipx would build a wheel in which the app entrypoint equates to jupyter run <NOTEBOOK>.

Let me be clear, I'm not saying that everyone should go around creating notebook binary applications. But rather, Hatch's plugin system has been fantastically useful as a package author, and I can see how generalising it here would be handy.

@petsuter
Copy link

(PEP 484 defines a # type: ignore comment, and mypy also knows about e.g. # mypy: ignore-errors comments and lines combined with # noqa comments for linters:

import PIL # type: ignore # noqa

Maybe something like this would also be good here?

import PIL # type: ignore # noqa # pipx: Pillow>=9.0.0

)

@pfmoore
Copy link
Member Author

pfmoore commented Nov 22, 2022

In that comment, I suggested making a plugin system for pipx, so that this would generalise beyond a single Python script with a particular metadata format.

Personally, I think this should start small - let's not over-engineer things to start with. A plugin system would be a far bigger change, and should probably tie in to much more than just this command (plugins for venv creation, for installing dependencies, for Python interpreter discovery, ...) I'm not at all sure anything like that is needed in pipx, and I don't want to start down that route with this change.

Making this functionality plugin-based can be done as a follow-up PR1 if people are interested.

PEP 484 defines a # type: ignore comment, and mypy also knows about e.g. # mypy: ignore-errors comments and lines combined with # noqa comments for linters

I personally hate that style. It would also be incredibly complex to parse (particularly in combination with the other annotations) and would mean that the information could be scattered throughout the code (imports can go anywhere). It would also suggest that a construct like

if WINDOWS:
    import colorama # pipx: colorama

would only install colorama on Windows, which isn't something we can implement.

So no, I'm a strong -1 on this idea.

Footnotes

  1. By someone else - I don't have any particular interest in doing it myself.

@agoose77
Copy link

agoose77 commented Nov 22, 2022

Personally, I think this should start small - let's not over-engineer things to start with. A plugin system would be a far bigger change, and should probably tie in to much more than just this command (plugins for venv creation, for installing dependencies, for Python interpreter discovery, ...) I'm not at all sure anything like that is needed in pipx, and I don't want to start down that route with this change.

Right, there are two sides to this; the end user experience, and the means of actually delivering something from a development perspective. With that in mind

It would also be incredibly complex to parse (particularly in combination with the other annotations)

Agreed. I think using comments would be a better solution than requiring the definition to be valid Python, though. It would be nice if tools that are provisioning the environment didn't need to evaluate Python code (even via literal_eval). Although we're currently talking about this in the context of pipx, it would be beneficial to tooling to avoid that hard dependency unless we require it. I'd prefer something like TOML (as it's in the stdlib, and is what we use for pyproject.toml, e.g.

# dependencies =  ["numpy"]
import numpy

I note that your original comment suggests a custom syntax, which I'm also neutral on. The benefit of using TOML is that we don't need any additional parsing logic beyond

import tomllib

leading_comments = []
for line in file:
    if not leading_comments and line.startswith("#!"):
        continue
    if not line.startswith("#"):
        break
    leading_comments.append(line.lstrip("#"))
try:
    metadata = tomllib.loads("\n".join(leading_comments))
except ...:
    return 

dependencies = metadata['dependencies']

Another thought that I've had is that there could be scope for this to work with shebangs, e.g.

#!/usr/bin/env pipx-run
# dependencies = ["numpy"]
import numpy

This would probably require a distinct entrypoint in order to pass the appropriate arguments to `pipx, but it would be fantastic to be able to implement executables from Python scripts like this.

@petsuter
Copy link

petsuter commented Nov 22, 2022

Parsing: I assumed re.findall('#\s*pipx:([^#\n])*', source) would be good enough, but have not really thought about it and am probably wrong.
I assumed one would be free to place the comment wherever, so if one prefers them all at the top one can do that; if sometimes one prefers them bundled with the import that seemed really cool too for simple scripts.

Good point about the OS check. Using the normal environment markers would seem reasonable:

if WINDOWS:
    import colorama # pipx: colorama==2.1; platform_system == "Windows"

But was just an idea. Keeping things simple for sure is a good approach in general. 👍

@pfmoore
Copy link
Member Author

pfmoore commented Nov 22, 2022

I think using comments would be a better solution than requiring the definition to be valid Python, though.

I think we're talking past each other here. What I am proposing (see my original post) is to recognise a block of comment lines in the source code. The first line must be # Requirements: and subsequent lines must be a hash, whitespace, and a requirement specifier, as defined by PEP 508. The requirements block is terminated by a blank line (or probably any line that doesn't start with a hash, I see little point in rejecting something "obviously" valid just because it misses out a blank line).

The requirement specifiers will be passed to pip to install.

There is no standard for embedding dependency specifications in a Python source code file. If there were, I'd expect pipx to follow that. But in the absence of a standard, this is purely a syntax recognised by pipx, and while I'm happy to make it easy for other tools to parse, that's as far as it seems worth going to me.

I will note that pip-run defines embedded requirements using a __requires__ variable in the code. But I don't think that's a good design, and indeed the pip-run maintainer is considering changing it, in this issue. Note that my proposed syntax here is essentially one of the proposed options in that issue, so we could be interoperable with pip-run if they make that change.

@agoose77
Copy link

I think we're talking past each other here

Probably, seems to happen fairly easily on these kinds of forums :)

I see little point in rejecting something "obviously" valid just because it misses out a blank line).

Agreed; permissive where sensible.

There is no standard for embedding dependency specifications in a Python source code file. If there were, I'd expect pipx to follow that. But in the absence of a standard, this is purely a syntax recognised by pipx, and while I'm happy to make it easy for other tools to parse, that's as far as it seems worth going to me.

Right, and I agree with the spirit of this :) I'm suggesting that imposing a constraint like "this needs to be valid markup for X" where "X" already has a parser means that one can be very explicit about what is supported. If we don't need to impose our own format, doing something like TOML, YAML (no stdlib parser), JSON, or INI would mean that we don't automatically make it harder for other tools to read this block without good reason.

If pip-run is not considering this syntax, I might make a case for it there, too :)

@pfmoore
Copy link
Member Author

pfmoore commented Nov 22, 2022

I'm suggesting that imposing a constraint like "this needs to be valid markup for X" where "X" already has a parser means that one can be very explicit about what is supported.

OK, I see what you're getting at. But we don't need anything more than a list of strings here. And we have to remove the # from the start of each line anyway, as we want the data to be a valid Python comment block. So we do some pre-processing and end up with a list of lines. Why parse that list further just to get a list of lines anyway? I'm going to say YAGNI, for now at least. If a use case comes up that needs something more complicated, we can revisit the question.

I'll be frank here - reading the requirements from the script is the least of the difficulties here, so I really don't want to spend much time debating a syntax. I'm going to go with what I stated above for now. We can experiment with alternative syntaxes in future PRs - or if my initial approach turns out to be too simplistic.

@agoose77
Copy link

I'll be frank here - reading the requirements from the script is the least of the difficulties here, so I really don't want to spend much time debating a syntax. I'm going to go with what I stated above for now. We can experiment with alternative syntaxes in future PRs - or if my initial approach turns out to be too simplistic.

Life's too short to beat around the bush ;) It was fun discussing this, good luck with your efforts!

@pfmoore
Copy link
Member Author

pfmoore commented Nov 22, 2022

Thanks for your comments, they made me think about some issues that I might not have spotted otherwise (validating requirements) so the PR will definitely be better for this conversation!

@brettcannon
Copy link
Member

I think Paul's suggestion is good enough as any language can parse the proposal out via a regex and/or some simple string searching, none of which is Python-specific. Effectively if Paul's proposal can be parsed out with nothing more than the re module and/or methods on str we are probably safe that any tool interested will be able to read the format without issue. I also agree that doing yet another format to parse isn't worth it as this isn't exactly going to be an expensive operation when the actual installation is going to be the costly thing to do. Plus we want it easy to write w/o tool validation support as this will be embedded in comments.

@agoose77
Copy link

Yes, I see the arguments against scope creep. I was thinking about embedding a subset of pyproject.toml in order to get the spec for free, and to make use of some of the extra fields. Initially I was thinking about entry-points (for multiple executables per script) and requires-python, though on reflection I think only the latter is really useful.

Given that I can name only one (two?) additional metadata fields that are truly useful, I don't think I can make a strong case for embedding pyproject.toml, and there are obvious downsides (toml-in-comments would be slightly harder to write than a custom spec like Paul suggests).

I ended up making a PoC of what I was thinking of here. It's terrible code (a real hack job), but for me it cemented the benefit of being able to write #!/usr/bin/env pipx-run as a shebang.

I'm quite excited about this; pipx is such a useful tool, and there's definitely a gap here to be filled.

@pfmoore
Copy link
Member Author

pfmoore commented Nov 23, 2022

I have a working proof of concept. There's one significant issue I need to resolve before submitting an initial PR, which is that pipx assumes that an environment will have a "main package" (this is checked in _validate_before_write when writing the pipx metadata file). For the case of an environment supporting a script, this clearly won't be the case. For now, I've hacked it by arbitrarily saying the first requirement is the "main package", but that's just to check my logic.

I think I need to refactor to allow (temporary) environments without a main package, but I'd appreciate any insights the @pypa/pipx-committers (or anyone else!) might have on how deeply embedded the assumption is that an environment has a main package. I'll do the research myself, but any pointers on where to look for potential problems would be very helpful!

@brettcannon
Copy link
Member

FYI I'm watching this closely as I have been toying with the idea of a py-run for the Python Launcher for Unix that essentially mimics pipx run (mainly so you don't have to worry about the interpreter under pipx from disappearing by implementing it in Rust), and I would want to support this use case as well.

@dbohdan
Copy link

dbohdan commented Jul 16, 2023

You may be wondering, like I did, what shebang line to use in a pipx script. Here are two options. If your system supports env -S (GNU env(1), FreeBSD), the line is straightforward.

#! /usr/bin/env -S pipx run

If your system doesn't support env -S, you can use a form of "exec magic".

#! /bin/sh
"exec" "/usr/bin/env" "python3" "-m" "pipx" "run" "$0"

(The space after #! is a stylistic choice. It is entirely optional.)

@Flimm
Copy link

Flimm commented Dec 12, 2023

You may be wondering, like I did, what shebang line to use in a pipx script. Here are two options. If your system supports env -S (GNU env(1), FreeBSD), the line is straightforward.

#! /usr/bin/env -S pipx run

That's handy, I didn't know about env's -S or --split-string option!

If the script is named example.py, the command that needs to be run is:

pipx run file:example.py

So the shebang that you suggested wouldn't be enough, you would still need to add the file: prefix somehow.

@dbohdan
Copy link

dbohdan commented Dec 12, 2023

Both shebang lines work for me with pipx 1.3.3. You can test them using the following examples. The file: prefix is optional. It prevents pipx from running a console script from the corresponding PyPI package if the file you have asked to run doesn't exist.

#! /usr/bin/env -S pipx run
print("Hey")
#! /bin/sh
"exec" "/usr/bin/env" "python3" "-m" "pipx" "run" "$0"
print("Hey")

@Flimm
Copy link

Flimm commented Dec 13, 2023

I was running pipx 1.2.0 (from Ubuntu 23.10's repositories). It looks like the newer versions of pipx do not require the file: prefix, which means the proposed shebang does work when used with newer versions of pipx. Thanks!

@dbohdan
Copy link

dbohdan commented Dec 13, 2023

Oh, I see. I didn't realize this was the case because I didn't try to run scripts with pipx before #916. You're welcome!

@Flimm
Copy link

Flimm commented Sep 17, 2024

For those who found this issue in their favourite search engine: PEP 723 was accepted as final, and the Python Packaging User Guide now documents the official spec for inline script metadata. It's slightly different from the initial syntax in the first post of this GitHub issue. It looks like this:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

import requests
from rich.pretty import pprint

The newer versions of pipx support this metadata. You can run pipx run script.py and it will automatically install the dependencies for the script in a virtual environment. This is documented in the pipx documentation here: https://pipx.pypa.io/stable/examples/#pipx-run-examples

@gaborbernat gaborbernat reopened this Sep 17, 2024
@Flimm
Copy link

Flimm commented Sep 18, 2024

@gaborbernat I'm not sure why this GitHub issue was reopened. I only meant to provide a news update to those who came across this issue, I didn't mean anyone to reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants