Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Python code in experimental_shell_command requires building a PEX and specifying an appropriate interpreter as a tool #17405

Closed
huonw opened this issue Oct 31, 2022 · 15 comments · Fixed by #17716
Assignees
Labels

Comments

@huonw
Copy link
Contributor

huonw commented Oct 31, 2022

Describe the bug
When experimental_shell_command needs to run Python code from the repo, it seems the best way to do that is to depend on a PEX and then specify a compatible Python version as a tool (and bash too). This seems... unwieldy, and the second part in particular seems like it's liable to end up with 'works on my machine' problems, if developers have different PATH configurations.

Example: https://gist.github.com/huonw/47bc63951eac7a05a3a3442843f040a9

git clone https://gist.github.com/47bc63951eac7a05a3a3442843f040a9.git
cd 47bc63951eac7a05a3a3442843f040a9
./pants run //:print

This would be better if (numbers corresponding to BUG comments):

  1. (convenience) the shell command could just depend on script.py (and its venv/other dependencies) directly, similar to how ./pants run script.py works mostly the same as ./pants run //:pex. This would avoid needing another PEX binary that gets unnecessarily explicitly packaged on ./pants package ::.
  2. (reliability/system dependencies) we didn't have to specify both a compatible Python and Bash in tools
  3. (reliability/system dependencies) related to that, if I use interpreter_constraints = ["CPython==3.7.*"] (and tools=["python3.7", "bash"]), running ./pants run script.py works fine, but neither ./pants run //:pex nor ./pants run //:print do.
    • I have 3.7.13 installed via pyenv, but not on my path by default (i.e. running python3.7 hits the pyenv shim, and gets an error like The `python3.7' command exists in these Python versions: 3.7.13).
    • Running pyenv global 3.7.13 or pyenv shell 3.7.13 first, and then rerunning ./pants run //:print works fine, but would presumably break anything using other versions...

Key files for posterity/convenience:

# script.py
print("hello")
# BUILD
python_sources(name="py")
pex_binary(name="pex", entry_point="script.py")

experimental_shell_command(
    name="shell",
    command="{chroot}/pex.pex > output.txt",
    tools=["python3.9", "bash"], # BUG 2
    outputs=["output.txt"],
    dependencies=[":pex"], # BUG 1
)

experimental_run_shell_command(
    name="print", command="cat {chroot}/output.txt", dependencies=[":shell"]
)
# pants.toml
[GLOBAL]
pants_version = "2.15.0a0"

backend_packages = [
  "pants.backend.shell",
  "pants.backend.python",
]

[python]
interpreter_constraints = ["CPython==3.9.*"] # BUG 3

[anonymous-telemetry]
enabled = false

Pants version
2.15.0a0

OS
macOS

Additional info

This is related to https://pantsbuild.slack.com/archives/C046T6T9U/p1666315386420799, and has the same context as #17345: we're using experimental_shell_command for ad hoc code generation, where we execute part of our API code to generate a schema file.

@huonw huonw added the bug label Oct 31, 2022
@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 31, 2022

TO clarify, what's the use-case for running python code via experimental_shell_command rather than ./pants run path/to/main.py or ./pants run path/to:target ?

@huonw
Copy link
Contributor Author

huonw commented Oct 31, 2022

We have API servers written in Python, and the code (implicitly) defines the schema for the APIs, and the libraries we use (FastAPI for REST and Strawberry for GraphQL) can generate a static schema file (e.g. OpenAPI for REST). These schemas are then used to generate clients.

We could have manual scripts to do the codegen, like the below, but it'd be far better for it to all run via pants with each step cacheable etc.

./pants run path/to/schema-generator.py > schema.json
./pants run path/to/client-generator.js < schema.json

(The "additional info" section of #17345 has a fleshed out example of this.)

A few additional/tangential points:

  • The overall question here applies to TS/JS too: the client generators will run via Node, and thus we'd want to be able to run them in the context of a Node "venv" (i.e. with an appropriate node_modules/ directory). Preferably, for both Python and Node, we could use Pants' builtin dependency/interpreter management, and also use ad-hoc scripts (more accessible, especially to non-Python/backend devs) rather than needing to write a plugin for every different codegen task.
  • Strawberry's GraphQL codegen can actually operate via a tool strawberry provides python -m strawberry export-schema path.to.module:app (https://strawberry.rocks/docs/guides/schema-export), it'd be nifty to support running external tools from deps, but for now we're happy to write an in-repo wrapper.

@chrisjrn
Copy link
Contributor

chrisjrn commented Nov 2, 2022

@huonw Hi! Just trying to wrap my head around this one. I think the solutions for the Python case and the JS case are going to be different, just by virtue of the fact that we already have a good working model for caching Python code, and none at all for JS (yet).

For the Python case, it looks to me like there's a few things that matter here:

  • Needing to be able to handle dependencies for this piece of Python code
  • Needing it to be cacheable, (i.e. have it run roughly as hermetically as other Python tools)
  • Possibly being able to run -m module entry points

Have I missed anything?

Am I right in thinking that the issue with PEX is not that a PEX gets built, it's just that you have to string together these steps manually?

@huonw
Copy link
Contributor Author

huonw commented Nov 2, 2022

Okay, let's focus on Python for now. 👍

Needing to be able to handle dependencies for this piece of Python code

Yeah, assuming you're explicitly thinking of the interpreter version as a dependency too.

Needing it to be cacheable, (i.e. have it run roughly as hermetically as other Python tools)

I guess that's an overall desire yeah, but (other than #17345) IME experimental_shell_command already works well in this respect.

Possibly being able to run -m module entry points

Yeah in theory, but slightly lower priority for me (writing wrapper scripts is fine for now).

Am I right in thinking that the issue with PEX is not that a PEX gets built, it's just that you have to string together these steps manually?

It's a bit of both. There's the inconvenience of doing it manually but also it's slightly annoying to have an explicit PEX that gets built as part of ./pants package ::.

In particular, the PEX file is purely for in-codebase use, whereas ./pants package :: feels like it's more about constructing release artefacts (or similar). I think having the explicit PEX ends up having suboptimal speed/cache/network-usage. That is, if we're using a remote cache with cache_content_behavior = "validate" (or "defer") and the code inputs to the codegen step haven't changed, I think the PEX file won't be created or downloaded for the codegen step (just the codegen output, if that is required), but a ./pants package :: will have to explicitly materialise the PEX into dist/ and thus spend the time downloading it.

Have I missed anything?

I think that about covers it. Thanks for working through my rambling description 😅

@chrisjrn
Copy link
Contributor

chrisjrn commented Nov 3, 2022

You're definitely right about the behaviour of the codegen step vs a ./pants package :: command. I think there's also just needing to be aware of that particular implementation detail of how Pants is able to run given Python "scripts" in its sandbox. It shouldn't really need to be something you're aware of.

With respect to the JS side of things, we're still trying to wrap our heads around what handling node_modules directories looks like in the context of Pants' process isolation model. I'm not certain the current experimental_shell_command approach is entirely the right design yet.

@thejcannon
Copy link
Member

thejcannon commented Nov 7, 2022

FWIW This wouldn't be the case if you ran a pyoxidizer_binary which has Python embedded 😉

@chrisjrn
Copy link
Contributor

chrisjrn commented Nov 7, 2022

@thejcannon sure, if you can tolerate the 30-60 seconds build time

@thejcannon
Copy link
Member

Just trying to point out alternatives 😄

@huonw
Copy link
Contributor Author

huonw commented Nov 8, 2022

Thanks for the tip! If we have devs often tripping over Python version issues, I'll strongly consider pyoxidizer_binary, but it seems to be... significantly more inconvenient, AFAICT:

  1. much slower time-to-run for an incremental change (e.g. 30s even for a minimal example in our repo)
  2. requires python_distributions throughout the codebase (we don't currently need any)
  3. just switches the pex_binary target to pyoxidizer_binary

(And points 2 and 3 still hit unnecessary builds during ./pants package, I think.)

@thejcannon
Copy link
Member

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

@stuhood
Copy link
Sponsor Member

stuhood commented Nov 8, 2022

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

Or potentially plugin fields for experimental_shell_command which can be used to add the runtimes for various languages.

@thejcannon
Copy link
Member

At the very least, let's all agree experimental_shell_command needs a new name 😅

@stuhood
Copy link
Sponsor Member

stuhood commented Nov 8, 2022

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

Or potentially plugin fields for experimental_shell_command which can be used to add the runtimes for various languages.

Concretely, this might look like:

experimental_shell_command(
  command="$NPM install && $PYTHON do-thing",
  npm_version=..,
  python_interpreter_constraints=..,
)

... with those fields being added by plugins, and with useful help strings which describe what environment variable to use to reference the binaries, etc.

@chrisjrn
Copy link
Contributor

#17680 and follow-up work will have some impact on this

@chrisjrn
Copy link
Contributor

chrisjrn commented Dec 9, 2022

@huonw See #17716 for some draft code that may well solve these problems. Would love your feedback.

chrisjrn pushed a commit that referenced this issue Dec 15, 2022
…box for side-effects (#17716)

This adds `experimental_run_in_sandbox`, which allows any target implementing `RunFieldSet`/returning `RunRequest` to be run in the sandbox with its execution dependencies.

Amongst other things, this saves needing to explicitly declare a `pex_binary` in order to run a `python_source`, however, other languages aren't as complete right now (though we can work on that).

Tests to follow once the API is bedded down: right now, this is _highly_ experimental, and mostly piggybacks on `experimental_shell_command` infrastructure.

Closes #17405.


Co-authored-by: Huon Wilson <wilson.huon@gmail.com>
Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants