Running Python code in `experimental_shell_command` requires building a PEX and specifying an appropriate interpreter as a tool #17405

huonw · 2022-10-31T06:21:00Z

Describe the bug
When experimental_shell_command needs to run Python code from the repo, it seems the best way to do that is to depend on a PEX and then specify a compatible Python version as a tool (and bash too). This seems... unwieldy, and the second part in particular seems like it's liable to end up with 'works on my machine' problems, if developers have different PATH configurations.

Example: https://gist.github.com/huonw/47bc63951eac7a05a3a3442843f040a9

git clone https://gist.github.com/47bc63951eac7a05a3a3442843f040a9.git
cd 47bc63951eac7a05a3a3442843f040a9
./pants run //:print

This would be better if (numbers corresponding to BUG comments):

(convenience) the shell command could just depend on script.py (and its venv/other dependencies) directly, similar to how ./pants run script.py works mostly the same as ./pants run //:pex. This would avoid needing another PEX binary that gets unnecessarily explicitly packaged on ./pants package ::.
(reliability/system dependencies) we didn't have to specify both a compatible Python and Bash in tools
(reliability/system dependencies) related to that, if I use interpreter_constraints = ["CPython==3.7.*"] (and tools=["python3.7", "bash"]), running ./pants run script.py works fine, but neither ./pants run //:pex nor ./pants run //:print do.
- I have 3.7.13 installed via pyenv, but not on my path by default (i.e. running python3.7 hits the pyenv shim, and gets an error like The `python3.7' command exists in these Python versions: 3.7.13).
- Running pyenv global 3.7.13 or pyenv shell 3.7.13 first, and then rerunning ./pants run //:print works fine, but would presumably break anything using other versions...

Key files for posterity/convenience:

# script.py
print("hello")

# BUILD
python_sources(name="py")
pex_binary(name="pex", entry_point="script.py")

experimental_shell_command(
    name="shell",
    command="{chroot}/pex.pex > output.txt",
    tools=["python3.9", "bash"], # BUG 2
    outputs=["output.txt"],
    dependencies=[":pex"], # BUG 1
)

experimental_run_shell_command(
    name="print", command="cat {chroot}/output.txt", dependencies=[":shell"]
)

# pants.toml
[GLOBAL]
pants_version = "2.15.0a0"

backend_packages = [
  "pants.backend.shell",
  "pants.backend.python",
]

[python]
interpreter_constraints = ["CPython==3.9.*"] # BUG 3

[anonymous-telemetry]
enabled = false

Pants version
2.15.0a0

OS
macOS

Additional info

This is related to https://pantsbuild.slack.com/archives/C046T6T9U/p1666315386420799, and has the same context as #17345: we're using experimental_shell_command for ad hoc code generation, where we execute part of our API code to generate a schema file.

The text was updated successfully, but these errors were encountered:

benjyw · 2022-10-31T16:47:34Z

TO clarify, what's the use-case for running python code via experimental_shell_command rather than ./pants run path/to/main.py or ./pants run path/to:target ?

huonw · 2022-10-31T22:26:03Z

We have API servers written in Python, and the code (implicitly) defines the schema for the APIs, and the libraries we use (FastAPI for REST and Strawberry for GraphQL) can generate a static schema file (e.g. OpenAPI for REST). These schemas are then used to generate clients.

We could have manual scripts to do the codegen, like the below, but it'd be far better for it to all run via pants with each step cacheable etc.

./pants run path/to/schema-generator.py > schema.json
./pants run path/to/client-generator.js < schema.json

(The "additional info" section of #17345 has a fleshed out example of this.)

A few additional/tangential points:

The overall question here applies to TS/JS too: the client generators will run via Node, and thus we'd want to be able to run them in the context of a Node "venv" (i.e. with an appropriate node_modules/ directory). Preferably, for both Python and Node, we could use Pants' builtin dependency/interpreter management, and also use ad-hoc scripts (more accessible, especially to non-Python/backend devs) rather than needing to write a plugin for every different codegen task.
Strawberry's GraphQL codegen can actually operate via a tool strawberry provides python -m strawberry export-schema path.to.module:app (https://strawberry.rocks/docs/guides/schema-export), it'd be nifty to support running external tools from deps, but for now we're happy to write an in-repo wrapper.

chrisjrn · 2022-11-02T17:09:17Z

@huonw Hi! Just trying to wrap my head around this one. I think the solutions for the Python case and the JS case are going to be different, just by virtue of the fact that we already have a good working model for caching Python code, and none at all for JS (yet).

For the Python case, it looks to me like there's a few things that matter here:

Needing to be able to handle dependencies for this piece of Python code
Needing it to be cacheable, (i.e. have it run roughly as hermetically as other Python tools)
Possibly being able to run -m module entry points

Have I missed anything?

Am I right in thinking that the issue with PEX is not that a PEX gets built, it's just that you have to string together these steps manually?

huonw · 2022-11-02T23:06:48Z

Okay, let's focus on Python for now. 👍

Needing to be able to handle dependencies for this piece of Python code

Yeah, assuming you're explicitly thinking of the interpreter version as a dependency too.

Needing it to be cacheable, (i.e. have it run roughly as hermetically as other Python tools)

I guess that's an overall desire yeah, but (other than #17345) IME experimental_shell_command already works well in this respect.

Possibly being able to run -m module entry points

Yeah in theory, but slightly lower priority for me (writing wrapper scripts is fine for now).

Am I right in thinking that the issue with PEX is not that a PEX gets built, it's just that you have to string together these steps manually?

It's a bit of both. There's the inconvenience of doing it manually but also it's slightly annoying to have an explicit PEX that gets built as part of ./pants package ::.

In particular, the PEX file is purely for in-codebase use, whereas ./pants package :: feels like it's more about constructing release artefacts (or similar). I think having the explicit PEX ends up having suboptimal speed/cache/network-usage. That is, if we're using a remote cache with cache_content_behavior = "validate" (or "defer") and the code inputs to the codegen step haven't changed, I think the PEX file won't be created or downloaded for the codegen step (just the codegen output, if that is required), but a ./pants package :: will have to explicitly materialise the PEX into dist/ and thus spend the time downloading it.

Have I missed anything?

I think that about covers it. Thanks for working through my rambling description 😅

chrisjrn · 2022-11-03T17:16:37Z

You're definitely right about the behaviour of the codegen step vs a ./pants package :: command. I think there's also just needing to be aware of that particular implementation detail of how Pants is able to run given Python "scripts" in its sandbox. It shouldn't really need to be something you're aware of.

With respect to the JS side of things, we're still trying to wrap our heads around what handling node_modules directories looks like in the context of Pants' process isolation model. I'm not certain the current experimental_shell_command approach is entirely the right design yet.

thejcannon · 2022-11-07T16:25:10Z

FWIW This wouldn't be the case if you ran a pyoxidizer_binary which has Python embedded 😉

chrisjrn · 2022-11-07T16:26:19Z

@thejcannon sure, if you can tolerate the 30-60 seconds build time

thejcannon · 2022-11-07T16:28:26Z

Just trying to point out alternatives 😄

huonw · 2022-11-08T03:29:01Z

Thanks for the tip! If we have devs often tripping over Python version issues, I'll strongly consider pyoxidizer_binary, but it seems to be... significantly more inconvenient, AFAICT:

much slower time-to-run for an incremental change (e.g. 30s even for a minimal example in our repo)
requires python_distributions throughout the codebase (we don't currently need any)
just switches the pex_binary target to pyoxidizer_binary

(And points 2 and 3 still hit unnecessary builds during ./pants package, I think.)

thejcannon · 2022-11-08T17:08:46Z

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

stuhood · 2022-11-08T17:10:04Z

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

Or potentially plugin fields for experimental_shell_command which can be used to add the runtimes for various languages.

thejcannon · 2022-11-08T17:10:42Z

At the very least, let's all agree experimental_shell_command needs a new name 😅

stuhood · 2022-11-08T20:58:49Z

Yeah I understand better, the direction this ticket is heading is for a per-language equivalent of experimental_shell_command. 👍

Or potentially plugin fields for experimental_shell_command which can be used to add the runtimes for various languages.

Concretely, this might look like:

experimental_shell_command(
  command="$NPM install && $PYTHON do-thing",
  npm_version=..,
  python_interpreter_constraints=..,
)

... with those fields being added by plugins, and with useful help strings which describe what environment variable to use to reference the binaries, etc.

chrisjrn · 2022-11-29T22:45:43Z

#17680 and follow-up work will have some impact on this

chrisjrn · 2022-12-09T19:45:31Z

@huonw See #17716 for some draft code that may well solve these problems. Would love your feedback.

…box for side-effects (#17716) This adds `experimental_run_in_sandbox`, which allows any target implementing `RunFieldSet`/returning `RunRequest` to be run in the sandbox with its execution dependencies. Amongst other things, this saves needing to explicitly declare a `pex_binary` in order to run a `python_source`, however, other languages aren't as complete right now (though we can work on that). Tests to follow once the API is bedded down: right now, this is _highly_ experimental, and mostly piggybacks on `experimental_shell_command` infrastructure. Closes #17405. Co-authored-by: Huon Wilson <wilson.huon@gmail.com> Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>

huonw added the bug label Oct 31, 2022

huonw mentioned this issue Oct 31, 2022

Better support for tools with built-in file-watching/hot-reload, for faster dev loops #17414

Open

benjyw assigned chrisjrn Nov 1, 2022

chrisjrn mentioned this issue Nov 22, 2022

experimental_shell_command stabilisation tracking ticket #17616

Closed

This was referenced Dec 6, 2022

Allow using a shell command/adhoc tool as part of any goal (check, lint, fmt, fix, deploy, ...) #17729

Open

experimental_shell_command substitutes {chroot} but does not set $CHROOT (nor have docs) #17404

Open

chrisjrn mentioned this issue Dec 9, 2022

Allow targets that implement RunFieldSet to be executed in the sandbox for side-effects #17716

Merged

chrisjrn closed this as completed in #17716 Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Python code in `experimental_shell_command` requires building a PEX and specifying an appropriate interpreter as a tool #17405

Running Python code in `experimental_shell_command` requires building a PEX and specifying an appropriate interpreter as a tool #17405

huonw commented Oct 31, 2022

benjyw commented Oct 31, 2022

huonw commented Oct 31, 2022

chrisjrn commented Nov 2, 2022

huonw commented Nov 2, 2022

chrisjrn commented Nov 3, 2022

thejcannon commented Nov 7, 2022 •

edited

Loading

chrisjrn commented Nov 7, 2022

thejcannon commented Nov 7, 2022

huonw commented Nov 8, 2022

thejcannon commented Nov 8, 2022

stuhood commented Nov 8, 2022

thejcannon commented Nov 8, 2022

stuhood commented Nov 8, 2022

chrisjrn commented Nov 29, 2022

chrisjrn commented Dec 9, 2022

Running Python code in experimental_shell_command requires building a PEX and specifying an appropriate interpreter as a tool #17405

Running Python code in experimental_shell_command requires building a PEX and specifying an appropriate interpreter as a tool #17405

Comments

huonw commented Oct 31, 2022

benjyw commented Oct 31, 2022

huonw commented Oct 31, 2022

chrisjrn commented Nov 2, 2022

huonw commented Nov 2, 2022

chrisjrn commented Nov 3, 2022

thejcannon commented Nov 7, 2022 • edited Loading

chrisjrn commented Nov 7, 2022

thejcannon commented Nov 7, 2022

huonw commented Nov 8, 2022

thejcannon commented Nov 8, 2022

stuhood commented Nov 8, 2022

thejcannon commented Nov 8, 2022

stuhood commented Nov 8, 2022

chrisjrn commented Nov 29, 2022

chrisjrn commented Dec 9, 2022

Running Python code in `experimental_shell_command` requires building a PEX and specifying an appropriate interpreter as a tool #17405

Running Python code in `experimental_shell_command` requires building a PEX and specifying an appropriate interpreter as a tool #17405

thejcannon commented Nov 7, 2022 •

edited

Loading