Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system_binary default fingerprinting behaviour (run without args) fails for many programs and is hard to debug #19013

Open
huonw opened this issue May 16, 2023 · 3 comments
Labels
backend: Shell Shell backend-related issues bug

Comments

@huonw
Copy link
Contributor

huonw commented May 16, 2023

Describe the bug

system_binary is nifty in 2.16, and it supports running the underlying program to 'fingerprint' it. For the simple case of no explicit fingerprint_... args, this translates to running it without any args, e.g. system_binary(name="git", binary_name="git") will run git.

This sort of invocation returns non-zero (fails) for many programs, like git, npm, even sed on some systems. The pants output doesn't make it at all obvious what the problem is.

Reproducer:

cd $(mktemp -d)

cat > pants.toml <<EOF
[GLOBAL]
pants_version = "2.16.0rc2"
backend_packages = ["pants.backend.experimental.adhoc"]

[anonymous-telemetry]
enabled = false
EOF

cat > BUILD <<EOF
system_binary(name="git-fingerprintless", binary_name="git")
adhoc_tool(name="fingerprintless", runnable=":git-fingerprintless", args=["--version"])

system_binary(name="git-fingerprinted", binary_name="git", fingerprint_args=["--version"])
adhoc_tool(name="fingerprinted", runnable=":git-fingerprinted", args=["--version"])
EOF

echo "Shell finds git at: $(which git)"

# OK: this works
pants export-codegen :fingerprinted

# BUG: this fails non-obviously
pants export-codegen :fingerprintless

Lightly edited output of bash ./script.sh

Shell finds git at: /opt/homebrew/bin/git


12:46:46.55 [INFO] Initializing scheduler...
12:46:46.73 [INFO] Scheduler initialized.
12:46:46.97 [INFO] Starting: Testing candidate for `git` at `/usr/bin/git`
12:46:46.97 [INFO] Starting: Testing candidate for `git` at `/opt/homebrew/bin/git`
12:46:47.00 [INFO] Completed: Testing candidate for `git` at `/opt/homebrew/bin/git`
12:46:47.00 [INFO] Completed: Testing candidate for `git` at `/usr/bin/git`
12:46:47.00 [INFO] Starting: Running the `adhoc_tool` at //:fingerprinted
12:46:47.01 [INFO] Completed: Running the `adhoc_tool` at //:fingerprinted
12:46:47.02 [INFO] Writing generated files to dist/codegen


12:46:47.44 [INFO] Starting: Testing candidate for `git` at `/usr/bin/git`
12:46:47.44 [INFO] Starting: Testing candidate for `git` at `/opt/homebrew/bin/git`
12:46:47.45 [INFO] Completed: Testing candidate for `git` at `/opt/homebrew/bin/git`
12:46:47.45 [INFO] Completed: Testing candidate for `git` at `/usr/bin/git`
12:46:47.45 [ERROR] 1 Exception encountered:

Engine traceback:
  in `export-codegen` goal

ValueError: Could not find a binary with name `git`. The following paths were searched: /usr/bin, /bin, /usr/local/bin, /opt/homebrew/bin.

This is confusing for users as the output explicitly says "could not find a binary", despite them several of them clearly existing. In addition, the docs are somewhat ambiguous, potentially suggesting fingerprinting may only happen if fingerprint is specified:

Pants will search for binaries with name binary_name in the search paths provided, as well as default search paths. If fingerprint is specified, each binary that is located will be executed with the arguments from fingerprint_args. Any binaries whose output does not match the pattern will be excluded.

Some (not mutually exclusive) options for improving this:

  1. tweak the failure message to be something like "Could not find an appropriate binary with name git" or something
  2. if all candidates are rejected, include the reason for failure for each candidate that exists and was rejected (e.g. exit code and/or command output), and maybe a suggestion related to tweaking the fingerprint... config
  3. tweak the behaviour of the fingerprinting itself e.g.
    1. don't do any fingerprinting when there's no fingerprint args (just test for file existence)
    2. have those args be required, e.g. instead of the implicit fingerprint_args=[], the user has to write that

The backend in question is still experimental.

Pants version
2.16.0rc2

OS
macOS

Additional info
Encountered in practice in https://pantsbuild.slack.com/archives/C046T6T9U/p1684197510244099 (git) and 329da41 (sed)

@huonw huonw added bug backend: Shell Shell backend-related issues labels May 16, 2023
@Geethree
Copy link
Contributor

This is me too with git and npm. Failure was entirely non obvious as which … obviously worked. Saving the sandbox with —keep-sandbox=always and looking at what the __run.sh script was doing allowed me to figure out that it was failing with a non-zero exit code.

@Geethree
Copy link
Contributor

Actually I can't get this to work..

system_binary(
    name="npm",
    binary_name="npm",
    fingerprint=r"9.2.0",
    fingerprint_args=["--version"],
)
cat /tmp/pants-sandbox-MjbXO2/__run.sh 
#!/bin/bash
# This command line should execute the same process as pants did internally.
export 
cd /tmp/pants-sandbox-MjbXO2
/usr/bin/npm --version
pants run teletom/frontend:npm --keep-sandboxes=always
17:52:33.49 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ewKBkO for Testing candidate for `npm` at `/bin/npm`
17:52:33.49 [INFO] Preserving local process execution dir /tmp/pants-sandbox-MjbXO2 for Testing candidate for `npm` at `/usr/bin/npm`
17:52:33.55 [INFO] Completed: Testing candidate for `npm` at `/bin/npm`
17:52:33.55 [INFO] Completed: Testing candidate for `npm` at `/usr/bin/npm`
17:52:33.55 [ERROR] 1 Exception encountered:

Engine traceback:
  in `run` goal

ValueError: Could not find a binary with name `npm` with output matching `9.2.0` when run with arguments `--version`. The following paths were searched: /usr/bin, /bin, /usr/local/bin, /opt/homebrew/bin.
bash /tmp/pants-sandbox-MjbXO2/__run.sh | tail
declare -x XDG_RUNTIME_DIR="/run/user/1000"
declare -x XDG_SEAT="seat0"
declare -x XDG_SESSION_CLASS="user"
declare -x XDG_SESSION_DESKTOP="i3"
declare -x XDG_SESSION_ID="3"
declare -x XDG_SESSION_TYPE="x11"
declare -x XDG_VTNR="2"
declare -x XMODIFIERS="@im=ibus"
declare -x _="/usr/bin/bash"
9.2.0

echo $?
0

@ubmarco
Copy link

ubmarco commented Jan 5, 2024

I second this. I was trying to use

system_binary(
    name="drawio",
    binary_name="drawio",
    fingerprint_args=["--version"],
)

which leads to the __run.sh

#!/usr/bin/env bash
# This command line should execute the same process as pants did internally.
cd /tmp/pants-sandbox-lpxUQZ
env -i  /opt/drawio/drawio --version

When executed manually in the sandbox, it reveals the problem

[97799:0105/212113.684144:ERROR:ozone_platform_x11.cc(240)] Missing X server or $DISPLAY
[97799:0105/212113.684192:ERROR:env.cc(257)] The platform failed to initialize.  Exiting.
[1]    97799 segmentation fault (core dumped)  env -i DISPLAY=0:0 drawio --version

Drawio relies on the DISPLAY variable which is lost after executing with env -i.
Forwarding this output would be really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Shell Shell backend-related issues bug
Projects
None yet
Development

No branches or pull requests

3 participants