Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locate bash before using it to locate BinaryPaths for other tools #10858

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 18 additions & 7 deletions src/python/pants/engine/process.py
Expand Up @@ -30,6 +30,9 @@
logger = logging.getLogger(__name__)


BASH_SEARCH_PATHS = ("/usr/bin", "/bin", "/usr/local/bin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update archive.py to use this value, rather than duplicating? Will keep the surface area smaller.

Maybe rename to something more generic, although I can't think of a good name for the binaries that we expect to be installed in /bin et al.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archive.py search path arguably should be configurable (as commented there), since it's used for multiple tools. This one on the other hand does not seem like it should be, necessarily, as we're using it to bootstrap.



@dataclass(frozen=True)
class ProductDescription:
value: str
Expand Down Expand Up @@ -482,15 +485,26 @@ def __init__(

@rule(desc="Find binary path", level=LogLevel.DEBUG)
async def find_binary(request: BinaryPathRequest) -> BinaryPaths:
# TODO(John Sirois): Replace this script with a statically linked native binary so we don't
# If we are not already locating bash, recurse to locate bash to use it as an absolute path in
# our shebang. This avoids mixing locations that we would search for bash into the search paths
# of the request we are servicing.
# TODO(#10769): Replace this script with a statically linked native binary so we don't
# depend on either /bin/bash being available on the Process host.
if request.binary_name == "bash":
shebang = "#!/usr/bin/env bash"
else:
bash_request = BinaryPathRequest(binary_name="bash", search_path=BASH_SEARCH_PATHS)
bash_paths = await Get(BinaryPaths, BinaryPathRequest, bash_request)
if not bash_paths.first_path:
raise BinaryNotFoundError(bash_request, rationale="use it to locate other executables")
shebang = f"#!{bash_paths.first_path.path}"

# Note: the backslash after the """ marker ensures that the shebang is at the start of the
# script file. Many OSs will not see the shebang if there is intervening whitespace.
script_path = "./script.sh"
stuhood marked this conversation as resolved.
Show resolved Hide resolved
script_content = dedent(
"""\
#!/usr/bin/env bash
f"""\
{shebang}

set -euo pipefail

Expand All @@ -508,7 +522,7 @@ async def find_binary(request: BinaryPathRequest) -> BinaryPaths:

search_path = create_path_env_var(request.search_path)
result = await Get(
FallibleProcessResult,
ProcessResult,
stuhood marked this conversation as resolved.
Show resolved Hide resolved
# We use a volatile process to force re-run since any binary found on the host system today
# could be gone tomorrow. Ideally we'd only do this for local processes since all known
# remoting configurations include a static container image as part of their cache key which
Expand All @@ -526,9 +540,6 @@ async def find_binary(request: BinaryPathRequest) -> BinaryPaths:
)

binary_paths = BinaryPaths(binary_name=request.binary_name)
if result.exit_code != 0:
return binary_paths

found_paths = result.stdout.decode().splitlines()
if not request.test:
return dataclasses.replace(binary_paths, paths=[BinaryPath(path) for path in found_paths])
Expand Down
28 changes: 27 additions & 1 deletion src/python/pants/engine/process_test.py
Expand Up @@ -10,18 +10,29 @@
from pants.engine.fs import CreateDigest, Digest, DigestContents, FileContent, PathGlobs, Snapshot
from pants.engine.internals.scheduler import ExecutionError
from pants.engine.process import (
BinaryPathRequest,
BinaryPaths,
FallibleProcessResult,
InteractiveProcess,
Process,
ProcessExecutionFailure,
ProcessResult,
)
from pants.engine.rules import Get, rule
from pants.testutil.rule_runner import QueryRule
from pants.testutil.pants_integration_test import setup_tmpdir
from pants.testutil.rule_runner import QueryRule, RuleRunner
from pants.testutil.test_base import TestBase
from pants.util.contextutil import temporary_dir


def process_rule_runner() -> RuleRunner:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be annotated with @pytest.fixture to ensure that each unit test gets a fresh instance. Also we've been calling it rule_runner for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we expect multiple rule runners in the file with different setups, you can call it find_binary_rule_runner.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not a huge fan of the fixtures. I don't see a great reason to do:

def test_my_stuff(rule_runner: RuleRunner) -> None:
    ..

rather than:

def test_my_stuff() -> None:
    rule_runner = my_rule_runner()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's totally valid to create a RuleRunner() inline. I do that in some places. But imo, it should either be inline, or it should be a fixture. Right now is a mix of having the function declared as top-level.

So, I'm not a huge fan of the fixtures. I don't see a great reason

Conformity with the greater Python ecosystem, which makes heavy use of fixtures. Our Plugin API is now aligned with the conventions.

We also use pre-built fixtures in other places too now like caplog, and I think it's good to be consistent with those.

Finally, it's important that plugin authors don't try using one rule runner for multiple tests, such as a class var or global consant. The fixture is meant to emphasize that you want a new instance every test.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture is meant to emphasize that you want a new instance every test.

Unless you accidentally put @pytest.fixture(scope=module) on it, which would get you a reused one.

I don't feel strongly about it. But particularly if we're discouraging the reuse of fixtures, they don't seem to buy much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, I don't care that much either way, so long as we're consistent in plugin docs and we feel confident that users won't accidentally create one instance for >1 test.

I'm curious what John and Benjy think.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it so that we're consistent for now.

return RuleRunner(
rules=[
QueryRule(BinaryPaths, [BinaryPathRequest]),
],
)


@dataclass(frozen=True)
class Concatted:
value: str
Expand Down Expand Up @@ -442,3 +453,18 @@ def test_running_interactive_process_in_workspace_cannot_have_input_files() -> N
mock_digest = Digest("fake", 1)
with pytest.raises(ValueError):
InteractiveProcess(argv=["/bin/echo"], input_digest=mock_digest, run_in_workspace=True)


def test_find_binary_on_path_without_bash() -> None:
stuhood marked this conversation as resolved.
Show resolved Hide resolved
# Test that locating a binary on a PATH which does not include bash works (by recursing to
# locate bash first).
binary_name = "mybin"
binary_dir = "bin"
with setup_tmpdir({f"{binary_dir}/{binary_name}": "this just needs to exist"}) as tmpdir:
binary_dir_abs = os.path.join(os.getcwd(), tmpdir, binary_dir)
search_path = [binary_dir_abs]
binary_paths = process_rule_runner().request(
BinaryPaths, [BinaryPathRequest(binary_name=binary_name, search_path=search_path)]
)
assert binary_paths.first_path is not None
assert binary_paths.first_path.path == os.path.join(binary_dir_abs, binary_name)