Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra_env_vars field for system_binary #20374

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ubmarco
Copy link

@ubmarco ubmarco commented Jan 7, 2024

Fixes #20373

@ubmarco ubmarco marked this pull request as draft January 7, 2024 21:12
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice so far! See my one comment about the proposed semantics (the "fingerprinting" is more of a detail than most users will understand or care about).

As for testing - should be easy to test by running the system binary env with some contrived env var and checking that the value shows up in its output!

help = help_text(
"""
Additional environment variables to provide to the system binary during fingerprinting.
This has no effect on the execution of the binary in the scope of an `adhoc_tool` or
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think these env vars should always be set, not just for fingerprinting (although the ad-hoc tool values should override them if specified). In your case it would mean you could set DISPLAY in just one place.

Copy link
Author

@ubmarco ubmarco Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Need clarification.

  1. Let's assume an adhoc_tool has multiple runnable_dependencies. One of them sets extra_env_vars in the system_binary. That means the adhoc_tool invocation would set the vars for the whole invocation, means for the runnable and all other binaries. Is this a safe assumption?
  2. If yes, I would combine all extra_env_vars of all runnable_dependencies. This may also conflict, e.g. one sets an env var to take it over from pants env, the other binary defines it explicitly. How to deal with this?
  3. If, however, the adhoc_tool defines extra_env_vars I would only consider this field.
  4. Is shell_command also affected?

I think it comes down to what we think the user wants.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that again and I think you're right.
If a variable is needed for fingerprinting, it most certainly is also needed for running the tool. So I will merge all env var requests of binaries.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think merge them in order of specificity, so the more specific targets overrides the less specific one.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this documentation needs updating.

@ubmarco
Copy link
Author

ubmarco commented Jan 10, 2024

I implemented the extra_env_vars inheritance feature.
Conflicts are resolved as follows:

  • adhoc_tool -> extra_env_vars always wins over adhoc_tool -> runnable_dependencies -> extra_env_vars
  • adhoc_tool -> runnable_dependencies -> extra_env_vars are injected in the order as they appear in the BUILD file, so last wins
    There are, however, no warnings in case a conflict appears. Is that needed?
    A test case is still missing.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jan 10, 2024

I implemented the extra_env_vars inheritance feature. Conflicts are resolved as follows:

  • adhoc_tool -> extra_env_vars always wins over adhoc_tool -> runnable_dependencies -> extra_env_vars
  • adhoc_tool -> runnable_dependencies -> extra_env_vars are injected in the order as they appear in the BUILD file, so last wins

This is what I meant by order of specificity, yes.

There are, however, no warnings in case a conflict appears. Is that needed?

I don't think so, you likely want to override on purpose.

A test case is still missing.

Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See comments inline.

Going to add @chrisjrn to the reviewers, since he has a better handle on the env vars stuff here than I do

@@ -87,6 +92,9 @@ async def _find_binary(
env.update(**(rds.extra_env or {}))
append_only_caches = rds.append_only_caches
immutable_input_digests = rds.immutable_input_digests
if extra_env_vars:
extra_env = await Get(EnvironmentVars, EnvironmentVarsRequest(extra_env_vars))
env.update(extra_env)
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could stomp the PATH or _PANTS_SHIM_ROOT, no? Seems like the update order should be reversed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reversed the order

help = help_text(
"""
Additional environment variables to provide to the system binary during fingerprinting.
This has no effect on the execution of the binary in the scope of an `adhoc_tool` or
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this documentation needs updating.

@@ -266,14 +277,16 @@ async def _resolve_runnable_dependencies(
shim_digest_path = f"_runnable_dependency_shims_{shim_digest.fingerprint}"
immutable_input_digests = {shim_digest_path: shim_digest}
_safe_update(immutable_input_digests, merged_extras.immutable_input_digests)
environment = {"_PANTS_SHIM_ROOT": "{chroot}"}
environment.update(runnable_extra_env_vars)
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch the update order here, we never want to stomp _PANTS_SHIM_ROOT (although it's highly unlikely)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reversed the order

@benjyw benjyw requested a review from chrisjrn January 10, 2024 18:50
@ubmarco
Copy link
Author

ubmarco commented Jan 15, 2024

I addressed the comments. Unfortunately I feel a bit lost on the implementation of the overall feature.
This simple example is still not covered:

system_binary(
    name="printenv",
    binary_name="printenv",
    extra_env_vars=["FOO=BAR"],
)

adhoc_tool(
    name="adhoc",
    runnable=":printenv",
    log_output=True,
    stdout="stdout",
    # extra_env_vars=["FOO=BAR"],
)

then running it as ./pants_from_sources.sh export-codegen docs:adhoc does not show the FOO variable.
In this case the system_binary is not used as runnable_dependencies but as runnable directly.
Also shell commands need to be cared for.
I'm having a hard time finding out where to ideally put the code to avoid duplication.
Also understanding the rule execution order is difficult as there is no call tree in the debugger.
I read the docs but don't feel home yet :-/
Is it possible to generate a (maybe graphical) rule execution tree?

Any advice?

@ubmarco ubmarco requested a review from benjyw January 15, 2024 23:42
@benjyw
Copy link
Sponsor Contributor

benjyw commented Jan 16, 2024

I addressed the comments. Unfortunately I feel a bit lost on the implementation of the overall feature. This simple example is still not covered:

system_binary(
    name="printenv",
    binary_name="printenv",
    extra_env_vars=["FOO=BAR"],
)

adhoc_tool(
    name="adhoc",
    runnable=":printenv",
    log_output=True,
    stdout="stdout",
    # extra_env_vars=["FOO=BAR"],
)

then running it as ./pants_from_sources.sh export-codegen docs:adhoc does not show the FOO variable. In this case the system_binary is not used as runnable_dependencies but as runnable directly. Also shell commands need to be cared for. I'm having a hard time finding out where to ideally put the code to avoid duplication. Also understanding the rule execution order is difficult as there is no call tree in the debugger. I read the docs but don't feel home yet :-/ Is it possible to generate a (maybe graphical) rule execution tree?

Any advice?

Hmm, I'll dive a little deeper but I would like to hear from @chrisjrn , who has the most context here.

@chrisjrn
Copy link
Contributor

I'm a bit rusty on this code and would gladly do a pairing session (ask me on Slack!), but my recollection is that the env for the system_binary target eventually ends up getting set in _runnable_dependency_shim and I don't think I see anything set there. Have you caught anything in that code path?

@huonw
Copy link
Contributor

huonw commented Mar 7, 2024

Heya, just checking in. What's the status of this PR? Is it ready for a re-review or is it waiting on something else?

@ubmarco
Copy link
Author

ubmarco commented Mar 13, 2024

What works:

  • extra_env_vars for system_binary are used for fingerprinting and in case the system binary is used as runnable_dependencies (my use case)

Pending issues:

  • extra_env_vars do not work if the system_binary is used as runnable directly in the scope of an adhoc_tool.
  • test cases are missing
  • shell scripts may also use the system_binary

Last time I worked on the PR I felt a little lost with the code base and went out of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extra_env_vars for system_binary
5 participants