Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SSH-related environment variables when executing git as part of the Changed Subsystem #20027

Open
andreaimprovised opened this issue Oct 12, 2023 · 1 comment

Comments

@andreaimprovised
Copy link
Contributor

andreaimprovised commented Oct 12, 2023

Is your feature request related to a problem? Please describe.

Yes.

The Changed Subsystem powers --changed-since advanced target selection and runs commands like git.

When pants runs git commands like diff and merge-base, it can effectively result in fetch under the hood in certain scenarios, e.g. treeless or blobless clones that can be preferable in CI.

If the the git url for the repo is an SSH-based one, e.g. git@github.com:... git will in turn rely on ssh. In turn, ssh-agent is commonly used for passwordless mediation of private keys for communication. This is mediated through environment variables such as SSH_AUTH_SOCK.

When pants runs, it typically runs commands in a daemonized environment, pantsd.

When pantsd is spawned, it is run in a hermetic environment with a limited set of environment variables preserved from the user's shell:

_PRESERVED_ENV_VARS = [
# Controls backtrace behavior for rust code.
"RUST_BACKTRACE",
# The environment variables consumed by the `bollard` crate as of
# https://github.com/fussybeaver/bollard/commit/a12c6b21b737e5ea9e6efe5f0128d02dc594f9aa
"DOCKER_HOST",
"DOCKER_CONFIG",
"DOCKER_CERT_PATH",
# Environment variables consumed (indirectly) by the `docker_credential` crate as of
# https://github.com/keirlawson/docker_credential/commit/0c42d0f3c76a7d5f699d4d1e8b9747f799cf6116
"HOME",
"PATH",
"USER",
]

When pantsd runs git, it does so in an unsandboxed manner, effectively running with the same hermetic environment variables as pantsd:

class GitBinary(BinaryPath):
def _invoke_unsandboxed(self, cmd: list[str]) -> str:
"""Invoke the given git command, _without_ the sandboxing provided by the `Process` API.
This API is for internal use only: users should prefer to consume methods of the
`GitWorktree` class.
"""
cmd = [self.path, *cmd]
self._log_call(cmd)
try:
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except OSError as e:
# Binary DNE or is not executable
cmd_str = " ".join(cmd)
raise GitBinaryException(f"Failed to execute command {cmd_str}: {e!r}")
out, err = process.communicate()
self._check_result(cmd, process.returncode, err.decode())
return out.decode().strip()

As a result, SSH_AUTH_SOCK cannot be set. This in turn can make git commands fail when they result in an internal fetch over SSH with a permission denied error.

Describe the solution you'd like

I have no particular opinion on how to manage this situation.

As noted by @stuhood in https://pantsbuild.slack.com/archives/C0D7TNJHL/p1684784924506069, auth-related environment variables seem to be a thematic problem for pantsd.

However, the Changed subsystem could be modified to run git inside a sandbox, and then provide a --changed-env-vars capability, or perhaps at least a --subprocess-environment-env-vars capability.

Describe alternatives you've considered

Workarounds are as follows:

  1. Use --no-pantsd. Because git is run in an unsandboxed manner, all SSH-related environment variables in the user's shell will be available.
  2. Anticipate the git commands resulting from --changed-since=$GITREF and manually run git diff --name-only $GITREF...HEAD in the shell environment where SSH-related variables are present. This will pre-fetch all relevant blobs and trees so the later call by pants won't need to fetch.
  3. Cloning with the relevant ssh environment variables bundled with the ssh command appears to work need: git clone -c core.sshCommand="env SSH_AUTH_SOCK=$SSH_AUTH_SOCK ssh" <git_url>. git config core.sshCommand "env SSH_AUTH_SOCK=$SSH_AUTH_SOCK ssh" is equivalent, post-clone config update.

Additional context

This was originally identified here: https://pantsbuild.slack.com/archives/C046T6T9U/p1697052406274539

@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 12, 2023

Sorry you've hit this annoying issue.

Any sensible person reading the docs would expect subprocess_environment options to handle this. But they don't because those docs are a lie, and that subsystem is only used in practice for python-running processes... This is a glaring error.

Probably we should make SubprocessEnvironment apply for all processes.

benjyw pushed a commit that referenced this issue Oct 13, 2023
Adds documentation to resolve #20026.

I opted for a standalone quote section and gave a nod to
the #20027 workaround.
WorkerPants pushed a commit that referenced this issue Oct 13, 2023
Adds documentation to resolve #20026.

I opted for a standalone quote section and gave a nod to
the #20027 workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants