safeoutputs generate_git_patch: Strategy 3 picks the alphabetically-first remote ref as the base, producing huge patches in side-repo-ops setups

🤖 _This issue has been generated by Claude Code._

## Context

In a "side-repo-ops" pattern, a workflow lives in a *workflow* repository (`acme/agentic-workflows`) but operates on a separate *target* repository (`acme/target-repo`) checked out as a subdirectory of `GITHUB_WORKSPACE`. The agent writes commits into the target checkout and then calls `safeoutputs.create_pull_request`. The safe-outputs server runs `generate_git_patch` to capture those commits as a `.patch`/`.bundle` artifact.

I just hit a case where the resulting patch contained **76 commits** even though the agent had only added **1** commit on top of `origin/master`. Walking through the safeoutputs server log (`actions/setup/js/generate_git_patch.cjs`), all three strategies misfired in turn — and Strategy 3 in particular has a bug that makes its base-commit choice essentially arbitrary.

## What happened, strategy-by-strategy

From `mcp-logs/safeoutputs/server.log`:

### Strategy 1 (merge-base with configured base branch) — failed

The agent's `create_pull_request` payload passed `"base_branch"` equal to the head branch name (`<new-branch>`) rather than `master`. Strategy 1 therefore looked for `origin/<new-branch>`, which didn't exist, and fell through to a fetch attempt:

```
Strategy 1 (full): origin/<branch> not found locally, attempting fetch
Fetch failed - ERR_SYSTEM: fatal: could not read Username for 'https://github.com': No such device or address
```

The fetch failed because git credentials had already been cleaned by this point. (Whether the agent should be allowed to set `base_branch == branch` is a separate question, but Strategy 1 has no defense against it.)

### Strategy 2 (`GITHUB_SHA..HEAD`) — failed

```
Strategy 2: currentHead=<agent-commit-sha>, GITHUB_SHA=<workflow-repo-sha>
Strategy 2: GITHUB_SHA not found in repo (cross-repo checkout?)
```

In a side-repo-ops setup, `GITHUB_SHA` belongs to the workflow repo (`acme/agentic-workflows`), not the target repo where the patch is being generated. `git cat-file -e $GITHUB_SHA` is guaranteed to fail. This strategy is essentially a no-op for the side-repo-ops pattern.

### Strategy 3 (cross-repo fallback) — "succeeded" with the wrong base

This is the actual bug. The relevant code at `actions/setup/js/generate_git_patch.cjs` (around the `Strategy 3` block, ~lines 349–410 on `main`):

```js
const remoteRefsOutput = execGitSync(["for-each-ref", "--format=%(refname)", "refs/remotes/"], { cwd }).trim();
// ...
const remoteRefs = remoteRefsOutput.split("\n").filter(r => r);
// ...
const remoteExcludeArgs = remoteRefs.flatMap(ref => ["--not", ref]);
const commitCount = parseInt(execGitSync(["rev-list", "--count", branchName, ...remoteExcludeArgs], { cwd }).trim(), 10);
// ...
// Get the merge-base with the first remote ref (typically origin/HEAD or origin/main)
// to determine the starting point for the patch
let baseCommit;
for (const ref of remoteRefs) {
  try {
    baseCommit = execGitSync(["merge-base", ref, branchName], { cwd }).trim();
    if (baseCommit) {
      debugLog(`Strategy 3: Found merge-base ${baseCommit} with ref ${ref}`);
      break;
    }
  } catch { /* try next */ }
}
```

The inline comment claims the iteration is "typically `origin/HEAD` or `origin/main`," but `git for-each-ref refs/remotes/` returns refs in **lexicographic order**, not in any priority order. In my repo, the first ref was a stale 9-month-old feature branch (name redacted). Its merge-base with the agent's branch was a commit from September — 75 commits behind `origin/master`. `format-patch <that-base>..<branch>` then produced 76 patches: 75 of master's history + the agent's 1 real change.

From the log:

```
Strategy 3: Found 13 commits not reachable from any remote ref
git merge-base refs/remotes/origin/<oldest-alphabetical-ref> <agent-branch>
  => <commit from 9 months ago>
Strategy 3: Found merge-base ... with ref refs/remotes/origin/<oldest-alphabetical-ref>
git format-patch <that-base>..<agent-branch> --stdout
Strategy 3: SUCCESS - Generated patch with 15353 lines
```

Note that the `rev-list --count` *exclusion* set already includes `origin/master` (since it's one of the listed remote refs), so the strategy already "knows" master is a candidate base — it just doesn't use it for the merge-base step.

## Why this matters

- Patches blow up from KB to hundreds of KB / thousands of lines, sometimes tripping `max_patch_size` limits and rejecting otherwise-fine PRs.
- When a patch does land, the artifact misleadingly shows dozens of unrelated commits attributed to the agent's run.
- The chosen base is non-deterministic across repos because it depends on which feature branches happen to be checked out as remote-tracking refs.
- It's hard to notice — Strategy 3 reports `SUCCESS`, and the resulting PR is fine (GitHub computes the real diff server-side from the head commit).

## Suggested fixes

Ordered roughly by impact:

1. **Strategy 3 should prefer well-known refs.** Try refs in this order before falling back to alphabetical iteration:
   - `refs/remotes/origin/HEAD` (if it resolves)
   - the `baseBranch` argument (e.g. `refs/remotes/origin/main`, `refs/remotes/origin/master`)
   - any explicit fallback list (e.g. `main`, `master`, `develop`, `trunk`)
   - then the rest, alphabetically.
2. **Or: pick the ref whose merge-base produces the *smallest* commit count**, not the first one that yields any merge-base. This is robust against the "stale feature branch" case without needing to know branch names.
3. **Sanity-check the result.** If the resulting `format-patch` produces more than, say, 5–10× the count from `rev-list --count <branch> --not <all remote refs>`, log a warning (or fall back) — that's a strong signal the chosen base is bogus. In my case `rev-list` reported 13 unreachable commits but the patch contained 76.
4. **Strategy 2 in side-repo-ops checkouts.** Detect when `cwd` isn't the workflow's own checkout (e.g. when the patch is being generated against a subdirectory checkout of a different repo) and skip Strategy 2 entirely instead of running the doomed `git cat-file`. Minor, but it'd shorten the log and avoid the misleading "cross-repo checkout?" message in the normal case.
5. **Strategy 1 base-branch validation.** When the agent passes `base_branch == branch`, that's almost certainly a mistake — reject it (or warn loudly and substitute the default branch) rather than silently failing through to Strategy 3.

## Reproduction

The setup that triggered this:

- Workflow runs in `acme/agentic-workflows` (`GITHUB_SHA` belongs to this repo).
- Target repo `acme/target-repo` is checked out under `$GITHUB_WORKSPACE/target-repo` with full history and additional refs fetched (`fetch: "*"`).
- Agent runs `git fetch origin master && git checkout -b <new-branch> origin/master`, makes a 1-line change, commits.
- Agent calls `safeoutputs.create_pull_request` with `branch: <new-branch>`, `base_branch: <new-branch>` (a mistake on the agent's part, but not the root cause of the 76-commit patch).
- `safeoutputs` server runs `generate_git_patch`, which falls through to Strategy 3.
- The repo has ~10 remote-tracking refs from `fetch: "*"`; the alphabetically-first is a stale September feature branch, 75 commits behind master.
- Resulting patch: 76 commits, ~760 KB, ~15k lines (versus ~20 lines for the actual change).

Happy to follow up with a PR if any of these suggestions seem reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safeoutputs generate_git_patch: Strategy 3 picks the alphabetically-first remote ref as the base, producing huge patches in side-repo-ops setups #34166

Context

What happened, strategy-by-strategy

Strategy 1 (merge-base with configured base branch) — failed

Strategy 2 (`GITHUB_SHA..HEAD`) — failed

Strategy 3 (cross-repo fallback) — "succeeded" with the wrong base

Why this matters

Suggested fixes

Reproduction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

safeoutputs generate_git_patch: Strategy 3 picks the alphabetically-first remote ref as the base, producing huge patches in side-repo-ops setups #34166

Description

Context

What happened, strategy-by-strategy

Strategy 1 (merge-base with configured base branch) — failed

Strategy 2 (GITHUB_SHA..HEAD) — failed

Strategy 3 (cross-repo fallback) — "succeeded" with the wrong base

Why this matters

Suggested fixes

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Strategy 2 (`GITHUB_SHA..HEAD`) — failed