Skip to content

safeoutputs generate_git_patch: Strategy 3 picks the alphabetically-first remote ref as the base, producing huge patches in side-repo-ops setups #34166

@yskopets

Description

@yskopets

🤖 This issue has been generated by Claude Code.

Context

In a "side-repo-ops" pattern, a workflow lives in a workflow repository (acme/agentic-workflows) but operates on a separate target repository (acme/target-repo) checked out as a subdirectory of GITHUB_WORKSPACE. The agent writes commits into the target checkout and then calls safeoutputs.create_pull_request. The safe-outputs server runs generate_git_patch to capture those commits as a .patch/.bundle artifact.

I just hit a case where the resulting patch contained 76 commits even though the agent had only added 1 commit on top of origin/master. Walking through the safeoutputs server log (actions/setup/js/generate_git_patch.cjs), all three strategies misfired in turn — and Strategy 3 in particular has a bug that makes its base-commit choice essentially arbitrary.

What happened, strategy-by-strategy

From mcp-logs/safeoutputs/server.log:

Strategy 1 (merge-base with configured base branch) — failed

The agent's create_pull_request payload passed "base_branch" equal to the head branch name (<new-branch>) rather than master. Strategy 1 therefore looked for origin/<new-branch>, which didn't exist, and fell through to a fetch attempt:

Strategy 1 (full): origin/<branch> not found locally, attempting fetch
Fetch failed - ERR_SYSTEM: fatal: could not read Username for 'https://github.com': No such device or address

The fetch failed because git credentials had already been cleaned by this point. (Whether the agent should be allowed to set base_branch == branch is a separate question, but Strategy 1 has no defense against it.)

Strategy 2 (GITHUB_SHA..HEAD) — failed

Strategy 2: currentHead=<agent-commit-sha>, GITHUB_SHA=<workflow-repo-sha>
Strategy 2: GITHUB_SHA not found in repo (cross-repo checkout?)

In a side-repo-ops setup, GITHUB_SHA belongs to the workflow repo (acme/agentic-workflows), not the target repo where the patch is being generated. git cat-file -e $GITHUB_SHA is guaranteed to fail. This strategy is essentially a no-op for the side-repo-ops pattern.

Strategy 3 (cross-repo fallback) — "succeeded" with the wrong base

This is the actual bug. The relevant code at actions/setup/js/generate_git_patch.cjs (around the Strategy 3 block, ~lines 349–410 on main):

const remoteRefsOutput = execGitSync(["for-each-ref", "--format=%(refname)", "refs/remotes/"], { cwd }).trim();
// ...
const remoteRefs = remoteRefsOutput.split("\n").filter(r => r);
// ...
const remoteExcludeArgs = remoteRefs.flatMap(ref => ["--not", ref]);
const commitCount = parseInt(execGitSync(["rev-list", "--count", branchName, ...remoteExcludeArgs], { cwd }).trim(), 10);
// ...
// Get the merge-base with the first remote ref (typically origin/HEAD or origin/main)
// to determine the starting point for the patch
let baseCommit;
for (const ref of remoteRefs) {
  try {
    baseCommit = execGitSync(["merge-base", ref, branchName], { cwd }).trim();
    if (baseCommit) {
      debugLog(`Strategy 3: Found merge-base ${baseCommit} with ref ${ref}`);
      break;
    }
  } catch { /* try next */ }
}

The inline comment claims the iteration is "typically origin/HEAD or origin/main," but git for-each-ref refs/remotes/ returns refs in lexicographic order, not in any priority order. In my repo, the first ref was a stale 9-month-old feature branch (name redacted). Its merge-base with the agent's branch was a commit from September — 75 commits behind origin/master. format-patch <that-base>..<branch> then produced 76 patches: 75 of master's history + the agent's 1 real change.

From the log:

Strategy 3: Found 13 commits not reachable from any remote ref
git merge-base refs/remotes/origin/<oldest-alphabetical-ref> <agent-branch>
  => <commit from 9 months ago>
Strategy 3: Found merge-base ... with ref refs/remotes/origin/<oldest-alphabetical-ref>
git format-patch <that-base>..<agent-branch> --stdout
Strategy 3: SUCCESS - Generated patch with 15353 lines

Note that the rev-list --count exclusion set already includes origin/master (since it's one of the listed remote refs), so the strategy already "knows" master is a candidate base — it just doesn't use it for the merge-base step.

Why this matters

  • Patches blow up from KB to hundreds of KB / thousands of lines, sometimes tripping max_patch_size limits and rejecting otherwise-fine PRs.
  • When a patch does land, the artifact misleadingly shows dozens of unrelated commits attributed to the agent's run.
  • The chosen base is non-deterministic across repos because it depends on which feature branches happen to be checked out as remote-tracking refs.
  • It's hard to notice — Strategy 3 reports SUCCESS, and the resulting PR is fine (GitHub computes the real diff server-side from the head commit).

Suggested fixes

Ordered roughly by impact:

  1. Strategy 3 should prefer well-known refs. Try refs in this order before falling back to alphabetical iteration:
    • refs/remotes/origin/HEAD (if it resolves)
    • the baseBranch argument (e.g. refs/remotes/origin/main, refs/remotes/origin/master)
    • any explicit fallback list (e.g. main, master, develop, trunk)
    • then the rest, alphabetically.
  2. Or: pick the ref whose merge-base produces the smallest commit count, not the first one that yields any merge-base. This is robust against the "stale feature branch" case without needing to know branch names.
  3. Sanity-check the result. If the resulting format-patch produces more than, say, 5–10× the count from rev-list --count <branch> --not <all remote refs>, log a warning (or fall back) — that's a strong signal the chosen base is bogus. In my case rev-list reported 13 unreachable commits but the patch contained 76.
  4. Strategy 2 in side-repo-ops checkouts. Detect when cwd isn't the workflow's own checkout (e.g. when the patch is being generated against a subdirectory checkout of a different repo) and skip Strategy 2 entirely instead of running the doomed git cat-file. Minor, but it'd shorten the log and avoid the misleading "cross-repo checkout?" message in the normal case.
  5. Strategy 1 base-branch validation. When the agent passes base_branch == branch, that's almost certainly a mistake — reject it (or warn loudly and substitute the default branch) rather than silently failing through to Strategy 3.

Reproduction

The setup that triggered this:

  • Workflow runs in acme/agentic-workflows (GITHUB_SHA belongs to this repo).
  • Target repo acme/target-repo is checked out under $GITHUB_WORKSPACE/target-repo with full history and additional refs fetched (fetch: "*").
  • Agent runs git fetch origin master && git checkout -b <new-branch> origin/master, makes a 1-line change, commits.
  • Agent calls safeoutputs.create_pull_request with branch: <new-branch>, base_branch: <new-branch> (a mistake on the agent's part, but not the root cause of the 76-commit patch).
  • safeoutputs server runs generate_git_patch, which falls through to Strategy 3.
  • The repo has ~10 remote-tracking refs from fetch: "*"; the alphabetically-first is a stale September feature branch, 75 commits behind master.
  • Resulting patch: 76 commits, ~760 KB, ~15k lines (versus ~20 lines for the actual change).

Happy to follow up with a PR if any of these suggestions seem reasonable.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions