Skip to content

create_pull_request: git am fallback also fails and cancels unrelated safe outputs — needs stronger retry #33285

@IEvangelist

Description

@IEvangelist

Problem

A create_pull_request safe-output failed because git am could not apply the generated patch, and the built-in fallback (re-checking out the original base commit and re-running git am) also failed. The job then exited with a hard error and the second safe-output message (notify_source_pr) was cancelled — even though it was a completely independent operation.

The agent's tokens/context spend were already paid by the time this failure happened in the post-processing step, so a transient failure here is especially expensive.

Failing run

https://github.com/microsoft/aspire/actions/runs/26078776570/job/76676900386

The agent ran in microsoft/aspire and targeted a PR in microsoft/aspire.dev (cross-repo create-pull-request against release/13.4).

Observed behavior

The fallback path that's supposed to "save" a failed git am by recreating the branch at the patch's original base commit also failed:

Attempting fallback: create PR branch at original base commit...
Original base commit from patch generation: 1cb508fd49bedc4afeb0b8fc008d51689756d853
...
Switched to a new branch 'docs/pr-17234-eager-config-migration-33637d63a4a37bea'
Created branch ... at original base commit 1cb508fd49bedc4afeb0b8fc008d51689756d853
/usr/bin/git am /tmp/gh-aw/aw-microsoft-aspire.dev-docs-pr-17234-eager-config-migration.patch
error: patch failed: src/frontend/config/sidebar/docs.topics.ts:867
error: src/frontend/config/sidebar/docs.topics.ts: patch does not apply
error: patch failed: src/frontend/scripts/check-data-files.mjs:21
error: src/frontend/scripts/check-data-files.mjs: patch does not apply
error: src/frontend/src/content/docs/app-host/hot-reload-and-watch.mdx: already exists in index
error: patch failed: src/frontend/src/content/docs/dashboard/index.mdx:17
error: src/frontend/src/content/docs/dashboard/index.mdx: patch does not apply
... (many more files) ...
error: patch failed: src/frontend/src/data/aspire-integrations.json:2
error: src/frontend/src/data/aspire-integrations.json: patch does not apply
error: patch failed: src/frontend/src/data/github-stats.json:1
error: src/frontend/src/data/github-stats.json: patch does not apply
error: patch failed: src/frontend/tests/e2e/ui-regressions.spec.ts:266
error: src/frontend/tests/e2e/ui-regressions.spec.ts: patch does not apply
Applying: Merge main into release/13.4 (#893)
Patch failed at 0001 Merge main into release/13.4 (#893)
...
Warning: Fallback to original base commit failed: The process '/usr/bin/git' failed with exit code 128
Error: ✗ Message 1 (create_pull_request) failed: Failed to apply patch
Warning: ⚠️ Code push operation 'create_pull_request' failed — remaining safe outputs will be cancelled
⏭ Message 2 (notify_source_pr) cancelled — Cancelled: code push operation failed (create_pull_request: Failed to apply patch)

Two distinct problems are visible:

  1. Patch contains a mix of "modify existing file" hunks and one "create new file" hunk (hot-reload-and-watch.mdx: already exists in index) — even at the original base commit 1cb508fd…. This smells like the cross-repo / different-tree patch-generation issue from Cross-repo safe-output PRs fail: patch generated as "create new file" when target file already exists #17969, but it surfaces here as the fallback also failing rather than the initial git am.
  2. Unrelated safe outputs are cancelled because one code-push message failed. notify_source_pr has no dependency on create_pull_request succeeding, but it never runs.

Expected behavior

When a git am patch application fails after the agent has already finished its work, the safe-outputs runner should be more resilient:

  1. Stronger retry / repair before giving up. Cheap recovery attempts that don't require re-prompting the model should be exhausted first, e.g.:
    • git am --3way (in addition to plain git am).
    • git apply --3way --reject followed by git am --continue after staging the clean hunks.
    • For files where the only failure is already exists in index, fall back to applying the hunks as a modify against the file already present in the tree (re-derive a "modify" diff from the patch's + body, using the existing file as base).
    • For cross-repo targets, regenerate the patch against the target repo's tree (the long-standing root cause from Cross-repo safe-output PRs fail: patch generated as "create new file" when target file already exists #17969) — even as a one-shot fix-up after git am fails.
  2. Don't cancel independent safe outputs. A failure in create_pull_request should not implicitly cancel notify_source_pr (or any other message that doesn't depend on the PR being created). Either declare dependencies explicitly, or have non-code-push messages keep running.
  3. Optionally, push the partial result as a draft PR / artifact so a human can finish the apply with a clear diff in front of them, instead of throwing the whole agent run away.

Why this matters

The agent already spent its context window and tokens producing this output. A git am mismatch in the post-processing step is a very cheap problem (no model calls needed to retry with --3way, no model calls needed to regenerate the patch against the target tree). It's the worst possible place to bail out with no retry, because re-running the workflow means re-spending all those tokens.

Tool-like / mechanical steps that run after the model has finished should have aggressive retry logic precisely because the expensive part of the run has already been paid for.

Possibly related

Environment

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions