Skip to content

Add agentic fix workflow for codegen build failures#101

Merged
edburns merged 3 commits intomainfrom
edburns/dd-2955542-agentic-codegen
Apr 24, 2026
Merged

Add agentic fix workflow for codegen build failures#101
edburns merged 3 commits intomainfrom
edburns/dd-2955542-agentic-codegen

Conversation

@edburns
Copy link
Copy Markdown
Collaborator

@edburns edburns commented Apr 24, 2026

Related to #93

Summary

Adds automated detection and repair of Java build failures caused by code generation changes. When @github/copilot is updated (e.g., via Dependabot PR #94) and the regenerated code breaks mvn verify, a gh-aw agentic workflow is triggered to fix the handwritten source code automatically.

Changes

New files

  • .github/workflows/codegen-agentic-fix.md — gh-aw manifest for the agentic fix agent. Defines safe-outputs (push-to-pull-request-branch, add-comment, noop), network/tool constraints, and a structured 3-attempt fix loop.
  • .github/workflows/codegen-agentic-fix.lock.yml — Compiled gh-aw lockfile with pinned action SHAs, container images, and secret declarations.

Modified files

  • .github/workflows/codegen-check.yml — On PRs, regenerated files are now committed back to the PR branch. If mvn verify fails after regeneration, the agentic fix workflow is triggered. Push-to-main behavior is preserved as-is (fail if stale).
  • .github/workflows/update-copilot-dependency.yml — After codegen and PR creation, mvn verify is run. On failure, the agentic fix workflow is triggered. Includes a polling loop that waits for the fix to complete and runs a final verification.

Design

The pipeline follows a two-workflow pattern:

  1. Trigger workflows (codegen-check.yml, update-copilot-dependency.yml) detect failures and dispatch the agentic fix.
  2. Fix workflow (codegen-agentic-fix.lock.yml) runs under gh-aw guardrails with scoped permissions, network firewall, and MCP gateway. It checks out the branch, reproduces the failure, applies fixes to handwritten code only, and pushes via push-to-pull-request-branch safe-output.

Key constraints enforced on the agent:

  • Never modify src/generated/java/, pom.xml, scripts/codegen/, or .github/
  • Maximum 3 fix attempts before escalating via PR comment
  • Must run mvn spotless:apply before committing
  • Only pushes if mvn verify passes

Testing

edburns and others added 2 commits April 24, 2026 14:03
- Create codegen-agentic-fix gh-aw workflow for fixing mvn verify
  failures after code generation changes
- update-copilot-dependency: run mvn verify after codegen, trigger
  agentic fix on failure before creating PR
- codegen-check: commit regenerated files back to PR branch, run
  mvn verify, trigger agentic fix on failure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 18:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “agentic fix” automation path to detect and repair Java build failures that occur after code generation updates (e.g., following @github/copilot schema bumps), by dispatching a constrained gh-aw workflow when mvn verify fails.

Changes:

  • Add a new agentic workflow manifest + compiled lockfile for automated repair attempts.
  • Update codegen-check to commit regenerated outputs on PRs and dispatch the agentic fix on mvn verify failure.
  • Update update-copilot-dependency to run mvn verify post-codegen and dispatch/poll the agentic fix on failure.
Show a summary per file
File Description
.github/workflows/update-copilot-dependency.yml Runs mvn verify after codegen PR creation; dispatches and waits for the agentic fix workflow on failure.
.github/workflows/codegen-check.yml On PRs, commits regenerated outputs back to the PR branch, runs mvn verify, and dispatches the agentic fix workflow on failure.
.github/workflows/codegen-agentic-fix.md Defines the gh-aw agent prompt/guardrails and the 3-attempt fix loop.
.github/workflows/codegen-agentic-fix.lock.yml Compiled gh-aw lockfile with pinned actions/images and safe-output wiring.

Copilot's findings

Comments suppressed due to low confidence (3)

.github/workflows/codegen-check.yml:154

  • The polling loop always reads the most recent run for the workflow+branch (--limit=1). If there’s an existing completed run on the branch (or the new run hasn’t been created yet), this can incorrectly “complete” immediately and proceed without waiting for the newly-dispatched run. Consider capturing a timestamp before dispatch and selecting the first run with createdAt after that time, or otherwise correlating the specific run you triggered.
          for i in $(seq 1 60); do
            RUN_ID=$(gh run list \
              --workflow=codegen-agentic-fix.lock.yml \
              --branch="$BRANCH" \
              --limit=1 \
              --json databaseId,status \
              --jq '.[0].databaseId')

            STATUS=$(gh run list \
              --workflow=codegen-agentic-fix.lock.yml \
              --branch="$BRANCH" \
              --limit=1 \
              --json databaseId,status \
              --jq '.[0].status')

.github/workflows/update-copilot-dependency.yml:236

  • The wait loop has the same correlation issue as codegen-check: it always polls --limit=1 for the workflow+branch, so it can pick up a previous completed run and stop early (or miss the newly triggered run until later). Capture a start timestamp and filter by createdAt, or otherwise ensure you’re waiting on the specific run that was just dispatched.
          for i in $(seq 1 60); do
            RUN_ID=$(gh run list \
              --workflow=codegen-agentic-fix.lock.yml \
              --branch="$BRANCH" \
              --limit=1 \
              --json databaseId,status \
              --jq '.[0].databaseId')

            STATUS=$(gh run list \
              --workflow=codegen-agentic-fix.lock.yml \
              --branch="$BRANCH" \
              --limit=1 \
              --json databaseId,status \
              --jq '.[0].status')

.github/workflows/codegen-agentic-fix.md:115

  • Same truncation concern when verifying fixes (mvn verify 2>&1 | tail -100): if the remaining failure is earlier in the log, the agent may not see it and can loop unproductively. Keep the full Maven output available (e.g., tee to a file) and only tail for the step summary if needed.
4. **Verify the fix:**
   ```bash
   mvn verify 2>&1 | tail -100
</details>


- **Files reviewed:** 4/4 changed files
- **Comments generated:** 4


Comment thread .github/workflows/codegen-check.yml
Comment thread .github/workflows/codegen-check.yml Outdated
Comment thread .github/workflows/update-copilot-dependency.yml
Comment thread .github/workflows/codegen-agentic-fix.md Outdated
This commit addresses all 4 review comments from
#101 (review)

---

## .github/workflows/codegen-check.yml

### Review comment (line 128): --ref "$BRANCH" will fail for PRs that don't contain the workflow file
Copilot said: `gh workflow run` with `--ref "$BRANCH"` requires
`codegen-agentic-fix.lock.yml` to exist on the PR branch. Dependabot
or codegen-only PRs branched before this workflow was added won't have
it, causing "workflow not found".

Fix applied: Removed `--ref "$BRANCH"` from the `gh workflow run` call.
GitHub now resolves the workflow file from main (default branch). The
target branch is already passed via `-f branch="$BRANCH"` input.

### Review comment (line 95): Downstream steps run even when regen push fails
Copilot said: `continue-on-error: true` on the push step means
downstream steps (mvn verify, agentic fix trigger) run against a
branch that doesn't have the regenerated code, so the agent can't
reproduce the failure.

Fix applied: Added a new step "Fail if regenerated files could not be
pushed" that exits with a clear error message and manual fix
instructions when the push is rejected (Dependabot read-only token,
fork PRs). Gated Java setup, mvn verify, and agentic fix trigger on
`steps.push-regen.outcome == 'success'`.

## .github/workflows/update-copilot-dependency.yml

### Review comment (line 211): Same --ref "$BRANCH" issue as codegen-check.yml
Copilot said: Same dispatch-from-PR-branch problem.

Fix applied: Removed `--ref "$BRANCH"` from the `gh workflow run` call,
identical to the codegen-check.yml fix.

## .github/workflows/codegen-agentic-fix.md

### Review comment (line 85): mvn verify output truncated by tail -100
Copilot said: `mvn verify 2>&1 | tail -100` hides root compilation
errors that appear early in the log. The agent may loop unproductively
without seeing the actual cause.

Fix applied: Changed both `mvn verify` invocations (Step 1 reproduce
and Step 2 verify) from `tail -100` to `tee /tmp/mvn-verify.log`.
Added guidance telling the agent to check the full log file for root
causes. Recompiled the lockfile with `gh aw compile`.

## .github/workflows/codegen-agentic-fix.lock.yml

Recompiled via `gh aw compile` to reflect the updated .md instructions.

Signed-off-by: Ed Burns <edburns@microsoft.com>
@edburns edburns merged commit 1ab1510 into main Apr 24, 2026
8 checks passed
@edburns edburns deleted the edburns/dd-2955542-agentic-codegen branch April 24, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants