Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

### Feat

- **checkup**: implement a PR-head freshness lease for `pdd checkup --pr` so the loop automatically reruns when the remote PR head advances mid-checkup. If the local fix can be cleanly rebased onto the new head, the loop keeps the rebased path and reruns the verifier; if a rebase conflict occurs or if in `--no-fix` mode the head is no longer fresh, the loop discards the stale worktree and cached state and performs a full rerun from the latest head, with a bounded retry limit of 2 full reruns (#1116).

### Fix

- **checkup**: enforce a SHA-backed verification trust boundary in `pdd checkup --pr --review-loop` so unverified fixer attempts are never rendered as completed fixes. `FixResult` now carries `fixer_result`/`push_status`/`local_fixer_commit_sha`/`pushed_head_sha`, `ReviewLoopState` carries `verified_head_sha`/`remote_pr_head_sha`/`verification_status_by_round`, and the final report renders fixed-field `### Fixes Attempted` bullets plus header `verified-head-sha:` / `remote-pr-head-sha:` lines. Before promoting `fresh-final-review: clean` or `verification=verified`, the loop re-fetches the remote PR head and downgrades to `verification=unverified` on mismatch or budget exhaustion (#1088).
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2788,9 +2788,13 @@ In **issue mode**, each step posts its findings as a comment on the GitHub issue

**PR Mode**: Use `--pr` and `--issue` to run checkup against an existing PR and the issue it is intended to resolve. By default this runs the full standard checkup flow on the PR worktree, commits eligible generated fixes, pushes them back to the same PR branch, and skips step 8 because the PR already exists. Add `--no-fix` when you want verification-only behavior. PR mode reloads `architecture.json` and `.pddrc` from the PR worktree before running, so audits see the PR's project state and not the parent checkout's.

PR mode enforces a **PR-head freshness lease** so the run automatically reruns when the remote PR head advances mid-checkup (#1116). Before Step 7 (Verification) and before the final push/report, the orchestrator re-fetches the remote PR head SHA. If the head has advanced and the local fix can be cleanly rebased onto the new head (only during the push phase), the loop keeps the rebased tree and reruns the verifier. If a rebase conflict occurs, or if the head advances before a local commit exists, the loop discards the stale worktree and cached state and performs a full rerun from the latest head (bounded to 2 full reruns).

The final report header always includes `verified-head-sha:` (the PR head SHA the verifier most recently cleared) and `remote-pr-head-sha:` (the current PR head observed at final-report render time) to support SHA-backed verification. Before rendering `fresh-final-review: clean` or `verification=verified`, the orchestrator re-fetches the remote PR head; if it has advanced past the verified SHA, affected findings are marked `unverified` and the clean-review status is downgraded.

**Review-Loop Mode**: Add `--review-loop` to PR mode when you want PDD to use a primary reviewer and separate fixer on the same PR. The loop uses one isolated worktree for the PR branch, treats the active reviewer as the authority, sends every valid finding to the fixer, commits and pushes successful fixes back to the PR head ref, then re-runs the active reviewer to verify the fixes and perform another full PR review. Failed pushes abort before verification, and active reviewer provider failures remain not-clean. Use `--reviewer-fallback` when the primary reviewer is known to be fragile in a given environment; the fallback reviewer is opt-in and distinct from the fixer.

The review loop enforces a SHA-backed verification trust boundary so an unverified fixer attempt is never rendered as a completed fix (#1088). Each `### Fixes Attempted` bullet in the final report is rendered in the fixed-field form `fixer_result=… push_status=… local_sha=… pushed_sha=… verification=verified|unverified [summary=…]`; bare `success`/`fixed` tokens from fixer output never appear as the leading status. The report header always includes `verified-head-sha:` (the PR head SHA the verifier most recently cleared) and `remote-pr-head-sha:` (the current PR head observed at final-report render time). Before rendering `fresh-final-review: clean` or `verification=verified`, the loop re-fetches the remote PR head; if it has advanced past the verified SHA, affected bullets are forced to `verification=unverified` and `fresh-final-review` is downgraded so the report cannot claim a fresh clean review on a head the verifier never saw. If budget/cost/time exhausts after a fixer push but before the verifier clears, affected findings remain open and the report renders `verification=unverified` rather than a misleading `success`.
Each `### Fixes Attempted` bullet in the final report is rendered in the fixed-field form `fixer_result=… push_status=… local_sha=… pushed_sha=… verification=verified|unverified [summary=…]`; bare `success`/`fixed` tokens from fixer output never appear as the leading status. If budget/cost/time exhausts after a fixer push but before the verifier clears, affected findings remain open and the report renders `verification=unverified` rather than a misleading `success`.

Example:
```bash
Expand Down
6 changes: 3 additions & 3 deletions architecture.json
Original file line number Diff line number Diff line change
Expand Up @@ -7539,7 +7539,7 @@
},
{
"reason": "Implements the PR-mode primary-reviewer/fixer review loop for pdd checkup.",
"description": "Runs one primary reviewer such as Codex against an existing PR, sends the reviewer's actionable findings to a separate fixer such as Claude, then has the active reviewer evaluate the fixer's changes and rationales in the same loop-owned worktree until clean or max rounds. An optional reviewer_fallback can take over once when the primary reviewer cannot complete, while preserving the original primary status as non-active diagnostic state; on a successful takeover the superseded primary's Per-Reviewer Status row is annotated `(optional, superseded by <fallback>)` so downstream verdict adapters resolve the report to ship_degraded rather than unknown. An optional fixer_fallback is the analog for the fix step: when the primary fixer reports success=False (typical cause: Claude Code subscription-tier credential exhaustion classified as credential-limit) the loop captures the pre-fix HEAD SHA, resets the worktree (`git reset --hard <pre_fix_sha>` + `git clean -fd`) so the failed primary's partial edits do not leak, alias-normalizes the fallback role, refuses it if it equals the primary fixer, the active reviewer, or the originally configured reviewer (preserved as ReviewLoopState.original_reviewer), then invokes the fallback once. On success the fallback role becomes ReviewLoopState.active_fixer and drives every subsequent round's fix step (the one-shot fallback is consumed); both the failed primary FixResult and the fallback FixResult are appended to state.fixes for an honest audit trail. When the fallback also fails its FixResult is still appended before the loop breaks; failed-fixer summaries are routed through _defang_adapter_trip_wires before rendering so [CRITICAL], issue_aligned: false, and max-*-reached: true tokens from raw subprocess output cannot downgrade an otherwise-verified fallback verdict. Review-only mode runs just the primary reviewer first pass without fixer, commit, or push. The parser filters findings that only describe external GitHub/CI readiness state with no repository-file evidence, so status-only auto-heal, Cloud Build, mergeability, and pending/action-required required-check findings do not keep the fixer loop running. The fixer may return fixed, not_valid, partially_fixed, or blocked, but only the active reviewer can close a finding. Issue #1088 trust boundary: findings only flip to status=fixed after the verifier clears the exact SHA the fixer just pushed. ReviewLoopState carries verified_head_sha (the pushed SHA the verifier most recently cleared), remote_pr_head_sha (observed at render time via a single _fetch_pr_metadata re-fetch), and verification_status_by_round (per-round outcome in {verified, unverified, stale, skipped}). When the fixer left the worktree unchanged (`No changes to push.`) or `git rev-parse HEAD` returns empty after a reported push, the loop skips the verifier, marks the round skipped, and stops; verification never runs against a SHA that did not land. When _finalize re-fetches the PR head and observes either a SHA mismatch or a missing head_sha, fresh-final downgrades to missing, any verified rounds are recorded as stale, and fixed findings revert to open in final-state.json so downstream consumers cannot treat the stale verifier verdict as ship-ready. FixResult and the per-round round-N-fix-...findings.json artifact persist fixer_result, push_status, local_fixer_commit_sha, and pushed_head_sha (rendered explicitly, even when null) so the on-disk audit trail matches the prompt contract. The loop persists prompt+output+normalized-findings+dedup-state artifacts per round (and a final-state.json), pushes fixes back to the PR head ref via the shared token-refresh helper from agentic_e2e_fix_orchestrator, and emits a Step-7-compatible final report whose findings table contains only remaining/unfixed findings.",
"description": "Runs one primary reviewer such as Codex against an existing PR, sends the reviewer's actionable findings to a separate fixer such as Claude, then has the active reviewer evaluate the fixer's changes and rationales in the same loop-owned worktree until clean or max rounds. An optional reviewer_fallback can take over once when the primary reviewer cannot complete, while preserving the original primary status as non-active diagnostic state; on a successful takeover the superseded primary's Per-Reviewer Status row is annotated `(optional, superseded by <fallback>)` so downstream verdict adapters resolve the report to ship_degraded rather than unknown. An optional fixer_fallback is the analog for the fix step: when the primary fixer reports success=False (typical cause: Claude Code subscription-tier credential exhaustion classified as credential-limit) the loop captures the pre-fix HEAD SHA, resets the worktree (`git reset --hard <pre_fix_sha>` + `git clean -fd`) so the failed primary's partial edits do not leak, alias-normalizes the fallback role, refuses it if it equals the primary fixer, the active reviewer, or the originally configured reviewer (preserved as ReviewLoopState.original_reviewer), then invokes the fallback once. On success the fallback role becomes ReviewLoopState.active_fixer and drives every subsequent round's fix step (the one-shot fallback is consumed); both the failed primary FixResult and the fallback FixResult are appended to state.fixes for an honest audit trail. When the fallback also fails its FixResult is still appended before the loop breaks; failed-fixer summaries are routed through _defang_adapter_trip_wires before rendering so [CRITICAL], issue_aligned: false, and max-*-reached: true tokens from raw subprocess output cannot downgrade an otherwise-verified fallback verdict. Review-only mode runs just the primary reviewer first pass without fixer, commit, or push. The parser filters findings that only describe external GitHub/CI readiness state with no repository-file evidence, so status-only auto-heal, Cloud Build, mergeability, and pending/action-required required-check findings do not keep the fixer loop running. The fixer may return fixed, not_valid, partially_fixed, or blocked, but only the active reviewer can close a finding. Issue #1088 trust boundary: findings only flip to status=fixed after the verifier clears the exact SHA the fixer just pushed. ReviewLoopState carries verified_head_sha (the pushed SHA the verifier most recently cleared), remote_pr_head_sha (observed at render time via a single _fetch_pr_metadata re-fetch), and verification_status_by_round (per-round outcome in {verified, unverified, stale, skipped}), and pr_head_refreshes_count (autonomous rerun counter). When the fixer left the worktree unchanged (`No changes to push.`) or `git rev-parse HEAD` returns empty after a reported push, the loop skips the verifier, marks the round skipped, and stops; verification never runs against a SHA that did not land. When _finalize re-fetches the PR head and observes either a SHA mismatch or a missing head_sha, fresh-final downgrades to missing, any verified rounds are recorded as stale, and fixed findings revert to open in final-state.json so downstream consumers cannot treat the stale verifier verdict as ship-ready. FixResult and the per-round round-N-fix-...findings.json artifact persist fixer_result, push_status, local_fixer_commit_sha, and pushed_head_sha (rendered explicitly, even when null) so the on-disk audit trail matches the prompt contract. The loop persists prompt+output+normalized-findings+dedup-state artifacts per round (and a final-state.json), pushes fixes back to the PR head ref via the shared token-refresh helper from agentic_e2e_fix_orchestrator, and emits a Step-7-compatible final report whose findings table contains only remaining/unfixed findings.",
"dependencies": [
"agentic_common_python.prompt",
"agentic_change_python.prompt",
Expand Down Expand Up @@ -7643,7 +7643,7 @@
},
{
"reason": "Multi-step orchestrator for pdd checkup — 8 steps with iterative fix-verify loop, worktree isolation, and fix-capable PR mode that pushes back to the same PR.",
"description": "Orchestrates the 8-step agentic checkup workflow: discover, deps, build, interfaces, test, fix (6.1/6.2/6.3), verify, create PR. Steps 3-7 run in an iterative while loop (max 3 iterations) with exit on 'All Issues Fixed'; exhausting the loop without that exact signal returns failure and does not push fixes or create/skip through a successful PR gate. Supports resume via workflow state persistence, git worktree isolation, --no-fix mode, and PR verification mode. In PR mode it checks out the PR head in a dedicated worktree, runs the full fix-capable checkup on that PR code, commits eligible generated fixes, pushes them back to the same PR branch, re-runs Step 7 after a rebase-on-updated-head push, posts final PR/issue reports only after the pushed PR head is verified, and skips Step 8 because the PR already exists.",
"description": "Orchestrates the 8-step agentic checkup workflow: discover, deps, build, interfaces, test, fix (6.1/6.2/6.3), verify, create PR. Steps 3-7 run in an iterative while loop (max 3 iterations) with exit on 'All Issues Fixed'; exhausting the loop without that exact signal returns failure and does not push fixes or create/skip through a successful PR gate. Supports resume via workflow state persistence, git worktree isolation, --no-fix mode, and PR verification mode. In PR mode it checks out the PR head in a dedicated worktree, runs the full fix-capable checkup on that PR code, commits eligible generated fixes, pushes them back to the same PR branch, re-runs Step 7 after a rebase-on-updated-head push, posts final PR/issue reports only after the pushed PR head is verified, and skips Step 8 because the PR already exists. Issue #1116: implements PR-head freshness lease with autonomous reruns.",
"dependencies": [
"agentic_common_python.prompt",
"checkup_review_loop_python.prompt"
Expand Down Expand Up @@ -7779,7 +7779,7 @@
},
{
"name": "_build_state",
"signature": "(issue_number: int, issue_url: str, last_completed_step: Union[int, float], step_outputs: Dict[str, str], total_cost: float, model_used: str, github_comment_id: Optional[int], changed_files: Optional[List[str]] = None, worktree_path: Optional[Path] = None, fix_verify_iteration: int = 0, previous_fixes: str = \"\", mode: str = \"issue\", pr_number: Optional[int] = None, pr_owner: Optional[str] = None, pr_repo: Optional[str] = None, pr_head_sha: Optional[str] = None) -> Dict",
"signature": "(issue_number: int, issue_url: str, last_completed_step: Union[int, float], step_outputs: Dict[str, str], total_cost: float, model_used: str, github_comment_id: Optional[int], changed_files: Optional[List[str]] = None, worktree_path: Optional[Path] = None, fix_verify_iteration: int = 0, previous_fixes: str = \"\", mode: str = \"issue\", pr_number: Optional[int] = None, pr_owner: Optional[str] = None, pr_repo: Optional[str] = None, pr_head_sha: Optional[str] = None, pr_head_refreshes_count: int = 0) -> Dict",
"returns": "Dict",
"sideEffects": [
"None"
Expand Down
6 changes: 6 additions & 0 deletions context/commit_and_push_if_changed_rule.prompt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
10. `_commit_and_push_if_changed` MUST:
- Stage tracked edits with `git add -u`, then add untracked files explicitly via `git add -- <paths>` EXCEPT those matching `_is_untracked_pdd_meta_artifact` (the helper that filters stray `.pdd/meta/*.json` fingerprints from out-of-scope modules). Never use `git add -A` — it would leak unrelated metadata artifacts into the fix commit. Then commit with the explicit bot identity `PDD Bot <pdd-bot@users.noreply.github.com>` only when staged changes remain.
- Resolve `(clone_url, head_ref)` from `gh api repos/{pr_owner}/{pr_repo}/pulls/{pr_number}` via `_fetch_pr_metadata`. Missing values abort with a clear message.
- Delegate the push itself to the shared retry helper, passing the head repo's `clone_url`, `HEAD:{head_ref}` as the refspec, and `force_with_lease_on_non_fast_forward=False`.
- `_commit_and_push_if_changed` returns `(pushed: bool, message: str, pushed_head_sha: Optional[str], rebase_conflicted: bool)`.
- If a push fails because the PR branch advanced remotely (`fetch first`, `non-fast-forward`, remote contains work, or branch-behind wording), MUST NOT force-push over the remote commit. Instead fetch the exact PR branch ref as `refs/heads/{head_ref}` into `FETCH_HEAD` (retrying fetch with the same token sources as push auth if needed), then rebase only the newly-created fixer commit onto the fetched head using plain `git rebase --onto FETCH_HEAD HEAD~1 HEAD`, invoked with the same explicit `-c user.name=PDD Bot -c user.email=pdd-bot@users.noreply.github.com` overrides the commit step uses because the worktree's git config cannot be trusted to carry a usable committer identity (under pytest's HOME isolation and on cloud-batch runs Linux git refuses the auto-detected `root@<host>.(none)` identity and aborts the rebase). This prevents a force-pushed PR head from resurrecting old commits that the remote intentionally dropped. Retry the same non-force push after rebase, allowing a small bounded number of fetch/rebase/retry cycles in case another checkup attempt or bot advances the branch again between the fetch and retry. If rebase fails due to conflict, set `rebase_conflicted = True` and abort before verifier because the PR still does not contain the local fix. Cap the git push/rebase retry budget at up to three push attempts (two intermediate rebases) (this is distinct from the full workflow refresh budget) and surface the final error redacted via `_redact_secret` against `_github_token_from_env()` so tokenized URLs that git may echo back from the push helper never leak into the user-visible message. If `git rebase --abort` itself fails during cleanup, append its error to the rebase-failure message instead of swallowing it. The retry loop MUST capture the fixer commit's SHA via `git rev-parse HEAD` immediately after the bot commit succeeds, and BEFORE each rebase retry MUST `git reset --hard <fixer_sha>` so the worktree's HEAD always points at the original fixer commit before the rebase runs. This is required because git can fast-forward or drop the fixer commit as empty when the fetched PR head already contains the patch, leaving HEAD on a remote commit; without the reset, a subsequent retry's `HEAD~1..HEAD` range would describe that remote commit and could resurrect work a maintainer force-pushed away.
Loading
Loading