Skip to content

feat: autonomous PR-watcher (Phase C)#29

Merged
michaelzwang13 merged 1 commit into
mainfrom
feat/autonomous-pr-watcher
May 24, 2026
Merged

feat: autonomous PR-watcher (Phase C)#29
michaelzwang13 merged 1 commit into
mainfrom
feat/autonomous-pr-watcher

Conversation

@michaelzwang13
Copy link
Copy Markdown
Owner

@michaelzwang13 michaelzwang13 commented May 24, 2026

Closes #9. Closes #28.

The last remaining piece of the Code Review Engineer epic (#10). A background asyncio loop in the platform's FastAPI lifespan now walks running code-review-engineer agents × their watched_repos, fetches open PRs from GitHub, dedups against reviewed_prs (Phase D), and dispatches a review task. The agent's LLM then decides to invoke github-pr-review from inside its container — the watcher is the trigger, not the bookkeeper.

What changed

New table — watched_repos (backend/migrations/004_code_review_engineer.sql, mirrored in schema.sql)

New model — WatchedRepoModel (backend/app/models/watched_repo.py)

  • create/list_by_agent/list_all/get_by_id/delete. create translates the PG unique-constraint violation into WatchedRepoExists so the router can return 409.

New router — watched_repos (backend/app/routers/watched_repos.py)

  • POST /agents/{id}/watched-repos (201) / 409 on cross-agent conflict
  • GET /agents/{id}/watched-repos
  • DELETE /agents/{id}/watched-repos/{watched_id}
  • User-side X-Api-Key auth, scoped to caller's agents (404 if not owned).

New service — PRWatcher (backend/app/services/pr_watcher.py)

  • 120s cadence, single global loop iterating across all running CRE agents.
  • Per-(agent, repo) error isolation: one failed GitHub call must not stop the rest.
  • 409 from the sidecar = "agent busy, defer to next tick" (no queue, no fork).
  • Startup-only 30-min staleness gate — on the first tick after platform boot, ignore PRs older than 30 min so we don't backlog-review what accumulated while the platform was down. Steady-state ticks have no time gate.

Lifespan wiring (backend/app/main.py)

  • lifespan handler starts the watcher task on boot, cancels it cleanly on shutdown.
  • PR_WATCHER_ENABLED=false opts out — used by the test suite (conftest.py) so TestClient(app) doesn't spin up a real background loop.

Memory injection rides for freeDispatcher.dispatch_task already injects agent_memory into role_context (Phase D). Every watcher-triggered review gets current memory by reusing that path.

Why a single global loop (not per-agent)

One asyncio.Task iterating across all agents serially is plenty for hackathon scale. Per-agent watchers would multiply the loop count with no real benefit, and complicate the lifespan teardown.

Why dedup stays at the agent level

reviewed_prs is still unique(agent_id, owner, repo, pr_number). #28 is prevented at the watched_repos insert, so by the time a PR is being reviewed, only one agent ever sees it. If we ever allow multi-agent overlap (we won't soon), we'd revisit reviewed_prs then — not now.

Tests

17 new — 125 → 142 total, all passing.

  • test_watched_repos.py (9): router success/auth/404/409 paths, model conflict translation.
  • test_pr_watcher.py (8): dispatch-on-fresh, dedup-skip, 409-defer, per-repo isolation, startup staleness, steady-state ignores age, no-agents short-circuit, clean cancel.

Smoke test (against live backend)

Verified end-to-end via /tmp/phaseC_smoke.py:

  1. POST → 201
  2. Same user, different agent, same repo → 409 from the live PG unique constraint (Prevent double-review when multiple agents watch the same repo #28 verified, not just unit-tested)
  3. GET lists per agent
  4. PRWatcher.tick() walks the real DB, hits the missing-GitHub-cred case, per-repo isolation swallows it — tick completes cleanly
  5. DELETE → 204
  6. After delete, the previously-blocked agent can subscribe

Migration

Same 004_code_review_engineer.sql continues to absorb the Code Review Engineer epic (Phases B, D, now C). Idempotent.

Deliberately out of scope

Test plan

  • pytest passes (142/142)
  • Smoke test against live backend
  • Manual end-to-end: hire a Code Review Engineer, subscribe to a repo with a real OAuth credential, open a PR, watch the platform logs for pr_watcher: dispatched review …

🤖 Generated with Claude Code

Background asyncio loop in the platform's FastAPI lifespan that walks
running code-review-engineer agents × their watched_repos, fetches open
PRs from GitHub, dedups against reviewed_prs, and dispatches a review
task — the last remaining piece of the Code Review Engineer epic.

  - watched_repos table (#28: unique(user_id, owner, repo) prevents two
    agents under the same user from watching the same repo and posting
    duplicate reviews)
  - WatchedRepoModel translates the PG unique-constraint violation into
    WatchedRepoExists so the router can return 409
  - watched_repos router: POST/GET/DELETE /agents/{id}/watched-repos
    under user-side X-Api-Key auth
  - PRWatcher service: 120s cadence, per-(agent,repo) error isolation,
    409-from-sidecar treated as "agent busy, defer to next tick",
    startup-only 30-min staleness gate to avoid backlog-flooding on
    platform restart
  - main.py lifespan starts/cancels the watcher; PR_WATCHER_ENABLED=false
    opt-out for tests and ad-hoc debug runs
  - 17 new tests (125 → 142, all passing)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@michaelzwang13 michaelzwang13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

PR #29: autonomous PR-watcher (Phase C)

This PR implements the final piece of the Code Review Engineer epic - a background asyncio loop that polls GitHub for open PRs and dispatches review tasks to running agents.

Review

Strengths:

  • Clean architecture with proper separation of concerns
  • Good error isolation (per-repo failures do not break the loop)
  • Smart deduplication via unique(user_id, owner, repo) constraint prevents double reviews (#28)
  • Proper lifespan management with graceful shutdown
  • 30-minute startup staleness gate prevents backlog flooding on platform restart
  • 409 handling for busy agents is thoughtful
  • Good test coverage (17 new tests, 125 to 142)

Minor notes:

  1. The broad except Exception with string matching for duplicate key errors in watched_repo.py is a bit fragile, but acceptable given Supabase client limitations
  2. Consider adding a TypedDict for list_running_by_role() return type for better type safety

Overall: Well-structured implementation with clean code, proper documentation, and thoughtful edge case handling. LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent double-review when multiple agents watch the same repo Phase C — Autonomous PR review

1 participant