feat: Closed PR Review Auto-Improver (automated feedback loop)#1755
feat: Closed PR Review Auto-Improver (automated feedback loop)#1755nick-inkeep merged 20 commits intomainfrom
Conversation
Automated system that analyzes human reviewer feedback after PRs are merged to identify generalizable improvements for the pr-review-* subagent system. - Workflow triggers on merged PRs, extracts human/bot comments - Agent applies 4-criteria generalizability test - Creates draft PRs with improvements to pr-review-*.md files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Include diffHunk in GraphQL query (shows code each comment is on) - Add Phase 2 "Deep-Dive on Promising Comments" with explicit guidance: - Read the full file to understand broader context - Grep for schemas/types/patterns mentioned in comments - Understand the anti-pattern before judging generalizability - Update Tool Policy to emphasize context gathering - Renumber phases (now 6 phases total) The agent now actively investigates each comment rather than judging based on comment text alone. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Based on write-agent skill guidance: 1. Add near-miss example (questions/discussions ≠ reviewer feedback) 2. Strengthen Role & Mission - describe what "excellence looks like" 3. Failure modes now use contrastive examples (❌ vs ✅) 4. Phase 2 now checklist format with stop condition 5. Example shows completed checklist, not just steps Key insight: "Stop here if you can't articulate a clear principle" prevents vague improvements from polluting reviewers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Phase 2 now uses git rev-list + git show to see code at comment time - Progressive gathering: diffHunk → full file → PR diff → other files - GraphQL query now includes createdAt for all comment types - Added git rev-list and git show to allowedTools This ensures the agent sees what the human reviewer saw, not the final merged state which may have fixes applied. Co-Authored-By: Claude <noreply@anthropic.com>
Two exit paths at each level: - EXIT A: Not generalizable (repo-specific, one-off bug, style preference) - EXIT B: Pattern found (can articulate anti-pattern + universal principle) Includes decision flow diagram and two contrasting examples showing early exit (repo-specific DateUtils) vs pattern discovery (type/schema DRY). Co-Authored-By: Claude <noreply@anthropic.com>
- Role & Mission: Add "what the best human analyst would do" section - Failure modes: Add "Asserting when uncertain" with contrastive example - Generalizability: Add confidence calibration guidance - Add explicit conservative default: "when torn, choose lower confidence" Per write-agent skill review: personality should describe best human behavior, failure modes should include asserting when uncertain (relevant for classification tasks). Co-Authored-By: Claude <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Pattern extracted from PR #1737 human reviewer feedback (amikofalvy): - Types should derive from Zod schemas using z.infer<typeof schema> - Use Pick/Omit/Partial instead of manually redefining type subsets - Extract shared enum/union schemas instead of inline string literals Changes: - pr-review-types.md: New anti-pattern + analysis step 6 with detection patterns - pr-review-consistency.md: Extended "Reuse" section to cover types This demonstrates the closed-pr-review-auto-improver output — these are the exact changes the agent proposed when run against PR #1737. Co-Authored-By: Claude <noreply@anthropic.com>
Extended "Schema-Type Derivation Discipline" to cover full spectrum: - Zod/validation schemas (z.infer) - Database schemas (Prisma, Drizzle generated types) - Internal packages (@inkeep/*, shared types) - External packages/SDKs (OpenAI, Vercel AI SDK) - Function signatures (Parameters<>, ReturnType<>) - Existing domain types (Pick, Omit, Partial) Added table format for clarity and comprehensive detection patterns. Co-Authored-By: Claude <noreply@anthropic.com>
Expanded type derivation guidance based on actual patterns found in agents repo: - Awaited<ReturnType<>> for async function returns - keyof typeof for constants-derived types - interface extends and intersection (&) for composition - Discriminated unions with type guards - satisfies operator for type-safe constants - Re-exports for API surface boundaries - Type duplication detection signals Patterns sourced from agents-api codebase analysis including: - env.ts, middleware/*, types/app.ts, domains/run/* Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added guidance for Zod schema extension/derivation patterns based on
codebase research (packages/agents-core/src/validation/schemas.ts):
- .extend() for adding/overriding fields
- .pick()/.omit() for field subsetting
- .partial() for Insert → Update schema derivation
- .extend().refine() for cross-field validation
- Anti-patterns: parallel schemas, duplicated fields
Examples from codebase:
- SubAgentInsertSchema.extend({ id: ResourceIdSchema })
- SubAgentUpdateSchema = SubAgentInsertSchema.partial()
- StopWhenSchema.pick({ transferCountIs: true })
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clear separation of concerns: - pr-review-types: Illegal states, invariants, unsafe narrowing - pr-review-consistency: DRY, schema reuse, convention conformance Moved to consistency: - Zod schema composition patterns (.extend, .pick, .partial) - Type derivation detection signals - satisfies operator, re-exports conventions Kept in types (type safety focus): - Discriminated unions vs optional fields (prevents illegal states) - Type guards vs unsafe `as` assertions - Detection of union types without discriminants Added cross-reference note in types agent pointing to consistency for derivation/DRY concerns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…and Phase 5.5 - Add skills: pr-review-subagents-available, pr-review-subagents-guidelines, find-similar-patterns - Add proper exit states at Phase 1, 2, and 4 (embedded in workflow, not separate section) - Add Phase 5 step 2: "Find examples of the pattern" with judgment guidance - Add Phase 5.5: Full file review & integration planning (scope fit, duplication check) - Update output contract with detailed JSON structure and exit examples - Add reviewer tagging to close the feedback loop Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The pr-review-consistency.md and pr-review-types.md improvements belong in PR #1759, not this auto-improver feature branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move agent and skills to inkeep/internal-cc-plugins for CI/CD-only loading:
- Removed: .claude/agents/closed-pr-review-auto-improver.md
- Removed: .agents/skills/{find-similar-patterns,pr-review-subagents-available,pr-review-subagents-guidelines}/
Updated workflow:
- Added step to clone inkeep/internal-cc-plugins
- Added --plugin-dir flag to load agent from plugin
Prerequisites before merging:
1. Create private repo: inkeep/internal-cc-plugins
2. Push plugin content to new repo
3. Add GH_PAT_PLUGINS secret to inkeep/agents
Co-Authored-By: Claude <noreply@anthropic.com>
GitHub Apps provide better security and maintainability: - 8-hour token lifetime (vs days/infinite for PATs) - No user account dependency (survives personnel changes) - Zero manual rotation (tokens generated fresh each run) - Scales to N plugins without additional credentials Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add manual trigger with pr_number input - Add Get PR Metadata step to fetch data via API (works for both triggers) - Update all PR references to use the new metadata outputs - Enables testing against historical PRs like #1737 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @nick-inkeep's task in 4m 9s —— View job PR Review Summary5 Key Findings | Risk: Medium 🟠
|
| Location | Issue | Reason Excluded |
|---|---|---|
yml:172-186 |
Diff truncation at 200KB vs 500KB in other workflow | Likely intentional given different use case; lower is more conservative |
yml:18-21 |
Missing id-token: write permission |
May not be needed since this workflow doesn't use OIDC |
Discarded as invalid or not applicable
| Location | Issue | Reason Excluded |
|---|---|---|
pull_request_target security |
Potential for secret exfiltration | Mitigated by merged == true check; only runs on approved code |
| Agent has Edit/Write tools | Broad permissions | Acceptable since it creates draft PRs requiring human approval |
💡 APPROVE WITH SUGGESTIONS
Summary: This is a well-designed feedback loop system with sound architecture. The security model (merged PRs only + draft PRs for outputs) is appropriate. The three major findings are all straightforward fixes: (1) validate merge status for workflow_dispatch, (2) use unique HEREDOC delimiters, and (3) align action versions. None are blocking, but addressing them before merge would improve robustness. Excited to see the pr-review agents learning from human feedback! 🎓
Completed Tasks
- Analyzed PR context and understood the changes
- Explored codebase for relevant patterns and architecture
- Performed security review (
pull_request_targetusage, tokens) - Performed standards and quality review (shell scripting, error handling)
- Checked consistency with existing workflows
- Aggregated and filtered findings
- Posted final review summary
Fixes from Claude Code review: 1. Add merge validation for workflow_dispatch (Major) - Prevents analyzing unmerged PRs via manual trigger - Validates PR is merged before proceeding 2. Use unique HEREDOC delimiters (Major) - Prevents collision if PR body/comments contain "EOF" - Uses unique suffixes like __BODY_DELIM_7f3a9b2c__ 3. Pin claude-code-action to SHA (Major) - Aligns with claude-code-review.yml for consistency - Tracks issue #892 for AJV validation bug 4. Add concurrency control (Minor) - Prevents race conditions on concurrent runs - Groups by PR number, doesn't cancel in-progress 5. Add shell error handling (Minor) - set -eo pipefail in all shell blocks - Fail fast on command errors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @nick-inkeep's task in 3m 53s —— View job PR Review Summary0 Key Findings | Risk: Low ✅ Previous Issues AddressedAll 5 issues from the previous review have been properly addressed:
🎯 Quality ObservationsSecurity Model - Well-designed:
Improvements Over Existing Patterns - This workflow actually demonstrates better practices than the existing
🟡 Optional EnhancementOne optional improvement to consider (not blocking): Debug artifact upload - The - name: Upload Debug Artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: auto-improver-debug-pr-${{ steps.pr-meta.outputs.pr_number }}-${{ github.run_id }}
path: ${{ steps.analyze.outputs.execution_file }}
retention-days: 7
if-no-files-found: ignoreThis is purely optional and doesn't affect functionality. Other Findings (2)Potentially valid but out of scope
Discarded as not applicable
✅ APPROVESummary: All 5 issues from the previous review have been properly addressed. The workflow now has robust error handling, secure patterns, and unique HEREDOC delimiters. In fact, this workflow demonstrates better practices than some existing workflows in the repo. The architecture is sound - plugin isolation via private repo, GitHub App authentication, and draft PR output requiring human approval. Ready to merge! 🎉 Completed Tasks
|
Uploads execution logs when workflow fails, matching pattern from claude-code-review.yml. 7-day retention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @nick-inkeep's task in 3m 58s —— View job PR Review Summary0 Key Findings | Risk: Low ✅ All Previous Issues Remain AddressedVerified all 5 issues from the initial review are still properly implemented in the current commit (454eff2):
🎯 Quality AssessmentSecurity Model — Well-designed:
Consistency with Existing Patterns — This workflow demonstrates improved practices:
Architecture — Sound design:
🔍 Observations (Not Blocking)
Other Findings (0)Discarded as not applicable
✅ APPROVESummary: This PR is ready to merge. All previously identified issues have been properly addressed and verified. The workflow implements robust security patterns (merged-PR-only trigger, GitHub App auth, tool allowlisting), comprehensive error handling, and proper concurrency control. The architecture—plugin isolation via private repo with draft PR outputs—is well-designed for an automated feedback loop system. This workflow actually demonstrates better practices than some existing workflows in the repo! 🎉 Completed Tasks
|
Summary
Introduces an automated system that learns from human reviewers to continuously improve our AI code review agents.
The Problem
When human reviewers catch issues that our
pr-review-*agents miss, that knowledge currently dies with the PR. We manually noticed patterns like "Type Definition Discipline" (PR #1737) but there's no systematic way to:The Solution
A GitHub Actions workflow that triggers after every PR merge (or manually for testing):
Key innovation: Git time-travel — The agent reconstructs what the human reviewer saw at comment time (not the final merged state), since issues are often fixed before merge.
Architecture
Plugin isolation: The agent and skills live in a private repo (
inkeep/internal-cc-plugins) and are loaded via--plugin-dirat CI runtime. This keeps CI/CD-only capabilities out of the main workspace.Cross-repo auth: Uses a GitHub App (
inkeep-internal-ci) with 8-hour tokens instead of PATs for better security and zero manual rotation.How It Works
pull_request_target: [closed]+merged == true, orworkflow_dispatchfor manual testinginternal-cc-pluginsgit rev-list --before+git showto see code at comment timepr-review-*.mdGeneralizability Test (all must pass)
Conservative by default: Better to miss a good pattern than pollute reviewers with repo-specific noise.
Manual Testing
After merge, you can test against historical PRs:
1737for the type/schema feedback)Files Changed
.github/workflows/closed-pr-review-auto-improver.ymlPrerequisites (already set up)
inkeep-internal-cicreated with Contents:read permissioninternal-cc-pluginsrepoINTERNAL_CI_APP_ID,INTERNAL_CI_APP_PRIVATE_KEYinkeep/internal-cc-pluginswith agent + skillsDesign Decisions
Test Plan
workflow_dispatchwith historical PR number (e.g., 1737)🤖 Generated with Claude Code