Track actionable review follow-ups#816
Conversation
|
fullsend review is working on this — view logs |
Site previewPreview: https://b384940e-site.fullsend-ai.workers.dev Commit: |
ralphbean
left a comment
There was a problem hiding this comment.
Review summary
This is a well-structured feature that directly addresses #789. The dedup strategy (content-addressed SHA-256 markers), the forge abstraction, and the test coverage for core flows are all solid. Three items need attention before merge; three more are noted for follow-up.
Request changes
truncate()slices on byte offsets — will produce invalid UTF-8 on multi-byte characters in issue titles.ListOpenIssuesfetches all open issues in the repo for dedup. At minimum, filter by thetype/chorelabel to reduce API calls and payload size.- The 422 retry in
CreateIssuecatches all 422 errors, not just label validation. Should either inspect the error response or document the intentional broadness.
Notes (deferred)
- Follow-up issue creation failure causes
post-reviewto exit non-zero even though the formal review already succeeded. Consider making this a soft failure. - The dedup loop recomputes SHA-256 markers in O(N×M) — precomputing markers before the loop is an easy optimization.
- Test coverage gaps: no unit tests for
truncate/compactWhitespacehelpers, no golden-value test for marker hash stability, no error-path tests forListOpenIssues.
| type reviewFollowupIssue struct { | ||
| finding ReviewFinding | ||
| issue *forge.Issue | ||
| created bool |
There was a problem hiding this comment.
[important] truncate() uses len(s) and s[:max-3] which operate on byte offsets. If a multi-byte UTF-8 character straddles the cut point, this produces invalid UTF-8 in the GitHub issue title. Should use []rune conversion:
func truncate(s string, max int) string {
runes := []rune(s)
if len(runes) <= max {
return s
}
if max <= 3 {
return string(runes[:max])
}
return strings.TrimSpace(string(runes[:max-3])) + "..."
}| return nil | ||
| } | ||
|
|
||
| func labelNames(labels []struct { |
There was a problem hiding this comment.
[important] This fetches all open issues in the repo (up to 10,000) every time post-review runs with actionable findings. For active repos this burns rate-limit quota and adds latency.
Since all follow-up issues are created with the type/chore label, adding &labels=type/chore to the query string would significantly reduce the result set and API calls. Even better long-term would be using GitHub's search API to search for the marker prefix directly.
| if len(labels) > 0 { | ||
| payload["labels"] = labels | ||
| } | ||
| resp, err := c.post(ctx, fmt.Sprintf("/repos/%s/%s/issues", owner, repo), payload) |
There was a problem hiding this comment.
[important] The 422 retry is intentionally broad — it catches any StatusUnprocessableEntity, not just label validation errors. A 422 for body-too-long or title validation would trigger a retry without labels that will likely fail again with a confusing double error.
Consider either:
- Inspecting the GitHub error response
errorsarray forfield: "labels"before retrying, or - Adding an inline comment explaining why the broad catch is acceptable (labels are best-effort, the only label used is
type/chore).
| return submitFormalReview(cmd.Context(), client, owner, repoName, pr, parsed.Action, parsed.HeadSHA, commentURL, dryRun, printer) | ||
| if err := submitFormalReview(cmd.Context(), client, owner, repoName, pr, parsed.Action, parsed.HeadSHA, commentURL, dryRun, printer); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
[moderate] (deferred) If postApprovedFollowUpIssues fails, the entire post-review command exits non-zero even though submitFormalReview already succeeded. Since follow-up issue creation is supplementary to the review, consider logging a warning and returning nil instead of propagating the error.
| marker := reviewFollowupIssueMarker(owner, repo, finding) | ||
| if strings.Contains(issue.Body, marker) { | ||
| existingByMarker[marker] = issue | ||
| } |
There was a problem hiding this comment.
[moderate] (deferred) The nested loop recomputes reviewFollowupIssueMarker for each finding on every issue (O(N×M) hash computations). Precomputing markers into a map before the loop is straightforward:
markers := make(map[string]ReviewFinding, len(actionable))
for _, f := range actionable {
markers[reviewFollowupIssueMarker(owner, repo, f)] = f
}
for _, issue := range openIssues {
for marker := range markers {
if strings.Contains(issue.Body, marker) {
existingByMarker[marker] = issue
}
}
}| @@ -558,3 +567,177 @@ func TestSubmitFormalReview_ListErrorSkipsCleanup(t *testing.T) { | |||
| assert.Empty(t, fc.MinimizedComments) | |||
| require.Len(t, fc.CreatedReviews, 1, "review should still be created despite list error") | |||
| } | |||
There was a problem hiding this comment.
[minor] (deferred) Good coverage for the core flows. A few edge-case gaps to consider for follow-up:
- Unit tests for
truncate()boundary cases (empty string, max=0, multi-byte chars) - A golden-value test for
reviewFollowupIssueMarkerto guard against accidental hash changes - Error-path test for
ListOpenIssuesfailures
There was a problem hiding this comment.
Good catches on the test gaps. A few additional edge cases to consider:
- Non-approve actions: No test verifying that
request-changes,comment, orfailureactions don't create follow-up issues (theaction != "approve"gate at postreview.go:310) - CreateIssue error path: Missing test for when
CreateIssuefails (rate limit, permissions, GitHub outage) - High/critical with actionable: No test confirming that high/critical severity findings are skipped even when
actionable: true - Mixed scenario: No test for the common case where some findings are new and others already have issues (tests both "Created" and "Existing" sections of the summary)
|
@ggallen — you filed #789 originally. Does this implementation match what you had in mind? Specifically:
Any concerns or gaps vs. the original intent? |
|
fullsend review is working on this — view logs |
|
Thanks for the careful review. I pushed a follow-up commit addressing the requested changes:
Validation:
I left the soft-failure suggestion unchanged for now because it changes the reliability semantics: if the formal approval succeeds but follow-up issue creation fails, failing the workflow makes the missing follow-up visible instead of silently approving without tracking the actionable finding. Happy to switch that to warning-only if you prefer. |
Review: #816Head SHA: 7ca8bdf SummaryThis PR delivers exactly what #789 requests: automatic GitHub issue creation for actionable low/info findings when the review agent approves a PR, with deterministic dedup markers, a sticky PR summary comment, and best-effort label application with retry logic. The implementation is clean, well-structured, follows existing patterns (sticky comments, fake client, forge interface), and has comprehensive test coverage including creation, filtering, dedup within and across PRs, duplicate-marker warnings, dry-run, and error propagation. The interface change ( FindingsInfo
FooterOutcome: approve Previous runReview: #816Head SHA: b1a7911 SummaryThis PR cleanly implements the feature requested in #789: creating follow-up GitHub issues for actionable low/info review findings when the review agent approves a PR. The code is well-structured, the interface changes are backward-compatible (variadic FindingsInfo
FooterOutcome: approve Previous run (2)Review: #816Head SHA: 76a7b30 SummaryThis PR implements follow-up issue creation for actionable low/info review findings when a PR is approved, directly addressing #789. The implementation is well-structured: FindingsMedium / Low / InfoNo critical, high, or medium findings.
FooterOutcome: approve |
76a7b30 to
b1a7911
Compare
|
fullsend review is working on this — view logs |
|
resolved merge conflicts. |
ralphbean
left a comment
There was a problem hiding this comment.
Review Summary
This PR cleanly implements the feature requested in #789: creating follow-up GitHub issues for actionable low/info review findings when approving PRs. The implementation is well-structured with comprehensive test coverage, deterministic deduplication via SHA-256 markers, and thoughtful error handling (label retry fallback, graceful degradation).
@dhshah13 addressed the previous review feedback effectively:
- ✅ UTF-8-safe truncation
- ✅ Label filtering for issue queries
- ✅ Narrow 422 retry to label validation errors
- ✅ Precomputed markers for O(N+M) scanning
- ✅ Additional test coverage
The strategic direction is sound and aligns with the project's autonomous development goals. A few minor edge cases noted inline for follow-up work, but nothing blocking.
Approving with deferred notes for future improvements.
| for marker := range markers { | ||
| if strings.Contains(issue.Body, marker) { | ||
| existingByMarker[marker] = issue | ||
| } |
There was a problem hiding this comment.
[moderate] (deferred) The collision handling silently overwrites when multiple issues contain the same marker:
existingByMarker[marker] = issue // Overwrites if duplicate foundWhile marker collisions are extremely unlikely (SHA-256 hash), if data corruption occurs or issues are manually edited, this could cause a finding to be marked as "tracked" when it's actually lost. Consider logging a warning when collision is detected, or document this as expected behavior.
Not blocking for this PR, but worth considering for robustness.
There was a problem hiding this comment.
Good call. I went ahead and addressed this.
Duplicate markers no longer silently overwrite the prior entry during duplicate detection. The scan keeps the first matching issue deterministically, emits a warning with both issue numbers, and skips replacing the existing marker mapping.
I also added TestPostApprovedFollowUpIssues_WarnsOnDuplicateMarkers to cover the manual/corrupted duplicate-marker case and verify the PR summary continues to reuse the first tracked issue instead of drifting silently.
Validation:
go test ./internal/cli -run TestPostApprovedFollowUpIssues_WarnsOnDuplicateMarkersgo test ./internal/cli ./internal/forge/...
|
fullsend review is working on this — view logs |
ralphbean
left a comment
There was a problem hiding this comment.
Review Summary
This is a well-designed feature that directly implements the requirements from #789. The code quality is high with comprehensive test coverage and thoughtful deduplication logic.
Strategic Assessment
✅ Good idea — Solves real workflow problem of tracking actionable findings after PR approval
✅ Fits scope — Core to fullsend's autonomous development mission
✅ Implements #789 — All requirements addressed (actionable flag, deduplication, sticky summary)
Code Quality
- ✅ Only creates follow-up issues for approved reviews with actionable low/info findings
- ✅ Content-based hashing prevents duplicates across runs and PRs
- ✅ Graceful label creation fallback
- ✅ Comprehensive test coverage with edge cases
- ✅ Handles UTF-8 correctly in title truncation
- ✅ Duplicate marker warning (commit 7ca8bdf) is a good addition
Findings
I've noted a few areas for potential future improvement in inline comments below (all deferred, non-blocking):
- Race condition in duplicate detection (difficult to solve without GitHub API atomic operations)
- Field validation before hashing (schema should enforce this)
- Schema-code alignment for actionable flag (minor documentation improvement)
I also have one minor suggestion about the reviewActionToEvent function accepting both "request-changes" and "request_changes" variants when the schema only defines the hyphenated version. I'll post that as a separate comment since the function signature wasn't modified in this PR.
Recommendation: Approve. The deferred items are nice-to-haves that don't block merging this excellent feature.
| } | ||
|
|
||
| printer.StepStart("Checking for existing review follow-up issues") | ||
| openIssues, err := client.ListOpenIssues(ctx, owner, repo, "type/chore") |
There was a problem hiding this comment.
[important] (deferred) The duplicate detection has a potential race condition: if two post-review runs execute concurrently for different PRs with identical findings, both could list issues, find no duplicates, and both create duplicate issues.
While the warning at line 347 helps detect this after the fact, there's no prevention. This is difficult to solve without database-level locking or GitHub's atomic operations.
Suggestion: Document this known limitation in a code comment so future maintainers understand the tradeoff. For example:
// Note: Duplicate detection is subject to race conditions if concurrent
// post-review runs create issues for the same finding. The warning below
// helps detect this case, but prevention would require atomic operations
// not available through the GitHub API.Not blocking for this PR, but worth documenting.
| return b.String() | ||
| } | ||
|
|
||
| func reviewFollowupIssueMarker(owner, repo string, finding ReviewFinding) string { |
There was a problem hiding this comment.
[important] (deferred) The marker hash is computed without validating that required fields are non-empty. If finding.File, finding.Category, or other fields are empty strings, the hash still computes but may lead to unexpected collisions or confusing markers.
Consider adding defensive validation in postApprovedFollowUpIssues before calling this function, or add a comment documenting the assumption that findings are pre-validated by the JSON schema.
Not blocking for this PR since the schema should enforce required fields, but defensive programming would make this more robust.
| "description": { "type": "string", "minLength": 1 }, | ||
| "remediation": { "type": "string" } | ||
| "remediation": { "type": "string" }, | ||
| "actionable": { |
There was a problem hiding this comment.
[minor] (deferred) The schema allows actionable on findings of any severity, but the code in postreview.go:actionableApprovedFindings() only processes findings with low or info severity. An agent could mark a medium finding as actionable: true, but it would be silently ignored.
Consider updating the description to clarify that actionable only applies to low/info findings:
"actionable": {
"type": "boolean",
"description": "True when this low/info severity finding should be tracked as a follow-up issue if the review approves. Only applies to 'low' and 'info' severity findings."
}Not blocking, but helps align schema with implementation.
|
[minor] Additional note on The function accepts both Consider either:
Prefer option 1 for clarity unless there's a specific reason to support both. |
|
fullsend review is working on this — view logs |
|
Agreed. I checked the surrounding contract and there is no compatibility reason to keep the underscore variant: the schema, docs, and agent instructions all define I removed Validation:
|
ralphbean
left a comment
There was a problem hiding this comment.
Review Summary
Well-designed feature that directly implements #789. The deduplication strategy (content-addressed SHA-256 markers), forge abstraction, and test coverage are all solid. The author addressed all prior blocking feedback (UTF-8 truncation, label filtering, narrow 422 retry, precomputed markers, duplicate marker warnings).
Strategic assessment
- Good idea — solves a real workflow gap where non-blocking findings vanish after merge
- In scope — core to fullsend's autonomous development mission
- Implements #789 — all requirements addressed
One deferred note on issue creation volume; otherwise LGTM.
Approving.
| printer.StepDone("Duplicate check complete") | ||
|
|
||
| results := make([]reviewFollowupIssue, 0, len(actionable)) | ||
| for _, finding := range actionable { |
There was a problem hiding this comment.
[moderate] (deferred) No cap on follow-up issues per review run. If a review agent emits many actionable low/info findings (e.g., 50+), this loop creates all of them as GitHub issues in a single run with no rate limiting or cap. This could be noisy for maintainers and could hit GitHub API rate limits.
Consider adding a configurable cap (e.g., max 10 per run) with a warning when hit. Not blocking for this PR.
There was a problem hiding this comment.
Good call. I added a cap, using 3 as the stricter limit rather than the example 10 so one review run cannot flood maintainers.
Details:
--max-follow-up-issuesnow defaults to3.- Allowed range is
0..3;0disables follow-up issue creation and values above3are rejected. - Existing duplicate issues do not consume the creation cap.
- When the cap is hit, the sticky follow-up summary includes a warning with the skipped count.
- Added tests for the default, validation, cap behavior, and the existing-issues-do-not-consume-cap case.
Validation:
go test ./internal/cli -run 'Test(NewPostReviewCmd_MaxFollowUpIssuesDefault|ValidateMaxReviewFollowUpIssues|PostApprovedFollowUpIssues|ReviewActionToEvent|ParseReviewResult_Findings)'bash internal/scaffold/fullsend-repo/scripts/validate-output-schema-test.sh
|
fullsend review is working on this — view logs |
…up-issues # Conflicts: # internal/cli/postreview.go # internal/scaffold/fullsend-repo/agents/review.md # internal/scaffold/fullsend-repo/skills/code-review/SKILL.md # internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md
|
fullsend review is working on this — view logs |
ralphbean
left a comment
There was a problem hiding this comment.
Review Summary
Well-designed feature that directly implements #789. The dedup strategy (content-addressed SHA-256 markers), forge abstraction, test coverage, and error handling are all solid. The author addressed all prior blocking feedback across 4 review rounds.
Strategic assessment
- Good idea — solves a real workflow gap where non-blocking findings vanish after merge
- In scope — core to fullsend's autonomous development mission
- Implements #789 — all requirements addressed (actionable flag, dedup, sticky summary, always-on)
Notes (deferred, non-blocking)
One deferred note on markdown injection risk in agent-sourced content flowing into issue bodies. See inline comment.
Approving.
| return fmt.Sprintf("Follow-up from PR #%d: %s", pr, truncate(summary, 88)) | ||
| } | ||
|
|
||
| func reviewFollowupIssueBody(owner, repo string, pr int, finding ReviewFinding, marker string) string { |
There was a problem hiding this comment.
[moderate, deferred] finding.Description and finding.Remediation are rendered as raw markdown in the issue body. A compromised agent could inject misleading links or formatting. Consider sanitizing or escaping markdown-active characters in agent-sourced content before writing to GitHub issues. Low risk today since the source is a controlled agent, but worth hardening given the project's threat model prioritizes external injection.
Summary
Closes #789.
This teaches
fullsend post-reviewto preserve actionable low/info review findings after an approval by creating GitHub follow-up issues and posting a sticky PR summary.What Changed
actionableflag.loworinfofindings.Validation
go test ./internal/cli ./internal/forge/...bash internal/scaffold/fullsend-repo/scripts/validate-output-schema-test.shLive Smoke Test
Verified against real GitHub data in
dhshah13/fullsend:post-reviewrun approved the PR and created follow-up issue: Follow-up from PR #2: Document a cleanup owner for this smoke-test fixture before reusing it. dhshah13/fullsend#3post-reviewrun reused the existing issue and did not create a duplicate.Notes
This is always-on behavior, matching #789. It only creates follow-up issues for approved reviews, and only when findings are both
low/infoand explicitly markedactionable: true.