feat(gate): add pure decision kernel#47
Conversation
|
Blocker Major
Minor
Nit |
|
Blocker Major Minor Nit The prior findings are resolved: partial marker reconciliation now rejects mismatched local run IDs, and non-positive attempts are validated with test coverage. I did not find new architectural drift or product-scope issues in the fix. |
|
Findings
|
monit-reviewer
left a comment
There was a problem hiding this comment.
Automated PR Review
Reviewed commit: d4abb0b
Summary
| Reviewer | Findings |
|---|---|
| harness-engineering:harness-architecture-reviewer | 3 |
| harness-engineering:harness-enforcement-reviewer | 4 |
| harness-engineering:harness-self-documenting-code-reviewer | 2 |
harness-engineering:harness-architecture-reviewer (3 findings)
💡 Suggestion - internal/gate/gate.go:265
Resume short-circuits before decideStaleBase is called, so stale-base candidates with LockStateHeld and HeartbeatStale=true never generate their warnings when a resumable exact run is present. A potentially stuck stale-base run goes silently unnoticed whenever a resume is available. The precedence rule is undocumented and untested.
💡 Suggestion - internal/gate/gate.go:318
decideStaleBase calls validateRunShape before the resumable(run) filter, so a non-resumable stale candidate with an invalid PostMode, non-positive Attempt, or invalid FailureClass returns DecisionError instead of being silently skipped. Callers supplying denormalized terminal/aborted rows from a broader query will hit unexpected errors. Either move validation after the resumable guard or document this as a strict-inputs contract and add a test covering an invalid non-resumable candidate.
💡 Suggestion - internal/gate/gate.go:426
validatePR does not check RunID for PRStateStaleBase (the case falls through to 'return Decision{}' without a RunID check), but decidePR echoes pr.RunID in the returned Decision for that state. Every other non-fresh PR state explicitly validates RunID. An empty RunID on a stale-base marker silently produces a Decision with RunID="", losing the audit reference the stale-base message explicitly documents.
harness-engineering:harness-enforcement-reviewer (4 findings)
internal/gate/gate.go:242
DryRun is checked before the Rerun+RetryPosts mutual-exclusion guard, so passing DryRun=true combined with --rerun or --retry-posts silently returns DecisionFresh with no error, hiding an almost-certainly unintentional flag combination. The mutual-exclusion check covers only Rerun+RetryPosts. Add guards for DryRun+Rerun and DryRun+RetryPosts, or document that DryRun is always highest-priority and add tests for each silenced combination.
💡 Suggestion - internal/gate/gate.go:297
decideStaleBase only checks run.RunID == "" for resumable candidates; non-resumable stale-base entries with an empty RunID pass validateRunShape and are silently skipped without a validation error. This accepts garbage inputs inconsistently with the contract enforced by validateRun for ExactRuns. Lift the RunID check to apply before the resumable guard.
💡 Suggestion - internal/gate/gate.go:455
retryEligible selects by PostModeLive && requiredPending+requiredFailedTerminal > 0 without filtering by run state. A RunStateRunning row with non-zero pending counts satisfies the predicate; decideRetryPosts would then return DecisionRetryPosts targeting an actively-running row, risking duplicate post actions if the live run completes normally afterward. Consider excluding RunStateRunning and RunStateIncomplete from retry-posts eligibility, or add a test asserting a running row with pending counts is not selected.
💡 Suggestion - internal/gate/gate_test.go:29
No test combines a resumable exact row, stale-base candidates, and a non-trivial PR marker (e.g. PRStateCompleteReview or PRStatePartial) in the same request. The code returns on the first resumable hit before reaching stale-base or PR branches, so correctness is not in doubt, but there is no regression guard for the three-way precedence invariant. This was an open finding in the PR discussion (comment 3) that the current diff does not fully address.
harness-engineering:harness-self-documenting-code-reviewer (2 findings)
💡 Suggestion - internal/gate/gate.go:351
In the PRStatePartial switch, the explicit cases for RunStateRunning and RunStateIncomplete are unreachable: the preceding resumable(*partialRun) check already returns ErrorPartialResumableInconsistent for both states. The default branch is also unreachable because validateRun rejects unknown states before decidePR is reached. These dead arms misrepresent which states the switch actually handles.
💡 Suggestion - internal/gate/gate.go:387
decidePR for PRStateCompleteReview returns Decision{Kind: DecisionEarlyExit, RunID: pr.RunID} with no Outcome field, silently dropping any valid Outcome present on the marker. PRStateCompleteNoDiff propagates its outcome. Either forward the outcome in the EarlyExit decision or have validatePR reject a non-empty outcome for PRStateCompleteReview to make the contract explicit.
4 info-level observations excluded. Run with --verbose to include.
3 PR discussion threads considered.
Completed in 5m 42s | $1.75 | sonnet | daemon 0.2.121 | Glorfindel
| Field | Value |
|---|---|
| Model | sonnet |
| Reviewers | hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor |
| Engine | claude · sonnet |
| Reviewed by | pr-review-daemon · monit-pr-reviewer |
| Duration | 5m 42s wall · 14m 49s compute (Reviewers: 3m 37s · Synthesis: 2m 02s) |
| Cost | $1.75 |
| Tokens | 287.5k in / 56.3k out |
| Turns | 10 |
Per-workstream usage
| Workstream | Model | In | Out | Cache read | Cache create | Cost |
|---|---|---|---|---|---|---|
| hybrid-synthesis | sonnet | 32.5k | 8.1k | 18.6k | 13.8k (1h) | $0.19 |
| harness-engineering:harness-architecture-reviewer | sonnet | 54.4k | 13.3k | 6.9k | 47.5k (1h) | $0.39 |
| harness-engineering:harness-enforcement-reviewer | sonnet | 58.0k | 10.2k | 6.9k | 51.1k (1h) | $0.36 |
| harness-engineering:harness-knowledge-reviewer | sonnet | 60.5k | 12.4k | 6.9k | 53.6k (1h) | $0.40 |
| harness-engineering:harness-self-documenting-code-reviewer | sonnet | 56.8k | 8.7k | 6.9k | 49.9k (1h) | $0.33 |
| security:security-code-auditor | haiku | 25.4k | 3.6k | 0 | 25.4k (1h) | $0.07 |
Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.
| AbortStaleRunIDs []string | ||
| Warnings []string | ||
| ErrorReason ErrorReason | ||
| FailureClass FailureClass |
There was a problem hiding this comment.
🟡 Medium (harness-engineering:harness-enforcement-reviewer): DryRun is checked before the Rerun+RetryPosts mutual-exclusion guard, so passing DryRun=true combined with --rerun or --retry-posts silently returns DecisionFresh with no error, hiding an almost-certainly unintentional flag combination. The mutual-exclusion check covers only Rerun+RetryPosts. Add guards for DryRun+Rerun and DryRun+RetryPosts, or document that DryRun is always highest-priority and add tests for each silenced combination.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - reject dry-run with rerun or retry-posts.
| return decision | ||
| } | ||
|
|
||
| if run, ok := latestMatchingRun(req.ExactRuns, resumable); ok { |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): Resume short-circuits before decideStaleBase is called, so stale-base candidates with LockStateHeld and HeartbeatStale=true never generate their warnings when a resumable exact run is present. A potentially stuck stale-base run goes silently unnoticed whenever a resume is available. The precedence rule is undocumented and untested.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - added explicit resume precedence coverage.
| return decision | ||
| } | ||
| var supersede []string | ||
| for _, run := range runs { |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-enforcement-reviewer): decideStaleBase only checks run.RunID == "" for resumable candidates; non-resumable stale-base entries with an empty RunID pass validateRunShape and are silently skipped without a validation error. This accepts garbage inputs inconsistently with the contract enforced by validateRun for ExactRuns. Lift the RunID check to apply before the resumable guard.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - stale candidates now require run IDs.
| return errorDecision(ErrorRetryPostsIneligible, "no live run has required pending or failed_terminal actions") | ||
| } | ||
| return Decision{Kind: DecisionRetryPosts, RunID: run.RunID} | ||
| } |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): decideStaleBase calls validateRunShape before the resumable(run) filter, so a non-resumable stale candidate with an invalid PostMode, non-positive Attempt, or invalid FailureClass returns DecisionError instead of being silently skipped. Callers supplying denormalized terminal/aborted rows from a broader query will hit unexpected errors. Either move validation after the resumable guard or document this as a strict-inputs contract and add a test covering an invalid non-resumable candidate.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - strict stale candidate validation is tested.
| } | ||
| return warnings, Decision{} | ||
| } | ||
|
|
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): In the PRStatePartial switch, the explicit cases for RunStateRunning and RunStateIncomplete are unreachable: the preceding resumable(*partialRun) check already returns ErrorPartialResumableInconsistent for both states. The default branch is also unreachable because validateRun rejects unknown states before decidePR is reached. These dead arms misrepresent which states the switch actually handles.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - simplified partial-state dispatch.
| return invalidInput(fmt.Sprintf("partial marker local row %q has unsupported state %q", partialRun.RunID, partialRun.State)) | ||
| } | ||
| default: | ||
| return invalidInput(fmt.Sprintf("invalid PR state %q", pr.State)) |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): decidePR for PRStateCompleteReview returns Decision{Kind: DecisionEarlyExit, RunID: pr.RunID} with no Outcome field, silently dropping any valid Outcome present on the marker. PRStateCompleteNoDiff propagates its outcome. Either forward the outcome in the EarlyExit decision or have validatePR reject a non-empty outcome for PRStateCompleteReview to make the contract explicit.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - reject outcomes on complete-review markers.
| if !run.FailureClass.Valid() { | ||
| return invalidInput(fmt.Sprintf("run %q has invalid failure class %q", run.RunID, run.FailureClass)) | ||
| } | ||
| return Decision{} |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-architecture-reviewer): validatePR does not check RunID for PRStateStaleBase (the case falls through to 'return Decision{}' without a RunID check), but decidePR echoes pr.RunID in the returned Decision for that state. Every other non-fresh PR state explicitly validates RunID. An empty RunID on a stale-base marker silently produces a Decision with RunID="", losing the audit reference the stale-base message explicitly documents.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - stale-base PR markers require run IDs.
| } | ||
| if !pr.Outcome.RealVerdict() { | ||
| return invalidInput("partial marker must carry a real verdict outcome") | ||
| } |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-enforcement-reviewer): retryEligible selects by PostModeLive && requiredPending+requiredFailedTerminal > 0 without filtering by run state. A RunStateRunning row with non-zero pending counts satisfies the predicate; decideRetryPosts would then return DecisionRetryPosts targeting an actively-running row, risking duplicate post actions if the live run completes normally afterward. Consider excluding RunStateRunning and RunStateIncomplete from retry-posts eligibility, or add a test asserting a running row with pending counts is not selected.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - retry-posts excludes resumable rows.
| want: Decision{Kind: DecisionRepair, RunID: "run-partial", Outcome: PROutcomeApproved}, | ||
| }, | ||
| { | ||
| name: "resumable exact row wins before complete PR marker", |
There was a problem hiding this comment.
🔵 Low (harness-engineering:harness-enforcement-reviewer): No test combines a resumable exact row, stale-base candidates, and a non-trivial PR marker (e.g. PRStateCompleteReview or PRStatePartial) in the same request. The code returns on the first resumable hit before reaching stale-base or PR branches, so correctness is not in doubt, but there is no regression guard for the three-way precedence invariant. This was an open finding in the PR discussion (comment 3) that the current diff does not fully address.
Reply to this thread when addressed.
There was a problem hiding this comment.
Fixed - added three-way precedence coverage.
monit-reviewer
left a comment
There was a problem hiding this comment.
Automated PR Review
Reviewed commit: ffb4ff8 | Previous: d4abb0b (incremental)
Summary
No issues found.
12 PR discussion threads considered.
Completed in 1m 04s | $0.99 | sonnet | daemon 0.2.121 | Glorfindel
| Field | Value |
|---|---|
| Model | sonnet |
| Mode | Re-review · Cycle 2 · Session resumed |
| Reviewers | hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor |
| Engine | claude · sonnet |
| Reviewed by | pr-review-daemon · monit-pr-reviewer |
| Duration | 1m 04s wall · 1m 27s compute (Reviewers: 25s · Synthesis: 19s) |
| Cost | $0.99 |
| Tokens | 328.4k in / 5.4k out |
| Turns | 6 |
Per-workstream usage
| Workstream | Model | In | Out | Cache read | Cache create | Cost |
|---|---|---|---|---|---|---|
| hybrid-synthesis | sonnet | 45.1k | 1.1k | 18.6k | 26.5k (1h) | $0.12 |
| harness-engineering:harness-architecture-reviewer | sonnet | 54.6k | 152 | 4.0k | 50.6k (1h) | $0.19 |
| harness-engineering:harness-enforcement-reviewer | sonnet | 51.6k | 145 | 4.0k | 47.5k (1h) | $0.18 |
| harness-engineering:harness-knowledge-reviewer | sonnet | 53.7k | 136 | 4.0k | 49.7k (1h) | $0.19 |
| harness-engineering:harness-self-documenting-code-reviewer | sonnet | 50.1k | 164 | 4.0k | 46.0k (1h) | $0.18 |
| security:security-code-auditor | haiku | 45.6k | 2.7k | 0 | 45.6k (1h) | $0.07 |
| discussion-summarizer | — | 27.7k | 945 | 18.6k | 9.1k (1h) | $0.06 |
Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.
Summary
Verification
Closes #21