feat: impl+verify loop improvements (P1-P7), bench, profiles, state machine by php-workx · Pull Request #6 · php-workx/verk

php-workx · 2026-05-17T06:38:08Z

Summary

50 commits implementing all 9 plans from docs/plans/:

P1 Intent echo before dispatch — runIntentGate in internal/engine/intent.go; called from ticket_run.go before implement phase; policy flag off by default (policy.intent.enabled)
P2 Standards injection — DetectLanguagesFromPaths + WorkerRequest.Standards
P3 Anti-rationalization prompt — Hard Edit Guard section in worker prompts
P4 Finding-level resolution evidence — ResolutionEvidence / DiffRange / TestReference in closeout schema v2; MigrateReviewFindingsToV2
P5 Wave-level reviewer — internal/engine/wave_review.go; WaveReviewArtifact; shadow/enforce mode; wired into epic outer loop
P6 Epic plan-time reviewer — internal/engine/epic_review.go; EpicPlanReviewArtifact; runs before first wave; shadow/enforce mode
P7 Compiled-constraint promotion — internal/engine/constraints/ package; candidate accumulation; CLI (verk constraints list/show/disable/history); closeout + verify hooks; 2 seed constraints
Ticket quality gate — 14 quality codes, EvaluateTicketQuality, RunTicketQualityGate, policy config
Agent profiles — DetectProfile (4 profiles), rationalization injection, go:embed profile MDs
Ticket state machine — TicketOutcome / TicketPhase separation, stopKind enum, classifyTicketStop
Per-worker review diffs — DiffAgainstFiles, reviewBaseline / collectReviewDelta
Memory learning loop — EscapedDefect / PromotionEntry JSONL store, verk learn CLI
Benchmark subsystem — matrix, provider registry, runner, report, compare, governance; verk bench CLI
llmcli runtime bridge — routes claude + codex adapters through llmcli

Test plan

go build ./... — clean
go test ./... — 1157 tests pass across 22 packages
just lint — clean

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- New CLI: inspect (with safe repairs), learn (record/list/show/promote lessons), bench (list/run/report/compare), and constraints management.
- Deterministic ticket quality gate, epic plan-time reviewer, and intent “echo” gate that can block runs pre-dispatch.
- Agent profiles, intent workflow, planner-review prompt, and profile shown in status output.
- Benchmarking: providers, matrix/runner, reporting, comparison, registry, and verknative/aider providers.
Documentation
- Added Memory Learning Loop, Ticket Quality Gate, runtime adapter plans, benchmarking guides, and operation/design notes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wire captureReviewBaseline and collectReviewDelta into RunTicket so the reviewer sees only what the current worker attempt changed, not the entire worktree diff since base commit. Pre-existing dirty files are excluded; files the worker creates or modifies are included. Add reviewBaseline field and currentReviewBaseline() helper to ticketRunState for resume safety. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add TestRunTicket_RepairReviewDiffOnlyIncludesRepairDelta to prove that the second review request only includes files changed by the repair worker, not files from the initial implementation phase. The test passed immediately, confirming that Task 4 already wired per-attempt baseline capture correctly for both Implement and Repair phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduces DefaultReopenTargetForSnapshot that routes retry decisions through the snapshot's Outcome field instead of relying solely on the execution phase. failed_retryable → implement; needs_decision and blocked → no automatic retry; empty outcome falls back to the existing phase-based logic for backward compatibility. Adds Outcome field to BlockedTicket so callers can propagate the outcome to CLI rendering. Updates collectBlockedTickets to load the outcome from the snapshot and call DefaultReopenTargetForSnapshot, gating RetryPhase on having a trusted snapshot rather than ticket-store status alone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add stopKind enum and classifyTicketStop helper to map WHY a ticket stopped to a precise TicketOutcome. Add currentOutcome field to ticketRunState and update snapshot() to prefer the explicit outcome over the phase-derived fallback. Annotate all TicketPhaseBlocked transition sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Partition blocked-run guidance into three distinct sections based on ticket Outcome: "Retryable tickets" (failed_retryable), "Tickets needing decision" (needs_decision with implement/repair/block commands), and "Blocked tickets" (hard blockers with no retry path). Updates retryableBlockedTickets to exclude needs_decision tickets from the interactive prompt, and updates tests to match the new section labels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduce ticketDecision type and decisionPrompter interface so tests can inject stub answers. Wire promptNeedsDecisionTickets into handleBlockedEpicRun so needs_decision tickets get an interactive menu in TTY sessions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add user-facing reference doc explaining ready/in_progress/failed_retryable/needs_decision/blocked outcomes, interactive and non-interactive verk run behavior, and backward-compatibility rules for legacy blocked artifacts. Mark the plan as implemented in the plan file and INDEX.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RunEpic now evaluates ticket quality before dispatching any workers. If the deterministic lint finds blocking findings (P2 or above), the run returns ErrEpicBlocked and persists the artifact to .verk/runs/<run-id>/ticket-quality.json so the CLI can surface it. Existing tests updated: epicChildTicket gains a stub ValidationCommands so it satisfies the gate; hand-crafted child tickets in epic_run_test.go and e2e helpers_test.go similarly updated (removing ambiguous "done" AC in favour of the already-present ValidationCommands). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Run the deterministic ticket-quality evaluator in RunTicket before the first worker is dispatched. Blocking findings transition the ticket to Blocked and persist .verk/runs/<run-id>/ticket-quality.json without invoking any worker. RunEpic suppresses the per-ticket gate via SkipTicketQualityGate so it does not overwrite the epic-scoped artifact. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds TicketSummary, RawTicketQualityFinding, PlannerReviewRequest, and PlannerReviewResult types plus PlannerSystemPrompt constant and BuildPlannerReviewPrompt function so the engine can build deterministic planner-role review requests without adapter wiring. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add e2e coverage for the quality gate (CLI scenario block, multi-surface integration-gap block, and inspect --fix safe-repair path), publish the user-facing reference doc, and mark the plan as Implemented in the index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Append-only JSONL storage for escaped defects and promoted rules under .verk/memory/, plus `verk learn escaped|list|show|promote` subcommands as the foundation for the memory learning loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add user-facing reference for the memory learning loop feature and mark the plan as implemented in the plans index. Also fix a pre-existing gofumpt alignment issue in epos/types.go that was blocking the commit hook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds Profile as a first-class Ticket field (stored via Extra["profile"] in epos, mirroring the model pattern), profile constants and ValidateProfile, and a pure DetectProfile function that routes tickets to security > contract > frontend > backend based on tags, criteria, body keywords, and owned-path extensions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds AdvisoryFindingsFromPromotedRules to the engine (severity P3, never blocking) and wires promoted rules as a "Lessons From Prior Escaped Defects" section into BuildPlannerReviewPrompt. Callers opt in; EvaluateTicketQuality is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add ResolveTicketProfile helper that detects and writes back the agent profile when empty, or validates it when explicit. Wire into the epic dispatch path and single-ticket CLI path before BuildPlanArtifact. Add AgentProfile field to PlanArtifact so the resolved value is captured in the plan artifact for audit purposes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements Tasks 1-3 of the benchmark-adoption plan: foundational types with JSON/YAML round-trip serialization, matrix parser with validation and pairing expansion, and a provider registry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ity CLI, bench verknative)

Add IntentConfig to policy (Enabled=false default, MaxAttempts=2), implement runIntentGate with the three validation rules (missing criteria, superset paths, empty test plan) and MaxAttempts retry loop, wire the gate into RunTicket at the Intake phase before dispatching the implementation worker, and add SR-1 post-implementation alignment check that logs out-of-scope changed files as an advisory warning in implementation.out_of_scope_files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

After all tickets in a wave complete, run a single fresh-context reviewer over the union diff to catch cross-ticket contradictions, integration drift, incomplete fanout, and orphaned references. Shadow mode persists the artifact without blocking; enforce mode blocks the wave on findings at or above the configured threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Before the first wave dispatches, run a single fresh-context reviewer over the root ticket and all child tickets to catch plan-level gaps (missing tickets, cross-service boundary omissions, ownership conflicts). Shadow mode persists the artifact without blocking; enforce mode blocks the epic with reason epic_plan_review_gap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds internal/engine/constraints/ package with signature derivation, candidate store (JSONL append), constraint execution at verify time, and promotion helpers. Hooks into ticket closeout (candidate recording) and verification (active constraint execution). Adds `verk constraints` CLI subcommands (list, show, disable, history). Adds ConstraintsConfig to policy with Enabled=false default and seeds two inactive grep constraints in .verk/constraints/index.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

qodo-code-review · 2026-05-17T06:38:12Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-17T06:38:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

</review_stack_artifact>

coderabbitai

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (18)

internal/adapters/repo/git/repo.go-431-438 (1)

431-438: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid full-file reads before applying synthetic diff limits.

syntheticUntrackedDiff still reads entire files into memory (Line 431) and only truncates later. Very large untracked files can cause memory spikes/OOM even though output is capped.

Suggested fix

 import (
 	"bytes"
 	"context"
 	"errors"
 	"fmt"
+	"io"
 	"os"
 	"os/exec"
 	"path"
 	"path/filepath"
 	"sort"
@@
-			read, readErr := os.ReadFile(fullPath)
+			f, openErr := os.Open(fullPath)
+			if openErr != nil {
+				if errors.Is(openErr, os.ErrNotExist) {
+					continue
+				}
+				return "", fmt.Errorf("open untracked file %q: %w", p, openErr)
+			}
+			read, readErr := io.ReadAll(io.LimitReader(f, syntheticDiffSizeLimit+1))
+			_ = f.Close()
 			if readErr != nil {
 				if errors.Is(readErr, os.ErrNotExist) {
 					continue
 				}
 				return "", fmt.Errorf("read untracked file %q: %w", p, readErr)
 			}
 			content = read
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/adapters/repo/git/repo.go` around lines 431 - 438, The code in
syntheticUntrackedDiff currently does a full-file os.ReadFile(fullPath)
(variables: fullPath, read, readErr, content) which can OOM for huge untracked
files; change it to open the file (os.Open), then read only up to the configured
synthetic diff limit using io.LimitReader (e.g. io.ReadAll(io.LimitReader(f,
limit+1))) so you can detect truncation without loading the entire file,
preserve the existing os.ErrNotExist handling, return the same error on
non-not-exist read errors, and set content to the truncated bytes (and
mark/truncate the output if you read more than limit).

internal/adapters/runtime/llmclibridge/bridge.go-323-327 (1)

323-327: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make reasoning-effort support conditional in diagnostics.

DiagnoseRuntime currently treats codex as unavailable when OptionReasoningEffort is unsupported, even though Run only requires that option when reasoning is explicitly requested. This can produce false-negative readiness results.

Suggested fix

 func requiredRuntimeOptions(runtimeName string) []llmclient.OptionName {
 	opts := []llmclient.OptionName{
 		llmclient.OptionWorkingDirectory,
 		llmclient.OptionEnvironment,
 		llmclient.OptionTimeout,
 		llmclient.OptionRawCapture,
 	}
 	if strings.TrimSpace(runtimeName) == RuntimeCodex {
 		opts = append(opts,
 			llmclient.OptionCodexJSONL,
-			llmclient.OptionReasoningEffort,
 		)
 	}
 	return opts
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/adapters/runtime/llmclibridge/bridge.go` around lines 323 - 327,
DiagnoseRuntime currently fails codex readiness if OptionReasoningEffort is
unsupported even though Run only needs that option when reasoning is requested;
update DiagnoseRuntime so it only requires llmclient.OptionReasoningEffort when
the diagnostics are simulating or validating a reasoning-enabled Run (or when an
explicit reasoning flag/parameter is passed), while still requiring
llmclient.OptionCodexJSONL (and other core options) unconditionally for
RuntimeCodex; locate the readiness checks in DiagnoseRuntime and change the
conditional that treats OptionReasoningEffort as mandatory into a conditional
that checks for a “reasoning requested” condition (or an extra boolean
parameter) before marking the runtime unavailable for missing
OptionReasoningEffort so Run and diagnostics align.

internal/bench/governance.go-40-50 (1)

40-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject impossible public sampling metadata (sampled > total_corpus).

Line 40-45 only enforces non-zero values. A public suite with sampling.sampled greater than sampling.total_corpus currently passes validation.

🔧 Suggested fix

 	if s.Sampling.TotalCorpus == 0 {
 		return fmt.Errorf("bench: public suite %q must specify sampling.total_corpus", s.Name)
 	}
 	if s.Sampling.Sampled == 0 {
 		return fmt.Errorf("bench: public suite %q must specify sampling.sampled", s.Name)
 	}
+	if s.Sampling.Sampled > s.Sampling.TotalCorpus {
+		return fmt.Errorf(
+			"bench: public suite %q has sampling.sampled=%d greater than sampling.total_corpus=%d",
+			s.Name, s.Sampling.Sampled, s.Sampling.TotalCorpus,
+		)
+	}
 	if s.TaskCount < minPublicTaskCount {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/governance.go` around lines 40 - 50, The validation currently
only checks non-zero sampling values but misses the impossible case where
s.Sampling.Sampled > s.Sampling.TotalCorpus; update the validation (in the same
function that checks s.Sampling.TotalCorpus and s.Sampling.Sampled) to add a
check that if s.Sampling.Sampled > s.Sampling.TotalCorpus return a fmt.Errorf
describing the problem (e.g., "bench: public suite %q has sampled > total_corpus
(%d > %d)"). Ensure you reference s.Name, s.Sampling.Sampled and
s.Sampling.TotalCorpus in the error message for clarity.

internal/bench/compare.go-210-251 (1)

210-251: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate markdown write/flush errors instead of returning success.

Line 211 onward ignores fmt.Fprintf/fmt.Fprintln and tabwriter.Flush errors, so RenderComparisonMarkdown can return nil on failed writes (e.g., broken pipe).

🔧 Suggested fix

 func RenderComparisonMarkdown(w io.Writer, cmp Comparison) error {
-	fmt.Fprintf(w, "# Benchmark Comparison\n\n")
-	fmt.Fprintf(w, "**Baseline:** %s (%s)  \n", cmp.Baseline.RunID, cmp.Baseline.SuiteName)
-	fmt.Fprintf(w, "**Candidate:** %s (%s)  \n\n", cmp.Candidate.RunID, cmp.Candidate.SuiteName)
+	writef := func(format string, args ...any) error {
+		if _, err := fmt.Fprintf(w, format, args...); err != nil {
+			return fmt.Errorf("bench: render markdown comparison: %w", err)
+		}
+		return nil
+	}
+	if err := writef("# Benchmark Comparison\n\n"); err != nil {
+		return err
+	}
+	if err := writef("**Baseline:** %s (%s)  \n", cmp.Baseline.RunID, cmp.Baseline.SuiteName); err != nil {
+		return err
+	}
+	if err := writef("**Candidate:** %s (%s)  \n\n", cmp.Candidate.RunID, cmp.Candidate.SuiteName); err != nil {
+		return err
+	}
@@
-	tw.Flush()
-	fmt.Fprintln(w)
+	if err := tw.Flush(); err != nil {
+		return fmt.Errorf("bench: render markdown comparison: %w", err)
+	}
+	if err := writef("\n"); err != nil {
+		return err
+	}
@@
-	tw.Flush()
-	fmt.Fprintln(w)
+	if err := tw.Flush(); err != nil {
+		return fmt.Errorf("bench: render markdown comparison: %w", err)
+	}
+	if err := writef("\n"); err != nil {
+		return err
+	}
 
 	return nil
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/compare.go` around lines 210 - 251, RenderComparisonMarkdown
currently ignores errors from fmt.Fprintf/Fprintln and tabwriter.Flush, which
can hide write failures; update the function (RenderComparisonMarkdown) to check
the error returned by each fmt.Fprintf/Fprintln and by tw.Flush (for each
tabwriter instance) and return the first non-nil error encountered instead of
continuing, ensuring every write/flush call (including the initial headers, the
summary block, and the per-task loop) captures its error and propagates it to
the caller.

internal/bench/matrix.go-75-79 (1)

75-79: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

fixed-reviewer pairing currently degenerates to “first profile is reviewer” in full-verk.

ValidateMatrix makes reviewer fields effectively present for every full-verk profile, so expandFixedReviewer’s first Reviewer.Model != "" match is always the first profile. This can silently mis-pair workers/reviewers and skew benchmark outcomes.

💡 Fix direction

 func ValidateMatrix(m Matrix) error {
+    if m.ComparisonDesign == designFixedReviewer {
+        // Require a single, unambiguous reviewer target across profiles,
+        // or add an explicit `reviewer_profile_id` field in Matrix.
+    }
     ...
 }

 func expandFixedReviewer(m Matrix) [][2]MatrixProfile {
-    reviewerIdx := len(m.Profiles) - 1
-    for i, p := range m.Profiles {
-        if p.Reviewer.Model != "" {
-            reviewerIdx = i
-            break
-        }
-    }
+    // Resolve reviewer deterministically from explicit config
+    // (preferred) or from a validated single reviewer target.
+    reviewerIdx := resolveFixedReviewerIndex(m)
     ...
 }

Also applies to: 122-132

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/matrix.go` around lines 75 - 79, The problem is that
ValidateMatrix populates reviewer fields for full-verk profiles causing
expandFixedReviewer to match the first profile (since Reviewer.Model != ""), so
fixed-reviewer pairing degenerates; fix by changing the pairing logic in
expandFixedReviewer to only consider reviewer candidates that were explicitly
specified (e.g., introduce and check a boolean/marker like ReviewerExplicit or
ReviewerSource on Profile) or by moving the expandFixedReviewer call so it runs
before ValidateMatrix fills in defaults; update expandFixedReviewer to reference
that explicit marker (or change call order around ValidateMatrix and
expandFixedReviewer) to ensure only originally-specified reviewers are matched.

internal/bench/providers/verknative/provider.go-118-123 (1)

118-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Score allows path traversal via expect_marker.

filepath.Join(workspaceDir, marker) with unvalidated marker can escape the workspace (../... or absolute paths), allowing unintended file reads.

🛡️ Suggested hardening

 	marker, ok := markerRaw.(string)
 	if !ok || marker == "" {
 		return bench.Score{Solved: false}
 	}
-
-	path := filepath.Join(workspaceDir, filepath.FromSlash(marker))
+	clean := filepath.Clean(filepath.FromSlash(marker))
+	if filepath.IsAbs(clean) || clean == ".." || strings.HasPrefix(clean, ".."+string(filepath.Separator)) {
+		return bench.Score{Solved: false}
+	}
+	path := filepath.Join(workspaceDir, clean)
+	rel, err := filepath.Rel(workspaceDir, path)
+	if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) {
+		return bench.Score{Solved: false}
+	}
-	data, err := os.ReadFile(path)
+	data, err := os.ReadFile(path)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/providers/verknative/provider.go` around lines 118 - 123, The
code constructs path := filepath.Join(workspaceDir, filepath.FromSlash(marker))
with an unvalidated marker (expect_marker) which allows path traversal; change
to validate and sanitize marker before reading: reject absolute markers
(filepath.IsAbs), clean the marker (filepath.Clean) and ensure it does not
contain upward segments ("..") or path separators that escape the workspace,
then compute the final path and verify it is inside workspaceDir (use
filepath.Join + filepath.Clean and confirm filepath.Rel(workspaceDir, finalPath)
does not start with ".."); only call os.ReadFile on the validated finalPath
(references: variables marker/expect_marker, workspaceDir, path, and functions
filepath.FromSlash, filepath.Clean, filepath.IsAbs, filepath.Rel, os.ReadFile).

internal/bench/report.go-141-169 (1)

141-169: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Propagate write/flush errors in RenderMarkdown.

RenderMarkdown returns error, but writes at Line 141+ and all tw.Flush() calls are unchecked. A short write or flush failure currently returns nil and emits partial output.

Proposed direction

+	writef := func(format string, args ...any) error {
+		_, err := fmt.Fprintf(w, format, args...)
+		return err
+	}
+
-	fmt.Fprintf(w, "# Benchmark Report\n\n")
+	if err := writef("# Benchmark Report\n\n"); err != nil {
+		return fmt.Errorf("bench: render markdown header: %w", err)
+	}
@@
-	tw.Flush()
+	if err := tw.Flush(); err != nil {
+		return fmt.Errorf("bench: flush profiles table: %w", err)
+	}

Apply the same pattern across all fmt.Fprintf/Fprintln and tw.Flush sites.

Also applies to: 173-185, 189-194, 198-209, 220-232, 262-293

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/report.go` around lines 141 - 169, RenderMarkdown currently
ignores write and flush errors (many fmt.Fprintf/Fprintln calls and tw.Flush
use), so partial output is silently returned nil; update RenderMarkdown to check
and propagate errors from every fmt.Fprintf/Fprintln and from tabwriter.Flush
calls. For each write call in RenderMarkdown (e.g., the header writes, label
loop over r.Labels, profile table loop over r.Profiles) capture the returned
(int, error) and if err != nil return it, and similarly check the error returned
by each tw.Flush() and return it when non-nil so the function surface returns
the first write/flush error encountered.

internal/bench/report.go-103-111 (1)

103-111: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cost totals drop Score.CostUSD when Usage is empty.

Both aggregateTotals and RenderCSV only read TaskResult.Usage. If executors provide cost only via TaskResult.Score (as DefaultExecutor does), report/csv cost becomes zero.

Proposed fix

 		for _, u := range r.Usage {
 			switch u.Confidence {
 			case "exact":
 				t.ExactCostUSD += u.CostUSD
 			default:
 				t.EstimatedCostUSD += u.CostUSD
 			}
 		}
+		if len(r.Usage) == 0 && r.Score.CostUSD > 0 {
+			switch defaultUsageConfidence(UsageRecord{
+				CostUSD:    r.Score.CostUSD,
+				Confidence: r.Score.Confidence,
+			}) {
+			case "exact":
+				t.ExactCostUSD += r.Score.CostUSD
+			default:
+				t.EstimatedCostUSD += r.Score.CostUSD
+			}
+		}

And in RenderCSV, fallback to tr.Score.CostUSD/tr.Score.Confidence when len(tr.Usage)==0.

Also applies to: 325-333

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/report.go` around lines 103 - 111, aggregateTotals and
RenderCSV currently only sum costs from TaskResult.Usage, which drops cost when
Usage is empty; update both functions (aggregateTotals and RenderCSV) to fall
back to TaskResult.Score.CostUSD and TaskResult.Score.Confidence when
len(tr.Usage)==0 (i.e., treat a single pseudo-usage entry sourced from tr.Score
when Usage is empty) so totals and CSV output include cost provided via
TaskResult.Score (used by DefaultExecutor); locate the loops over r.Usage and
tr.Usage and add the conditional fallback to use tr.Score.CostUSD and
tr.Score.Confidence when Usage is empty.

internal/bench/registry.go-22-27 (1)

22-27: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against nil/empty providers in Register.

At Line 23, p.Name() can panic if p is nil, and empty names are currently accepted. This should return an error instead of crashing or registering an invalid key.

Proposed fix

 import (
 	"fmt"
 	"sort"
+	"strings"
 )
@@
 func (r *Registry) Register(p Provider) error {
+	if p == nil {
+		return fmt.Errorf("bench: provider is nil")
+	}
-	name := p.Name()
+	name := strings.TrimSpace(p.Name())
+	if name == "" {
+		return fmt.Errorf("bench: provider name is empty")
+	}
 	if _, exists := r.providers[name]; exists {
 		return fmt.Errorf("bench: provider %q is already registered", name)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/registry.go` around lines 22 - 27, Registry.Register currently
calls p.Name() without validating p and allows empty names, which can panic or
register invalid keys; update Register to first check that the Provider
parameter (p) is not nil and return a descriptive error if it is, then obtain
name := p.Name() and validate that name != "" (return an error for empty names),
and only then check r.providers[name] and assign r.providers[name] = p;
reference the Registry.Register method, the Provider parameter p, the p.Name()
call, and the r.providers map when making the changes.

internal/bench/runner.go-144-147 (1)

144-147: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate RunOptions.Provider before dereferencing.

At Line 144, opts.Provider.LoadTasks(...) will panic when Provider is nil. Return a validation error instead.

Proposed fix

 func Run(ctx context.Context, opts RunOptions) (RunSummary, error) {
 	now := opts.clock()
+	if opts.Provider == nil {
+		return RunSummary{}, fmt.Errorf("bench: provider is required")
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/runner.go` around lines 144 - 147, Check for nil
RunOptions.Provider before calling opts.Provider.LoadTasks to avoid a panic; if
opts.Provider is nil return an appropriate error (e.g., fmt.Errorf("bench: nil
provider in RunOptions") or a wrapped validation error) instead of proceeding.
Update the code surrounding the LoadTasks call in runner.go (the block using
opts.Provider.LoadTasks and returning RunSummary{}) to validate opts.Provider,
and ensure the error type/message clearly indicates a missing/invalid provider
when returning. Keep the rest of the LoadTasks error handling unchanged.

internal/bench/report.go-327-333 (1)

327-333: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not infer CSV confidence from only the first usage row.

At Line 329, confidence is taken from the first usage entry. For mixed usage records (e.g., exact + estimated), this can overstate confidence for the aggregated task cost.

Proposed fix

 		var totalCost float64
-		var confidence string
+		confidence := "unavailable"
 		for _, u := range tr.Usage {
 			totalCost += u.CostUSD
-			if confidence == "" {
-				confidence = u.Confidence
-			}
+			c := defaultUsageConfidence(u)
+			switch {
+			case c == "exact" && confidence == "unavailable":
+				confidence = "exact"
+			case c != "exact":
+				confidence = "estimated"
+			}
 		}
-		confidence = defaultUsageConfidence(UsageRecord{CostUSD: totalCost, Confidence: confidence})
+		if len(tr.Usage) == 0 {
+			confidence = defaultUsageConfidence(UsageRecord{CostUSD: totalCost, Confidence: confidence})
+		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/report.go` around lines 327 - 333, The code currently sets
confidence using only the first entry from tr.Usage; instead, scan all usage
rows (the tr.Usage slice) and compute a combined confidence before calling
defaultUsageConfidence. For example, iterate over each UsageRecord in tr.Usage,
accumulate totalCost as you already do, and derive combinedConfidence by
checking every u.Confidence (e.g., if any record is non-"Exact" mark the
aggregate as "Estimated"/the most conservative value or otherwise keep "Exact");
then call defaultUsageConfidence(UsageRecord{CostUSD: totalCost, Confidence:
combinedConfidence}) so the aggregated task confidence reflects all usage rows
rather than just the first.

internal/cli/bench.go-199-201 (1)

199-201: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle repo-root resolution failures before reading run artifacts.

Line 200 and Line 218 discard resolveRepoRoot errors, so report/compare can read from unintended relative paths when not in a repo.

Suggested fix

 func runBenchReport(cmd *cobra.Command, runID, format string) error {
-	repoRoot, _ := resolveRepoRoot()
+	repoRoot, err := resolveRepoRoot()
+	if err != nil {
+		return cmdError(cmd, fmt.Errorf("resolve repo root: %w", err), 1)
+	}
 	outDir := filepath.Join(repoRoot, ".verk", "bench", "runs", runID)
-	rr, err := bench.LoadRunResult(outDir)
+	rr, err := bench.LoadRunResult(outDir)
 	if err != nil {
 		return cmdError(cmd, err, 1)
 	}
@@
 func runBenchCompare(cmd *cobra.Command, baseline, candidate, format string) error {
-	repoRoot, _ := resolveRepoRoot()
+	repoRoot, err := resolveRepoRoot()
+	if err != nil {
+		return cmdError(cmd, fmt.Errorf("resolve repo root: %w", err), 1)
+	}
-	base, err := bench.LoadRunResult(filepath.Join(repoRoot, ".verk", "bench", "runs", baseline))
+	base, err := bench.LoadRunResult(filepath.Join(repoRoot, ".verk", "bench", "runs", baseline))

Also applies to: 217-219

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/bench.go` around lines 199 - 201, The code in runBenchReport
currently ignores errors from resolveRepoRoot() and proceeds to build outDir and
read artifacts, which can point to unintended relative paths; update
runBenchReport to check the error returned by resolveRepoRoot() (the call that
assigns repoRoot, err := resolveRepoRoot()) and return or log the error
immediately instead of discarding it, ensuring outDir is only constructed when
repoRoot is valid; apply the same fix to the other occurrence where
resolveRepoRoot() is called (the block around lines 217-219) so both places
validate and handle the error before using repoRoot.

internal/cli/bench_test.go-64-69 (1)

64-69: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update stale expectation in bench run test.

This test now contradicts current behavior and is failing CI (Line 67 expects non-zero, command returns 0).

Suggested fix

-func TestBenchRun_NotYetImplemented(t *testing.T) {
+func TestBenchRun_Completes(t *testing.T) {
 	_, _, code := runBenchCmd(t, "run", "smoke")
-	if code == 0 {
-		t.Fatal("bench run: expected non-zero exit code, got 0")
+	if code != 0 {
+		t.Fatalf("bench run: expected exit 0, got %d", code)
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/bench_test.go` around lines 64 - 69, The test
TestBenchRun_NotYetImplemented has a stale assertion: it calls runBenchCmd(t,
"run", "smoke") and asserts the exit code is non-zero; update the expectation to
match current behavior by asserting code == 0 (or using t.Fatalf on non-zero) so
the test expects a zero exit code from runBenchCmd; modify the assertion in
TestBenchRun_NotYetImplemented to validate success instead of failure.

internal/cli/run_blocked.go-108-110 (1)

108-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Decision prompt ignores run cancellation while waiting for input.

Line 144 hardcodes context.Background(), so a canceled run context does not interrupt ChooseTicketDecision; the prompt can hang until another keypress arrives. Thread the caller context through the prompter API and pass it to readPromptByte.

Suggested fix

 type decisionPrompter interface {
-	ChooseTicketDecision(ticket engine.BlockedTicket) (ticketDecision, error)
+	ChooseTicketDecision(ctx context.Context, ticket engine.BlockedTicket) (ticketDecision, error)
 }

-func (p terminalDecisionPrompter) ChooseTicketDecision(ticket engine.BlockedTicket) (ticketDecision, error) {
+func (p terminalDecisionPrompter) ChooseTicketDecision(ctx context.Context, ticket engine.BlockedTicket) (ticketDecision, error) {
+	if ctx == nil {
+		ctx = context.Background()
+	}
 	reason := strings.TrimSpace(ticket.Reason)
 	if reason == "" {
 		reason = noReasonRecorded
 	}
@@
-	ctx := context.Background()
 	for {
 		_, _ = fmt.Fprint(p.w, "  Choice: ")
 		b, ok, cancelled := readPromptByte(ctx, p.r)
@@
-		decision, err := prompter.ChooseTicketDecision(t)
+		decision, err := prompter.ChooseTicketDecision(ctx, t)

Also applies to: 132-150, 176-177, 197-197, 584-595

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/run_blocked.go` around lines 108 - 110, The decision prompter
currently ignores cancellation because ChooseTicketDecision is called with
context.Background(); change the decisionPrompter interface signature to
ChooseTicketDecision(ctx context.Context, ticket engine.BlockedTicket)
(ticketDecision, error), update all implementations and call sites to pass the
caller's ctx instead of context.Background(), and thread that same ctx into
readPromptByte so the prompt read is cancellable; ensure all referenced calls
(ChooseTicketDecision invocations and readPromptByte usage) are updated
accordingly.

internal/engine/constraints/promote.go-25-33 (1)

25-33: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Unused promoted map is dead code.

The promoted map is built but never used to filter or check anything. The loop over c.PromotedFrom does nothing (_ = pf). Either remove this code or implement the intended filtering logic.

-	// Collect promoted signatures so we can detect already-promoted ones.
-	promoted := make(map[Signature]struct{})
-	for _, c := range idx.Constraints {
-		promoted[Signature(c.ID)] = struct{}{}
-		// Also track by finding signature patterns
-		for _, pf := range c.PromotedFrom {
-			_ = pf // just a marker
-		}
-	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/constraints/promote.go` around lines 25 - 33, The code builds
a promoted map (promoted := make(map[Signature]struct{})) and iterates
c.PromotedFrom without using either; remove the dead code or implement the
intended check: either delete the entire promoted collection and the unused
loop, or use promoted inside the constraint processing to skip already-promoted
constraints by checking if Signature(c.ID) exists in promoted or by populating
promoted from c.PromotedFrom and using that set where constraints are evaluated;
refer to the promoted variable, Signature(c.ID), idx.Constraints, and
c.PromotedFrom to locate and update the logic accordingly.

internal/engine/review_delta.go-105-110 (1)

105-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid false attribution when a file was already deleted at baseline.

At Line 107, os.IsNotExist always marks the file as worker-changed. If baseline already recorded it as non-existent (snap.Exists == false), this is unchanged pre-existing state and should not be attributed to the worker.

💡 Suggested fix

 		current, readErr := os.ReadFile(abs)
 		if readErr != nil {
 			if os.IsNotExist(readErr) {
-				// File was deleted by the worker — include it.
-				workerChanged = append(workerChanged, f)
+				// Include only when the file existed at baseline and is now deleted.
+				if snap.Exists {
+					workerChanged = append(workerChanged, f)
+				}
 				continue
 			}
 			return reviewDelta{}, fmt.Errorf("collectReviewDelta: read %s: %w", f, readErr)
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/review_delta.go` around lines 105 - 110, When
os.ReadFile(abs) returns os.IsNotExist, don't automatically attribute deletion
to the worker; check the baseline snapshot's exists flag (snap.Exists) first and
only append f to workerChanged when the baseline recorded the file as existing.
Update the block around the ReadFile error handling in review_delta.go (the
branch using os.IsNotExist(readErr)) to conditionally append to workerChanged
only if snap.Exists is true, then continue.

internal/engine/intent.go-155-160 (1)

155-160: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use owned-path containment, not exact equality, for TargetFiles checks.

Line 157 currently requires exact path matches, so a valid target like internal/app/foo.go is rejected when ownership is internal/app. That can incorrectly fail intent convergence.

💡 Suggested fix

-	// Rule 2: TargetFiles must be a subset of OwnedPaths (empty TargetFiles is OK).
-	for _, f := range result.TargetFiles {
-		if _, ok := ownedSet[f]; !ok {
-			return "superset_paths"
-		}
-	}
+	// Rule 2: TargetFiles must be within OwnedPaths (file == owned path OR nested under owned dir).
+	for _, f := range result.TargetFiles {
+		target := filepath.Clean(strings.TrimSpace(f))
+		allowed := false
+		for owned := range ownedSet {
+			scope := filepath.Clean(strings.TrimSpace(owned))
+			if scope == "" {
+				continue
+			}
+			if target == scope || strings.HasPrefix(target, scope+string(filepath.Separator)) {
+				allowed = true
+				break
+			}
+		}
+		if !allowed {
+			return "superset_paths"
+		}
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/intent.go` around lines 155 - 160, The current loop in
intent.go that checks result.TargetFiles against ownedSet uses exact equality
and should instead check containment (owned directory covers files beneath it).
Replace the exact map lookup for each f in result.TargetFiles with a containment
test: iterate owned paths from ownedSet and treat f as owned if f == owned or f
has the owned path as a directory prefix (e.g. f starts with owned + "/"); use
filepath.Clean on both sides and strings.HasPrefix for the prefix check to
handle path separators correctly so the function that returns "superset_paths"
only does so when a target file is truly outside all owned paths.

internal/engine/ticket_quality_gate.go-41-42 (1)

41-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add runID validation to prevent directory traversal in artifact paths.

The runID parameter is used directly in filepath.Join at line 41 without validation. While current callers validate via ValidateArtifactIdentifier, the function should be defensive. Use the existing validation pattern:

Suggested fix

func persistTicketQualityArtifact(repoRoot, runID string, artifact state.TicketQualityArtifact) error {
+	if err := ValidateArtifactIdentifier(runID, "run id"); err != nil {
+		return err
+	}
 	dir := filepath.Join(repoRoot, ".verk", "runs", runID)
 	path := filepath.Join(dir, "ticket-quality.json")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/ticket_quality_gate.go` around lines 41 - 42, The code uses
runID directly in filepath.Join to build dir/path; add defensive validation by
calling ValidateArtifactIdentifier(runID) before computing dir and path, and
return an error (or propagate) if validation fails so malicious runID cannot
cause directory traversal; update the block around the dir :=
filepath.Join(repoRoot, ".verk", "runs", runID) / path := filepath.Join(dir,
"ticket-quality.json") to validate runID first and only proceed to join when
ValidateArtifactIdentifier succeeds.

🟡 Minor comments (12)

docs/ticket-state-machine.md-67-82 (1)

67-82: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences currently violate MD040. Please annotate them with an explicit language (e.g., text for CLI output examples).

Also applies to: 96-99, 111-119, 130-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ticket-state-machine.md` around lines 67 - 82, Update the fenced code
blocks in docs/ticket-state-machine.md to include explicit language identifiers
(e.g., use ```text instead of ``` ) for all CLI/output examples; specifically
change the fenced blocks at the shown snippet and the other occurrences
referenced (lines around 96-99, 111-119, 130-133) so they comply with MD040 by
adding the appropriate language token such as "text" or "bash".

docs/plans/2026-05-16-llmcli-runtime-adapter-integration.md-3-10 (1)

3-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove blank lines inside the opening blockquote block.

Lines 6 and 8 create MD028 violations (no-blanks-blockquote). Keep the status/related-tickets/for-claude lines in one continuous blockquote.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/2026-05-16-llmcli-runtime-adapter-integration.md` around lines 3 -
10, The blockquote contains blank lines causing MD028; remove the empty lines so
the three blockquote lines starting with "**Status: Implemented / Reference.**",
"**Related tickets:** `fi-8ken` and children." and "**For Claude:** REQUIRED
SUB-SKILL: Use superpowers:executing-plans..." are contiguous within the same >
block (i.e., delete the blank lines between those > lines to form one continuous
blockquote).

docs/plans/2026-04-22-ticket-state-machine.md-3-5 (1)

3-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix blockquote blank-line formatting to satisfy markdownlint.

Line 4 introduces a blank line between blockquote lines, which triggers MD028. Keep the blockquote contiguous (or make the blank line a quoted > line).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/2026-04-22-ticket-state-machine.md` around lines 3 - 5, The
blockquote in docs/plans/2026-04-22-ticket-state-machine.md is split by a bare
blank line between the two quoted lines ("**Status: Implemented** — All six
tasks landed..." and "**For Claude:** REQUIRED SUB-SKILL: Use
superpowers:executing-plans..."), which triggers markdownlint MD028; fix it by
either removing the blank line so the two lines remain contiguous in the same
blockquote, or make the blank line a quoted line by adding a leading ">" to
preserve spacing; update the lines containing those exact strings accordingly.

docs/plans/INDEX.md-53-67 (1)

53-67: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix ordered-list numbering in “Current Priority Order.”

The sequence currently jumps from 1. to 3. (Line 57), which reads like a missing priority item. Renumber this section consistently.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/INDEX.md` around lines 53 - 67, The ordered list under "Current
Priority Order" has inconsistent numbering (it jumps from "1." to "3."); update
the numeric prefixes so they are sequential (change the item starting "Implement
the Claude Code foundation..." from "3." to "2." and adjust any following items
if present) so the list reads 1., 2., 3., 4., 5. and preserves the existing item
text (e.g., lines beginning "Finish the remaining active tool-reliability...",
"Implement the Claude Code foundation...", etc.).

internal/adapters/runtime/prompt.go-410-415 (1)

410-415: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden intent-block validation against whitespace-only list items.

finalizeIntentBlock currently accepts payloads like {"covered_criteria":[" "]} as valid. That allows an effectively empty intent echo to pass.

Suggested fix

 func finalizeIntentBlock(b VerkIntentBlock) (VerkIntentBlock, bool) {
-	if len(b.CoveredCriteria) == 0 && len(b.TargetFiles) == 0 && strings.TrimSpace(b.TestPlan) == "" {
+	b.CoveredCriteria = trimNonEmpty(b.CoveredCriteria)
+	b.TargetFiles = trimNonEmpty(b.TargetFiles)
+	b.TestPlan = strings.TrimSpace(b.TestPlan)
+	if len(b.CoveredCriteria) == 0 && len(b.TargetFiles) == 0 && b.TestPlan == "" {
 		return VerkIntentBlock{}, false
 	}
-	b.TestPlan = strings.TrimSpace(b.TestPlan)
 	return b, true
 }
+
+func trimNonEmpty(items []string) []string {
+	out := make([]string, 0, len(items))
+	for _, item := range items {
+		if s := strings.TrimSpace(item); s != "" {
+			out = append(out, s)
+		}
+	}
+	return out
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/adapters/runtime/prompt.go` around lines 410 - 415,
finalizeIntentBlock currently treats lists containing whitespace-only strings as
non-empty; update finalizeIntentBlock to trim each entry of b.CoveredCriteria
and b.TargetFiles, filter out entries that become empty after trimming, and
replace the slices with the filtered results before validation; then apply the
existing TestPlan trim and return the block only if the resulting
CoveredCriteria or TargetFiles slices are non-empty or TestPlan is non-empty.
Ensure you operate on VerkIntentBlock fields CoveredCriteria, TargetFiles, and
TestPlan within the finalizeIntentBlock function.

internal/adapters/runtime/llmclibridge/bridge_test.go-238-240 (1)

238-240: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The cancelled-case stderr check is currently a no-op.

When wantStderr == "", strings.Contains(x, "") always passes, so this branch does not verify stderr behavior.

Suggested fix

-			if !strings.Contains(string(result.Stderr), tt.wantStderr) {
-				t.Fatalf("expected stderr to contain %q, got %q", tt.wantStderr, result.Stderr)
-			}
+			if tt.wantStderr == "" {
+				if len(result.Stderr) != 0 {
+					t.Fatalf("expected empty stderr, got %q", result.Stderr)
+				}
+			} else if !strings.Contains(string(result.Stderr), tt.wantStderr) {
+				t.Fatalf("expected stderr to contain %q, got %q", tt.wantStderr, result.Stderr)
+			}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/adapters/runtime/llmclibridge/bridge_test.go` around lines 238 -
240, The test currently uses strings.Contains(result.Stderr, tt.wantStderr)
which is a no-op when tt.wantStderr == "" (strings.Contains(..., "") is always
true); change the assertion to branch on tt.wantStderr: if tt.wantStderr == ""
assert that result.Stderr is empty (len(result.Stderr) == 0 or == ""), otherwise
assert strings.Contains(string(result.Stderr), tt.wantStderr) and keep the
t.Fatalf messages referencing tt.wantStderr and result.Stderr to preserve
diagnostics. Use the existing tt.wantStderr and result.Stderr symbols to locate
and update the check in the test.

internal/bench/providers/aiderpolyglot/provider.go-15-18 (1)

15-18: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate suite before configuration/not-implemented paths in LoadTasks.

Right now, an invalid suite can be reported as ErrNotConfigured (or generic not-implemented), which hides caller input errors and complicates CLI/debug UX.

🔧 Proposed patch

 import (
 	"errors"
+	"fmt"
 	"verk/internal/bench"
 )
 ...
 func (p *Provider) LoadTasks(suite string) ([]bench.Task, error) {
+	if suite != "smoke" {
+		return nil, fmt.Errorf("aider-polyglot: unknown suite %q", suite)
+	}
 	if p.DatasetPath == "" {
 		return nil, ErrNotConfigured
 	}
 	// TODO: read dataset metadata from p.DatasetPath; not implemented in this spike.
-	return nil, errors.New("aider-polyglot: loading from dataset not yet implemented (spike)")
+	return nil, errors.New("aider-polyglot: loading from dataset not yet implemented (spike)")
 }

Also applies to: 52-58

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/providers/aiderpolyglot/provider.go` around lines 15 - 18, In
LoadTasks, validate the incoming suite parameter immediately (before any
configuration checks or returning ErrNotConfigured/ErrNotImplemented) and return
an explicit validation error for a bad suite; update provider.go's LoadTasks to
perform the suite check first (use or add a clear error symbol such as
bench.ErrInvalidSuite or a new package-level error) so callers get a
deterministic validation error instead of ErrNotConfigured/ErrNotImplemented;
ensure this same early validation is applied to the other affected code paths
around the 52-58 area that currently branch to configuration/not-implemented
handling.

internal/cli/learn.go-207-209 (1)

207-209: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject whitespace-only --rule-id values.

Line 207 accepts " " as present; trim before validating and persist the trimmed value.

Suggested fix

-			if ruleID == "" {
+			ruleID = strings.TrimSpace(ruleID)
+			if ruleID == "" {
 				return cmdError(cmd, fmt.Errorf("--rule-id is required"), 2)
 			}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/learn.go` around lines 207 - 209, Trim whitespace from the
ruleID flag before validating and using it: replace direct checks of ruleID (the
variable validated in the block that currently calls cmdError) with a trimmed
value (e.g., strings.TrimSpace(ruleID)) and validate that trimmed string is
non-empty; persist and pass the trimmed value onward instead of the original.
Ensure the cmdError check uses the trimmedRuleID and that any subsequent
references in this command handler (where ruleID is used) are updated to the
trimmed variable.

internal/cli/learn.go-43-53 (1)

43-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten required-flag validation for --summary and --missed-by.

Whitespace-only --summary and comma-only/blank --missed-by currently pass required checks.

Suggested fix

-			if summary == "" {
+			summary = strings.TrimSpace(summary)
+			if summary == "" {
 				return cmdError(cmd, fmt.Errorf("--summary is required"), 2)
 			}
-			if missedBy == "" {
+			missedBy = strings.TrimSpace(missedBy)
+			if missedBy == "" {
 				return cmdError(cmd, fmt.Errorf("--missed-by is required"), 2)
 			}
 
 			missedByList := splitComma(missedBy)
+			if len(missedByList) == 0 {
+				return cmdError(cmd, fmt.Errorf("--missed-by is required"), 2)
+			}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/learn.go` around lines 43 - 53, The current checks treat
whitespace-only --summary and comma-only/blank entries in --missed-by as valid;
update the validation in the CLI handler to trim and validate inputs: replace
the summary check to use strings.TrimSpace(summary) and return the same cmdError
if it's empty after trimming; for missedBy, split with splitComma but then
iterate with mb = strings.TrimSpace(mb), skip or treat empty mb as an error
(e.g., if len(missedByList)==0 or any trimmed mb == "" return cmdError about
required --missed-by), and validate each non-empty trimmed mb against
memory.ValidMissedBy when calling cmdError for unknown values; keep references
to splitComma, ValidMissedBy, cmdError, summary, and missedBy so the change is
easy to locate.

internal/cli/status_test.go-171-171 (1)

171-171: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the CLI execute error in the test.

Line 171 drops the root.Execute() error, which can hide command failures and make this test pass for the wrong reason.

Suggested fix

-	_ = root.Execute()
+	if err := root.Execute(); err != nil {
+		t.Fatalf("status command failed: %v", err)
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/status_test.go` at line 171, The test currently ignores the
error returned by root.Execute(), which can hide failures; update the test in
internal/cli/status_test.go to capture the return value from root.Execute() and
assert it's nil (or assert the expected error) using the test framework's
assertion (e.g., t.Fatalf/t.Errorf or require.NoError/assert.NoError) so command
failures fail the test; locate the call to root.Execute() and replace the
discarded result with a proper check that fails the test if an unexpected error
is returned.

internal/engine/constraints/signature.go-27-34 (1)

27-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Platform inconsistency: filepath.Dir uses OS-specific separators.

On Windows, filepath.Dir returns paths with backslashes, but glob patterns typically use forward slashes. This could cause inconsistent signature hashes across platforms.

 func fileToGlob(path string) string {
 	dir := filepath.Dir(path)
 	ext := filepath.Ext(path)
 	if ext == "" {
 		return path
 	}
-	return dir + "/*" + ext
+	return filepath.ToSlash(dir) + "/*" + ext
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/constraints/signature.go` around lines 27 - 34, fileToGlob
uses filepath.Dir which yields OS-specific separators and can create backslashes
on Windows, causing inconsistent glob/signature strings; update fileToGlob to
normalize to forward slashes by calling filepath.ToSlash on the inputs/outputs:
when ext == "" return filepath.ToSlash(path), otherwise compute dir :=
filepath.ToSlash(filepath.Dir(path)) and return dir + "/*" + ext (using "/"
explicitly for the glob separator) so the produced glob always uses forward
slashes across platforms.

internal/engine/ticket_quality_gate_test.go-148-155 (1)

148-155: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert RunEpic errors in artifact-persistence subtests.

Both subtests ignore the returned error, which can hide behavior regressions unrelated to artifact writing.

✅ Suggested test tightening

-		_, _ = RunEpic(context.Background(), RunEpicRequest{
+		_, err := RunEpic(context.Background(), RunEpicRequest{
 			RepoRoot:     repoRoot,
 			RunID:        runID,
 			RootTicketID: epic.ID,
 			BaseCommit:   baseCommit,
 			Adapter:      runtimefake.New(nil, nil),
 			Config:       policy.DefaultConfig(),
 		})
+		if !errors.Is(err, ErrEpicBlocked) {
+			t.Fatalf("expected ErrEpicBlocked in blocked case, got: %v", err)
+		}
@@
-		_, _ = RunEpic(context.Background(), RunEpicRequest{
+		_, err := RunEpic(context.Background(), RunEpicRequest{
 			RepoRoot:     repoRoot,
 			RunID:        runID,
 			RootTicketID: epic.ID,
 			BaseCommit:   baseCommit,
 			Adapter:      adapter,
 			Config:       policy.DefaultConfig(),
 		})
+		if err != nil {
+			t.Fatalf("expected passed case to succeed, got: %v", err)
+		}

Also applies to: 223-230

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/ticket_quality_gate_test.go` around lines 148 - 155, The
subtests for artifact-persistence call RunEpic(...) and ignore its returned
error, which can hide regressions; update those calls in
ticket_quality_gate_test.go (the artifact-persistence subtests around the
RunEpicRequest usage and the similar block at the later occurrence) to capture
the error return and assert success (e.g., use require.NoError(t, err) or
assert.NoError(t, err)) and/or fail the test when err != nil so failures in
RunEpic are surfaced; locate the RunEpic call and the RunEpicRequest struct
usage to modify the test assertions accordingly.

🧹 Nitpick comments (6)

internal/bench/providers/aiderpolyglot/provider_test.go (1)

76-79: ⚡ Quick win

Assert SupportedModes contents, not just count.

A slice-length check alone can miss regressions where modes are incorrect but still two items.

✅ Stronger assertion

 	wantModes := []bench.BenchmarkMode{bench.ModeFullVerk, bench.ModeWorkerOnly}
 	if len(caps.SupportedModes) != len(wantModes) {
 		t.Errorf("SupportedModes: got %v, want %v", caps.SupportedModes, wantModes)
 	}
+	for i, want := range wantModes {
+		if caps.SupportedModes[i] != want {
+			t.Errorf("SupportedModes[%d]: got %q, want %q", i, caps.SupportedModes[i], want)
+		}
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/providers/aiderpolyglot/provider_test.go` around lines 76 -
79, The test currently only checks the length of caps.SupportedModes against
wantModes; change it to assert the actual contents match — compare
caps.SupportedModes to wantModes (either by deep-equality with reflect.DeepEqual
or by converting to a set and comparing elements) so the test fails if the
specific modes differ; update the assertion around the variables wantModes and
caps.SupportedModes in provider_test.go to verify element equality (and consider
order-insensitive comparison if order isn't guaranteed).

internal/bench/providers/verknative/provider_test.go (1)

113-151: ⚡ Quick win

Add a scorer test that rejects ../ and absolute marker paths.

Given scoring reads from disk, a dedicated boundary test will lock in safe behavior once path sanitization is added.

🧪 Example test to add

+func TestScore_UnsolvedWhenMarkerEscapesWorkspace(t *testing.T) {
+	dir := t.TempDir()
+	task := smokeTask("../outside.txt", "ok")
+	score := verknative.Score(task, dir)
+	if score.Solved {
+		t.Fatal("Score.Solved should be false for traversal marker path")
+	}
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/providers/verknative/provider_test.go` around lines 113 - 151,
Add a new unit test in internal/bench/providers/verknative/provider_test.go
(e.g. TestScore_RejectsPathTraversalAndAbsoluteMarker) that constructs tasks
with marker paths containing "../" and absolute paths (like "../secret.txt" and
"/etc/passwd") using the existing smokeTask helper and calls
verknative.Score(task, dir); ensure the test writes matching files both inside
and outside dir to prove traversal would succeed if unsanitized, but assert
score.Solved is false for both inputs to lock in safe behavior; reference
smokeTask and verknative.Score when locating where to add the test.

internal/bench/runner_test.go (1)

100-475: 🏗️ Heavy lift

Add resume regression tests for result preservation and run-id consistency.

Please add tests that assert Resume keeps prior completed task outcomes (not rewritten as cancelled) and that persisted run.json/complete.json retain the checkpoint run ID.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/runner_test.go` around lines 100 - 475, Add two tests in
runner_test.go near the other Run tests that exercise Resume: (1) create an
initial run via Run with at least two tasks, ensure one is completed and
checkpoint/checkpoint.json and results.jsonl contain that task, then call Resume
with the same OutDir and verify the previously completed task remains completed
(not changed to "cancelled") in results.jsonl and checkpoint.json; (2) perform a
run, capture the run ID written to run.json and complete.json (or checkpoint run
metadata), then resume and assert the same run ID is preserved in the persisted
run.json/complete.json files after Resume; locate code references like Run,
Resume, results.jsonl, checkpoint.json, run.json, and complete.json to implement
the assertions.

internal/bench/report_test.go (1)

202-254: ⚡ Quick win

Add CSV assertions for mixed-confidence and score-only-cost rows.

Current CSV tests validate first-row behavior, but they don't cover mixed usage confidence or fallback to TaskResult.Score when Usage is empty—both are important report invariants.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/bench/report_test.go` around lines 202 - 254,
TestRenderCSV_HasHeaderAndRows is missing assertions for rows that should show
mixed-confidence and cost derived from TaskResult.Score when Usage is empty;
update the test data from makeTestRunResult/BuildReport (or ensure tasks like
"task-2" and "task-4" exist in the generated report) and add CSV assertions
after parsing records to check that the mixed-usage row has cost_confidence
"mixed" (e.g., for task-2) and that the score-only row uses the TaskResult.Score
value for cost_usd and sets cost_confidence to the appropriate fallback (e.g.,
"exact" or "unavailable" as defined by RenderCSV logic); locate these checks in
TestRenderCSV_HasHeaderAndRows and add comparisons against records[...] indices
consistent with the header (cost_usd at index 7, cost_confidence at index 8).

internal/engine/constraints/promote.go (1)

134-136: 💤 Low value

Prefer strings.HasSuffix for suffix checks.

Using strings.HasSuffix is more idiomatic and readable than manual index slicing.

-		if len(e) > 6 && e[len(e)-6:] == ".jsonl" {
+		if strings.HasSuffix(e, ".jsonl") {
 			sigs = append(sigs, Signature(e[:len(e)-6]))

Note: You'd also want to use strings.TrimSuffix(e, ".jsonl") for extracting the signature.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/constraints/promote.go` around lines 134 - 136, Replace the
manual slice-based suffix check and extraction in promote.go: instead of
checking len(e) > 6 && e[len(e)-6:] == ".jsonl" and slicing e[:len(e)-6], use
strings.HasSuffix(e, ".jsonl") to test the suffix and strings.TrimSuffix(e,
".jsonl") to get the base name, then append Signature(trimmed) to sigs
(referencing variables e, sigs and the Signature type).

internal/engine/constraints/execute.go (1)

144-146: 💤 Low value

Silent error swallowing may hide actionable failures.

When git diff fails for reasons other than "no commits yet", the error is silently discarded. Consider returning the error in ExecuteResult.Err so operators can debug unexpected git failures.

-		// Other errors: return as informational, not fatal
-		return "", false, nil
+		// Other errors: return as informational
+		return "", false, fmt.Errorf("git diff failed: %w", runErr)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/engine/constraints/execute.go` around lines 144 - 146, The code
currently swallows non-"no commits yet" git errors by returning "", false, nil;
instead propagate that error into the execution result so callers can
surface/debug it. Locate the return in internal/engine/constraints/execute.go
(the branch handling git diff failures) and replace the nil error with an
ExecuteResult that carries the original error in its Err field (or otherwise set
ExecuteResult.Err to the caught error) so the function returns the error via
ExecuteResult.Err rather than dropping it.

18 issues fixed across security (path traversal, OOM reads), error handling (swallowed write/flush errors, nil guards, validation), and logic bugs (checkpoint key collision, prefix vs exact matching, false attribution, dead code, cross-platform signature stability). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/bench/compare.go`:
- Around line 129-167: pairResults currently keys bMap/cMap only by TaskID which
causes overwrites when a run has multiple profiles; change the maps in
pairResults to use a composite key (e.g., TaskID + ProfileID or TaskID +
ProfileName) so each distinct profile-result remains separate, update the
composite key creation in pairResults where bMap[candidate[i].TaskID] and
similar are assigned, and propagate that composite identity into the
PairedTaskComparison struct (add a field like CompositeID or ProfileID) so
rendering/diffing and classifyOutcome lookups remain accurate; ensure
classifyOutcome(b, c) and any callers of PairedTaskComparison are adjusted to
expect/use the composite identity.

In `@internal/bench/governance.go`:
- Around line 19-27: The switch in SuiteClaimLabel is missing a case for
SamplingModeRegression so regression suites fall through to "exploratory"; add a
case for SamplingModeRegression and return the intended regression label (e.g.,
the same string used for SamplingModeSmoke like "regression/sanity only — not an
externally defensible ranking" or the correct regression-specific text) to
ensure SamplingModeRegression maps to the proper claim label.

In `@internal/bench/report.go`:
- Around line 272-275: aggregateTotals currently adds Score.CostUSD into the
grand totals when tr.Usage is empty but the per-profile (ps.costUSD aggregation
loop over tr.Usage) and CSV/profile table code paths omit Score.CostUSD, causing
inconsistent cost figures; update the per-profile and CSV/profile aggregation
logic (where ps.costUSD is accumulated from tr.Usage) to fall back to or also
include tr.Score.CostUSD when tr.Usage is empty (mirror the aggregateTotals
behavior), ensuring code paths that compute ps.costUSD and CSV rows reference
tr.Score.CostUSD as the source of cost when tr.Usage has no entries.

In `@internal/engine/constraints/promote.go`:
- Around line 67-75: The timeout_ms parsing silently falls back and allows
non-positive values; update the rawSpec["timeout_ms"] handling so that if the
key exists you validate its type and range and return an error immediately on
invalid input: accept only int or float64 (convert float64->int), reject other
types, and require the resulting timeoutMs > 0; set timeoutMs to the validated
value only after these checks. Ensure the error is propagated from the
surrounding parsing/constructor path so invalid timeout_ms cannot be persisted
(referencing rawSpec and timeoutMs in promote.go).

In `@internal/engine/intent.go`:
- Around line 64-67: The code trims criterion IDs only in the if guard but
stores the raw c.ID into requiredCriteria, causing mismatches for IDs with
surrounding whitespace; update the logic in the loop over plan.Criteria so you
compute a normalized id := strings.TrimSpace(c.ID) (or lower-case if
normalization requires) and then use that normalized id for both the non-empty
check and as the key stored in requiredCriteria; apply the same normalization
where membership checks occur (the other occurrence around lines 139-140) so all
set insertions and lookups use the same normalized ID variable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 827e93a4-d8da-4df8-94d6-f6e89d78e16f

📥 Commits

Reviewing files that changed from the base of the PR and between a2d9934 and 407bba2.

📒 Files selected for processing (17)

internal/adapters/repo/git/repo.go
internal/adapters/runtime/llmclibridge/bridge.go
internal/bench/compare.go
internal/bench/governance.go
internal/bench/providers/verknative/provider.go
internal/bench/registry.go
internal/bench/report.go
internal/bench/runner.go
internal/bench/types.go
internal/cli/learn.go
internal/engine/constraints/promote.go
internal/engine/constraints/signature.go
internal/engine/intent.go
internal/engine/intent_test.go
internal/engine/review_delta.go
internal/engine/test_resolver.go
internal/engine/ticket_quality_gate.go

🚧 Files skipped from review as they are similar to previous changes (9)

internal/bench/registry.go
internal/bench/types.go
internal/engine/ticket_quality_gate.go
internal/engine/constraints/signature.go
internal/cli/learn.go
internal/bench/providers/verknative/provider.go
internal/engine/intent_test.go
internal/adapters/repo/git/repo.go
internal/engine/review_delta.go

…ion label, cost consistency, timeout validation, criterion normalization)

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/bench/governance.go`:
- Line 29: Replace the raw string literal "exploratory" with the existing shared
constant designExploratory in the function that currently returns "exploratory"
(update the return in governance.go to return designExploratory); this removes
the duplicate constant and fixes the goconst lint error while reusing the shared
symbol.

In `@internal/bench/report.go`:
- Around line 132-138: The repeated string literals in the switch (returning
"system result", "capability result", "runtime result", "exploratory") and
elsewhere ("exact", "estimated") should be replaced with package-level constants
to satisfy goconst; add clear constant names (e.g., resultSystem,
resultCapability, resultRuntime, resultExploratory, labelExact, labelEstimated)
and update the switch that checks ModeWorkerOnly and ModeRuntimeProbe (and the
other mode cases) to return those constants, and likewise replace the duplicate
literals around lines 379-385 with the new constants so all occurrences reuse
the same identifiers.
- Around line 348-357: The current loop over tr.Usage sums totalCost but picks
the first non-empty confidence, which can mislabel rows when later entries are
less certain; update the aggregation to compute totalCost as before but compute
confidence by scanning all u.Confidence values and choosing the most
conservative/worst certainty (e.g., if any u.Confidence == "estimated" then
confidence = "estimated", otherwise if any == "exact" then "exact", else ""),
then handle the len(tr.Usage)==0 fallback to tr.Score.CostUSD and finally call
defaultUsageConfidence(UsageRecord{CostUSD: totalCost, Confidence: confidence})
to produce the final confidence; adjust the code around tr.Usage, totalCost,
confidence, defaultUsageConfidence, and UsageRecord accordingly.

In `@internal/engine/intent.go`:
- Around line 154-156: The check that iterates over result.TargetFiles should
normalize each entry (trim surrounding whitespace and skip empty/whitespace-only
strings) before calling fileInOwned against ownedPaths to avoid false
"superset_paths" rejections; update the loop that references result.TargetFiles
to trim entries (e.g., strings.TrimSpace) and continue past empty results, then
call fileInOwned(f, ownedPaths) as before so only meaningful file paths are
validated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0835c12-529e-4775-9c69-666676033092

📥 Commits

Reviewing files that changed from the base of the PR and between 407bba2 and 3673bdb.

📒 Files selected for processing (6)

internal/bench/compare.go
internal/bench/governance.go
internal/bench/governance_test.go
internal/bench/report.go
internal/engine/constraints/promote.go
internal/engine/intent.go

🚧 Files skipped from review as they are similar to previous changes (3)

internal/bench/governance_test.go
internal/engine/constraints/promote.go
internal/bench/compare.go

…CSV, trim TargetFiles

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/bench/report.go`:
- Around line 128-132: Add a dedicated constant for the repeated "unavailable"
confidence string and replace literal occurrences with it: extend the existing
constants block (which defines claimSystemResult, confidenceExact,
confidenceEstim) by adding something like confidenceUnavailable = "unavailable",
then update all uses (e.g., where the code currently compares/assigns the
literal "unavailable" around the confidence logic referenced in the diff,
including the other occurrences noted) to reference confidenceUnavailable
instead of the string literal.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d7b5eed-a758-45a2-922e-027972cbc7a6

📥 Commits

Reviewing files that changed from the base of the PR and between 3673bdb and 726c9ee.

📒 Files selected for processing (3)

internal/bench/governance.go
internal/bench/report.go
internal/engine/intent.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/bench/governance.go
internal/engine/intent.go

php-workx and others added 30 commits May 14, 2026 15:44

fix: add scoped git diffs for review

d0209fe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: emit no-trailing-newline marker in synthetic diffs

897762a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: cap hunk header count and add binary/empty tests

e5cdb04

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: show scoped review files in reviewer prompts

9db9620

test: assert no files-under-review section when ChangedFiles empty

4937a83

fix: capture per-attempt review deltas

f6462fb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: assert review changed files are persisted

f3d7d90

test: guard reviewer diff against unrelated dirty files

c9bc683

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add ticket quality artifact types

2fe24d7

feat: add deterministic ticket quality lint

3d76604

feat: plan safe ticket quality repairs

610fee4

feat: add ticket quality inspect command

80e4092

feat: consolidate parallel agent work (profiles, planner prompt, qual…

f962e73

…ity CLI, bench verknative)

php-workx and others added 16 commits May 16, 2026 19:26

feat: resolve closeout test references

d0ff0b8

feat: add runtime intent adapter surface

deb97ab

docs: sync plan statuses and decisions

cb3acf5

docs: prioritize llmcli runtime stability

e847d2d

feat: add llmcli runtime bridge

e765329

feat: route claude adapter through llmcli bridge

cc4fec8

feat: route codex adapter through llmcli bridge

a909778

chore: remove obsolete runtime subprocess helpers

1213d2a

feat: add llmcli runtime doctor diagnostics

10281b7

docs: document llmcli fallback operations

1613c14

fix: handle large codex jsonl events

4c352ae

docs: mark llmcli adapter implemented

4ab8848

php-workx self-assigned this May 17, 2026

fix: remove stale stub guard for bench run (now implemented)

a2d9934

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread internal/bench/runner.go Outdated

Comment thread internal/bench/runner.go

Comment thread internal/engine/test_resolver.go

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread internal/bench/compare.go

Comment thread internal/bench/report.go

Comment thread internal/engine/constraints/promote.go

Comment thread internal/engine/intent.go

fix: address second-round CodeRabbit findings (composite key, regress…

3673bdb

…ion label, cost consistency, timeout validation, criterion normalization)

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread internal/bench/governance.go Outdated

Comment thread internal/bench/report.go Outdated

Comment thread internal/bench/report.go Outdated

Comment thread internal/engine/intent.go

fix: use shared constants for repeated strings, fix mixed-confidence …

726c9ee

…CSV, trim TargetFiles

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread internal/bench/report.go

fix: add confidenceUnavailable constant to eliminate goconst lint error

adeefe9

php-workx merged commit 95cead2 into main May 17, 2026
11 checks passed

php-workx deleted the worktree-feat+plan-implementations branch May 17, 2026 15:18

Conversation

php-workx commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

qodo-code-review Bot commented May 17, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

php-workx commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading