feat(ledger): add SQLite run ledger by rianjs · Pull Request #37 · open-cli-collective/codereview-cli

rianjs · 2026-05-30T20:33:27Z

Summary

add internal/ledger SQLite store with schema version 1 and CR-06 startup pragmas
add single-writer write path, typed storage constants, run/session/finding/action/named-session persistence
add transactional AllocateRun with recovery-mode insert-only semantics and concurrent attempt allocation coverage

Verification

go test ./internal/ledger
go test -race ./internal/ledger
go test ./...
make check

Closes #13

rianjs · 2026-05-30T21:12:20Z

Architect review: clean after recovery allocation and validation coverage fixes.

STATUS: blockers=0 majors=0 minors=0 nits=0

rianjs · 2026-05-30T21:15:24Z

Coverage is broad, but there are still gaps to tighten before daemon review.

Blocker: AllocateRun is not yet proven transactional on failure after the PR upsert and before run insert commit. A non-atomic implementation could still pass.
Major: negative-path tests do not consistently prove no mutation on failure; they mostly assert error classification.
Minor: runs_resume is asserted by index name only, not definition.
Nit: recovery constraint classification has a white-box helper test; public-path coverage should carry as much of that as possible.

monit-reviewer

Automated PR Review

Reviewed commit: e692e58

Summary

Reviewer	Findings
harness-engineering:harness-architecture-reviewer	4
harness-engineering:harness-enforcement-reviewer	1
harness-engineering:harness-knowledge-reviewer	1
harness-engineering:harness-self-documenting-code-reviewer	1
security:security-code-auditor	1

harness-engineering:harness-architecture-reviewer (4 findings)

⚠️ Should Fix - internal/ledger/ledger.go:1237

isSQLiteConstraintError uses strings.Contains(err.Error(), "constraint") to classify constraint violations. This is fragile: any error whose message incidentally contains "constraint" (wrapped I/O errors, future driver message changes, trigger errors naming a constraint) will be misclassified, causing allocateRunTx to enter an incorrect retry path or return a false ErrRunExists. The modernc.org/sqlite driver exposes a structured *sqlite3.Error type with a numeric Code field; the correct check is errors.As to extract that type and compare against SQLITE_CONSTRAINT (19). This function is the load-bearing guard for the entire AllocateRun retry and recovery path.

💡 Suggestion - internal/ledger/ledger.go:356

write() holds s.mu while blocking on the unbuffered s.writes channel send. Under load with tight per-request timeouts, all concurrent callers queue behind the mutex rather than at the channel, so they cannot even begin context-cancellation detection until the previous caller finishes dispatching. Consider a buffered channel (capacity matching expected burst) or releasing the mutex before the channel send and re-checking closed state afterward to allow callers to race independently.

💡 Suggestion - internal/ledger/ledger.go:1282

The findings table declares a foreign key on session_row_id referencing sessions(session_row_id) without ON DELETE CASCADE. If a DeleteSession method is added later, SQLite will block the delete with a FK violation rather than cascading to findings, creating an asymmetric deletion contract relative to the runs → sessions cascade. Add ON DELETE CASCADE here, or add an explanatory comment if orphaned findings are acceptable.

💡 Suggestion - internal/ledger/ledger.go:1293

planned_actions.finding_id is a nullable TEXT column with no REFERENCES constraint pointing to findings(finding_id). With foreign_keys = ON, SQLite only enforces declared FK constraints; a caller can insert a planned action referencing a non-existent finding and it will be silently accepted. Given that the outbox consumer will look up findings by this ID, a dangling reference creates a silent data integrity gap. Add REFERENCES findings(finding_id), or document that the column is intentionally unvalidated.

harness-engineering:harness-enforcement-reviewer (1 findings)

⚠️ Should Fix - internal/ledger/ledger.go:344

In write(), if ctx is cancelled after the request is dispatched to the writer goroutine (first select) but before the response is received (second select), the caller receives ctx.Err() while the writer may still execute and commit the DB operation. For AllocateRun specifically, this means a run can be persisted while the caller believes the call failed. A subsequent retry with the same RunID will receive ErrRunExists, producing confusing recovery semantics. Consider returning the allocated run ID even on context cancellation, or documenting this hazard explicitly so callers can handle the ambiguous state.

harness-engineering:harness-knowledge-reviewer (1 findings)

💡 Suggestion - internal/ledger/ledger.go:412

classifyAllocateConstraint returns (retry=true, nil) when params.RunID is non-empty but the run_id does not yet exist in the database. The semantics are surprising: a caller who provided an explicit RunID and received a constraint violation unrelated to run_id uniqueness would be silently retried rather than having the real error surfaced. The path is currently unreachable under the single-writer design, but a comment or tighter condition would make the intent explicit and prevent confusion during future multi-writer work.

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/ledger/ledger.go:619

GetPR, GetRun, GetSession, GetNamedSession, ListFindings, and ListPlannedActions call s.db directly without checking s.closed. After Close() is called, these methods return a generic 'sql: database is closed' error rather than the typed ErrClosed sentinel. Callers using errors.Is(err, ErrClosed) to detect post-close access will miss these cases, while the write path correctly checks s.closed before dispatching.

security:security-code-auditor (1 findings)

⚠️ Should Fix - internal/ledger/ledger.go:465

In recovery mode, classifyAllocateConstraint returns retry=true when params.RunID is set, a constraint error occurs, and the RunID does not yet exist. However, the only other uniqueness constraint that could fire is on (pr_key, sha, base_sha, profile, posting_identity, attempt). Since recovery mode retries with the same parameters, it will hit the same constraint violation on every iteration, producing a truly infinite loop. Recovery mode should fail immediately on non-run_id constraint violations rather than retrying.

3 info-level observations excluded. Run with --verbose to include.

2 PR discussion threads considered.

Completed in 3m 59s | $1.35 | sonnet | daemon 0.2.121 | Glorfindel

Field	Value
Model	sonnet
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	3m 59s wall · 10m 57s compute (Reviewers: 2m 24s · Synthesis: 1m 32s)
Cost	$1.35
Tokens	217.8k in / 41.8k out
Turns	6

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	33.2k	5.5k	0	33.2k (1h)	$0.21
harness-engineering:harness-architecture-reviewer	sonnet	37.0k	4.7k	2.1k	34.9k (1h)	$0.23
harness-engineering:harness-enforcement-reviewer	sonnet	37.0k	7.4k	2.1k	34.9k (1h)	$0.27
harness-engineering:harness-knowledge-reviewer	sonnet	37.0k	6.0k	2.1k	34.9k (1h)	$0.25
harness-engineering:harness-self-documenting-code-reviewer	sonnet	36.7k	5.4k	2.1k	34.6k (1h)	$0.24
security:security-code-auditor	haiku	37.0k	12.8k	0	36.9k (1h)	$0.14

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

monit-reviewer · 2026-05-30T21:22:17Z

+  completed_at   TEXT,
+  outcome        TEXT,
+  artifact_path  TEXT NOT NULL,
+  blocking_count INTEGER NOT NULL DEFAULT 0,


🟡 Medium (harness-engineering:harness-architecture-reviewer): isSQLiteConstraintError uses strings.Contains(err.Error(), "constraint") to classify constraint violations. This is fragile: any error whose message incidentally contains "constraint" (wrapped I/O errors, future driver message changes, trigger errors naming a constraint) will be misclassified, causing allocateRunTx to enter an incorrect retry path or return a false ErrRunExists. The modernc.org/sqlite driver exposes a structured *sqlite3.Error type with a numeric Code field; the correct check is errors.As to extract that type and compare against SQLITE_CONSTRAINT (19). This function is the load-bearing guard for the entire AllocateRun retry and recovery path.

Reply to this thread when addressed.

Fixed - switched constraint detection to modernc error codes.

monit-reviewer · 2026-05-30T21:22:17Z

+		_ = db.Close()
+		return nil, err
+	}
+	if _, err := dbmig.Apply(ctx, db, migrations()); err != nil {


🟡 Medium (harness-engineering:harness-enforcement-reviewer): In write(), if ctx is cancelled after the request is dispatched to the writer goroutine (first select) but before the response is received (second select), the caller receives ctx.Err() while the writer may still execute and commit the DB operation. For AllocateRun specifically, this means a run can be persisted while the caller believes the call failed. A subsequent retry with the same RunID will receive ErrRunExists, producing confusing recovery semantics. Consider returning the allocated run ID even on context cancellation, or documenting this hazard explicitly so callers can handle the ambiguous state.

Reply to this thread when addressed.

Fixed - dispatched writes now return the writer result.

monit-reviewer · 2026-05-30T21:22:17Z

+		if err == nil {
+			return run, nil
+		}
+		retry, classifyErr := classifyAllocateConstraint(ctx, db, params, err)


🟡 Medium (security:security-code-auditor): In recovery mode, classifyAllocateConstraint returns retry=true when params.RunID is set, a constraint error occurs, and the RunID does not yet exist. However, the only other uniqueness constraint that could fire is on (pr_key, sha, base_sha, profile, posting_identity, attempt). Since recovery mode retries with the same parameters, it will hit the same constraint violation on every iteration, producing a truly infinite loop. Recovery mode should fail immediately on non-run_id constraint violations rather than retrying.

Reply to this thread when addressed.

Fixed - recovery retries only resume-key collisions.

monit-reviewer · 2026-05-30T21:22:17Z

+	}
+	go store.writer()
+	return store, nil
+}


🔵 Low (harness-engineering:harness-architecture-reviewer): write() holds s.mu while blocking on the unbuffered s.writes channel send. Under load with tight per-request timeouts, all concurrent callers queue behind the mutex rather than at the channel, so they cannot even begin context-cancellation detection until the previous caller finishes dispatching. Consider a buffered channel (capacity matching expected burst) or releasing the mutex before the channel send and re-checking closed state afterward to allow callers to race independently.

Reply to this thread when addressed.

Fixed - added a bounded writer queue.

monit-reviewer · 2026-05-30T21:22:17Z

+	`CREATE TABLE planned_actions (
+  action_id      TEXT PRIMARY KEY,
+  run_id         TEXT NOT NULL REFERENCES runs(run_id) ON DELETE CASCADE,
+  kind           TEXT NOT NULL,


🔵 Low (harness-engineering:harness-architecture-reviewer): The findings table declares a foreign key on session_row_id referencing sessions(session_row_id) without ON DELETE CASCADE. If a DeleteSession method is added later, SQLite will block the delete with a FK violation rather than cascading to findings, creating an asymmetric deletion contract relative to the runs → sessions cascade. Add ON DELETE CASCADE here, or add an explanatory comment if orphaned findings are acceptable.

Reply to this thread when addressed.

Fixed - findings cascade on session deletion.

monit-reviewer · 2026-05-30T21:22:17Z

+  attempted_at   TEXT,
+  posted_at      TEXT,
+  upstream_id    TEXT,
+  error          TEXT


🔵 Low (harness-engineering:harness-architecture-reviewer): planned_actions.finding_id is a nullable TEXT column with no REFERENCES constraint pointing to findings(finding_id). With foreign_keys = ON, SQLite only enforces declared FK constraints; a caller can insert a planned action referencing a non-existent finding and it will be silently accepted. Given that the outbox consumer will look up findings by this ID, a dangling reference creates a silent data integrity gap. Add REFERENCES findings(finding_id), or document that the column is intentionally unvalidated.

Reply to this thread when addressed.

Fixed - planned actions now reference findings.

monit-reviewer · 2026-05-30T21:22:17Z

+
+// DeleteRun deletes a run and lets SQLite cascade child rows.
+func (s *Store) DeleteRun(ctx context.Context, runID string) error {
+	if strings.TrimSpace(runID) == "" {


🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): GetPR, GetRun, GetSession, GetNamedSession, ListFindings, and ListPlannedActions call s.db directly without checking s.closed. After Close() is called, these methods return a generic 'sql: database is closed' error rather than the typed ErrClosed sentinel. Callers using errors.Is(err, ErrClosed) to detect post-close access will miss these cases, while the write path correctly checks s.closed before dispatching.

Reply to this thread when addressed.

Fixed - reads now return ErrClosed after Close.

monit-reviewer · 2026-05-30T21:22:17Z

+
+func configureSQLite(ctx context.Context, db *sql.DB) error {
+	if _, err := db.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
+		return fmt.Errorf("ledger: enable foreign keys: %w", err)


🔵 Low (harness-engineering:harness-knowledge-reviewer): classifyAllocateConstraint returns (retry=true, nil) when params.RunID is non-empty but the run_id does not yet exist in the database. The semantics are surprising: a caller who provided an explicit RunID and received a constraint violation unrelated to run_id uniqueness would be silently retried rather than having the real error surfaced. The path is currently unreachable under the single-writer design, but a comment or tighter condition would make the intent explicit and prevent confusion during future multi-writer work.

Reply to this thread when addressed.

Fixed - unrelated constraints are no longer retried.

monit-reviewer

Automated PR Review

Reviewed commit: 83790ad | Previous: e692e58 (incremental)

Approved with 1 non-blocking suggestion below. Address at your discretion.

Summary

Reviewer	Findings
harness-engineering:harness-enforcement-reviewer	1

harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/ledger/ledger.go:1299

isResumeAttemptConstraintError identifies resume-key collisions by checking that all six column names appear in the driver error message text. This couples the retry decision to the modernc.org/sqlite message format: a driver update that changes the "UNIQUE constraint failed: table.col1, table.col2" format (e.g., drops the table prefix, changes the separator) would cause the function to silently return false, surfacing a raw constraint error instead of retrying. The failure mode is no-retry rather than incorrect-retry, which is safer than the previous isSQLiteConstraintError pattern, but it remains a version-sensitive contract. A named constraint (CONSTRAINT resume_key UNIQUE(...)) would make the error message carry the constraint name instead, making the match stable; alternatively, a test fixture that pins the expected message format from the driver would catch a regression at test time rather than in production.

10 PR discussion threads considered.

Completed in 2m 01s | $1.85 | sonnet | daemon 0.2.121 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 2 · Session resumed
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	2m 01s wall · 4m 50s compute (Reviewers: 1m 24s · Synthesis: 20s)
Cost	$1.85
Tokens	540.8k in / 15.8k out
Turns	7

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	43.3k	1.1k	18.6k	24.7k (1h)	$0.12
harness-engineering:harness-architecture-reviewer	sonnet	75.4k	112	2.8k	72.5k (1h)	$0.27
harness-engineering:harness-enforcement-reviewer	sonnet	78.0k	4.2k	2.8k	75.2k (1h)	$0.35
harness-engineering:harness-knowledge-reviewer	sonnet	157.7k	3.5k	6.9k	150.8k (1h)	$0.62
harness-engineering:harness-self-documenting-code-reviewer	sonnet	75.4k	3.2k	2.8k	72.5k (1h)	$0.32
security:security-code-auditor	haiku	83.5k	2.9k	0	83.5k (1h)	$0.12
discussion-summarizer	—	27.5k	815	18.6k	8.9k (1h)	$0.05

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

monit-reviewer · 2026-05-30T21:31:37Z

+  session_row_id      TEXT PRIMARY KEY,
+  run_id              TEXT NOT NULL REFERENCES runs(run_id) ON DELETE CASCADE,
+  provider_session_id TEXT NOT NULL,
+  role                TEXT NOT NULL,


🔵 Low (harness-engineering:harness-enforcement-reviewer): isResumeAttemptConstraintError identifies resume-key collisions by checking that all six column names appear in the driver error message text. This couples the retry decision to the modernc.org/sqlite message format: a driver update that changes the "UNIQUE constraint failed: table.col1, table.col2" format (e.g., drops the table prefix, changes the separator) would cause the function to silently return false, surfacing a raw constraint error instead of retrying. The failure mode is no-retry rather than incorrect-retry, which is safer than the previous isSQLiteConstraintError pattern, but it remains a version-sensitive contract. A named constraint (CONSTRAINT resume_key UNIQUE(...)) would make the error message carry the constraint name instead, making the match stable; alternatively, a test fixture that pins the expected message format from the driver would catch a regression at test time rather than in production.

Reply to this thread when addressed.

Covered by TestClassifyAllocateConstraintForRecoveryMode, which generates the driver duplicate-resume error and verifies retry classification.

rianjs added 2 commits May 30, 2026 16:33

Add SQLite run ledger

113dc44

Closes #13

Harden recovery allocation handling

4659a5e

Prove ledger failure atomicity

e692e58

monit-reviewer suggested changes May 30, 2026

View reviewed changes

Harden ledger write and constraint handling

83790ad

monit-reviewer approved these changes May 30, 2026

View reviewed changes

rianjs merged commit 3a526cc into main May 30, 2026
8 checks passed

rianjs deleted the feat/cr-06-ledger-store branch May 30, 2026 21:33

Conversation

rianjs commented May 30, 2026

Summary

Verification

Uh oh!

rianjs commented May 30, 2026

Uh oh!

rianjs commented May 30, 2026

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants