docs(plan): codemap audit --base <ref> — worktree + reindex strategy (v1.x backlog)#51
docs(plan): codemap audit --base <ref> — worktree + reindex strategy (v1.x backlog)#51SutuSebastian merged 2 commits intomainfrom
Conversation
…(v1.x backlog) Plan for the next-best agent-value loop: PR-review structural-diff. Replaces today's 3-step --baseline dance with one verb. Reuses 90% of the existing audit infrastructure (PR #33); only new piece is the worktree+reindex snapshot path. Cache-by-resolved-sha; LRU 5/500 MiB; mutual-exclusive with --baseline; per-delta override compatible. Hard error on non-git projects (no graceful fallback — there's no meaningful 'ref' without git). Plan only — implementation follows after CodeRabbit review per the impact (#49→#50) / watch (#46→#47) workflow.
|
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdded a comprehensive design plan ( ChangesDocumentation: audit --base Feature Plan
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 36 seconds.Comment |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/plans/audit-base.md (2)
55-78: ⚡ Quick winClarify backwards-compat/type changes needed for
AuditBase(source: "ref"+ newbase.ref).The plan correctly states
base.sourcebecomes an enum extension with"ref"(Lines 77-78) and addsbase.ref(Lines 63-66). However, to prevent downstream breakage, it would help to explicitly call out that the TypeScriptAuditBase/AuditEnvelopetypes must be updated sosourceaccepts"baseline" | "ref"andbase.refexists for"ref"audits (or is optional).Without that, consumers that currently assume only
source: "baseline"may fail compilation or mishandle runtime shapes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/audit-base.md` around lines 55 - 78, Update the TypeScript types for AuditBase/AuditEnvelope so the per-delta discriminator base.source accepts the union "baseline" | "ref" and ensure base.ref is present (or at least optional) on the shape used when base.source === "ref"; convert AuditBase to a discriminated union (or widen the interface) that includes a variant with base.source: "ref" and base.ref: string (and keep the existing baseline variant) so consumers that narrow on base.source handle the new shape without type errors.
109-110: ⚡ Quick winDocument
CODEMAP_AUDIT_CACHE_SIZEsemantics (or remove the mention).You mention
CODEMAP_AUDIT_CACHE_SIZE(Line 109-110) as “defer to env var + sane defaults” but don’t specify:
- whether it’s a hard cap or soft limit,
- format (MiB? bytes?),
- default value,
- units, and how it interacts with the “5 entries OR 500 MiB” eviction (Line 84, 109-110).
Given the repo’s docs guidelines for config options in
*.md, please either fully document it (with an example) or drop the env-var detail until v1.x+ when it’s actually implemented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/audit-base.md` around lines 109 - 110, Update the docs to either fully specify CODEMAP_AUDIT_CACHE_SIZE or remove its mention: if keeping it, add semantics for CODEMAP_AUDIT_CACHE_SIZE (environment variable), stating whether it is a hard cap or soft target, the accepted format/units (e.g., integer bytes or suffixes like MiB/GiB), the default value, how it interacts with the existing “5 entries OR 500 MiB” eviction policy (which wins when limits conflict), and a small example export line; if you prefer to defer, remove the reference to CODEMAP_AUDIT_CACHE_SIZE entirely so the doc doesn’t promise an unimplemented config option.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/plans/audit-base.md`:
- Around line 83-91: Clarify and reconcile D1/D2/D8 by specifying that the
worktree created via `git worktree add` is ephemeral and removed immediately
after producing a cached index: on cache miss create worktree at
`<projectRoot>/.codemap/audit-cache/<sha>/worktree-<tmp>` run
`runCodemapIndex({mode: "full"})` to build a temp `.codemap.db` stored as
`<projectRoot>/.codemap/audit-cache/<sha>/.codemap.db`, then `git worktree
remove --force <path>` in a `finally` while leaving the cached `.codemap.db` and
metadata (LRU timestamp/size) intact for future cache hits; explain that cache
hits use the cached `.codemap.db` (no worktree present) to skip reindexing, and
that eviction (LRU after 5 entries or 500 MiB) removes both the cached DB file
and any stale worktree dirs if present.
- Line 84: The sha-keyed cache can race when two processes resolve the same sha
and both create/populate <projectRoot>/.codemap/audit-cache/<sha>/; implement an
atomic per-sha strategy: acquire a per-sha lockfile (e.g., .lock inside the
<sha> directory or a global lockdir with atomic mkdir) before any populate,
write the temp DB/artifacts to a temp path (e.g., <sha>.tmp.<pid>), fsync/close
them, then atomically rename into the final <sha> directory so readers never see
partial state; readers should treat a missing final dir as a cache miss and, if
they encounter a lockfile, either wait with timeout/retry or single-flight by
checking the lock and then re-checking the final dir after lock release; apply
this for the code that resolves the cache key (the --base → resolved sha logic)
and the cache eviction routine (LRU/size checks) to avoid races during eviction.
---
Nitpick comments:
In `@docs/plans/audit-base.md`:
- Around line 55-78: Update the TypeScript types for AuditBase/AuditEnvelope so
the per-delta discriminator base.source accepts the union "baseline" | "ref" and
ensure base.ref is present (or at least optional) on the shape used when
base.source === "ref"; convert AuditBase to a discriminated union (or widen the
interface) that includes a variant with base.source: "ref" and base.ref: string
(and keep the existing baseline variant) so consumers that narrow on base.source
handle the new shape without type errors.
- Around line 109-110: Update the docs to either fully specify
CODEMAP_AUDIT_CACHE_SIZE or remove its mention: if keeping it, add semantics for
CODEMAP_AUDIT_CACHE_SIZE (environment variable), stating whether it is a hard
cap or soft target, the accepted format/units (e.g., integer bytes or suffixes
like MiB/GiB), the default value, how it interacts with the existing “5 entries
OR 500 MiB” eviction policy (which wins when limits conflict), and a small
example export line; if you prefer to defer, remove the reference to
CODEMAP_AUDIT_CACHE_SIZE entirely so the doc doesn’t promise an unimplemented
config option.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 044362b6-cb36-4975-9538-304f36120ea0
📒 Files selected for processing (2)
docs/plans/audit-base.mddocs/roadmap.md
…, worktree-as-cache lifecycle clarity, TS type widening callout - D1/D2/D8 rewritten: worktree IS the cache entry (kept until LRU evicts); cleanup runs only on reindex failure rollback OR LRU eviction. The earlier ambiguity (D2 said 'cache by sha' while D8 said 'remove in finally') is resolved. - D11 added: atomic cache populate via per-pid temp dir + POSIX rename → free single-flight semantics; no lock files needed. Same pattern for eviction. Closes the race CodeRabbit flagged on concurrent CI matrix runs against the same sha. - AuditBase TS type widening to discriminated union called out explicitly above the Decisions table (Tracer 1 ships it). - CODEMAP_AUDIT_CACHE_SIZE env var mention dropped — was promising an unimplemented config knob; v1 hardcodes the limits, defer to v1.x+.
|
Triaged the 4 findings (1 Major + 3 Quick wins) per `pr-comment-fact-check`. All 4 had merit — applied in 9c92b05. Applied (4 of 4):
Plan PR ready for re-review or merge. |
…impl of PR #51 plan) (#52) * feat(audit): runAuditFromRef + worktree cache (Tracer 1) * feat(audit): cmd-audit.ts --base <ref> parser + runner wired (Tracer 2) - parseAuditRest gains --base <ref> / --base=<ref>; rejects --base + --baseline as mutually exclusive (clean error message); per-delta --<delta>-baseline still composes orthogonally. - runAuditCmd dispatches to runAuditFromRef when opts.base is set; otherwise unchanged baseline-prefix path. - makeWorktreeReindex injects the runtime save/restore around the worktree-side reindex (initCodemap mutates global state — restore needed so live-DB ops after the reindex still target the original project). - main.ts forwards parsed.base. - Help text updated with --base examples. - .gitignore: .codemap/audit-cache/ (don't track cached worktrees; .codemap/recipes/ stays tracked). - Smoke: bun src/index.ts audit --base HEAD~1 --json --summary works on this repo (4 files added vs HEAD~1, 7 deps, 0 deprecated). Second run hits cache (~110ms). - 6 new parser tests cover --base alone, --base=, mutual exclusivity, per-delta composition, missing value. * feat(audit): MCP audit tool gains base?: z.string() (Tracer 3) - audit-engine.ts: makeWorktreeReindex lifted from cmd-audit.ts (single source of truth — both transports inject the same factory). - tool-handlers.ts: handleAudit gains base/baseline_prefix mutual-exclusion guard + dispatches to runAuditFromRef when args.base set. - mcp-server.ts: audit tool description rewritten to document the 3 snapshot sources (base / baseline_prefix / baselines per-delta) + their composition rules. - 2 new MCP tests: rejects base + baseline_prefix combo, returns clean error in non-git project. - HTTP POST /tool/audit lights up automatically via the existing dispatcher (Tracer 4 next). * feat(audit): HTTP POST /tool/audit auto-wired (Tracer 4) The dispatcher already validates with auditArgsSchema (which gained the `base?: string` field in Tracer 3) — this commit just adds tests: - audit + base + baseline_prefix combo → 400 mutually exclusive - audit + base in non-git project → 400 'requires a git repository' Both transports (MCP + HTTP) now expose the new --base path through the same handleAudit pure function. * docs(audit-base): sync README + architecture + glossary + agent rule/skill + changeset (Tracer 5) - README.md: 'audit --base origin/main' code block + cache caveat - docs/architecture.md: Audit wiring extended with --base details (worktree + atomic populate + cache + LRU + non-git error + GIT_* env hygiene); audit-engine.ts inventory mentions runAuditFromRef + makeWorktreeReindex; audit-worktree.ts added to application/ inventory - docs/glossary.md: 'audit --base / git-ref baseline' entry; existing 'audit' entry now mentions 3 mutually-exclusive sources - docs/roadmap.md: drop the v1.x backlog item (it shipped) - docs/research/fallow.md: B.5 row updated to 'v1 + v1.x partial shipped'; Adjacent-shipped bullet referencing PR #51 (plan) + #52 (impl) - .agents/rules/codemap.md + templates/agents/rules/codemap.md: new 'Audit vs git ref' table row + paragraph rewrite (Rule 10 lockstep) - .agents/skills/codemap/SKILL.md + templates/agents/skills/codemap/SKILL.md: audit tool args + behavior reflect new 'base' field (Rule 10 lockstep) - src/agents-init.ts: ensureGitignoreCodemapPattern now adds '.codemap/audit-cache/' alongside '.codemap.*' (the audit cache lives inside .codemap/ but .codemap/recipes/ stays git-tracked); test updated. - .changeset/codemap-audit-base.md: minor — full design rationale - docs/plans/audit-base.md: deleted per docs-governance Rule 3 * fix(audit-base): address all 6 CodeRabbit review comments + 2 nitpicks Threads (all 6 actionable, no hallucinations): 1. .agents/rules + templates: stale 'B.6' spec leak → 'query_baselines entries' (consistent terminology) 2. audit-engine.ts: process-level mutex (_reindexChain promise queue) around makeWorktreeReindex's runtime-singleton swap. Concurrent HTTP/MCP audits no longer interleave initCodemap/configureResolver state. Long-term cleanup (runCodemapIndex takes explicit context) tracked separately. 3. audit-worktree.test.ts: commitFiles fixture now checks git commit exit code; throws with stderr on failure (was silently passing against previous commit) 4. audit-worktree.ts: tmpName gains randomUUID() suffix (defensive against cross-process races; in-process serialised by mutex from #2) 5. audit-worktree.ts: evictIfOverLimits accepts protectPath; populateWorktree passes finalPath so the freshly-populated entry can't evict itself when it alone exceeds MAX_CACHE_BYTES. New test covers the protect path. 6. mcp-server.ts + .agents/skills + templates: rephrased 'three mutually exclusive sources' to 'two mutually exclusive PRIMARY sources + optional per-delta overrides' (matches handleAudit's actual exclusivity rule) Nitpicks: - fallow.md status: 'v1 + v1.x partial shipped' → 'v1 + v1.x shipped; verdict deferred' (clarity) - cmd-audit.test.ts: added --base= empty-value + --base "" two-token-empty-string parser tests for symmetry with --baseline coverage
…age` table) (#56) * docs(plan): static coverage ingestion (Istanbul JSON → `coverage` table) Plans the C.11 candidate from `research/fallow.md` — `codemap ingest-coverage <path>` reads Istanbul `coverage-final.json` into two new tables (`coverage` symbol-level + `file_coverage` rollup), joinable to `symbols` for the killer "what's structurally dead AND untested?" recipe in one query. Resolves the open question from `fallow.md § 6` ("symbols column vs separate table?") in favour of a separate table with `ON DELETE CASCADE` (D1) — coverage shape evolves independently of structural columns; LEFT JOIN keeps NULL semantics explicit; rows survive `--full` reindex via the `query_baselines` precedent (D6). Key decisions: - Istanbul JSON in v1; LCOV in v1.x; raw V8 traces never (D3, fallow's paid moat). - One-shot `ingest-coverage` verb decoupled from `codemap` index runs (D4) — coverage cadence (per `bun test --coverage`) ≠ index cadence (per file edit). - Statement coverage only in v1 (D5); branch/function deferred until a consumer asks. - MCP/HTTP exposure as a query column, not a separate `coverage` tool (D9) — composes with every existing recipe + ad-hoc SQL. - `codemap audit --delta coverage` deferred to v1.x (D10) — raw schema first. Five-tracer plan: schema bump → engine → CLI verb → fixture + golden recipe → docs. Plan only — implementation follows after CodeRabbit review per the established workflow (PRs #46/47, #49/50, #51/52, #53/54). * docs(plan): fact-check fixes — drop hallucinated SQL/projection/runner claims Self-audit against the actual codebase surfaced four claims that didn't hold: 1. Killer recipe SQL referenced `callee_id` — `calls` is name-keyed (`callee_name TEXT`, no symbol-id FK; see `db.ts` `CallRow`). Rewrote the "no callers" predicate as `NOT EXISTS (… WHERE callee_name = s.name)`. 2. D7 claimed line-range projection is "the same `markers` already uses" — `markers` is line-pinned (`line_number INTEGER`), no projection. Reworded as "novel for this plan" with the actual mechanic spelled out. 3. D3 listed `bun test --coverage` as an Istanbul JSON emitter — `bun test --help` shows only `text` / `lcov` reporters today. Removed bun from the Istanbul-emitters list; left vitest/jest/c8/nyc with the explicit reporter flags they need. 4. D12 contradicted D6 ("rows absent until re-ingest" vs "rows survive `--full`"). Reconciled: empty is the correct initial state on first bump; subsequent bumps preserve via the `dropAll()` exclusion. Quoted the `lessons.md` policy verbatim instead of paraphrasing. * docs(plan): v2 — fix CASCADE hazard + innermost-wins projection + nits Self-grilling found two real schema design holes that would block execution: 1. **D6 CASCADE hazard.** Original draft keyed `coverage` on `symbol_id REFERENCES symbols(id) ON DELETE CASCADE`. Every `--full` reindex calls `dropAll()` → drops `symbols` → CASCADE wipes coverage, regardless of whether `coverage` itself was excluded from `dropAll()`. Recreated `symbols` get fresh auto-increment IDs anyway → coverage permanently lost without re-ingest. Fix: natural-key PK `(file_path, name, line_start)` — no FK to `symbols.id`. Survives the `symbols` drop-recreate cycle. Trade-off: orphan rows when files are deleted; cleaned by one explicit `DELETE FROM coverage WHERE file_path NOT IN (SELECT path FROM files)` after every ingest. 2. **D7 overlapping symbols.** Original draft: `line_start ≤ stmt_line ≤ line_end` matches every enclosing scope. With nested symbols (class methods inside classes, closures inside functions), one Istanbul statement projects onto 3+ symbols, inflating `total_statements` 2-3×. Fix: innermost-wins via `(line_end - line_start) ASC LIMIT 1`. New `skipped.statements_no_symbol` counter for statements that fall outside every symbol range (top-level expressions, side-effect imports). Nits cleared in the same pass: - D2: drop `file_coverage` rollup table from v1 (aggregateable via GROUP BY on the symbol-level table; doubling sources of truth without a benchmark is premature). Promote to v1.x with a real query. - D11: spec the `total_statements = 0 → coverage_pct IS NULL` edge case + document the cross-file name-collision lossiness in the killer recipe. - Drop `--prune` flag (orphan cleanup is unconditional, no flag needed). - Drop per-row `source` column (single meta key sufficient; one ingest at a time). - Update killer recipe SQL to use the natural-key 3-column join. - Drop made-up "~50 LoC LCOV ingester" estimate and "<50 ms / <1 ms / ~500 KB" performance numbers (no benchmark backed them). - Tracer 1 / 2 / 3 acceptance criteria updated to match the new schema. Plan is now ready for tracer-1 implementation. CodeRabbit pass deferred (rate-limited 57m). * docs(plan): tighten Bun-native API references (file read + perf note) Plan correctly inherits the established Node vs Bun runtime split, but the single tracer-3 reference understated it. Now: - Tracer 3 cites `packaging.md § Node vs Bun` as the canonical pattern source instead of pointing at config.ts in passing. - Performance section calls out the actual lever — `Bun.file(path).json()` uses Bun's native JSON parser, materially faster than V8 `JSON.parse` on multi-MB Istanbul payloads (real coverage files for medium codebases routinely hit several MB). No new Bun-native API surfaces are added — the feature doesn't need globbing, file writes, spawn, or hashing beyond what the existing engines already use through their abstractions. * docs(plan): v3 — ship LCOV in v1 + drop --source flag + bundle killer recipe The "fully capable, no half-way APIs" principle reshapes three things: 1. **LCOV ingester ships in v1** alongside Istanbul. Original draft deferred LCOV to v1.x, which would exclude `bun test --coverage` users — i.e. codemap's own primary runtime. That's the textbook half-baked surface the principle bans. Two parser front-ends share one `upsertCoverageRows` core; LCOV is regex tokenizing over `SF:` / `DA:` / `end_of_record`. Tracer 2 splits into 2a (shared core + Istanbul parser) and 2b (LCOV parser), both writing identical normalised CoverageRow[] into the same upsert path. 2. **`--source istanbul|lcov` flag dropped.** Auto-detection from extension (`.json` → istanbul, `.info` → lcov, directory → probe both, error on ambiguous) is unambiguous; a flag for "tell codemap what it can already see" is API noise. Misnamed files can be renamed (one-liner) cheaper than codemap can grow a flag. 3. **Killer recipe ships as bundled `untested-and-dead.{sql,md}`** in `templates/recipes/`. Per the recipes-as-content registry (PR #37), the high-value queries become first-class agent surface. A buried doc snippet would be invisible to agents at session start; the bundled recipe shows up in `--recipes-json` and gets a `codemap query --recipe untested-and-dead` direct invocation. Tracer 4 also fans out: Istanbul + LCOV fixtures cover the same partial coverage shape; three golden recipes (`coverage-istanbul.json`, `coverage-lcov.json`, `untested-and-dead.json`) prove format equivalence. Out-of-scope, alternatives, performance section, title, and goal statement all updated to match. * docs(plan): v4 — agent-journey audit + bundled recipe shelf (D13) Walked every D / OOS / tracer item against "fully capable + agent first-class + no half-baked APIs". Found three half-baked surfaces: 1. **D2 deferral leaks "compose GROUP BY yourself" onto the agent.** Deferring the `file_coverage` table is correct (no benchmark proves it's needed) — but the agent-facing answer for "rank files by coverage" was missing. Fix: keep table deferral, ship a bundled `files-by-coverage.{sql,md}` recipe so the GROUP BY view IS first-class. 2. **D11 name-collision lossiness was acknowledged but unmitigated.** The killer recipe's `callee_name = s.name` cross-file lossiness was documented in the recipe SQL comment, but the recipe `.md` didn't give the agent any narrowing pattern. Now D11 ships three concrete narrowing patterns in the `.md` (file_path scope, default- export filter, exported-only restriction) so the agent has workable mitigations on day one. 3. **Missing recipe shelf for common agent questions.** Walking the journey: only "What's structurally dead AND untested?" had a recipe; "Rank files by coverage" and "Worst-covered exported symbols" forced ad-hoc SQL. Three recipes fully cover the agent journey end-to-end. New D13 codifies the bundled-recipe principle: every common agent question gets a `--recipe` verb. Three v1 recipes: - `untested-and-dead.{sql,md}` (killer, with name-collision mitigations) - `files-by-coverage.{sql,md}` (replaces D2's table deferral) - `worst-covered-exports.{sql,md}` (top-N agent ask) Each `.md` carries a frontmatter `actions` block (per PR #26) so agents get per-row follow-up hints. All three appear in `--recipes-json` automatically — agents discover them at session start. New "Agent journey" section makes the principle visible: a table mapping every common agent question to the v1 verb that answers it. If a row ever shows "compose SQL yourself" without a recipe, the surface is half-baked and needs a recipe before tracer 1 ships. Tracer 4 expanded: ships all three recipes + five golden snapshots (adds files-by-coverage.json + worst-covered-exports.json on top of the three existing). Tracer 5 expanded: glossary + agent rule trigger table gain three new rows. Plan now passes the principle audit end-to-end.
Summary
Plan for
codemap audit --base <ref>— the v1.x backlog item that closes the highest-frequency agent loop left open after PR #50: PR-review structural diff.Today, comparing the working tree to `origin/main` is a 3-step `--baseline` dance (switch branches, reindex, save baselines, switch back). `--base ` collapses it to one verb that:
Why
PR #50 just shipped `codemap impact` — the "blast radius" loop. The remaining high-frequency agent loop without a one-verb solution is "what changed structurally in this PR?" `--base ` reuses 90% of the audit infrastructure (only new piece is the worktree + reindex snapshot path) and lights up MCP + HTTP automatically via the existing `tool-handlers.ts` dispatcher.
See `docs/plans/audit-base.md` for the full design — sketched API, output envelope, 10 decisions (D1-D10), 5 tracers, perf considerations, alternatives rejected, and out-of-scope list.
Test plan
Summary by CodeRabbit
Release Notes
audit --basefeature specification