From 61eec22f7bf8ddf96941aafc7d5c7d7c0f6e2afd Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Sun, 3 May 2026 15:25:22 +0300 Subject: [PATCH 1/2] =?UTF-8?q?docs(plan):=20codemap=20audit=20--base=20=20=E2=80=94=20worktree=20+=20reindex=20strategy=20(v1.x=20b?= =?UTF-8?q?acklog)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Plan for the next-best agent-value loop: PR-review structural-diff. Replaces today's 3-step --baseline dance with one verb. Reuses 90% of the existing audit infrastructure (PR #33); only new piece is the worktree+reindex snapshot path. Cache-by-resolved-sha; LRU 5/500 MiB; mutual-exclusive with --baseline; per-delta override compatible. Hard error on non-git projects (no graceful fallback — there's no meaningful 'ref' without git). Plan only — implementation follows after CodeRabbit review per the impact (#49→#50) / watch (#46→#47) workflow. --- docs/plans/audit-base.md | 129 +++++++++++++++++++++++++++++++++++++++ docs/roadmap.md | 2 +- 2 files changed, 130 insertions(+), 1 deletion(-) create mode 100644 docs/plans/audit-base.md diff --git a/docs/plans/audit-base.md b/docs/plans/audit-base.md new file mode 100644 index 0000000..5b65ac1 --- /dev/null +++ b/docs/plans/audit-base.md @@ -0,0 +1,129 @@ +# `codemap audit --base ` — git-ref baseline (worktree + reindex) + +> **Status:** in design (no code) · **Backlog:** [`docs/roadmap.md` § Backlog](../roadmap.md#backlog) → "`codemap audit --base ` (v1.x)". Delete this file when shipped (per [`docs/README.md` Rule 3](../README.md)). + +## Goal + +`codemap audit --baseline ` (PR #33) compares the live index against pre-saved per-delta baselines. That covers the workflow "I baseline once at v1.0.0, then track drift between releases." It does NOT cover the workflow agents hit every PR review: + +> "What changed structurally between this branch and `origin/main`?" + +Today's workaround is a 3-step dance: `codemap query --save-baseline=pr-base -r files`, `git checkout origin/main && codemap && codemap query --save-baseline=…`, switch back, `codemap audit --baseline pr-base`. The agent has to remember to switch branches, remember the baseline name, AND keep the index in sync each time. + +`codemap audit --base ` collapses that into one verb. Worktree + reindex against any committish, diff against current — same `{head, deltas}` envelope `--baseline` already emits. + +## Why this is the next-best agent-value move + +| Loop | Status (post-PR #50) | This plan | +| ----------------------------------------------------------------------------- | ------------------------------------- | ------------------------ | +| "Is the index stale?" | Solved by `--watch` | — | +| "Where is X?" | Solved by `show` / `snippet` | — | +| "Blast radius of X?" | Solved by `impact` (PR #50) | — | +| **"What changed in this PR? / Did my refactor break anything structurally?"** | Workaround: 3-step `--baseline` dance | **`audit --base `** | +| "When did X arrive?" | Requires `git log -L` shell-out | Out of scope | +| "Is this dead code?" | `impact` says 0 callers; coverage gap | Out of scope (C.11) | + +PR review is a **daily** agent loop. This unblocks it with one verb that reuses 90% of the existing audit infrastructure. + +## Sketched API + +CLI surface (additive on the existing `audit` command): + +```bash +# Existing flags (unchanged): +codemap audit --baseline [---baseline ] [--summary] [--json] [--no-index] + +# New flag: +codemap audit --base [--summary] [--json] [--no-index] + # = any committish: origin/main · HEAD~5 · v1.0.0 · · etc. + +# New: combine --base with explicit per-delta override +codemap audit --base origin/main --files-baseline pr-files [--json] + # files delta uses pre-saved 'pr-files' baseline; dependencies + deprecated + # use the worktree-derived rows from origin/main. + +# Errors: +codemap audit --base origin/main --baseline pr # ERROR: --base and --baseline are mutually exclusive +codemap audit --base bogus-ref # ERROR: --base: cannot resolve "bogus-ref" to a commit (git rev-parse failed) +codemap audit --base origin/main # in non-git project: ERROR: --base requires a git repository +``` + +MCP tool: `audit` gains an optional `base?: string` arg with the same semantics. HTTP `POST /tool/audit` lights up automatically via the existing dispatcher. + +## Output envelope + +Identical to today's `--baseline` shape. Only the per-delta `base` metadata changes: + +```jsonc +{ + "head": { "sha": "abc123", "indexed_at": 1714742400 }, + "deltas": { + "files": { + "base": { + "source": "ref", // NEW value (was always "baseline") + "ref": "origin/main", // NEW field (alongside existing name) + "sha": "def456", // resolved sha at audit time + "indexed_at": 1714742400 // when the worktree index ran + }, + "added": [{ "path": "src/new.ts" }, ...], + "removed": [{ "path": "src/old.ts" }, ...] + }, + "dependencies": { ... }, + "deprecated": { ... } + } +} +``` + +Per-delta `base.source` already exists as a discriminator (`"baseline"` today). Adding `"ref"` is a backwards-compatible enum extension; consumers that switch on `base.source` get a clean miss instead of a silent failure. + +## Decisions + +| # | Decision | +| --- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| D1 | **Worktree + reindex strategy.** Use `git worktree add /tmp/codemap-base- ` to materialise the ref into a sibling directory, then run the existing `runCodemapIndex` against it with a temp `.codemap.db`. Run each delta's canonical SQL on the temp DB to get the baseline rows. Tear down the worktree + temp DB on exit. **NOT** in-place `git checkout` — that would mutate the user's working tree, break `watch` if running, and disrupt open editors. | +| D2 | **Cache the temp index by sha.** Worktrees go to `/.codemap/audit-cache//` (project-local, gitignored alongside `.codemap.db`). Cache hit on second run against same `` (resolved sha) → skip the reindex (~100x faster on a 100k-symbol repo). Eviction: LRU after 5 cache entries OR 500 MiB total — defer config knobs to v1.x+. Cache key is the **resolved sha**, not the ref string, so `--base origin/main` and `--base abc123` (where `abc123` is `origin/main`'s tip) share one entry. | +| D3 | **Ref resolution upfront.** `git rev-parse --verify "${ref}^{commit}"` runs first — if it fails, return `{error: "codemap audit: --base: cannot resolve \"\" to a commit"}` before touching the worktree. Same shape `getFilesChangedSince` already uses (`git-changed.ts:23`). | +| D4 | **Non-git projects.** Hard error: `{error: "codemap audit: --base requires a git repository"}`. No graceful fallback — there's no meaningful "ref" without git. Detected via `existsSync(join(root, ".git"))` (cheap, runs before any spawn). | +| D5 | **Dirty working tree.** Audit current state regardless. The whole point is "compare current (potentially uncommitted) work to ." NO check / warning — symmetric with how `getFilesChangedSince` already handles `git status` rows alongside `git diff` output. | +| D6 | **Mutual exclusivity with `--baseline`.** Reject `--base X --baseline Y` at parse time with a structured error. `--base` + per-delta `---baseline name` IS allowed (D7) — that's the "I have a saved baseline for `files` but want fresh refs for the others" escape hatch. | +| D7 | **Per-delta override interaction.** `--base ` populates all 3 deltas from the worktree by default. `---baseline ` on top overrides ONE delta to use the saved baseline; the other two still use the worktree. Mirrors how `--baseline ` + per-delta overrides compose today (`resolveAuditBaselines` is the shared composer). | +| D8 | **Worktree cleanup.** `git worktree remove --force ` in a `finally`. On crash mid-audit: stale worktrees accumulate in `.codemap/audit-cache/` — the LRU eviction (D2) sweeps them eventually. Optional `codemap audit --prune-cache` verb defer to v1.x+ once we see real-world stale-cache rates. | +| D9 | **Index prelude on the worktree.** Always run `runCodemapIndex({mode: "full"})` against the temp DB on cache miss — the worktree's tree has its own changed-set we can't reconstruct from the live `.codemap.db`. CLI `--no-index` controls the **head-side** prelude (existing flag, unchanged); the worktree-side reindex is non-optional because there's no prior index to be incremental against. | +| D10 | **`base.ref` field is the user-supplied string** (`origin/main`), `base.sha` is the resolved sha. Both surface in the envelope so CI logs can echo what the user asked for AND what it resolved to. Mirrors how baselines record `git_ref` today. | + +## Tracers + +| # | Slice | Acceptance | +| --- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| 1 | `application/audit-engine.ts` extends with `runAuditFromRef({db, ref, perDeltaOverrides, root})` — pure function: takes a ref string, materialises the worktree (via new `application/audit-worktree.ts` helper), runs deltas, returns the same `AuditEnvelope` shape with `base.source: "ref"`. Unit tests against a tmp git repo fixture (init + commit + run audit against `HEAD~1`). | Returns 3 deltas; `base.ref` matches input; `base.sha` is the resolved sha; cache hit on second run against same sha. | +| 2 | `cmd-audit.ts` parser gains `--base `. Parser rejects `--base + --baseline` combo with a structured error. Per-delta `---baseline` still composes. Wired through `runAuditCmd` to dispatch to `runAuditFromRef` when `opts.base` is set. | `bun src/index.ts audit --base HEAD~1 --json` returns the envelope on this repo's history; error cases give clean messages. | +| 3 | MCP `audit` tool args schema gains `base?: z.string()`. `handleAudit` dispatches to the new ref path when set. Mutual-exclusion guard mirrors the CLI parser. | MCP integration test runs `audit` with `base: "HEAD~1"` against a fixture repo, confirms envelope shape + `base.source = "ref"`. | +| 4 | HTTP `POST /tool/audit` auto-wired via the existing `dispatchTool` switch arm — Zod validation on the new `base` arg lights up automatically. | HTTP integration test: POST with `{base: "HEAD~1"}` returns 200 + envelope; non-git project → 400 with the clean error message. | +| 5 | Docs sync — README (new audit example), `docs/architecture.md` § Audit wiring extended (worktree + cache section), `docs/glossary.md` (`audit --base` entry), `.agents/rules/codemap.md` + `templates/agents/rules/codemap.md` (Rule 10 lockstep — new table row + paragraph), `.agents/skills/codemap/SKILL.md` + templates (audit tool description gets `base` field), changeset (minor). Delete this plan file. Update `docs/roadmap.md` to remove the v1.x backlog entry. Add `.codemap/audit-cache/` to the auto-`.gitignore` list in `agents-init.ts` (mirrors how `.codemap.*` is handled today). | All docs updated; plan deleted. | + +## Performance considerations + +- **Cache miss** (first run against a ref): full reindex on the worktree. ~3 s for codemap (~110 files); ~30 s for a 10k-file repo. One-time cost per ref. +- **Cache hit** (subsequent runs): skip reindex; just open the temp DB, run 3 SQL queries, diff. Sub-100ms. +- **Worktree size**: `git worktree add` is essentially free (shares git objects via the `.git/worktrees/` linkfile). Only the working-tree files are duplicated; `.git/` itself is not copied. +- **Disk pressure**: 5-entry LRU × ~repo working-tree size. For a 10 MB working tree → 50 MB cache ceiling. Configurable via env var if needed (`CODEMAP_AUDIT_CACHE_SIZE` — defer to demand). +- **Concurrent audits**: each acquires its own worktree dir (sha-keyed); no lock needed. SQLite WAL mode handles the read concurrency on the head DB. + +## Alternatives considered (and rejected for now) + +| Candidate | Why not | +| ------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| In-place `git checkout ` then reindex then checkout back | Mutates the user's working tree, breaks `watch` if running, disrupts editor open buffers, and any uncommitted work has to be stashed. Worktree avoids all of that. | +| Read git objects directly via `git cat-file --batch` (no working tree) | Possible but re-implements `git checkout`'s file-extraction logic. Worktree IS git's official "compare against another commit without disturbing the main tree" primitive. | +| Per-file diff against the ref (no full reindex on the worktree) | Would need a way to reconstruct the index incrementally from a tree object. Codemap's `getChangedFiles` does the inverse (current vs `last_indexed_commit`) — running it backwards is non-trivial and produces a less reliable result than just reindexing. | +| Save baselines automatically per ref | We have `--save-baseline` for that. `--base` is for **ad-hoc** comparisons where the user doesn't want to litter `query_baselines` with throwaway snapshots. | +| Use `git diff --raw -- ` to get the file change set, then derive deltas without reindexing | `dependencies` + `deprecated` deltas need parsed-symbol facts, not just file paths. No way to get those without running the parser on the ref's content. Files-only delta could shortcut this, but the asymmetry (one fast, two slow) is worse UX than uniform "all 3 take a worktree." | + +## Out of scope + +- **`--base` for non-audit commands** (e.g. `codemap query --base ""`). Would need a generic ref-snapshot facility; defer until 2 consumers ask. +- **Auto-baseline-save on first ref audit** (cache the worktree's row data into `query_baselines` for cross-tool reuse). Conflates two distinct lifecycles (baselines are durable; ref caches are LRU-evictable). +- **Ref-vs-ref audit** (`--base origin/main --head v1.0.0`). v1 always compares the current working tree to one ref. Two-ref audit is a v1.x+ if asked. +- **Verdict / threshold integration** (`audit.deltas[].{added_max, action}`). Already on the v1.x backlog as a separate item; orthogonal to this plan. +- **Network refs** (`--base https://github.com/foo/bar@main`). The user's local clone has to know the ref already. Fetching is `git fetch`'s job. +- **Worktree cache config** (size limit, TTL, eviction policy). Defer to env var + sane defaults; only build a config surface if benchmarks demand it. diff --git a/docs/roadmap.md b/docs/roadmap.md index fa16986..ec27bed 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -35,7 +35,7 @@ Codemap stays a structural-index primitive that other tools can consume. Out of ## Backlog -- [ ] **`codemap audit --base `** (v1.x) — worktree+reindex snapshot strategy. v1 shipped `--baseline ` / `---baseline ` (B.6 reuse) — see [`architecture.md` § Audit wiring](./architecture.md#cli-usage). v1.x adds `--base ` for "audit against an arbitrary ref I haven't pre-baselined" (defers worktree spawn + cache decision until a real consumer asks). +- [ ] **`codemap audit --base `** (v1.x) — worktree+reindex snapshot strategy. v1 shipped `--baseline ` / `---baseline ` (B.6 reuse) — see [`architecture.md` § Audit wiring](./architecture.md#cli-usage). v1.x adds `--base ` for "audit against an arbitrary ref I haven't pre-baselined." Plan: [`plans/audit-base.md`](./plans/audit-base.md). - [ ] **`codemap audit` verdict + thresholds** (v1.x) — `verdict: "pass" | "warn" | "fail"` driven by `codemap.config.audit.deltas[].{added_max, action}`. Triggers: two consumers ship `jq`-based threshold scripts with similar shapes, OR one consumer asks with a concrete config sketch. Until then, raw deltas + consumer-side `jq` is the CI exit-code idiom. - [ ] **Monorepo / workspace awareness** — discover workspaces from `pnpm-workspace.yaml` / `package.json` and index per-workspace dependency graphs - [ ] **Cross-agent handoff artifact** — _speculative_; layered prefix/delta JSON written on session-stop, read on session-start. Complementary to indexing rather than core to it; revisit if user demand emerges From 9c92b055a837b19ca2e6860a5c2dbb7ea5c78769 Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Sun, 3 May 2026 16:36:13 +0300 Subject: [PATCH 2/2] =?UTF-8?q?docs(plan):=20address=20CodeRabbit=20findin?= =?UTF-8?q?gs=20=E2=80=94=20atomic=20cache=20populate=20(D11),=20worktree-?= =?UTF-8?q?as-cache=20lifecycle=20clarity,=20TS=20type=20widening=20callou?= =?UTF-8?q?t?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - D1/D2/D8 rewritten: worktree IS the cache entry (kept until LRU evicts); cleanup runs only on reindex failure rollback OR LRU eviction. The earlier ambiguity (D2 said 'cache by sha' while D8 said 'remove in finally') is resolved. - D11 added: atomic cache populate via per-pid temp dir + POSIX rename → free single-flight semantics; no lock files needed. Same pattern for eviction. Closes the race CodeRabbit flagged on concurrent CI matrix runs against the same sha. - AuditBase TS type widening to discriminated union called out explicitly above the Decisions table (Tracer 1 ships it). - CODEMAP_AUDIT_CACHE_SIZE env var mention dropped — was promising an unimplemented config knob; v1 hardcodes the limits, defer to v1.x+. --- docs/plans/audit-base.md | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/docs/plans/audit-base.md b/docs/plans/audit-base.md index 5b65ac1..4b5f013 100644 --- a/docs/plans/audit-base.md +++ b/docs/plans/audit-base.md @@ -76,20 +76,23 @@ Identical to today's `--baseline` shape. Only the per-delta `base` metadata chan Per-delta `base.source` already exists as a discriminator (`"baseline"` today). Adding `"ref"` is a backwards-compatible enum extension; consumers that switch on `base.source` get a clean miss instead of a silent failure. +**TS type change required.** `AuditBase` (in `src/application/audit-engine.ts`) becomes a discriminated union: the existing `{source: "baseline", name, sha, indexed_at}` shape stays, plus a new `{source: "ref", ref, sha, indexed_at}` arm. Tracer 1 ships the type widening; downstream consumers that narrow on `base.source` keep compiling because the discriminator is exhaustive. + ## Decisions -| # | Decision | -| --- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| D1 | **Worktree + reindex strategy.** Use `git worktree add /tmp/codemap-base- ` to materialise the ref into a sibling directory, then run the existing `runCodemapIndex` against it with a temp `.codemap.db`. Run each delta's canonical SQL on the temp DB to get the baseline rows. Tear down the worktree + temp DB on exit. **NOT** in-place `git checkout` — that would mutate the user's working tree, break `watch` if running, and disrupt open editors. | -| D2 | **Cache the temp index by sha.** Worktrees go to `/.codemap/audit-cache//` (project-local, gitignored alongside `.codemap.db`). Cache hit on second run against same `` (resolved sha) → skip the reindex (~100x faster on a 100k-symbol repo). Eviction: LRU after 5 cache entries OR 500 MiB total — defer config knobs to v1.x+. Cache key is the **resolved sha**, not the ref string, so `--base origin/main` and `--base abc123` (where `abc123` is `origin/main`'s tip) share one entry. | -| D3 | **Ref resolution upfront.** `git rev-parse --verify "${ref}^{commit}"` runs first — if it fails, return `{error: "codemap audit: --base: cannot resolve \"\" to a commit"}` before touching the worktree. Same shape `getFilesChangedSince` already uses (`git-changed.ts:23`). | -| D4 | **Non-git projects.** Hard error: `{error: "codemap audit: --base requires a git repository"}`. No graceful fallback — there's no meaningful "ref" without git. Detected via `existsSync(join(root, ".git"))` (cheap, runs before any spawn). | -| D5 | **Dirty working tree.** Audit current state regardless. The whole point is "compare current (potentially uncommitted) work to ." NO check / warning — symmetric with how `getFilesChangedSince` already handles `git status` rows alongside `git diff` output. | -| D6 | **Mutual exclusivity with `--baseline`.** Reject `--base X --baseline Y` at parse time with a structured error. `--base` + per-delta `---baseline name` IS allowed (D7) — that's the "I have a saved baseline for `files` but want fresh refs for the others" escape hatch. | -| D7 | **Per-delta override interaction.** `--base ` populates all 3 deltas from the worktree by default. `---baseline ` on top overrides ONE delta to use the saved baseline; the other two still use the worktree. Mirrors how `--baseline ` + per-delta overrides compose today (`resolveAuditBaselines` is the shared composer). | -| D8 | **Worktree cleanup.** `git worktree remove --force ` in a `finally`. On crash mid-audit: stale worktrees accumulate in `.codemap/audit-cache/` — the LRU eviction (D2) sweeps them eventually. Optional `codemap audit --prune-cache` verb defer to v1.x+ once we see real-world stale-cache rates. | -| D9 | **Index prelude on the worktree.** Always run `runCodemapIndex({mode: "full"})` against the temp DB on cache miss — the worktree's tree has its own changed-set we can't reconstruct from the live `.codemap.db`. CLI `--no-index` controls the **head-side** prelude (existing flag, unchanged); the worktree-side reindex is non-optional because there's no prior index to be incremental against. | -| D10 | **`base.ref` field is the user-supplied string** (`origin/main`), `base.sha` is the resolved sha. Both surface in the envelope so CI logs can echo what the user asked for AND what it resolved to. Mirrors how baselines record `git_ref` today. | +| # | Decision | +| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| D1 | **Worktree + reindex strategy.** Use `git worktree add /.codemap/audit-cache// ` to materialise the ref alongside the project (NOT in `/tmp` — keeps it on the same filesystem as `.codemap.db` so `git worktree add`'s linkfile resolves and so it auto-falls under the project-local `.gitignore` entry). Run the existing `runCodemapIndex` against it; the indexer writes `.codemap.db` inside the worktree dir. Run each delta's canonical SQL on that DB to get the baseline rows. **NOT** in-place `git checkout` — that would mutate the user's working tree, break `watch` if running, and disrupt open editors. | +| D2 | **Cache IS the worktree dir.** `/.codemap/audit-cache//` is both the materialised tree AND the location of the temp `.codemap.db`. Cache hit on second run against same `` (resolved sha): existence check passes → just open the existing `.codemap.db` (no `git worktree add`, no reindex). Cache miss: see D11 (atomic populate). Eviction: LRU after 5 cache entries OR 500 MiB total (hardcoded in v1; no config surface — defer to v1.x+ if real consumers ask). Eviction calls `git worktree remove --force ` then `rm -rf` for safety. Cache key is the **resolved sha**, not the ref string, so `--base origin/main` and `--base abc123` (where `abc123` is `origin/main`'s tip) share one entry. **Worktree never removed on success** — that's the whole point of caching. | +| D3 | **Ref resolution upfront.** `git rev-parse --verify "${ref}^{commit}"` runs first — if it fails, return `{error: "codemap audit: --base: cannot resolve \"\" to a commit"}` before touching the worktree. Same shape `getFilesChangedSince` already uses (`git-changed.ts:23`). | +| D4 | **Non-git projects.** Hard error: `{error: "codemap audit: --base requires a git repository"}`. No graceful fallback — there's no meaningful "ref" without git. Detected via `existsSync(join(root, ".git"))` (cheap, runs before any spawn). | +| D5 | **Dirty working tree.** Audit current state regardless. The whole point is "compare current (potentially uncommitted) work to ." NO check / warning — symmetric with how `getFilesChangedSince` already handles `git status` rows alongside `git diff` output. | +| D6 | **Mutual exclusivity with `--baseline`.** Reject `--base X --baseline Y` at parse time with a structured error. `--base` + per-delta `---baseline name` IS allowed (D7) — that's the "I have a saved baseline for `files` but want fresh refs for the others" escape hatch. | +| D7 | **Per-delta override interaction.** `--base ` populates all 3 deltas from the worktree by default. `---baseline ` on top overrides ONE delta to use the saved baseline; the other two still use the worktree. Mirrors how `--baseline ` + per-delta overrides compose today (`resolveAuditBaselines` is the shared composer). | +| D8 | **Cleanup runs on failure, not success.** Per D2 the worktree IS the cache entry — keeping it is correct. `git worktree remove --force ` runs only on (a) cache-miss reindex throwing midway (rollback so a half-populated dir doesn't poison future cache hits), or (b) LRU eviction. Stale entries from process crashes (SIGKILL between worktree-add and reindex completion) get swept by the next eviction cycle. Optional `codemap audit --prune-cache` verb deferred to v1.x+ once real-world stale rates motivate it. | +| D9 | **Index prelude on the worktree.** Always run `runCodemapIndex({mode: "full"})` against the temp DB on cache miss — the worktree's tree has its own changed-set we can't reconstruct from the live `.codemap.db`. CLI `--no-index` controls the **head-side** prelude (existing flag, unchanged); the worktree-side reindex is non-optional because there's no prior index to be incremental against. | +| D10 | **`base.ref` field is the user-supplied string** (`origin/main`), `base.sha` is the resolved sha. Both surface in the envelope so CI logs can echo what the user asked for AND what it resolved to. Mirrors how baselines record `git_ref` today. | +| D11 | **Atomic cache populate (concurrency safety).** Two `codemap audit --base ` invocations resolving to the same sha must not race. Populate sequence on cache miss: (a) `mkdir -p .codemap/audit-cache/.tmp..` (per-pid temp dir, never the final path); (b) `git worktree add .codemap/audit-cache/.tmp.. ` then reindex into it; (c) `rename(.tmp.., )` — POSIX `rename` is atomic for same-filesystem moves and fails cleanly if `/` already exists (lost the race; remove the `.tmp` dir + reuse the winner's cache entry). Readers (cache-hit path) treat `/.codemap.db`'s existence as proof the entry is complete because the rename only happens after the reindex finishes. No lock files needed — POSIX `rename` semantics give us single-flight for free. Same pattern applies to LRU eviction: rename the victim dir to `.tmp.evict.` first, then `rm -rf` and `git worktree remove`. | ## Tracers @@ -106,8 +109,8 @@ Per-delta `base.source` already exists as a discriminator (`"baseline"` today). - **Cache miss** (first run against a ref): full reindex on the worktree. ~3 s for codemap (~110 files); ~30 s for a 10k-file repo. One-time cost per ref. - **Cache hit** (subsequent runs): skip reindex; just open the temp DB, run 3 SQL queries, diff. Sub-100ms. - **Worktree size**: `git worktree add` is essentially free (shares git objects via the `.git/worktrees/` linkfile). Only the working-tree files are duplicated; `.git/` itself is not copied. -- **Disk pressure**: 5-entry LRU × ~repo working-tree size. For a 10 MB working tree → 50 MB cache ceiling. Configurable via env var if needed (`CODEMAP_AUDIT_CACHE_SIZE` — defer to demand). -- **Concurrent audits**: each acquires its own worktree dir (sha-keyed); no lock needed. SQLite WAL mode handles the read concurrency on the head DB. +- **Disk pressure**: 5-entry LRU × ~repo working-tree size. For a 10 MB working tree → 50 MB cache ceiling. No config surface in v1; defer to v1.x+ if real consumers ask. +- **Concurrent audits**: safe via the atomic populate pattern (D11) — POSIX `rename` gives single-flight semantics without lock files. SQLite WAL mode handles read concurrency on the head DB. ## Alternatives considered (and rejected for now)