feat(query): --save-baseline / --baseline (B.6) — snapshots in .codemap.db by SutuSebastian · Pull Request #30 · stainless-code/codemap

SutuSebastian · 2026-05-01T07:29:35Z

Summary

Implements B.6 from docs/research/fallow.md § Tier B: snapshot a query result set, refactor, then diff. Four new flags on codemap query:

Flag	What it does
`--save-baseline[=<name>]`	Snapshot the rows. Name defaults to `--recipe` id; ad-hoc SQL needs `=<name>`. Re-saving overwrites in place.
`--baseline[=<name>]`	Diff current rows vs the saved snapshot. Output: `{baseline:{...}, current_row_count, added: [...], removed: [...]}`.
`--baselines`	List saved baselines (no `rows_json` payload).
`--drop-baseline <name>`	Delete one.

Storage decision: DB, not files

Snapshots live in a new query_baselines table inside .codemap.db rather than .codemap/baselines/<recipe>.json. Driven by the brainstorm in this conversation and the SQL-index thesis:

Axis	DB table (this PR)	JSON files (rejected)
Thesis fit	One file, one query surface	Parallel artifact, dilutes the pitch
Gitignore	`.codemap.*` already covers it	Need new entry / new dir
Cross-baseline queries	SQL JOIN	N file reads + glue
Atomicity	Single SQLite transaction	fs temp-file dance
Format versioning	Schema bump (already a primitive)	Hand-rolled `version` field per file
Discoverability	`SELECT * FROM query_baselines` works on day one	New CLI subcommand to enumerate

SCHEMA_VERSION 4 → 5. The new table is intentionally absent from dropAll() so --full and future schema rebuilds preserve baselines (only index tables get dropped).

Composition

With	Behaviour
`--summary`	Collapses diff to `{baseline:{...}, current_row_count, added: N, removed: N}`
`--changed-since <ref>`	Pre-filters current rows before the diff (PR-scoped delta against the saved snapshot)
`--recipe` + recipe `actions`	Actions attach to `added` rows only — the rows the agent should act on
`--group-by`	Mutually exclusive — different output shape

Diff identity

Per-row JSON.stringify equality. No fuzzy "changed" category in v1 (avoids the row-key heuristic; agents can re-derive richer diffs from the raw rows).

Test plan

bun run check passes (build, format:check, lint:ci, test, typecheck, test:golden — all 19 golden scenarios green; 4 new parser tests + 1 new db round-trip test).
End-to-end smoked against this clone:
- Save default-name (--save-baseline -r fan-out) → {"saved":"fan-out","row_count":10,…}
- List → shows the saved baseline with metadata
- Diff against unchanged tree → {added:[],removed:[]}
- Contrived diff (SELECT … LIMIT 5 saved → LIMIT 7 baseline'd) → 2 added rows
- --summary diff → {added:2,removed:0}
- Drop → list shows only the remaining baseline
- bun:sqlite null vs better-sqlite3 undefined coercion handled (caught in the db test).
Schema bump documented in architecture.md § query_baselines + glossary entry + Schema Versioning.
Per Rule 10, rule + skill updated in lockstep across .agents/ and templates/agents/.
Minor changeset (schema bump per .agents/lessons.md).
CI green.

Summary by CodeRabbit

Release Notes

New Features
- Added query baseline management: save result snapshots with --save-baseline, compare against saved baselines with --baseline to display added/removed rows, list stored baselines with --baselines, and delete baselines with --drop-baseline.
- Baselines persist across --full runs and schema changes.
Tests
- Added comprehensive test coverage for baseline parsing, storage, and diff operations.
Documentation
- Updated CLI reference, architecture guide, and skill documentation with baseline workflows and command examples.

…ap.db Adds the four-flag baseline surface from docs/research/fallow.md B.6: - --save-baseline[=<name>] snapshot result rows (name = recipe id by default) - --baseline[=<name>] diff current result vs saved snapshot - --baselines list saved baselines (no rows_json payload) - --drop-baseline <name> delete one Storage decision: snapshots live in a new `query_baselines` table inside .codemap.db rather than parallel JSON files. Wins over the file-per-baseline sketch: - One on-disk artifact, no new gitignore entries - Atomic writes (single SQLite txn) - Cross-baseline queries are SQL JOINs - No file format design / hand-rolled version field Schema 4 → 5. The new table is intentionally absent from dropAll() so baselines survive `--full` and future SCHEMA_VERSION rebuilds (only index tables get dropped). Future schema changes to query_baselines itself need an in-place migration. Diff identity for v1 = canonical JSON.stringify(row). Output: {baseline:{name, recipe_id, row_count, git_ref, created_at}, current_row_count, added: [...rows], removed: [...rows]} Composes with everything: --summary collapses to {added:N, removed:N}; --changed-since filters before the diff; --baseline + --recipe attach recipe `actions` to the `added` rows only (the rows the agent should act on); --group-by is mutually exclusive with --baseline (different output shape). Tests cover parser shape for all four flags, db round-trip with upsert / get / list / delete + the bun-vs-better-sqlite3 null/undefined coercion. End-to-end smoked: save / list / diff (no change) / contrived diff (5→7 rows) / --summary diff / drop, all under both --recipe and ad-hoc-with-explicit-name modes. Per Rule 10: rule + skill updated in lockstep across .agents/ and templates/agents/. Schema bump justifies a minor changeset per .agents/lessons.md "changesets bump policy (pre-v1)".

changeset-bot · 2026-05-01T07:29:39Z

🦋 Changeset detected

Latest commit: ff6986e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@stainless-code/codemap	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-05-01T07:29:48Z

Warning

Rate limit exceeded

@SutuSebastian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 54 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8497a528-bc85-4473-a882-631d61e8218c

📥 Commits

Reviewing files that changed from the base of the PR and between 64e3e98 and ff6986e.

📒 Files selected for processing (8)

.agents/lessons.md
.agents/skills/codemap/SKILL.md
README.md
docs/architecture.md
src/cli/cmd-query.test.ts
src/cli/cmd-query.ts
src/db.test.ts
templates/agents/skills/codemap/SKILL.md

📝 Walkthrough

Walkthrough

Introduces query baseline functionality to Codemap's query CLI command, enabling users to snapshot current query results to a persisted .codemap.db table, compare subsequent runs against saved baselines via JSON-stringified row identity, and manage baselines via list/delete operations. Schema version incremented to 5.

Changes

Cohort / File(s)	Summary
Documentation Updates `.agents/rules/codemap.md`, `.agents/skills/codemap/SKILL.md`, `docs/architecture.md`, `docs/glossary.md`, `templates/agents/rules/codemap.md`, `templates/agents/skills/codemap/SKILL.md`	Extended CLI reference with baseline workflow (`--save-baseline`, `--baseline`, `--baselines`, `--drop-baseline`) and diffing mechanics (per-row JSON equality, added/removed categorization). Updated recipe `actions` behavior to attach only to `added` rows under `--baseline`. Schema version bump to 5 and new `query_baselines` table documentation.
Changelog & README `.changeset/query-baselines.md`, `README.md`	Standard changeset entry documenting minor version bump with baseline feature and CLI example additions to project README.
Database Layer `src/db.ts`, `src/db.test.ts`	Schema increment to v5; new `query_baselines` table with metadata (name, recipe_id, sql, row_count, git_ref, created_at) and canonical `rows_json` snapshot. New CRUD exports: `upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`. Comprehensive lifecycle tests.
CLI Implementation `src/cli/cmd-query.ts`, `src/cli/cmd-query.test.ts`, `src/cli/main.ts`	Parser augmented to recognize baseline flags with validation (mutual exclusivity, mandatory naming for ad-hoc SQL). New command kinds (`listBaselines`, `dropBaseline`) and run-object fields (`saveBaseline`, `baseline`). `runQueryCmd` branches to baseline operations (snapshot persistence, diffing with set-membership identity, summary/full output modes). New exported handlers `runListBaselinesCmd` and `runDropBaselineCmd`. Extensive test coverage for parsing and error scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI Parser
    participant runQueryCmd
    participant Database
    
    rect rgba(100, 150, 200, 0.5)
    Note over User,Database: Save Baseline Workflow
    User->>CLI Parser: codemap query --save-baseline[=name] -r recipe
    CLI Parser->>runQueryCmd: { kind: 'run', saveBaseline: true|string, ... }
    runQueryCmd->>Database: Execute query for recipe
    Database-->>runQueryCmd: Current result rows
    runQueryCmd->>Database: upsertQueryBaseline(name, sql, rows_json, ...)
    Database-->>runQueryCmd: Baseline stored
    runQueryCmd-->>User: Snapshot confirmed
    end
    
    rect rgba(150, 100, 200, 0.5)
    Note over User,Database: Baseline Diff Workflow
    User->>CLI Parser: codemap query --baseline[=name] -r recipe
    CLI Parser->>runQueryCmd: { kind: 'run', baseline: true|string, ... }
    runQueryCmd->>Database: Execute query for recipe
    Database-->>runQueryCmd: Current result rows
    runQueryCmd->>Database: getQueryBaseline(name)
    Database-->>runQueryCmd: Saved baseline snapshot
    runQueryCmd->>runQueryCmd: Compute diff (JSON.stringify set membership)
    runQueryCmd-->>User: { added: [...], removed: [...] }
    end
    
    rect rgba(200, 150, 100, 0.5)
    Note over User,Database: Baseline Management
    User->>CLI Parser: codemap query --baselines
    CLI Parser->>runQueryCmd: { kind: 'listBaselines', ... }
    runQueryCmd->>Database: listQueryBaselines()
    Database-->>runQueryCmd: Metadata list
    runQueryCmd-->>User: Baselines (name, recipe_id, row_count, created_at)
    
    User->>CLI Parser: codemap query --drop-baseline name
    CLI Parser->>runQueryCmd: { kind: 'dropBaseline', name, ... }
    runQueryCmd->>Database: deleteQueryBaseline(name)
    Database-->>runQueryCmd: boolean (success)
    runQueryCmd-->>User: Deletion confirmed
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: query JSON/recipes, golden & benchmark tooling, docs hub #8: Baseline implementation directly extends the query CLI wiring (parseQueryRest/runQueryCmd) and DB schema that this PR introduced.
feat(cli): codemap validate / context / --performance, four new query recipes, friendlier no-DB error #23: Both PRs modify query command parsing logic and test coverage in src/cli/cmd-query.* files.
feat(query): Tier A flags (--summary, --changed-since, --group-by) + per-row recipe actions #26: Both PRs adjust per-row recipe actions behavior; PR #26 adds the feature while this PR constrains it to attach only to added rows under --baseline diffs.

Suggested labels

enhancement, documentation

Poem

🐰 A baseline is born in .db's deep store,
Snapshots of queries from lore to lore,
Added and removed in diffing delight,
JSON strings compare both day and night,
Persist through the chaos, survive schema's might! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main feature addition: baseline snapshot and diff functionality for the query command with storage in .codemap.db.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/query-baselines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 45 minutes and 54 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Two lessons appended after auditing this PR against .agents/rules/: - Backticks inside SQL/help-text template literals — hit twice now (B.7 schema comment + B.6 help text). The cmd-query.ts help string and db.ts CREATE TABLE strings are both `template literals`; a Markdown-style `--flag` code-fence inside terminates the literal early and TypeScript explodes several lines later with a cryptic "expected `,` or `)`". Lesson: use plain prose in those strings, or escape with \\\`. - STOP-before-Grep applies to symbol lookups too — used Grep for `printQueryResult`, `getCurrentCommit`, `dropAll` in PR #30 when `SELECT … FROM symbols WHERE name = ?` was the right tool. The codemap rule already covers this; lesson clarifies that "symbol lookup" is the trigger, not "structural question." Also slim two non-earning code comments per concise-comments rule.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cli/cmd-query.ts (1)

284-343: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make baseline mode mutually exclusive with --group-by.

runQueryCmd() handles saveBaseline/baseline before grouped execution, so codemap query --group-by owner --baseline -r fan-out currently returns an ungrouped baseline result and silently drops --group-by.

Suggested guard

   if (saveBaseline !== undefined && baseline !== undefined) {
     return {
       kind: "error",
       message:
         "codemap: --save-baseline and --baseline are mutually exclusive in one run.",
     };
   }
+  if (groupBy !== undefined && (saveBaseline !== undefined || baseline !== undefined)) {
+    return {
+      kind: "error",
+      message:
+        "codemap: --group-by cannot be combined with --save-baseline or --baseline.",
+    };
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cli/cmd-query.ts` around lines 284 - 343, Add a guard so baseline mode
cannot be used with grouped execution: in the same parsing block (the function
handling CLI args, near the existing saveBaseline/baseline checks) check if
groupBy !== undefined and (saveBaseline !== undefined || baseline !== undefined)
and return an error result (kind: "error") with a clear message like "codemap:
--group-by cannot be used with --save-baseline or --baseline." Reference the
existing variables saveBaseline, baseline, groupBy and the run branch that
returns { kind: "run", ... } so the new check runs before that branch is
returned.

🧹 Nitpick comments (1)

src/db.test.ts (1)
125-188: ⚡ Quick win

Add one assertion for the rebuild-survival path.

This test covers CRUD well, but the headline contract of the feature is that query_baselines survives --full / schema rebuilds because dropAll() leaves it behind. Without exercising that path once, a future schema refactor can break the marquee behavior while this suite still passes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db.test.ts` around lines 125 - 188, Add a check that query_baselines
survive a full schema rebuild by invoking dropAll(db) after the initial upserts
and then asserting the baselines still exist via listQueryBaselines(db) and/or
getQueryBaseline(db, "fan-out"); specifically call dropAll(db) (the rebuild path
referenced in the comment) and then
expect(listQueryBaselines(db).map(b=>b.name)).toContain("fan-out") and/or
expect(getQueryBaseline(db, "fan-out")).toBeDefined() before continuing
deletions and closeDb.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/architecture.md`:
- Line 120: Update the docs to reflect that `--baseline --summary` still emits
`baseline` metadata and `current_row_count` in addition to added/removed counts;
edit the paragraph describing `--save-baseline`, `--baseline[=<name>]` and the
`--summary` behavior in the architecture doc (the block referencing
`--baseline[=<name>]` and the claimed output `{added: N, removed: N}`) so it
shows the actual payload shape `{baseline:{...}, current_row_count, added: N,
removed: N}` when `--summary` is used with `--baseline`; keep references to
`runQueryCmd`/`--summary`/`--baseline` to locate the text to change.

In `@README.md`:
- Around line 88-93: The README baseline section is missing three user-visible
behaviors: document that the full JSON diff output (when using --json
--baseline) includes a current_row_count field; state that running ad-hoc SQL in
baseline mode requires specifying a baseline name (e.g., when using --baseline
you must provide the saved name from --save-baseline); and explicitly note that
--group-by cannot be combined with --baseline. Update the examples and prose
around the commands shown (references: codemap query --save-baseline, codemap
query --json --baseline, codemap query --group-by, codemap query --baselines) to
mention these constraints and show a short example of the JSON diff including
current_row_count and an example of specifying a baseline name.

In `@src/cli/cmd-query.ts`:
- Around line 249-281: The listBaselines and dropBaselineName branches (the code
paths that return { kind: "listBaselines", json } and { kind: "dropBaseline",
... }) currently ignore flags like summary, changedSince, and groupBy; update
the guard conditions inside the listBaselines (checking listBaselines) and
dropBaselineName (checking dropBaselineName) branches to also reject if summary
!== undefined || changedSince !== undefined || groupBy !== undefined (in
addition to the existing checks against recipeId, printSqlId, saveBaseline,
baseline, i < rest.length, etc.), and return the same kind:"error" pattern with
an appropriate message so commands like "codemap query --summary --baselines"
fail instead of silently ignoring those flags.
- Around line 840-854: The diffRows implementation collapses duplicates by using
Set(JSON.stringify(row)), causing incorrect diffs for multisets; update diffRows
to perform multiset diffing by using frequency maps keyed by JSON.stringify
within the function (e.g., build baseCounts and curCounts maps), decrement
counts when matching, then reconstruct added as entries in current whose count
in baseCounts is exhausted and removed as entries in baseline whose count in
curCounts is exhausted; retain the same return shape ({ added, removed }) and
reference the diffRows function name so the change is localized.

In `@templates/agents/skills/codemap/SKILL.md`:
- Around line 222-230: Update the schema table entry for row_count in SKILL.md
so it correctly describes that row_count is the cached number of saved rows
(i.e., the count of entries represented by rows_json), not the character length
of rows_json; locate the table in templates/agents/skills/codemap/SKILL.md (the
row with "row_count | INTEGER") and change its Description to something like
"Cached number of saved rows" and mirror the exact same wording in
.agents/skills/codemap/SKILL.md.
- Around line 41-47: Update the SKILL.md flags list to explicitly state that
baseline mode cannot be combined with --group-by: find the section describing
"--baseline[=<name>]" and "--group-by owner|directory|package" and add a short
note such as "Note: --baseline (and --save-baseline) cannot be used together
with --group-by; these flags are mutually exclusive" so agents/clients won't
synthesize the invalid combined command; ensure the note appears adjacent to
both flag descriptions so readers of either entry see the restriction.

---

Outside diff comments:
In `@src/cli/cmd-query.ts`:
- Around line 284-343: Add a guard so baseline mode cannot be used with grouped
execution: in the same parsing block (the function handling CLI args, near the
existing saveBaseline/baseline checks) check if groupBy !== undefined and
(saveBaseline !== undefined || baseline !== undefined) and return an error
result (kind: "error") with a clear message like "codemap: --group-by cannot be
used with --save-baseline or --baseline." Reference the existing variables
saveBaseline, baseline, groupBy and the run branch that returns { kind: "run",
... } so the new check runs before that branch is returned.

---

Nitpick comments:
In `@src/db.test.ts`:
- Around line 125-188: Add a check that query_baselines survive a full schema
rebuild by invoking dropAll(db) after the initial upserts and then asserting the
baselines still exist via listQueryBaselines(db) and/or getQueryBaseline(db,
"fan-out"); specifically call dropAll(db) (the rebuild path referenced in the
comment) and then
expect(listQueryBaselines(db).map(b=>b.name)).toContain("fan-out") and/or
expect(getQueryBaseline(db, "fan-out")).toBeDefined() before continuing
deletions and closeDb.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d95033e4-26a5-4424-bdff-19220265e20e

📥 Commits

Reviewing files that changed from the base of the PR and between 09c6370 and 64e3e98.

📒 Files selected for processing (13)

.agents/rules/codemap.md
.agents/skills/codemap/SKILL.md
.changeset/query-baselines.md
README.md
docs/architecture.md
docs/glossary.md
src/cli/cmd-query.test.ts
src/cli/cmd-query.ts
src/cli/main.ts
src/db.test.ts
src/db.ts
templates/agents/rules/codemap.md
templates/agents/skills/codemap/SKILL.md

…guards, doc payloads) Six actionable + one nitpick (one Major), all verified correct. Code: - diffRows: switch from naive Set to multiset frequency-map diff. Naive Set([A,A]) vs Set([A]) reported no removal — wrong for non-DISTINCT queries (e.g. `SELECT name FROM symbols`). Now baseline [A,A] vs current [A] correctly reports removed: [A]. - Parser: --group-by + --save-baseline / --baseline now errors at parse time. Previously runQueryCmd routed to the baseline branch first and silently dropped --group-by. - Parser: --baselines and --drop-baseline now reject --summary, --changed-since, and --group-by (in addition to the existing recipe / SQL / save / baseline checks). Was silently accepted-and-ignored. Docs (synced across architecture.md, README.md, AND both copies of SKILL.md per Rule 10): - --baseline --summary payload corrected: includes baseline + current_row_count alongside added/removed counts (was documented as just {added: N, removed: N}). - README baseline section calls out current_row_count, ad-hoc-needs-name, --group-by mutex. - SKILL.md row_count description: "Cached length of rows_json" was ambiguous (could mean character length); now "Cached number of rows in the saved result set." - SKILL.md --group-by description: "Mutually exclusive with --save-baseline / --baseline." Mirrored on the --baseline side too. - rows_json description: "multiset diff identity (duplicate rows preserved)" instead of "set-diff identity = per-row JSON-stringify equality." Tests: - New diffRows multiset suite (6 cases including 3-of-3 duplicates and per-key independence). - New parser tests: --group-by + baseline mutex, --baselines / --drop-baseline no-op-flag rejection. - New db round-trip test: query_baselines survives dropAll() — the schema-rebuild contract that's the marquee of B.6. Export diffRows so it can be unit-tested in isolation; runtime callers already use it through the same module.

SutuSebastian · 2026-05-01T07:44:26Z

Two CodeRabbit "outside diff range" findings also addressed in ff6986e:

src/cli/cmd-query.ts L284-343 (Major) — --group-by + --baseline mutex. Real bug: runQueryCmd checked saveBaseline / baseline first and routed to that branch, silently dropping --group-by. Now guarded at parse time:
```
$ codemap query --json --group-by directory --baseline -r fan-out
codemap: --group-by cannot be combined with --save-baseline or --baseline (different output shapes).
```
src/db.test.ts L125-188 (nitpick) — exercise the dropAll() survival path. Headline contract of B.6 is that baselines survive --full and SCHEMA rebuilds. Added a dedicated test that calls dropAll(db); createTables(db); and asserts the baseline still exists, so a future schema refactor can't silently break it.

…chitecture skills (#32) Two unrelated docs changes batched: ## 1. Plan: `codemap audit --base <ref>` (B.5) Per `docs/README.md` Rule 3 (plans live in `plans/<feature-name>.md`, link from `roadmap.md`), drafts the design for **B.5** before writing any code. The research note explicitly calls this "the single highest-leverage candidate this refresh." | Decision | v1 | | --- | --- | | **Snapshot strategy** | Temp worktree + full reindex under `.codemap.audit-<sha>/` (gitignored by the existing `.codemap.*` glob). Defers caching / perf-tuning until a real consumer hits the wall. | | **Built-in deltas** | `files`, `dependencies`, `deprecated`, `visibility`, `barrels`, `hot_files`. Each wraps an existing recipe — no new analysis layer. | | **Verdict** | `pass` / `warn` / `fail` with thresholds **opt-in via `codemap.config.audit`**. v1 emits raw deltas only (default `pass`). | | **Exit codes** | `0` / `1` / `2` — mirrors `git diff --exit-code`. | | **Composition** | `--json` / `--summary` work; `--changed-since` / `--group-by` / `--save-baseline` / `--baseline` are mutex (different shapes / semantics). | | **Tracer-bullet sequence** | 7 commits: scaffold → worktree → first delta → remaining deltas → threshold config → docs+agents (Rule 10) → changeset. | Both prerequisites just merged on `main`: B.6 (PR #30) proves the snapshot-in-DB primitive; B.7 (PR #28) provides the `symbols.visibility` column the `visibility` delta needs. ## 2. Adopt two Tier 3 skills from [`mattpocock/skills`](https://github.com/mattpocock/skills) Sourced after evaluating three skills mid-thread; the two adopted ones earn their always-zero-cost slot: | Skill | What | | --- | --- | | **`grill-me`** | 8-line interview-pattern skill. Walk a design tree branch by branch, recommend an answer per question, ask one at a time. Filled the gap visible in commit 1's plan: I made many decisions by myself; `grill-me` would have surfaced them for second opinion before they crystallised. | | **`improve-codebase-architecture`** | Ousterhout-style deepening vocabulary (`module / interface / seam / adapter / depth / leverage / locality`), the deletion test, "one adapter = hypothetical seam, two = real," dependency categories (`DEEPENING.md`), and parallel-sub-agent "Design It Twice" interface exploration (`INTERFACE-DESIGN.md`). | Both are maintainer-only (under `.agents/skills/` + `.cursor/skills/` symlinks per `agents-first-convention`). **Not added to `templates/agents/`** — same precedent as PR #25 (consumer surface ships only the codemap rule + skill). ### Translation notes `improve-codebase-architecture/SKILL.md` adapted at three points to fit codemap's docs framework (the upstream version assumes `CONTEXT.md` + `docs/adr/`; we have neither): - `CONTEXT.md` references → `docs/glossary.md` (Rule 9 already enforces glossary updates per PR). - `docs/adr/` references → `docs/plans/<topic>.md` (Rule 3 — but plans are mortal; decisions of record lift to `architecture.md` per Rule 2 then the plan is deleted). - "Offer ADR on rejection" step → dropped. Codemap doesn't keep decision records; the closest is "lift to architecture.md." Companion files (`LANGUAGE.md`, `DEEPENING.md`, `INTERFACE-DESIGN.md`) ship **verbatim** — none reference `CONTEXT.md` or ADRs. `grill-me/SKILL.md` extended with two short codemap-specific notes: prefer `codemap` over `Grep` when exploring (per the `codemap` rule), and write crystallised answers into the in-flight `docs/plans/<name>.md` inline (Rule 3). ### Skipped - **`grill-with-docs`** (the third skill in the upstream "grill" family) — requires standing up CONTEXT.md / `docs/adr/` infrastructure that conflicts with the lift-to-architecture-then-delete-the-plan lifecycle codemap already runs. The salvageable ADR 3-criteria gate is recorded in this conversation; lift if codemap ever needs ADRs. ### Tier 3 list updated `.agents/rules/agents-tier-system.md` Tier 3 list extended with both new skills, and the previously-missing `docs-governance` + `docs-lifecycle-sweep` entries from PR #25. ## Test plan - [x] `bun run check` green (no behavior changed; pure docs + skills). - [x] All cross-references resolve (plan → research → architecture / lessons; skill files → glossary.md / architecture.md / codemap rule / each other). - [x] `.cursor/skills/{grill-me,improve-codebase-architecture}` symlinks resolve. - [x] Plan calls itself out as **Plan** type per `docs/README.md § Document Lifecycle` — delete on ship, lift to `architecture.md`. - [ ] CI green.

) * docs(research): refresh fallow.md + scan against current ship state fallow.md gains a "Status snapshot (as of 2026-05-01)" section that tabulates every adoption candidate's ship status — single source of truth for "what's open" without munging the original tier tables. Captures: - Tier A all shipped (PR #26) - B.5 partial (v1 in PR #33; --base <ref> + verdict deferred to v1.x) - B.6 shipped (PR #30) — table-in-DB, not parallel JSON files - B.7 shipped (PR #28) — landed on `symbols`, not `exports` - B.8 / C.9 / C.10 / C.11 / D.* still as-was - MCP server (agent-transports v1) shipped in PR #35 (adjacent — not a numbered fallow candidate but worth surfacing here) § 6 open questions: marks the 2 settled ones (actions ownership, audit verdict default) with their resolution PRs; preserves the 2 still-open ones (coverage column shape, plugin layer scope). § 3 already-shipped block: updates the visibility-tags note to acknowledge B.7 promoted it from regex to structured column instead of saying "B.7 proposes promoting" (which it doesn't anymore). competitive-scan-2026-04.md § 4: marks MCP server wrapping `query` as ✅ shipped via PR #35 with a cross-link to fallow.md's status snapshot. Other items still tracked there. No behavior change; pure docs refresh to match current reality. * docs(research): fix MD056 — D row in fallow.md status snapshot was 4 cells, header was 5 CodeRabbit caught: status-snapshot table header has 5 columns (Tier / # / Item / Status / Where) but the D.12-D.16 row only had 4 (collapsed Status + Where into one cell). Markdown parses that as a malformed table; renderers either drop the row or misalign neighbouring rows. Added the missing 5th cell pointing back at § 1's Defer / skip table for the per-row reasoning. * docs(research): align B.7 row title to shipped column name (symbols.visibility) CodeRabbit caught: Tier B table B.7 row title still said 'exports.visibility column' despite the body hedging '(or symbols)' AND the shipped column landing on symbols. Status snapshot row at L22 already says symbols. Updated the title to match shipped reality + added an explicit nod to the original hedge so the historical-record property survives.

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread docs/architecture.md Outdated

Comment thread README.md

Comment thread src/cli/cmd-query.ts

Comment thread src/cli/cmd-query.ts Outdated

Comment thread templates/agents/skills/codemap/SKILL.md

Comment thread templates/agents/skills/codemap/SKILL.md

SutuSebastian merged commit a309d52 into main May 1, 2026
9 checks passed

SutuSebastian deleted the feat/query-baselines branch May 1, 2026 07:47

github-actions Bot mentioned this pull request May 1, 2026

chore: version packages #31

Open

SutuSebastian mentioned this pull request May 1, 2026

docs: codemap-audit (B.5) plan + adopt grill-me & improve-codebase-architecture skills #32

Merged

5 tasks

SutuSebastian mentioned this pull request May 1, 2026

docs(plans): draft agent-transports (MCP server v1 + HTTP API v1.x) #34

Merged

5 tasks

coderabbitai Bot mentioned this pull request May 1, 2026

feat: codemap audit v1 (B.5) — structural-drift command with per-delta baselines #33

Merged

7 tasks

This was referenced May 1, 2026

feat(mcp): codemap mcp — agent-transports v1 (Tracer 1 of 7) #35

Merged

docs(research): refresh fallow.md + scan against current ship state #36

Merged

SutuSebastian mentioned this pull request May 1, 2026

feat(recipes): recipes-as-content registry — bundled .md + project-local .codemap/recipes/ #37

Merged

10 tasks

This was referenced May 2, 2026

feat(query): --format sarif | annotations (B.8 — pipe rows into GitHub Code Scanning + PR annotations) #43

Merged

feat(serve): codemap serve — HTTP API exposing every MCP tool over POST /tool/{name} #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(query): --save-baseline / --baseline (B.6) — snapshots in .codemap.db#30

feat(query): --save-baseline / --baseline (B.6) — snapshots in .codemap.db#30
SutuSebastian merged 3 commits intomainfrom
feat/query-baselines

SutuSebastian commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

changeset-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SutuSebastian commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SutuSebastian commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Storage decision: DB, not files

Composition

Diff identity

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SutuSebastian commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SutuSebastian commented May 1, 2026 •

edited by coderabbitai Bot

Loading

changeset-bot Bot commented May 1, 2026 •

edited

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading