feat(query): --save-baseline / --baseline (B.6) — snapshots in .codemap.db#30
feat(query): --save-baseline / --baseline (B.6) — snapshots in .codemap.db#30SutuSebastian merged 3 commits intomainfrom
Conversation
…ap.db
Adds the four-flag baseline surface from docs/research/fallow.md B.6:
- --save-baseline[=<name>] snapshot result rows (name = recipe id by default)
- --baseline[=<name>] diff current result vs saved snapshot
- --baselines list saved baselines (no rows_json payload)
- --drop-baseline <name> delete one
Storage decision: snapshots live in a new `query_baselines` table inside
.codemap.db rather than parallel JSON files. Wins over the file-per-baseline
sketch:
- One on-disk artifact, no new gitignore entries
- Atomic writes (single SQLite txn)
- Cross-baseline queries are SQL JOINs
- No file format design / hand-rolled version field
Schema 4 → 5. The new table is intentionally absent from dropAll() so
baselines survive `--full` and future SCHEMA_VERSION rebuilds (only
index tables get dropped). Future schema changes to query_baselines
itself need an in-place migration.
Diff identity for v1 = canonical JSON.stringify(row). Output:
{baseline:{name, recipe_id, row_count, git_ref, created_at},
current_row_count,
added: [...rows],
removed: [...rows]}
Composes with everything: --summary collapses to {added:N, removed:N};
--changed-since filters before the diff; --baseline + --recipe attach
recipe `actions` to the `added` rows only (the rows the agent should
act on); --group-by is mutually exclusive with --baseline (different
output shape).
Tests cover parser shape for all four flags, db round-trip with
upsert / get / list / delete + the bun-vs-better-sqlite3 null/undefined
coercion. End-to-end smoked: save / list / diff (no change) / contrived
diff (5→7 rows) / --summary diff / drop, all under both --recipe and
ad-hoc-with-explicit-name modes.
Per Rule 10: rule + skill updated in lockstep across .agents/ and
templates/agents/.
Schema bump justifies a minor changeset per .agents/lessons.md
"changesets bump policy (pre-v1)".
🦋 Changeset detectedLatest commit: ff6986e The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughIntroduces query baseline functionality to Codemap's Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI Parser
participant runQueryCmd
participant Database
rect rgba(100, 150, 200, 0.5)
Note over User,Database: Save Baseline Workflow
User->>CLI Parser: codemap query --save-baseline[=name] -r recipe
CLI Parser->>runQueryCmd: { kind: 'run', saveBaseline: true|string, ... }
runQueryCmd->>Database: Execute query for recipe
Database-->>runQueryCmd: Current result rows
runQueryCmd->>Database: upsertQueryBaseline(name, sql, rows_json, ...)
Database-->>runQueryCmd: Baseline stored
runQueryCmd-->>User: Snapshot confirmed
end
rect rgba(150, 100, 200, 0.5)
Note over User,Database: Baseline Diff Workflow
User->>CLI Parser: codemap query --baseline[=name] -r recipe
CLI Parser->>runQueryCmd: { kind: 'run', baseline: true|string, ... }
runQueryCmd->>Database: Execute query for recipe
Database-->>runQueryCmd: Current result rows
runQueryCmd->>Database: getQueryBaseline(name)
Database-->>runQueryCmd: Saved baseline snapshot
runQueryCmd->>runQueryCmd: Compute diff (JSON.stringify set membership)
runQueryCmd-->>User: { added: [...], removed: [...] }
end
rect rgba(200, 150, 100, 0.5)
Note over User,Database: Baseline Management
User->>CLI Parser: codemap query --baselines
CLI Parser->>runQueryCmd: { kind: 'listBaselines', ... }
runQueryCmd->>Database: listQueryBaselines()
Database-->>runQueryCmd: Metadata list
runQueryCmd-->>User: Baselines (name, recipe_id, row_count, created_at)
User->>CLI Parser: codemap query --drop-baseline name
CLI Parser->>runQueryCmd: { kind: 'dropBaseline', name, ... }
runQueryCmd->>Database: deleteQueryBaseline(name)
Database-->>runQueryCmd: boolean (success)
runQueryCmd-->>User: Deletion confirmed
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 45 minutes and 54 seconds.Comment |
Two lessons appended after auditing this PR against .agents/rules/: - Backticks inside SQL/help-text template literals — hit twice now (B.7 schema comment + B.6 help text). The cmd-query.ts help string and db.ts CREATE TABLE strings are both `template literals`; a Markdown-style `--flag` code-fence inside terminates the literal early and TypeScript explodes several lines later with a cryptic "expected `,` or `)`". Lesson: use plain prose in those strings, or escape with \\\`. - STOP-before-Grep applies to symbol lookups too — used Grep for `printQueryResult`, `getCurrentCommit`, `dropAll` in PR #30 when `SELECT … FROM symbols WHERE name = ?` was the right tool. The codemap rule already covers this; lesson clarifies that "symbol lookup" is the trigger, not "structural question." Also slim two non-earning code comments per concise-comments rule.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cli/cmd-query.ts (1)
284-343:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake baseline mode mutually exclusive with
--group-by.
runQueryCmd()handlessaveBaseline/baselinebefore grouped execution, socodemap query --group-by owner --baseline -r fan-outcurrently returns an ungrouped baseline result and silently drops--group-by.Suggested guard
if (saveBaseline !== undefined && baseline !== undefined) { return { kind: "error", message: "codemap: --save-baseline and --baseline are mutually exclusive in one run.", }; } + if (groupBy !== undefined && (saveBaseline !== undefined || baseline !== undefined)) { + return { + kind: "error", + message: + "codemap: --group-by cannot be combined with --save-baseline or --baseline.", + }; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/cmd-query.ts` around lines 284 - 343, Add a guard so baseline mode cannot be used with grouped execution: in the same parsing block (the function handling CLI args, near the existing saveBaseline/baseline checks) check if groupBy !== undefined and (saveBaseline !== undefined || baseline !== undefined) and return an error result (kind: "error") with a clear message like "codemap: --group-by cannot be used with --save-baseline or --baseline." Reference the existing variables saveBaseline, baseline, groupBy and the run branch that returns { kind: "run", ... } so the new check runs before that branch is returned.
🧹 Nitpick comments (1)
src/db.test.ts (1)
125-188: ⚡ Quick winAdd one assertion for the rebuild-survival path.
This test covers CRUD well, but the headline contract of the feature is that
query_baselinessurvives--full/ schema rebuilds becausedropAll()leaves it behind. Without exercising that path once, a future schema refactor can break the marquee behavior while this suite still passes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db.test.ts` around lines 125 - 188, Add a check that query_baselines survive a full schema rebuild by invoking dropAll(db) after the initial upserts and then asserting the baselines still exist via listQueryBaselines(db) and/or getQueryBaseline(db, "fan-out"); specifically call dropAll(db) (the rebuild path referenced in the comment) and then expect(listQueryBaselines(db).map(b=>b.name)).toContain("fan-out") and/or expect(getQueryBaseline(db, "fan-out")).toBeDefined() before continuing deletions and closeDb.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/architecture.md`:
- Line 120: Update the docs to reflect that `--baseline --summary` still emits
`baseline` metadata and `current_row_count` in addition to added/removed counts;
edit the paragraph describing `--save-baseline`, `--baseline[=<name>]` and the
`--summary` behavior in the architecture doc (the block referencing
`--baseline[=<name>]` and the claimed output `{added: N, removed: N}`) so it
shows the actual payload shape `{baseline:{...}, current_row_count, added: N,
removed: N}` when `--summary` is used with `--baseline`; keep references to
`runQueryCmd`/`--summary`/`--baseline` to locate the text to change.
In `@README.md`:
- Around line 88-93: The README baseline section is missing three user-visible
behaviors: document that the full JSON diff output (when using --json
--baseline) includes a current_row_count field; state that running ad-hoc SQL in
baseline mode requires specifying a baseline name (e.g., when using --baseline
you must provide the saved name from --save-baseline); and explicitly note that
--group-by cannot be combined with --baseline. Update the examples and prose
around the commands shown (references: codemap query --save-baseline, codemap
query --json --baseline, codemap query --group-by, codemap query --baselines) to
mention these constraints and show a short example of the JSON diff including
current_row_count and an example of specifying a baseline name.
In `@src/cli/cmd-query.ts`:
- Around line 249-281: The listBaselines and dropBaselineName branches (the code
paths that return { kind: "listBaselines", json } and { kind: "dropBaseline",
... }) currently ignore flags like summary, changedSince, and groupBy; update
the guard conditions inside the listBaselines (checking listBaselines) and
dropBaselineName (checking dropBaselineName) branches to also reject if summary
!== undefined || changedSince !== undefined || groupBy !== undefined (in
addition to the existing checks against recipeId, printSqlId, saveBaseline,
baseline, i < rest.length, etc.), and return the same kind:"error" pattern with
an appropriate message so commands like "codemap query --summary --baselines"
fail instead of silently ignoring those flags.
- Around line 840-854: The diffRows implementation collapses duplicates by using
Set(JSON.stringify(row)), causing incorrect diffs for multisets; update diffRows
to perform multiset diffing by using frequency maps keyed by JSON.stringify
within the function (e.g., build baseCounts and curCounts maps), decrement
counts when matching, then reconstruct added as entries in current whose count
in baseCounts is exhausted and removed as entries in baseline whose count in
curCounts is exhausted; retain the same return shape ({ added, removed }) and
reference the diffRows function name so the change is localized.
In `@templates/agents/skills/codemap/SKILL.md`:
- Around line 222-230: Update the schema table entry for row_count in SKILL.md
so it correctly describes that row_count is the cached number of saved rows
(i.e., the count of entries represented by rows_json), not the character length
of rows_json; locate the table in templates/agents/skills/codemap/SKILL.md (the
row with "row_count | INTEGER") and change its Description to something like
"Cached number of saved rows" and mirror the exact same wording in
.agents/skills/codemap/SKILL.md.
- Around line 41-47: Update the SKILL.md flags list to explicitly state that
baseline mode cannot be combined with --group-by: find the section describing
"--baseline[=<name>]" and "--group-by owner|directory|package" and add a short
note such as "Note: --baseline (and --save-baseline) cannot be used together
with --group-by; these flags are mutually exclusive" so agents/clients won't
synthesize the invalid combined command; ensure the note appears adjacent to
both flag descriptions so readers of either entry see the restriction.
---
Outside diff comments:
In `@src/cli/cmd-query.ts`:
- Around line 284-343: Add a guard so baseline mode cannot be used with grouped
execution: in the same parsing block (the function handling CLI args, near the
existing saveBaseline/baseline checks) check if groupBy !== undefined and
(saveBaseline !== undefined || baseline !== undefined) and return an error
result (kind: "error") with a clear message like "codemap: --group-by cannot be
used with --save-baseline or --baseline." Reference the existing variables
saveBaseline, baseline, groupBy and the run branch that returns { kind: "run",
... } so the new check runs before that branch is returned.
---
Nitpick comments:
In `@src/db.test.ts`:
- Around line 125-188: Add a check that query_baselines survive a full schema
rebuild by invoking dropAll(db) after the initial upserts and then asserting the
baselines still exist via listQueryBaselines(db) and/or getQueryBaseline(db,
"fan-out"); specifically call dropAll(db) (the rebuild path referenced in the
comment) and then
expect(listQueryBaselines(db).map(b=>b.name)).toContain("fan-out") and/or
expect(getQueryBaseline(db, "fan-out")).toBeDefined() before continuing
deletions and closeDb.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d95033e4-26a5-4424-bdff-19220265e20e
📒 Files selected for processing (13)
.agents/rules/codemap.md.agents/skills/codemap/SKILL.md.changeset/query-baselines.mdREADME.mddocs/architecture.mddocs/glossary.mdsrc/cli/cmd-query.test.tssrc/cli/cmd-query.tssrc/cli/main.tssrc/db.test.tssrc/db.tstemplates/agents/rules/codemap.mdtemplates/agents/skills/codemap/SKILL.md
…guards, doc payloads)
Six actionable + one nitpick (one Major), all verified correct.
Code:
- diffRows: switch from naive Set to multiset frequency-map diff. Naive
Set([A,A]) vs Set([A]) reported no removal — wrong for non-DISTINCT
queries (e.g. `SELECT name FROM symbols`). Now baseline [A,A] vs
current [A] correctly reports removed: [A].
- Parser: --group-by + --save-baseline / --baseline now errors at parse
time. Previously runQueryCmd routed to the baseline branch first and
silently dropped --group-by.
- Parser: --baselines and --drop-baseline now reject --summary,
--changed-since, and --group-by (in addition to the existing recipe
/ SQL / save / baseline checks). Was silently accepted-and-ignored.
Docs (synced across architecture.md, README.md, AND both copies of
SKILL.md per Rule 10):
- --baseline --summary payload corrected: includes baseline + current_row_count
alongside added/removed counts (was documented as just {added: N, removed: N}).
- README baseline section calls out current_row_count, ad-hoc-needs-name,
--group-by mutex.
- SKILL.md row_count description: "Cached length of rows_json" was
ambiguous (could mean character length); now "Cached number of rows
in the saved result set."
- SKILL.md --group-by description: "Mutually exclusive with
--save-baseline / --baseline." Mirrored on the --baseline side too.
- rows_json description: "multiset diff identity (duplicate rows
preserved)" instead of "set-diff identity = per-row JSON-stringify
equality."
Tests:
- New diffRows multiset suite (6 cases including 3-of-3 duplicates and
per-key independence).
- New parser tests: --group-by + baseline mutex, --baselines / --drop-baseline
no-op-flag rejection.
- New db round-trip test: query_baselines survives dropAll() — the
schema-rebuild contract that's the marquee of B.6.
Export diffRows so it can be unit-tested in isolation; runtime callers
already use it through the same module.
|
Two CodeRabbit "outside diff range" findings also addressed in
|
…chitecture skills (#32) Two unrelated docs changes batched: ## 1. Plan: `codemap audit --base <ref>` (B.5) Per `docs/README.md` Rule 3 (plans live in `plans/<feature-name>.md`, link from `roadmap.md`), drafts the design for **B.5** before writing any code. The research note explicitly calls this "the single highest-leverage candidate this refresh." | Decision | v1 | | --- | --- | | **Snapshot strategy** | Temp worktree + full reindex under `.codemap.audit-<sha>/` (gitignored by the existing `.codemap.*` glob). Defers caching / perf-tuning until a real consumer hits the wall. | | **Built-in deltas** | `files`, `dependencies`, `deprecated`, `visibility`, `barrels`, `hot_files`. Each wraps an existing recipe — no new analysis layer. | | **Verdict** | `pass` / `warn` / `fail` with thresholds **opt-in via `codemap.config.audit`**. v1 emits raw deltas only (default `pass`). | | **Exit codes** | `0` / `1` / `2` — mirrors `git diff --exit-code`. | | **Composition** | `--json` / `--summary` work; `--changed-since` / `--group-by` / `--save-baseline` / `--baseline` are mutex (different shapes / semantics). | | **Tracer-bullet sequence** | 7 commits: scaffold → worktree → first delta → remaining deltas → threshold config → docs+agents (Rule 10) → changeset. | Both prerequisites just merged on `main`: B.6 (PR #30) proves the snapshot-in-DB primitive; B.7 (PR #28) provides the `symbols.visibility` column the `visibility` delta needs. ## 2. Adopt two Tier 3 skills from [`mattpocock/skills`](https://github.com/mattpocock/skills) Sourced after evaluating three skills mid-thread; the two adopted ones earn their always-zero-cost slot: | Skill | What | | --- | --- | | **`grill-me`** | 8-line interview-pattern skill. Walk a design tree branch by branch, recommend an answer per question, ask one at a time. Filled the gap visible in commit 1's plan: I made many decisions by myself; `grill-me` would have surfaced them for second opinion before they crystallised. | | **`improve-codebase-architecture`** | Ousterhout-style deepening vocabulary (`module / interface / seam / adapter / depth / leverage / locality`), the deletion test, "one adapter = hypothetical seam, two = real," dependency categories (`DEEPENING.md`), and parallel-sub-agent "Design It Twice" interface exploration (`INTERFACE-DESIGN.md`). | Both are maintainer-only (under `.agents/skills/` + `.cursor/skills/` symlinks per `agents-first-convention`). **Not added to `templates/agents/`** — same precedent as PR #25 (consumer surface ships only the codemap rule + skill). ### Translation notes `improve-codebase-architecture/SKILL.md` adapted at three points to fit codemap's docs framework (the upstream version assumes `CONTEXT.md` + `docs/adr/`; we have neither): - `CONTEXT.md` references → `docs/glossary.md` (Rule 9 already enforces glossary updates per PR). - `docs/adr/` references → `docs/plans/<topic>.md` (Rule 3 — but plans are mortal; decisions of record lift to `architecture.md` per Rule 2 then the plan is deleted). - "Offer ADR on rejection" step → dropped. Codemap doesn't keep decision records; the closest is "lift to architecture.md." Companion files (`LANGUAGE.md`, `DEEPENING.md`, `INTERFACE-DESIGN.md`) ship **verbatim** — none reference `CONTEXT.md` or ADRs. `grill-me/SKILL.md` extended with two short codemap-specific notes: prefer `codemap` over `Grep` when exploring (per the `codemap` rule), and write crystallised answers into the in-flight `docs/plans/<name>.md` inline (Rule 3). ### Skipped - **`grill-with-docs`** (the third skill in the upstream "grill" family) — requires standing up CONTEXT.md / `docs/adr/` infrastructure that conflicts with the lift-to-architecture-then-delete-the-plan lifecycle codemap already runs. The salvageable ADR 3-criteria gate is recorded in this conversation; lift if codemap ever needs ADRs. ### Tier 3 list updated `.agents/rules/agents-tier-system.md` Tier 3 list extended with both new skills, and the previously-missing `docs-governance` + `docs-lifecycle-sweep` entries from PR #25. ## Test plan - [x] `bun run check` green (no behavior changed; pure docs + skills). - [x] All cross-references resolve (plan → research → architecture / lessons; skill files → glossary.md / architecture.md / codemap rule / each other). - [x] `.cursor/skills/{grill-me,improve-codebase-architecture}` symlinks resolve. - [x] Plan calls itself out as **Plan** type per `docs/README.md § Document Lifecycle` — delete on ship, lift to `architecture.md`. - [ ] CI green.
) * docs(research): refresh fallow.md + scan against current ship state fallow.md gains a "Status snapshot (as of 2026-05-01)" section that tabulates every adoption candidate's ship status — single source of truth for "what's open" without munging the original tier tables. Captures: - Tier A all shipped (PR #26) - B.5 partial (v1 in PR #33; --base <ref> + verdict deferred to v1.x) - B.6 shipped (PR #30) — table-in-DB, not parallel JSON files - B.7 shipped (PR #28) — landed on `symbols`, not `exports` - B.8 / C.9 / C.10 / C.11 / D.* still as-was - MCP server (agent-transports v1) shipped in PR #35 (adjacent — not a numbered fallow candidate but worth surfacing here) § 6 open questions: marks the 2 settled ones (actions ownership, audit verdict default) with their resolution PRs; preserves the 2 still-open ones (coverage column shape, plugin layer scope). § 3 already-shipped block: updates the visibility-tags note to acknowledge B.7 promoted it from regex to structured column instead of saying "B.7 proposes promoting" (which it doesn't anymore). competitive-scan-2026-04.md § 4: marks MCP server wrapping `query` as ✅ shipped via PR #35 with a cross-link to fallow.md's status snapshot. Other items still tracked there. No behavior change; pure docs refresh to match current reality. * docs(research): fix MD056 — D row in fallow.md status snapshot was 4 cells, header was 5 CodeRabbit caught: status-snapshot table header has 5 columns (Tier / # / Item / Status / Where) but the D.12-D.16 row only had 4 (collapsed Status + Where into one cell). Markdown parses that as a malformed table; renderers either drop the row or misalign neighbouring rows. Added the missing 5th cell pointing back at § 1's Defer / skip table for the per-row reasoning. * docs(research): align B.7 row title to shipped column name (symbols.visibility) CodeRabbit caught: Tier B table B.7 row title still said 'exports.visibility column' despite the body hedging '(or symbols)' AND the shipped column landing on symbols. Status snapshot row at L22 already says symbols. Updated the title to match shipped reality + added an explicit nod to the original hedge so the historical-record property survives.
Summary
Implements B.6 from
docs/research/fallow.md§ Tier B: snapshot a query result set, refactor, then diff. Four new flags oncodemap query:--save-baseline[=<name>]--recipeid; ad-hoc SQL needs=<name>. Re-saving overwrites in place.--baseline[=<name>]{baseline:{...}, current_row_count, added: [...], removed: [...]}.--baselinesrows_jsonpayload).--drop-baseline <name>Storage decision: DB, not files
Snapshots live in a new
query_baselinestable inside.codemap.dbrather than.codemap/baselines/<recipe>.json. Driven by the brainstorm in this conversation and the SQL-index thesis:.codemap.*already covers itversionfield per fileSELECT * FROM query_baselinesworks on day oneSCHEMA_VERSION4 → 5. The new table is intentionally absent fromdropAll()so--fulland future schema rebuilds preserve baselines (only index tables get dropped).Composition
--summary{baseline:{...}, current_row_count, added: N, removed: N}--changed-since <ref>--recipe+ recipeactionsaddedrows only — the rows the agent should act on--group-byDiff identity
Per-row
JSON.stringifyequality. No fuzzy "changed" category in v1 (avoids the row-key heuristic; agents can re-derive richer diffs from the raw rows).Test plan
bun run checkpasses (build, format:check, lint:ci, test, typecheck, test:golden — all 19 golden scenarios green; 4 new parser tests + 1 new db round-trip test).--save-baseline -r fan-out) →{"saved":"fan-out","row_count":10,…}{added:[],removed:[]}SELECT … LIMIT 5saved →LIMIT 7baseline'd) → 2 added rows--summarydiff →{added:2,removed:0}architecture.md § query_baselines+ glossary entry +Schema Versioning..agents/andtemplates/agents/..agents/lessons.md).Summary by CodeRabbit
Release Notes
New Features
--save-baseline, compare against saved baselines with--baselineto display added/removed rows, list stored baselines with--baselines, and delete baselines with--drop-baseline.--fullruns and schema changes.Tests
Documentation