From 64e3e98de289da10c4f1198c19b8314a67cc7012 Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Fri, 1 May 2026 10:28:42 +0300 Subject: [PATCH 1/3] =?UTF-8?q?feat(query):=20--save-baseline=20/=20--base?= =?UTF-8?q?line=20(B.6)=20=E2=80=94=20snapshots=20in=20.codemap.db?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the four-flag baseline surface from docs/research/fallow.md B.6: - --save-baseline[=] snapshot result rows (name = recipe id by default) - --baseline[=] diff current result vs saved snapshot - --baselines list saved baselines (no rows_json payload) - --drop-baseline delete one Storage decision: snapshots live in a new `query_baselines` table inside .codemap.db rather than parallel JSON files. Wins over the file-per-baseline sketch: - One on-disk artifact, no new gitignore entries - Atomic writes (single SQLite txn) - Cross-baseline queries are SQL JOINs - No file format design / hand-rolled version field Schema 4 → 5. The new table is intentionally absent from dropAll() so baselines survive `--full` and future SCHEMA_VERSION rebuilds (only index tables get dropped). Future schema changes to query_baselines itself need an in-place migration. Diff identity for v1 = canonical JSON.stringify(row). Output: {baseline:{name, recipe_id, row_count, git_ref, created_at}, current_row_count, added: [...rows], removed: [...rows]} Composes with everything: --summary collapses to {added:N, removed:N}; --changed-since filters before the diff; --baseline + --recipe attach recipe `actions` to the `added` rows only (the rows the agent should act on); --group-by is mutually exclusive with --baseline (different output shape). Tests cover parser shape for all four flags, db round-trip with upsert / get / list / delete + the bun-vs-better-sqlite3 null/undefined coercion. End-to-end smoked: save / list / diff (no change) / contrived diff (5→7 rows) / --summary diff / drop, all under both --recipe and ad-hoc-with-explicit-name modes. Per Rule 10: rule + skill updated in lockstep across .agents/ and templates/agents/. Schema bump justifies a minor changeset per .agents/lessons.md "changesets bump policy (pre-v1)". --- .agents/rules/codemap.md | 28 +- .agents/skills/codemap/SKILL.md | 19 +- .changeset/query-baselines.md | 5 + README.md | 6 + docs/architecture.md | 18 +- docs/glossary.md | 4 + src/cli/cmd-query.test.ts | 120 ++++++ src/cli/cmd-query.ts | 501 ++++++++++++++++++++++- src/cli/main.ts | 17 + src/db.test.ts | 69 ++++ src/db.ts | 110 ++++- templates/agents/rules/codemap.md | 30 +- templates/agents/skills/codemap/SKILL.md | 19 +- 13 files changed, 910 insertions(+), 36 deletions(-) create mode 100644 .changeset/query-baselines.md diff --git a/.agents/rules/codemap.md b/.agents/rules/codemap.md index 0567241..e1fe5db 100644 --- a/.agents/rules/codemap.md +++ b/.agents/rules/codemap.md @@ -12,18 +12,22 @@ A local database (default **`.codemap.db`**) indexes structure: symbols, imports ## CLI (this repository) -| Context | Incremental index | Query | -| ------------------------------ | ------------------ | ------------------------------------------------------------------------------------------ | -| **Default** — from this clone | `bun src/index.ts` | `bun src/index.ts query --json ""` | -| Same entry | `bun run dev` | (same as first row) | -| Query (ASCII table — optional) | — | `bun src/index.ts query ""` | -| Recipe | — | `bun src/index.ts query --json --recipe fan-out` (see **`bun src/index.ts query --help`**) | -| Recipe catalog / SQL | — | `bun src/index.ts query --recipes-json` · `bun src/index.ts query --print-sql fan-out` | -| Counts only | — | `bun src/index.ts query --json --summary -r deprecated-symbols` | -| PR-scoped rows | — | `bun src/index.ts query --json --changed-since origin/main -r fan-out` | -| Bucket by owner / dir / pkg | — | `bun src/index.ts query --json --group-by directory -r fan-in` | - -**Recipe `actions`:** with **`--json`**, recipes that define an `actions` template append it to every row (kebab-case verb + description — e.g. `fan-out` → `review-coupling`). Inspect via **`--recipes-json`**. Ad-hoc SQL never carries actions. +| Context | Incremental index | Query | +| ------------------------------ | ------------------ | --------------------------------------------------------------------------------------------------------- | +| **Default** — from this clone | `bun src/index.ts` | `bun src/index.ts query --json ""` | +| Same entry | `bun run dev` | (same as first row) | +| Query (ASCII table — optional) | — | `bun src/index.ts query ""` | +| Recipe | — | `bun src/index.ts query --json --recipe fan-out` (see **`bun src/index.ts query --help`**) | +| Recipe catalog / SQL | — | `bun src/index.ts query --recipes-json` · `bun src/index.ts query --print-sql fan-out` | +| Counts only | — | `bun src/index.ts query --json --summary -r deprecated-symbols` | +| PR-scoped rows | — | `bun src/index.ts query --json --changed-since origin/main -r fan-out` | +| Bucket by owner / dir / pkg | — | `bun src/index.ts query --json --group-by directory -r fan-in` | +| Save / diff a baseline | — | `bun src/index.ts query --save-baseline -r visibility-tags` then `… --json --baseline -r visibility-tags` | +| List / drop baselines | — | `bun src/index.ts query --baselines` · `bun src/index.ts query --drop-baseline ` | + +**Recipe `actions`:** with **`--json`**, recipes that define an `actions` template append it to every row (kebab-case verb + description — e.g. `fan-out` → `review-coupling`). Under `--baseline`, actions attach to the **`added`** rows only. Inspect via **`--recipes-json`**. Ad-hoc SQL never carries actions. + +**Baselines** (`query_baselines` table inside `.codemap.db`, no parallel JSON files): `--save-baseline[=]` snapshots a result set; `--baseline[=]` diffs the current result against it (added / removed rows; identity = `JSON.stringify(row)`). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Survives `--full` and SCHEMA bumps. After **`bun run build`**, **`node dist/index.mjs`** matches the published **`codemap`** binary (same flags). **`bun link`** / global **`codemap`** also work when testing the packaged CLI. diff --git a/.agents/skills/codemap/SKILL.md b/.agents/skills/codemap/SKILL.md index d99c5ce..833016c 100644 --- a/.agents/skills/codemap/SKILL.md +++ b/.agents/skills/codemap/SKILL.md @@ -41,7 +41,10 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name. - **`--summary`** — counts only. With **`--json`**: **`{"count": N}`**. With **`--group-by`**: **`{"group_by": "", "groups": [{key, count}]}`**. - **`--changed-since `** — post-filter rows by **`path`** / **`file_path`** / **`from_path`** / **`to_path`** / **`resolved_path`** against **`git diff --name-only ...HEAD ∪ git status --porcelain`**. Rows with no recognised path column pass through. - **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. -- **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Inspect via **`--recipes-json`**. +- **`--save-baseline[=]`** — snapshot the result rows to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` and SCHEMA bumps). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Re-saving with the same name overwrites in place. +- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row `JSON.stringify` equality. Pair with `--summary` for `{added: N, removed: N}`. +- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. +- **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**. **Determinism:** Bundled recipes use stable secondary **`ORDER BY`** tie-breakers (and ordered inner **`LIMIT`** samples where applicable). Prefer **`--recipe`** over pasting SQL when you need the maintained ordering. **Canonical SQL** is **`src/cli/query-recipes.ts`** (`QUERY_RECIPES`). @@ -212,6 +215,20 @@ LIMIT 10 | `project_root` | Absolute path to project | | `schema_version` | Schema version number | +### `query_baselines` — Saved query result snapshots (user data) + +User-facing baselines saved by `codemap query --save-baseline`, replayed by `codemap query --baseline`. **Survives `--full` and SCHEMA bumps** — intentionally absent from `dropAll()`. + +| Column | Type | Description | +| ---------- | ------- | ---------------------------------------------------------------------------------------- | +| name | TEXT PK | User-supplied name; defaults to the `--recipe` id (ad-hoc SQL requires an explicit name) | +| recipe_id | TEXT | The `--recipe` id when known; NULL for ad-hoc SQL | +| sql | TEXT | The SQL that produced the snapshot | +| rows_json | TEXT | Canonical `JSON.stringify(rows)` — set-diff identity = per-row JSON-stringify equality | +| row_count | INTEGER | Cached length of `rows_json` | +| git_ref | TEXT | `git rev-parse HEAD` at save time, or NULL when not a git working tree | +| created_at | INTEGER | `Date.now()` at save time (epoch ms) | + ## Query patterns ### Basic lookups diff --git a/.changeset/query-baselines.md b/.changeset/query-baselines.md new file mode 100644 index 0000000..3802a07 --- /dev/null +++ b/.changeset/query-baselines.md @@ -0,0 +1,5 @@ +--- +"@stainless-code/codemap": minor +--- + +`codemap query --save-baseline` / `--baseline` — snapshot a query result set and diff against it later. Stored in the new `query_baselines` table inside `.codemap.db` (no parallel JSON files). `--baselines` lists saved snapshots, `--drop-baseline ` deletes one. Diff identity is per-row `JSON.stringify` equality; `--summary` collapses to `{added: N, removed: N}`. Recipe `actions` attach to the `added` rows when running under `--baseline`. Baselines survive `--full` and SCHEMA rebuilds. `SCHEMA_VERSION` bumps from 4 to 5. diff --git a/README.md b/README.md index 619f162..6532405 100644 --- a/README.md +++ b/README.md @@ -85,6 +85,12 @@ codemap query --json --summary --changed-since HEAD~5 "SELECT file_path FROM sym codemap query --json --summary --group-by directory -r fan-in codemap query --json --group-by owner -r deprecated-symbols codemap query --json --summary --group-by package "SELECT file_path FROM symbols" +# Snapshot a result, refactor, then diff (saved inside .codemap.db, no JSON files) +codemap query --save-baseline -r visibility-tags # save under name "visibility-tags" +codemap query --json --baseline -r visibility-tags # full diff: {baseline, added, removed} +codemap query --json --summary --baseline -r visibility-tags # counts only: {added, removed} +codemap query --baselines # list saved baselines +codemap query --drop-baseline visibility-tags # delete # Recipes that define per-row action templates append "actions" hints (kebab-case verb + # description) in --json output; ad-hoc SQL never carries actions. Inspect via --recipes-json. # List bundled recipes as JSON, or print one recipe's SQL (no DB required) diff --git a/docs/architecture.md b/docs/architecture.md index fbe5108..71011e7 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -117,7 +117,7 @@ A local SQLite database (`.codemap.db`) indexes the project tree and stores stru **Commands and flags** (index, query, **`codemap agents init`**, **`--root`**, **`--config`**, environment): [../README.md § CLI](../README.md#cli) — **do not duplicate** flag lists here; this section only adds implementation notes. From this repository: **`bun run dev`** or **`bun src/index.ts`** (same flags). -**Query wiring:** **`src/cli/cmd-query.ts`** (argv, **`printQueryResult`**, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**), **`src/cli/query-recipes.ts`** (**`QUERY_RECIPES`** — bundled SQL only source; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions. The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). +**Query wiring:** **`src/cli/cmd-query.ts`** (argv, **`printQueryResult`**, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**, **`--save-baseline`** / **`--baseline`** / **`--baselines`** / **`--drop-baseline`**), **`src/cli/query-recipes.ts`** (**`QUERY_RECIPES`** — bundled SQL only source; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **`--save-baseline[=]`** snapshots the result to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` / SCHEMA bumps because the table is intentionally absent from `dropAll()`); name defaults to `--recipe` id, ad-hoc SQL needs an explicit name. **`--baseline[=]`** replays the SQL, fetches the saved row set, and emits `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (or `{added: N, removed: N}` with `--summary`); identity is per-row `JSON.stringify` equality, no fuzzy "changed" category in v1. **`--baselines`** (read-only list) and **`--drop-baseline `** complete the surface; helpers in **`src/db.ts`** (`upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`). **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions; under `--baseline`, actions attach to `added` rows only (the rows the agent should act on). The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). **Validate wiring:** **`src/cli/cmd-validate.ts`** — **`computeValidateRows`** is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed`. CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. @@ -161,7 +161,7 @@ Optional **`codemap.config.ts`** (default export: object or async factory) or ** **Fresh database:** the default CLI **`codemap`** (incremental) calls **`createSchema()`** in **`runCodemapIndex`** before **`getChangedFiles()`**, so the **`meta`** table exists before **`getMeta(..., "last_indexed_commit")`** runs on an empty **`.codemap.db`**. -Current schema version: **4** — see [Schema Versioning](#schema-versioning) for details. +Current schema version: **5** — see [Schema Versioning](#schema-versioning) for details. All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. PRAGMAs and index design: [SQLite Performance Configuration](#sqlite-performance-configuration). @@ -308,6 +308,20 @@ Edges are deduped per (caller_scope, callee) per file: if `foo` calls `bar` thre | key | TEXT PK | e.g. `schema_version`, `last_indexed_commit`, `indexed_at` | | value | TEXT | Stored value | +### `query_baselines` — Saved query result snapshots (user data) (`STRICT`) + +User-facing baselines saved by `codemap query --save-baseline`, replayed by `codemap query --baseline` for diffs (added / removed rows). Lives next to the index tables so the entire codemap state stays in one SQLite file — no parallel JSON snapshot files. **Intentionally absent from `dropAll()`** so `--full` and `SCHEMA_VERSION` rebuilds preserve baselines (only index tables get dropped). + +| Column | Type | Description | +| ---------- | ------- | ----------------------------------------------------------------------------------------- | +| name | TEXT PK | User-supplied name; defaults to the `--recipe` id (ad-hoc SQL must pass an explicit name) | +| recipe_id | TEXT | The `--recipe` id when known; NULL for ad-hoc SQL | +| sql | TEXT | The SQL that produced the snapshot (replayable; useful when re-running on a new branch) | +| rows_json | TEXT | Canonical `JSON.stringify(rows)`. Diff identity is per-row JSON-stringify equality | +| row_count | INTEGER | Cached length of `rows_json` for fast `--baselines` listing | +| git_ref | TEXT | `git rev-parse HEAD` at save time, or NULL when not a git working tree | +| created_at | INTEGER | `Date.now()` at save time (epoch ms) | + ### Indexes All tables have covering indexes tuned for AI agent query patterns. See [Covering indexes](#covering-indexes) and [Partial indexes](#partial-indexes) for the full list. diff --git a/docs/glossary.md b/docs/glossary.md index 0e23ba9..8f45405 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -299,6 +299,10 @@ A managed root-level file (`CLAUDE.md`, `AGENTS.md`, `GEMINI.md`, `.github/copil Any SQL run against `.codemap.db` — either a **recipe** (bundled SQL) or ad-hoc. Distinct from **query-recipes.ts** (the file that holds bundled recipe SQL strings). +### query baseline + +A snapshot of a query result set saved by `codemap query --save-baseline[=]` and replayed by `codemap query --baseline[=]` for added/removed diffs. Stored in the `query_baselines` table inside `.codemap.db` (no parallel JSON files; survives `--full` and `SCHEMA_VERSION` rebuilds because the table is intentionally absent from `dropAll()`). Default name = `--recipe` id; ad-hoc SQL must pass an explicit name. Diff identity is per-row `JSON.stringify` equality — exact match, no fuzzy "changed" category in v1. + ### query recipe See **recipe**. diff --git a/src/cli/cmd-query.test.ts b/src/cli/cmd-query.test.ts index 997412a..1188ce3 100644 --- a/src/cli/cmd-query.test.ts +++ b/src/cli/cmd-query.test.ts @@ -29,6 +29,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: undefined, groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -42,6 +44,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: undefined, groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -55,6 +59,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: undefined, groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -68,6 +74,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: undefined, groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -83,6 +91,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -101,6 +111,8 @@ describe("parseQueryRest", () => { changedSince: "origin/main", recipeId: undefined, groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -123,6 +135,8 @@ describe("parseQueryRest", () => { changedSince: "HEAD~3", recipeId: "fan-out", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -142,6 +156,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: undefined, groupBy: "directory", + saveBaseline: undefined, + baseline: undefined, }); }); @@ -157,6 +173,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-in", groupBy: "owner", + saveBaseline: undefined, + baseline: undefined, }); }); @@ -172,6 +190,98 @@ describe("parseQueryRest", () => { if (r.kind === "error") expect(r.message).toContain("unknown --group-by"); }); + // ---------- baseline flags ---------- + + it("parses bare --save-baseline + --recipe (default name = recipe id)", () => { + const r = parseQueryRest(["query", "--save-baseline", "-r", "fan-out"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.recipeId).toBe("fan-out"); + expect(r.saveBaseline).toBe(true); + expect(r.baseline).toBeUndefined(); + }); + + it("parses --save-baseline= with ad-hoc SQL", () => { + const r = parseQueryRest([ + "query", + "--save-baseline=pre-refactor", + "SELECT 1", + ]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.saveBaseline).toBe("pre-refactor"); + }); + + it("errors when bare --save-baseline meets ad-hoc SQL with no following name", () => { + const r = parseQueryRest(["query", "--save-baseline"]); + expect(r.kind).toBe("error"); + }); + + it("errors when --save-baseline= has empty name", () => { + const r = parseQueryRest(["query", "--save-baseline=", "SELECT 1"]); + expect(r.kind).toBe("error"); + if (r.kind === "error") expect(r.message).toContain("non-empty name"); + }); + + it("parses --baseline= with ad-hoc SQL", () => { + const r = parseQueryRest(["query", "--baseline=pre-refactor", "SELECT 1"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.baseline).toBe("pre-refactor"); + }); + + it("parses bare --baseline + --recipe", () => { + const r = parseQueryRest(["query", "--baseline", "-r", "fan-out"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.baseline).toBe(true); + expect(r.recipeId).toBe("fan-out"); + }); + + it("errors when --save-baseline and --baseline are combined", () => { + const r = parseQueryRest([ + "query", + "--save-baseline", + "--baseline", + "-r", + "fan-out", + ]); + expect(r.kind).toBe("error"); + if (r.kind === "error") expect(r.message).toContain("mutually exclusive"); + }); + + it("parses --baselines as a list operation", () => { + expect(parseQueryRest(["query", "--baselines"])).toEqual({ + kind: "listBaselines", + json: false, + }); + expect(parseQueryRest(["query", "--json", "--baselines"])).toEqual({ + kind: "listBaselines", + json: true, + }); + }); + + it("rejects --baselines combined with SQL or other flags", () => { + expect(parseQueryRest(["query", "--baselines", "SELECT 1"]).kind).toBe( + "error", + ); + expect(parseQueryRest(["query", "--baselines", "-r", "fan-out"]).kind).toBe( + "error", + ); + }); + + it("parses --drop-baseline ", () => { + expect( + parseQueryRest(["query", "--drop-baseline", "pre-refactor"]), + ).toEqual({ + kind: "dropBaseline", + name: "pre-refactor", + json: false, + }); + }); + + it("errors when --drop-baseline has no name", () => { + const r = parseQueryRest(["query", "--drop-baseline"]); + expect(r.kind).toBe("error"); + if (r.kind === "error") expect(r.message).toContain("--drop-baseline"); + }); + it("errors when --changed-since has no ref", () => { const r = parseQueryRest(["query", "--changed-since"]); expect(r.kind).toBe("error"); @@ -213,6 +323,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out-sample-json", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -228,6 +340,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -243,6 +357,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out-sample", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -258,6 +374,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); @@ -273,6 +391,8 @@ describe("parseQueryRest", () => { changedSince: undefined, recipeId: "fan-out", groupBy: undefined, + saveBaseline: undefined, + baseline: undefined, }); }); diff --git a/src/cli/cmd-query.ts b/src/cli/cmd-query.ts index 109cbbb..29f64e3 100644 --- a/src/cli/cmd-query.ts +++ b/src/cli/cmd-query.ts @@ -1,5 +1,17 @@ -import { printQueryResult, queryRows } from "../application/index-engine"; +import { + getCurrentCommit, + printQueryResult, + queryRows, +} from "../application/index-engine"; import { loadUserConfig, resolveCodemapConfig } from "../config"; +import { + closeDb, + deleteQueryBaseline, + getQueryBaseline, + listQueryBaselines, + openDb, + upsertQueryBaseline, +} from "../db"; import { filterRowsByChangedFiles, getFilesChangedSince } from "../git-changed"; import type { Bucketizer, GroupByMode } from "../group-by"; import { @@ -36,9 +48,13 @@ export function parseQueryRest(rest: string[]): changedSince: string | undefined; recipeId: string | undefined; groupBy: GroupByMode | undefined; + saveBaseline: string | true | undefined; + baseline: string | true | undefined; } | { kind: "recipesCatalog" } - | { kind: "printRecipeSql"; id: string } { + | { kind: "printRecipeSql"; id: string } + | { kind: "listBaselines"; json: boolean } + | { kind: "dropBaseline"; name: string; json: boolean } { if (rest[0] !== "query") { throw new Error("parseQueryRest: expected query"); } @@ -58,6 +74,10 @@ export function parseQueryRest(rest: string[]): let recipesJson = false; let printSqlId: string | undefined; let groupBy: GroupByMode | undefined; + let listBaselines = false; + let dropBaselineName: string | undefined; + let saveBaseline: string | true | undefined; + let baseline: string | true | undefined; while (i < rest.length) { const a = rest[i]; @@ -105,6 +125,76 @@ export function parseQueryRest(rest: string[]): i += 2; continue; } + // --save-baseline | --save-baseline= | --save-baseline + if (a === "--save-baseline" || a.startsWith("--save-baseline=")) { + const eq = a.indexOf("="); + if (eq !== -1) { + const v = a.slice(eq + 1); + if (!v) { + return { + kind: "error", + message: + 'codemap: "--save-baseline=" requires a non-empty name. Drop the "=" to use the recipe id as the default name.', + }; + } + saveBaseline = v; + i++; + continue; + } + const next = rest[i + 1]; + if (next !== undefined && !next.startsWith("-")) { + saveBaseline = next; + i += 2; + continue; + } + saveBaseline = true; + i++; + continue; + } + // --baseline | --baseline= | --baseline + if (a === "--baseline" || a.startsWith("--baseline=")) { + const eq = a.indexOf("="); + if (eq !== -1) { + const v = a.slice(eq + 1); + if (!v) { + return { + kind: "error", + message: + 'codemap: "--baseline=" requires a non-empty name. Drop the "=" to use the recipe id as the default name.', + }; + } + baseline = v; + i++; + continue; + } + const next = rest[i + 1]; + if (next !== undefined && !next.startsWith("-")) { + baseline = next; + i += 2; + continue; + } + baseline = true; + i++; + continue; + } + if (a === "--baselines") { + listBaselines = true; + i++; + continue; + } + if (a === "--drop-baseline") { + const name = rest[i + 1]; + if (name === undefined || name.startsWith("-")) { + return { + kind: "error", + message: + 'codemap: "--drop-baseline" requires a name. Example: codemap query --drop-baseline pre-refactor', + }; + } + dropBaselineName = name; + i += 2; + continue; + } if (a === "--recipes-json") { recipesJson = true; i++; @@ -156,6 +246,49 @@ export function parseQueryRest(rest: string[]): return { kind: "recipesCatalog" }; } + if (listBaselines) { + if ( + recipeId !== undefined || + printSqlId !== undefined || + saveBaseline !== undefined || + baseline !== undefined || + dropBaselineName !== undefined || + i < rest.length + ) { + return { + kind: "error", + message: + "codemap: --baselines does not take SQL, --recipe, --save-baseline, --baseline, or --drop-baseline.", + }; + } + return { kind: "listBaselines", json }; + } + + if (dropBaselineName !== undefined) { + if ( + recipeId !== undefined || + printSqlId !== undefined || + saveBaseline !== undefined || + baseline !== undefined || + i < rest.length + ) { + return { + kind: "error", + message: + "codemap: --drop-baseline does not take SQL or other baseline / recipe flags.", + }; + } + return { kind: "dropBaseline", name: dropBaselineName, json }; + } + + if (saveBaseline !== undefined && baseline !== undefined) { + return { + kind: "error", + message: + "codemap: --save-baseline and --baseline are mutually exclusive in one run.", + }; + } + if (printSqlId !== undefined) { if (recipeId !== undefined) { return { @@ -197,7 +330,17 @@ export function parseQueryRest(rest: string[]): message: `codemap: unknown recipe "${recipeId}". Known recipes: ${known}`, }; } - return { kind: "run", sql, json, summary, changedSince, recipeId, groupBy }; + return { + kind: "run", + sql, + json, + summary, + changedSince, + recipeId, + groupBy, + saveBaseline, + baseline, + }; } const sql = rest.slice(i).join(" ").trim(); @@ -205,7 +348,22 @@ export function parseQueryRest(rest: string[]): return { kind: "error", message: - 'codemap: missing SQL or recipe. Usage: codemap query [--json] [--summary] [--changed-since ] [--group-by ] "" | codemap query [...] --recipe | codemap query --recipes-json | codemap query --print-sql ', + 'codemap: missing SQL or recipe. Usage: codemap query [--json] [--summary] [--changed-since ] [--group-by ] [--save-baseline[=] | --baseline[=]] "" | codemap query [...] --recipe | codemap query --recipes-json | codemap query --print-sql | codemap query --baselines | codemap query --drop-baseline ', + }; + } + // Ad-hoc SQL needs an explicit baseline name (no recipe id default). + if (saveBaseline === true) { + return { + kind: "error", + message: + 'codemap: "--save-baseline" needs an explicit name when used without --recipe (recipe id is the default name otherwise). Use --save-baseline=.', + }; + } + if (baseline === true) { + return { + kind: "error", + message: + 'codemap: "--baseline" needs an explicit name when used without --recipe. Use --baseline=.', }; } return { @@ -216,6 +374,8 @@ export function parseQueryRest(rest: string[]): changedSince, recipeId: undefined, groupBy, + saveBaseline, + baseline, }; } @@ -250,10 +410,12 @@ function formatRecipeHelpLines(): string { */ export function printQueryCmdHelp(): void { const recipeBlock = formatRecipeHelpLines(); - console.log(`Usage: codemap query [--json] [--summary] [--changed-since ] [--group-by ] "" - codemap query [--json] [--summary] [--changed-since ] [--group-by ] --recipe (alias: -r) + console.log(`Usage: codemap query [--json] [--summary] [--changed-since ] [--group-by ] [--save-baseline[=] | --baseline[=]] "" + codemap query [...] --recipe (alias: -r) codemap query --recipes-json codemap query --print-sql + codemap query --baselines + codemap query --drop-baseline Read-only SQL against .codemap.db (after at least one successful index run). The CLI does not cap row count — use SQL LIMIT (and ORDER BY) when you need a bounded result set. @@ -263,6 +425,7 @@ Flags: On error, prints a single object: {"error":""} to stdout. --summary Print only the row count (no rows). With --json: {"count": N}. Without: count: N. With --group-by, output collapses to {"group_by": "", "groups": [{key, count}]}. + With --baseline, collapses to {baseline, current_row_count, added: N, removed: N}. Useful for dashboards and agent context windows where the rows are noise. --changed-since Filter result rows to those touching files changed since . The ref can be any committish (origin/main, HEAD~5, a sha, a tag). Rows are kept if any of @@ -276,6 +439,19 @@ Flags: directory First path segment (src/cli/foo.ts → src). package Workspace dir from package.json/workspaces or pnpm-workspace.yaml; out-of-workspace paths bucket to "". + --save-baseline[=] + Snapshot the result rows to the query_baselines table inside .codemap.db + for later --baseline diffs. Name defaults to the --recipe id; ad-hoc SQL + must pass an explicit =. Stores SQL, rows, row count, current git + HEAD (when available), and a timestamp. Re-saving with the same name + overwrites in place. Survives --full and SCHEMA_VERSION rebuilds. + --baseline[=] Diff the current result against the saved baseline of the same name. + Output: {baseline:{...}, current_row_count, added: [...], removed: [...]} + (with --json) or a two-section terminal dump. Set membership uses + JSON.stringify(row) — exact-match identity, no fuzzy "changed" category. + Recipe actions, when defined, attach to the added rows only. + --baselines List saved baselines (name, recipe_id, row_count, git_ref, created_at). + --drop-baseline Delete a saved baseline. Exits 1 if the name doesn't exist. --recipe, -r Run bundled SQL (no SQL string on the command line). --recipes-json Print all bundled recipes (id, description, sql) as JSON to stdout. No DB. --print-sql Print one recipe's SQL text to stdout (does not run the query). No DB. @@ -307,6 +483,16 @@ Examples: codemap query --json --summary --group-by owner -r deprecated-symbols codemap query --json --summary --group-by package "SELECT file_path FROM symbols" + # Snapshot a result, refactor, then diff + codemap query --save-baseline -r visibility-tags # save under name "visibility-tags" + # ... refactor ... + codemap query --json --baseline -r visibility-tags # full diff + codemap query --json --summary --baseline -r visibility-tags # counts only + codemap query --save-baseline=pre-refactor "SELECT name, file_path FROM symbols WHERE visibility = 'beta'" + codemap query --baseline=pre-refactor "SELECT name, file_path FROM symbols WHERE visibility = 'beta'" + codemap query --baselines # list + codemap query --drop-baseline pre-refactor # delete + # Inspect recipes without touching the DB codemap query --recipes-json codemap query --print-sql fan-out @@ -326,6 +512,8 @@ export async function runQueryCmd(opts: { changedSince?: string | undefined; recipeId?: string | undefined; groupBy?: GroupByMode | undefined; + saveBaseline?: string | true | undefined; + baseline?: string | true | undefined; }): Promise { try { const user = await loadUserConfig(opts.root, opts.configFile); @@ -347,6 +535,35 @@ export async function runQueryCmd(opts: { ? getQueryRecipeActions(opts.recipeId) : undefined; + // Baseline ops branch off here — they don't compose with --group-by because + // the diff semantics are about row identity, not bucketing. (--summary still + // composes: collapses the diff to {added: N, removed: N}.) + if (opts.saveBaseline !== undefined) { + runSaveBaseline({ + sql: opts.sql, + json: opts.json === true, + recipeId: opts.recipeId, + baselineName: + opts.saveBaseline === true + ? (opts.recipeId as string) + : opts.saveBaseline, + changedFiles, + }); + return; + } + if (opts.baseline !== undefined) { + runBaselineDiff({ + sql: opts.sql, + json: opts.json === true, + summary: opts.summary === true, + baselineName: + opts.baseline === true ? (opts.recipeId as string) : opts.baseline, + changedFiles, + recipeActions, + }); + return; + } + if (opts.groupBy !== undefined) { runGroupedQuery({ sql: opts.sql, @@ -373,6 +590,70 @@ export async function runQueryCmd(opts: { } } +/** Bootstrap a DB connection and run the list-baselines command. */ +export async function runListBaselinesCmd(opts: { + root: string; + configFile: string | undefined; + json: boolean; +}): Promise { + try { + const user = await loadUserConfig(opts.root, opts.configFile); + initCodemap(resolveCodemapConfig(opts.root, user)); + configureResolver(getProjectRoot(), getTsconfigPath()); + const db = openDb(); + try { + const rows = listQueryBaselines(db); + if (opts.json) { + console.log(JSON.stringify(rows)); + } else if (rows.length === 0) { + console.log("(no baselines)"); + } else { + console.table(rows); + } + } finally { + closeDb(db, { readonly: true }); + } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err); + emitErrorMaybeJson(msg, opts.json); + } +} + +/** Bootstrap a DB connection and drop the named baseline; exits 1 on miss. */ +export async function runDropBaselineCmd(opts: { + root: string; + configFile: string | undefined; + name: string; + json: boolean; +}): Promise { + try { + const user = await loadUserConfig(opts.root, opts.configFile); + initCodemap(resolveCodemapConfig(opts.root, user)); + configureResolver(getProjectRoot(), getTsconfigPath()); + const db = openDb(); + try { + const dropped = deleteQueryBaseline(db, opts.name); + if (!dropped) { + emitErrorMaybeJson( + `codemap: no baseline named "${opts.name}". Use --baselines to list saved baselines.`, + opts.json, + ); + return; + } + if (opts.json) { + console.log(JSON.stringify({ dropped: opts.name })); + } else { + console.log(`Dropped baseline: ${opts.name}`); + } + } finally { + closeDb(db); + } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err); + emitErrorMaybeJson(msg, opts.json); + } +} + function emitErrorMaybeJson(message: string, json: boolean | undefined) { if (json === true) { console.log(JSON.stringify({ error: message })); @@ -473,3 +754,211 @@ function attachActionsForGrouped( if ("actions" in obj) return obj; return { ...obj, actions }; } + +// `git rev-parse HEAD` may legitimately fail (no git, detached worktree, etc.). +// Baselines just record git_ref = NULL in that case — no fatal error. +function tryGetGitRef(): string | null { + try { + const sha = getCurrentCommit(); + return sha || null; + } catch { + return null; + } +} + +function runSaveBaseline(opts: { + sql: string; + json: boolean; + recipeId: string | undefined; + baselineName: string; + changedFiles: Set | undefined; +}) { + let rows: unknown[]; + try { + rows = queryRows(opts.sql); + } catch (err) { + emitErrorMaybeJson( + err instanceof Error ? err.message : String(err), + opts.json, + ); + return; + } + if (opts.changedFiles !== undefined) { + rows = filterRowsByChangedFiles(rows, opts.changedFiles); + } + + const db = openDb(); + let savedAt: number; + let gitRef: string | null; + try { + savedAt = Date.now(); + gitRef = tryGetGitRef(); + upsertQueryBaseline(db, { + name: opts.baselineName, + recipe_id: opts.recipeId ?? null, + sql: opts.sql, + rows_json: JSON.stringify(rows), + row_count: rows.length, + git_ref: gitRef, + created_at: savedAt, + }); + } finally { + closeDb(db); + } + + if (opts.json) { + console.log( + JSON.stringify({ + saved: opts.baselineName, + recipe_id: opts.recipeId ?? null, + row_count: rows.length, + git_ref: gitRef, + created_at: savedAt, + }), + ); + } else { + const ref = gitRef ? ` @ ${gitRef.slice(0, 8)}` : ""; + console.log( + `Saved baseline "${opts.baselineName}" (${rows.length} rows${ref}).`, + ); + } +} + +interface BaselineDiff { + baseline: { + name: string; + recipe_id: string | null; + row_count: number; + git_ref: string | null; + created_at: number; + }; + current_row_count: number; + added: unknown[]; + removed: unknown[]; +} + +// Set-membership = canonical JSON.stringify(row). For v1 we don't compute a +// "changed" category — it requires a row-key heuristic and the agent can +// re-derive richer diffs from the underlying rows if needed. +function diffRows( + baseline: unknown[], + current: unknown[], +): { + added: unknown[]; + removed: unknown[]; +} { + const baseSet = new Set(baseline.map((r) => JSON.stringify(r))); + const curSet = new Set(current.map((r) => JSON.stringify(r))); + const added = current.filter((r) => !baseSet.has(JSON.stringify(r))); + const removed = baseline.filter((r) => !curSet.has(JSON.stringify(r))); + return { added, removed }; +} + +function runBaselineDiff(opts: { + sql: string; + json: boolean; + summary: boolean; + baselineName: string; + changedFiles: Set | undefined; + recipeActions: ReadonlyArray | undefined; +}) { + const db = openDb(); + let baseline: ReturnType; + try { + baseline = getQueryBaseline(db, opts.baselineName); + } finally { + closeDb(db, { readonly: true }); + } + + if (baseline === undefined) { + emitErrorMaybeJson( + `codemap: no baseline named "${opts.baselineName}". Use --baselines to list saved baselines.`, + opts.json, + ); + return; + } + + let baselineRows: unknown[]; + try { + baselineRows = JSON.parse(baseline.rows_json) as unknown[]; + } catch { + emitErrorMaybeJson( + `codemap: baseline "${opts.baselineName}" has corrupt rows_json — drop and re-save.`, + opts.json, + ); + return; + } + + let currentRows: unknown[]; + try { + currentRows = queryRows(opts.sql); + } catch (err) { + emitErrorMaybeJson( + err instanceof Error ? err.message : String(err), + opts.json, + ); + return; + } + if (opts.changedFiles !== undefined) { + currentRows = filterRowsByChangedFiles(currentRows, opts.changedFiles); + } + + const { added, removed } = diffRows(baselineRows, currentRows); + + // Recipe actions enrich `added` only — they're the rows the agent should act on. + const enrichedAdded = + opts.recipeActions !== undefined && opts.recipeActions.length > 0 + ? added.map((row) => attachActionsForGrouped(row, opts.recipeActions!)) + : added; + + const diff: BaselineDiff = { + baseline: { + name: baseline.name, + recipe_id: baseline.recipe_id, + row_count: baseline.row_count, + git_ref: baseline.git_ref, + created_at: baseline.created_at, + }, + current_row_count: currentRows.length, + added: enrichedAdded, + removed, + }; + + if (opts.summary) { + const payload = { + baseline: diff.baseline, + current_row_count: diff.current_row_count, + added: added.length, + removed: removed.length, + }; + if (opts.json) { + console.log(JSON.stringify(payload)); + } else { + console.log( + `baseline "${diff.baseline.name}": ${diff.baseline.row_count} rows → ${diff.current_row_count} rows (+${added.length} / -${removed.length})`, + ); + } + return; + } + + if (opts.json) { + console.log(JSON.stringify(diff)); + return; + } + + // Terminal mode — readable two-section dump. + console.log( + `baseline "${diff.baseline.name}": ${diff.baseline.row_count} rows → ${diff.current_row_count} rows (+${added.length} / -${removed.length})`, + ); + if (added.length > 0) { + console.log(`\n added (+${added.length}):`); + console.table(enrichedAdded); + } + if (removed.length > 0) { + console.log(`\n removed (-${removed.length}):`); + console.table(removed); + } + if (added.length === 0 && removed.length === 0) { + console.log(" no diff."); + } +} diff --git a/src/cli/main.ts b/src/cli/main.ts index 1ff29e9..dc83c56 100644 --- a/src/cli/main.ts +++ b/src/cli/main.ts @@ -114,6 +114,8 @@ Copies bundled agent templates into .agents/ under the project root. printQueryCmdHelp, printRecipesCatalogJson, printRecipeSqlToStdout, + runDropBaselineCmd, + runListBaselinesCmd, runQueryCmd, } = await import("./cmd-query.js"); const parsed = parseQueryRest(rest); @@ -135,6 +137,19 @@ Copies bundled agent templates into .agents/ under the project root. } return; } + if (parsed.kind === "listBaselines") { + await runListBaselinesCmd({ root, configFile, json: parsed.json }); + return; + } + if (parsed.kind === "dropBaseline") { + await runDropBaselineCmd({ + root, + configFile, + name: parsed.name, + json: parsed.json, + }); + return; + } await runQueryCmd({ root, configFile, @@ -144,6 +159,8 @@ Copies bundled agent templates into .agents/ under the project root. changedSince: parsed.changedSince, recipeId: parsed.recipeId, groupBy: parsed.groupBy, + saveBaseline: parsed.saveBaseline, + baseline: parsed.baseline, }); return; } diff --git a/src/db.test.ts b/src/db.test.ts index dc632b7..ca09ff5 100644 --- a/src/db.test.ts +++ b/src/db.test.ts @@ -4,12 +4,16 @@ import { closeDb, createIndexes, createTables, + deleteQueryBaseline, getMeta, getAllFileHashes, + getQueryBaseline, insertFile, insertSymbols, + listQueryBaselines, SCHEMA_VERSION, setMeta, + upsertQueryBaseline, } from "./db"; import { openCodemapDatabase } from "./sqlite-db"; @@ -117,4 +121,69 @@ describe("SQLite layer (in-memory)", () => { closeDb(db); } }); + + it("query_baselines round-trips upsert / get / list / delete", () => { + const db = openCodemapDatabase(":memory:"); + try { + createTables(db); + expect(listQueryBaselines(db)).toEqual([]); + expect(getQueryBaseline(db, "fan-out")).toBeUndefined(); + + upsertQueryBaseline(db, { + name: "fan-out", + recipe_id: "fan-out", + sql: "SELECT 1", + rows_json: JSON.stringify([{ a: 1 }, { a: 2 }]), + row_count: 2, + git_ref: "abc1234", + created_at: 1_700_000_000_000, + }); + + const got = getQueryBaseline(db, "fan-out"); + expect(got).toEqual({ + name: "fan-out", + recipe_id: "fan-out", + sql: "SELECT 1", + rows_json: JSON.stringify([{ a: 1 }, { a: 2 }]), + row_count: 2, + git_ref: "abc1234", + created_at: 1_700_000_000_000, + }); + + // Re-saving with the same name overwrites in place. + upsertQueryBaseline(db, { + name: "fan-out", + recipe_id: "fan-out", + sql: "SELECT 1", + rows_json: JSON.stringify([{ a: 1 }]), + row_count: 1, + git_ref: "def5678", + created_at: 1_700_000_001_000, + }); + expect(getQueryBaseline(db, "fan-out")?.row_count).toBe(1); + expect(getQueryBaseline(db, "fan-out")?.git_ref).toBe("def5678"); + + // Second baseline coexists. + upsertQueryBaseline(db, { + name: "pre-refactor", + recipe_id: null, + sql: "SELECT name FROM symbols", + rows_json: "[]", + row_count: 0, + git_ref: null, + created_at: 1_700_000_002_000, + }); + + const list = listQueryBaselines(db); + // Sorted DESC by created_at — pre-refactor first. + expect(list.map((b) => b.name)).toEqual(["pre-refactor", "fan-out"]); + expect(list[0]).not.toHaveProperty("rows_json"); // summary view omits payload + + expect(deleteQueryBaseline(db, "pre-refactor")).toBe(true); + expect(deleteQueryBaseline(db, "pre-refactor")).toBe(false); // already gone + expect(listQueryBaselines(db).map((b) => b.name)).toEqual(["fan-out"]); + } finally { + closeDb(db); + } + }); }); diff --git a/src/db.ts b/src/db.ts index 89dc61f..c4480aa 100644 --- a/src/db.ts +++ b/src/db.ts @@ -2,7 +2,7 @@ import { openCodemapDatabase } from "./sqlite-db"; import type { CodemapDatabase, BindValues } from "./sqlite-db"; /** Bump on any DDL change; `createSchema()` auto-rebuilds on mismatch. */ -export const SCHEMA_VERSION = 4; +export const SCHEMA_VERSION = 5; export type { CodemapDatabase }; @@ -138,6 +138,22 @@ export function createTables(db: CodemapDatabase) { key TEXT PRIMARY KEY, value TEXT ) STRICT, WITHOUT ROWID; + + -- User-data table: query result snapshots for --save-baseline / --baseline. + -- Lives next to the index tables so the entire codemap state is one SQLite file + -- (no parallel JSON files / new gitignore entries). Intentionally absent from + -- dropAll() so --full and SCHEMA_VERSION rebuilds preserve baselines (only + -- index tables get dropped). Future schema bumps that change THIS tables shape + -- need an in-place migration rather than relying on the schema-mismatch rebuild. + CREATE TABLE IF NOT EXISTS query_baselines ( + name TEXT PRIMARY KEY, + recipe_id TEXT, + sql TEXT NOT NULL, + rows_json TEXT NOT NULL, + row_count INTEGER NOT NULL, + git_ref TEXT, + created_at INTEGER NOT NULL + ) STRICT; `); } @@ -621,3 +637,95 @@ export function getAllFileHashes(db: CodemapDatabase): Map { } return map; } + +/** + * Snapshot of a `query --recipe ` (or ad-hoc SQL) result, captured by + * `--save-baseline` and replayed by `--baseline`. `rows_json` is the + * canonical JSON.stringify of the row array — set-diff happens in JS by + * stringifying current rows and comparing membership. + */ +export interface QueryBaselineRow { + name: string; + recipe_id: string | null; + sql: string; + rows_json: string; + row_count: number; + git_ref: string | null; + created_at: number; +} + +export function upsertQueryBaseline( + db: CodemapDatabase, + baseline: QueryBaselineRow, +) { + db.run( + `INSERT INTO query_baselines (name, recipe_id, sql, rows_json, row_count, git_ref, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?) + ON CONFLICT(name) DO UPDATE SET + recipe_id = excluded.recipe_id, + sql = excluded.sql, + rows_json = excluded.rows_json, + row_count = excluded.row_count, + git_ref = excluded.git_ref, + created_at = excluded.created_at`, + [ + baseline.name, + baseline.recipe_id, + baseline.sql, + baseline.rows_json, + baseline.row_count, + baseline.git_ref, + baseline.created_at, + ], + ); +} + +export function getQueryBaseline( + db: CodemapDatabase, + name: string, +): QueryBaselineRow | undefined { + // bun:sqlite returns null for misses; better-sqlite3 returns undefined. Coerce here. + return ( + db + .query( + `SELECT name, recipe_id, sql, rows_json, row_count, git_ref, created_at + FROM query_baselines WHERE name = ?`, + ) + .get(name) ?? undefined + ); +} + +/** Lightweight metadata view of every saved baseline (omits `rows_json`). */ +export interface QueryBaselineSummaryRow { + name: string; + recipe_id: string | null; + row_count: number; + git_ref: string | null; + created_at: number; +} + +export function listQueryBaselines( + db: CodemapDatabase, +): QueryBaselineSummaryRow[] { + return db + .query( + `SELECT name, recipe_id, row_count, git_ref, created_at + FROM query_baselines ORDER BY created_at DESC, name ASC`, + ) + .all(); +} + +/** @returns true if a baseline with that name was deleted. */ +export function deleteQueryBaseline( + db: CodemapDatabase, + name: string, +): boolean { + const before = db + .query<{ n: number }>( + "SELECT COUNT(*) AS n FROM query_baselines WHERE name = ?", + ) + .get(name); + if (!before || before.n === 0) return false; + db.run("DELETE FROM query_baselines WHERE name = ?", [name]); + return true; +} diff --git a/templates/agents/rules/codemap.md b/templates/agents/rules/codemap.md index 6ee813c..d511894 100644 --- a/templates/agents/rules/codemap.md +++ b/templates/agents/rules/codemap.md @@ -18,19 +18,23 @@ Install **[@stainless-code/codemap](https://www.npmjs.com/package/@stainless-cod **Examples below use `codemap`** — prefix with **`npx @stainless-code/codemap`** (or **`pnpm dlx`**, **`yarn dlx`**, **`bunx`**) when the CLI is not on your **`PATH`**. -| Action | Command | -| --------------------------------- | ------------------------------------------------------------------------ | -| Incremental index | `codemap` | -| Query (JSON — default for agents) | `codemap query --json ""` | -| Query (ASCII table — optional) | `codemap query ""` | -| Query (recipe) | `codemap query --json --recipe fan-out` (see **`codemap query --help`**) | -| Recipe catalog (JSON) | `codemap query --recipes-json` | -| Print one recipe’s SQL | `codemap query --print-sql fan-out` | -| Counts only | `codemap query --json --summary -r deprecated-symbols` | -| PR-scoped rows | `codemap query --json --changed-since origin/main -r fan-out` | -| Bucket by owner / dir / pkg | `codemap query --json --group-by directory -r fan-in` | - -**Recipe `actions`:** with **`--json`**, recipes that define an `actions` template append it to every row (kebab-case verb + description — e.g. `fan-out` → `review-coupling`). Inspect via **`--recipes-json`**. Ad-hoc SQL never carries actions. +| Action | Command | +| --------------------------------- | ------------------------------------------------------------------------------------------------ | +| Incremental index | `codemap` | +| Query (JSON — default for agents) | `codemap query --json ""` | +| Query (ASCII table — optional) | `codemap query ""` | +| Query (recipe) | `codemap query --json --recipe fan-out` (see **`codemap query --help`**) | +| Recipe catalog (JSON) | `codemap query --recipes-json` | +| Print one recipe’s SQL | `codemap query --print-sql fan-out` | +| Counts only | `codemap query --json --summary -r deprecated-symbols` | +| PR-scoped rows | `codemap query --json --changed-since origin/main -r fan-out` | +| Bucket by owner / dir / pkg | `codemap query --json --group-by directory -r fan-in` | +| Save / diff a baseline | `codemap query --save-baseline -r visibility-tags` then `… --json --baseline -r visibility-tags` | +| List / drop baselines | `codemap query --baselines` · `codemap query --drop-baseline ` | + +**Recipe `actions`:** with **`--json`**, recipes that define an `actions` template append it to every row (kebab-case verb + description — e.g. `fan-out` → `review-coupling`). Under `--baseline`, actions attach to the **`added`** rows only. Inspect via **`--recipes-json`**. Ad-hoc SQL never carries actions. + +**Baselines** (`query_baselines` table inside `.codemap.db`, no parallel JSON files): `--save-baseline[=]` snapshots a result set; `--baseline[=]` diffs the current result against it (added / removed rows; identity = `JSON.stringify(row)`). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Survives `--full` and SCHEMA bumps. **Bundled rules/skills:** **`codemap agents init`** writes **`.agents/`** from the package (see [docs/agents.md](../../../docs/agents.md)). diff --git a/templates/agents/skills/codemap/SKILL.md b/templates/agents/skills/codemap/SKILL.md index 1f3f849..40674b0 100644 --- a/templates/agents/skills/codemap/SKILL.md +++ b/templates/agents/skills/codemap/SKILL.md @@ -41,7 +41,10 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name. - **`--summary`** — counts only. With **`--json`**: **`{"count": N}`**. With **`--group-by`**: **`{"group_by": "", "groups": [{key, count}]}`**. - **`--changed-since `** — post-filter rows by **`path`** / **`file_path`** / **`from_path`** / **`to_path`** / **`resolved_path`** against **`git diff --name-only ...HEAD ∪ git status --porcelain`**. Rows with no recognised path column pass through. - **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. -- **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Inspect via **`--recipes-json`**. +- **`--save-baseline[=]`** — snapshot the result rows to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` and SCHEMA bumps). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Re-saving with the same name overwrites in place. +- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row `JSON.stringify` equality. Pair with `--summary` for `{added: N, removed: N}`. +- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. +- **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**. **Determinism:** Bundled recipes use stable secondary **`ORDER BY`** tie-breakers (and ordered inner **`LIMIT`** samples where applicable). Prefer **`--recipe`** over pasting SQL when you need the maintained ordering. **Canonical SQL** is whatever **`codemap query --print-sql `** or **`codemap query --recipes-json`** returns (single source in the CLI). @@ -212,6 +215,20 @@ LIMIT 10 | `project_root` | Absolute path to project | | `schema_version` | Schema version number | +### `query_baselines` — Saved query result snapshots (user data) + +User-facing baselines saved by `codemap query --save-baseline`, replayed by `codemap query --baseline`. **Survives `--full` and SCHEMA bumps** — intentionally absent from `dropAll()`. + +| Column | Type | Description | +| ---------- | ------- | ---------------------------------------------------------------------------------------- | +| name | TEXT PK | User-supplied name; defaults to the `--recipe` id (ad-hoc SQL requires an explicit name) | +| recipe_id | TEXT | The `--recipe` id when known; NULL for ad-hoc SQL | +| sql | TEXT | The SQL that produced the snapshot | +| rows_json | TEXT | Canonical `JSON.stringify(rows)` — set-diff identity = per-row JSON-stringify equality | +| row_count | INTEGER | Cached length of `rows_json` | +| git_ref | TEXT | `git rev-parse HEAD` at save time, or NULL when not a git working tree | +| created_at | INTEGER | `Date.now()` at save time (epoch ms) | + ## Query patterns ### Basic lookups From 577854861ed73492ec2125abcb9d9eaa60ffc27d Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Fri, 1 May 2026 10:32:10 +0300 Subject: [PATCH 2/3] chore(agents): record .agents/ self-audit findings from PR #30 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two lessons appended after auditing this PR against .agents/rules/: - Backticks inside SQL/help-text template literals — hit twice now (B.7 schema comment + B.6 help text). The cmd-query.ts help string and db.ts CREATE TABLE strings are both `template literals`; a Markdown-style `--flag` code-fence inside terminates the literal early and TypeScript explodes several lines later with a cryptic "expected `,` or `)`". Lesson: use plain prose in those strings, or escape with \\\`. - STOP-before-Grep applies to symbol lookups too — used Grep for `printQueryResult`, `getCurrentCommit`, `dropAll` in PR #30 when `SELECT … FROM symbols WHERE name = ?` was the right tool. The codemap rule already covers this; lesson clarifies that "symbol lookup" is the trigger, not "structural question." Also slim two non-earning code comments per concise-comments rule. --- .agents/lessons.md | 2 ++ src/cli/cmd-query.ts | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/.agents/lessons.md b/.agents/lessons.md index 89b7310..ba03283 100644 --- a/.agents/lessons.md +++ b/.agents/lessons.md @@ -10,3 +10,5 @@ Each entry is a single bullet: `- **** — `. Newest entries at t - **changesets bump policy (pre-v1)** — while in `0.x`, default to **patch** for everything (additive features, fixes, docs, internal refactors); reserve **minor** for schema-breaking changes that force a `.codemap.db` rebuild (matches 0.2.0 precedent: new tables/columns/`SCHEMA_VERSION` bump). Strict SemVer kicks in only after `1.0.0`. Don't propose `minor` just because new CLI commands or public types were added. - **agent rule + skill maintenance** — when shipping a CLI flag, recipe id, recipe `actions` template, schema column, or any agent-queryable surface, update **both** copies of the codemap rule + skill in the **same PR** per [docs/README.md Rule 10](../docs/README.md): `templates/agents/rules/codemap.md` + `templates/agents/skills/codemap/SKILL.md` (ships to npm) **and** `.agents/rules/codemap.md` + `.agents/skills/codemap/SKILL.md` (this clone's mirror). Drift between the two pairs should be CLI-prefix-only (`codemap` vs `bun src/index.ts`). Forgetting this leaves installed agents with a stale view of the CLI — that's how `--summary` / `--changed-since` / `--group-by` / `actions` / `symbols.visibility` shipped without any `templates/agents/` mention until PR #29 retro-fixed it. +- **backticks inside SQL or help-text template literals** — never put a literal backtick inside a `` `...` `` template-literal string. `db.ts` SQL DDL strings (multi-line CREATE TABLE templates) and `printQueryCmdHelp()` (multi-line help text) are both `` `...` `` template literals; an inner backtick — typically a Markdown-style code-fence around a flag like `` `--full` `` — terminates the literal early and the parser blows up several lines later with cryptic "expected `,` or `)`" errors. **Use plain prose in those strings** (`--full` not `` `--full` ``), or escape (`` \` ``) if you really need the character. Hit twice (B.7 + B.6 PR #30); the lesson is general — applies to any TS template literal that gets pasted prose later, not just SQL / help text. +- **STOP-before-Grep applies to symbol lookups too** — `Grep` for symbol names like `printQueryResult`, `getCurrentCommit`, `dropAll` violates the [`codemap` rule](rules/codemap.md). The codemap query `SELECT file_path, line_start FROM symbols WHERE name = ''` answers it faster and without scanning. Reach for `Grep` only when the question is content-shaped (regex over file bodies, finding pattern usages inside function bodies, etc.) — not when it's "where is X defined / who calls X / what does file Y export." This was a PR #30 self-correction. diff --git a/src/cli/cmd-query.ts b/src/cli/cmd-query.ts index 29f64e3..058bd8c 100644 --- a/src/cli/cmd-query.ts +++ b/src/cli/cmd-query.ts @@ -125,7 +125,6 @@ export function parseQueryRest(rest: string[]): i += 2; continue; } - // --save-baseline | --save-baseline= | --save-baseline if (a === "--save-baseline" || a.startsWith("--save-baseline=")) { const eq = a.indexOf("="); if (eq !== -1) { @@ -151,7 +150,6 @@ export function parseQueryRest(rest: string[]): i++; continue; } - // --baseline | --baseline= | --baseline if (a === "--baseline" || a.startsWith("--baseline=")) { const eq = a.indexOf("="); if (eq !== -1) { @@ -946,7 +944,6 @@ function runBaselineDiff(opts: { return; } - // Terminal mode — readable two-section dump. console.log( `baseline "${diff.baseline.name}": ${diff.baseline.row_count} rows → ${diff.current_row_count} rows (+${added.length} / -${removed.length})`, ); From ff6986e35215628338feb03bcba42cec74795daf Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Fri, 1 May 2026 10:43:35 +0300 Subject: [PATCH 3/3] fix(query): address PR #30 CodeRabbit feedback (multiset diff, mutex guards, doc payloads) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Six actionable + one nitpick (one Major), all verified correct. Code: - diffRows: switch from naive Set to multiset frequency-map diff. Naive Set([A,A]) vs Set([A]) reported no removal — wrong for non-DISTINCT queries (e.g. `SELECT name FROM symbols`). Now baseline [A,A] vs current [A] correctly reports removed: [A]. - Parser: --group-by + --save-baseline / --baseline now errors at parse time. Previously runQueryCmd routed to the baseline branch first and silently dropped --group-by. - Parser: --baselines and --drop-baseline now reject --summary, --changed-since, and --group-by (in addition to the existing recipe / SQL / save / baseline checks). Was silently accepted-and-ignored. Docs (synced across architecture.md, README.md, AND both copies of SKILL.md per Rule 10): - --baseline --summary payload corrected: includes baseline + current_row_count alongside added/removed counts (was documented as just {added: N, removed: N}). - README baseline section calls out current_row_count, ad-hoc-needs-name, --group-by mutex. - SKILL.md row_count description: "Cached length of rows_json" was ambiguous (could mean character length); now "Cached number of rows in the saved result set." - SKILL.md --group-by description: "Mutually exclusive with --save-baseline / --baseline." Mirrored on the --baseline side too. - rows_json description: "multiset diff identity (duplicate rows preserved)" instead of "set-diff identity = per-row JSON-stringify equality." Tests: - New diffRows multiset suite (6 cases including 3-of-3 duplicates and per-key independence). - New parser tests: --group-by + baseline mutex, --baselines / --drop-baseline no-op-flag rejection. - New db round-trip test: query_baselines survives dropAll() — the schema-rebuild contract that's the marquee of B.6. Export diffRows so it can be unit-tested in isolation; runtime callers already use it through the same module. --- .agents/skills/codemap/SKILL.md | 10 +-- README.md | 7 +- docs/architecture.md | 2 +- src/cli/cmd-query.test.ts | 102 ++++++++++++++++++++++- src/cli/cmd-query.ts | 65 +++++++++++---- src/db.test.ts | 28 +++++++ templates/agents/skills/codemap/SKILL.md | 10 +-- 7 files changed, 196 insertions(+), 28 deletions(-) diff --git a/.agents/skills/codemap/SKILL.md b/.agents/skills/codemap/SKILL.md index 833016c..d77e2d4 100644 --- a/.agents/skills/codemap/SKILL.md +++ b/.agents/skills/codemap/SKILL.md @@ -40,10 +40,10 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name. - **`--summary`** — counts only. With **`--json`**: **`{"count": N}`**. With **`--group-by`**: **`{"group_by": "", "groups": [{key, count}]}`**. - **`--changed-since `** — post-filter rows by **`path`** / **`file_path`** / **`from_path`** / **`to_path`** / **`resolved_path`** against **`git diff --name-only ...HEAD ∪ git status --porcelain`**. Rows with no recognised path column pass through. -- **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. +- **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. **Mutually exclusive with `--save-baseline` / `--baseline`.** - **`--save-baseline[=]`** — snapshot the result rows to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` and SCHEMA bumps). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Re-saving with the same name overwrites in place. -- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row `JSON.stringify` equality. Pair with `--summary` for `{added: N, removed: N}`. -- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. +- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row multiset equality (canonical `JSON.stringify` keyed frequency map; duplicates preserved). Pair with `--summary` for `{baseline:{...}, current_row_count, added: N, removed: N}`. **Mutually exclusive with `--group-by`.** +- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. Both reject every other flag — they're list-only / drop-only operations. - **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**. **Determinism:** Bundled recipes use stable secondary **`ORDER BY`** tie-breakers (and ordered inner **`LIMIT`** samples where applicable). Prefer **`--recipe`** over pasting SQL when you need the maintained ordering. **Canonical SQL** is **`src/cli/query-recipes.ts`** (`QUERY_RECIPES`). @@ -224,8 +224,8 @@ User-facing baselines saved by `codemap query --save-baseline`, replayed by `cod | name | TEXT PK | User-supplied name; defaults to the `--recipe` id (ad-hoc SQL requires an explicit name) | | recipe_id | TEXT | The `--recipe` id when known; NULL for ad-hoc SQL | | sql | TEXT | The SQL that produced the snapshot | -| rows_json | TEXT | Canonical `JSON.stringify(rows)` — set-diff identity = per-row JSON-stringify equality | -| row_count | INTEGER | Cached length of `rows_json` | +| rows_json | TEXT | Canonical `JSON.stringify(rows)` — multiset diff identity (duplicate rows preserved) | +| row_count | INTEGER | Cached number of rows in the saved result set | | git_ref | TEXT | `git rev-parse HEAD` at save time, or NULL when not a git working tree | | created_at | INTEGER | `Date.now()` at save time (epoch ms) | diff --git a/README.md b/README.md index 6532405..e553ed6 100644 --- a/README.md +++ b/README.md @@ -87,10 +87,13 @@ codemap query --json --group-by owner -r deprecated-symbols codemap query --json --summary --group-by package "SELECT file_path FROM symbols" # Snapshot a result, refactor, then diff (saved inside .codemap.db, no JSON files) codemap query --save-baseline -r visibility-tags # save under name "visibility-tags" -codemap query --json --baseline -r visibility-tags # full diff: {baseline, added, removed} -codemap query --json --summary --baseline -r visibility-tags # counts only: {added, removed} +codemap query --json --baseline -r visibility-tags # full diff: {baseline, current_row_count, added, removed} +codemap query --json --summary --baseline -r visibility-tags # counts only: {baseline, current_row_count, added: N, removed: N} +codemap query --save-baseline=pre-refactor "SELECT file_path FROM symbols" # ad-hoc SQL needs an explicit = +codemap query --baseline=pre-refactor "SELECT file_path FROM symbols" codemap query --baselines # list saved baselines codemap query --drop-baseline visibility-tags # delete +# --group-by is mutually exclusive with --save-baseline / --baseline (different output shapes) # Recipes that define per-row action templates append "actions" hints (kebab-case verb + # description) in --json output; ad-hoc SQL never carries actions. Inspect via --recipes-json. # List bundled recipes as JSON, or print one recipe's SQL (no DB required) diff --git a/docs/architecture.md b/docs/architecture.md index 71011e7..4c1f8a1 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -117,7 +117,7 @@ A local SQLite database (`.codemap.db`) indexes the project tree and stores stru **Commands and flags** (index, query, **`codemap agents init`**, **`--root`**, **`--config`**, environment): [../README.md § CLI](../README.md#cli) — **do not duplicate** flag lists here; this section only adds implementation notes. From this repository: **`bun run dev`** or **`bun src/index.ts`** (same flags). -**Query wiring:** **`src/cli/cmd-query.ts`** (argv, **`printQueryResult`**, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**, **`--save-baseline`** / **`--baseline`** / **`--baselines`** / **`--drop-baseline`**), **`src/cli/query-recipes.ts`** (**`QUERY_RECIPES`** — bundled SQL only source; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **`--save-baseline[=]`** snapshots the result to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` / SCHEMA bumps because the table is intentionally absent from `dropAll()`); name defaults to `--recipe` id, ad-hoc SQL needs an explicit name. **`--baseline[=]`** replays the SQL, fetches the saved row set, and emits `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (or `{added: N, removed: N}` with `--summary`); identity is per-row `JSON.stringify` equality, no fuzzy "changed" category in v1. **`--baselines`** (read-only list) and **`--drop-baseline `** complete the surface; helpers in **`src/db.ts`** (`upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`). **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions; under `--baseline`, actions attach to `added` rows only (the rows the agent should act on). The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). +**Query wiring:** **`src/cli/cmd-query.ts`** (argv, **`printQueryResult`**, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**, **`--save-baseline`** / **`--baseline`** / **`--baselines`** / **`--drop-baseline`**), **`src/cli/query-recipes.ts`** (**`QUERY_RECIPES`** — bundled SQL only source; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **`--save-baseline[=]`** snapshots the result to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` / SCHEMA bumps because the table is intentionally absent from `dropAll()`); name defaults to `--recipe` id, ad-hoc SQL needs an explicit name. **`--baseline[=]`** replays the SQL, fetches the saved row set, and emits `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (or `{baseline:{...}, current_row_count, added: N, removed: N}` with `--summary`); identity is per-row multiset equality (canonical `JSON.stringify` keyed frequency map — duplicate rows are tracked, not collapsed). No fuzzy "changed" category in v1. **`--group-by` is mutually exclusive** with both `--save-baseline` and `--baseline` (different output shapes). **`--baselines`** (read-only list) and **`--drop-baseline `** complete the surface; helpers in **`src/db.ts`** (`upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`). **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions; under `--baseline`, actions attach to `added` rows only (the rows the agent should act on). The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). **Validate wiring:** **`src/cli/cmd-validate.ts`** — **`computeValidateRows`** is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed`. CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. diff --git a/src/cli/cmd-query.test.ts b/src/cli/cmd-query.test.ts index 1188ce3..f572b3a 100644 --- a/src/cli/cmd-query.test.ts +++ b/src/cli/cmd-query.test.ts @@ -1,6 +1,6 @@ import { describe, expect, it } from "bun:test"; -import { parseQueryRest } from "./cmd-query"; +import { diffRows, parseQueryRest } from "./cmd-query"; import { getQueryRecipeActions, getQueryRecipeSql, @@ -282,6 +282,106 @@ describe("parseQueryRest", () => { if (r.kind === "error") expect(r.message).toContain("--drop-baseline"); }); + it("errors when --group-by is combined with --save-baseline or --baseline", () => { + const r1 = parseQueryRest([ + "query", + "--group-by", + "directory", + "--save-baseline", + "-r", + "fan-out", + ]); + expect(r1.kind).toBe("error"); + if (r1.kind === "error") expect(r1.message).toContain("--group-by"); + + const r2 = parseQueryRest([ + "query", + "--group-by", + "directory", + "--baseline", + "-r", + "fan-out", + ]); + expect(r2.kind).toBe("error"); + if (r2.kind === "error") expect(r2.message).toContain("--group-by"); + }); + + it("errors when --baselines is combined with no-op flags", () => { + expect(parseQueryRest(["query", "--summary", "--baselines"]).kind).toBe( + "error", + ); + expect( + parseQueryRest(["query", "--changed-since", "main", "--baselines"]).kind, + ).toBe("error"); + expect( + parseQueryRest(["query", "--group-by", "directory", "--baselines"]).kind, + ).toBe("error"); + }); + + it("errors when --drop-baseline is combined with no-op flags", () => { + expect( + parseQueryRest(["query", "--summary", "--drop-baseline", "x"]).kind, + ).toBe("error"); + expect( + parseQueryRest([ + "query", + "--group-by", + "directory", + "--drop-baseline", + "x", + ]).kind, + ).toBe("error"); + }); +}); + +describe("diffRows (multiset)", () => { + it("reports no diff when baseline equals current", () => { + expect(diffRows([{ a: 1 }], [{ a: 1 }])).toEqual({ + added: [], + removed: [], + }); + }); + + it("reports added rows as the new ones not in baseline", () => { + expect(diffRows([{ a: 1 }], [{ a: 1 }, { a: 2 }])).toEqual({ + added: [{ a: 2 }], + removed: [], + }); + }); + + it("reports removed rows as those gone from current", () => { + expect(diffRows([{ a: 1 }, { a: 2 }], [{ a: 1 }])).toEqual({ + added: [], + removed: [{ a: 2 }], + }); + }); + + it("preserves duplicate-row cardinality (multiset, not set)", () => { + // Baseline [A, A] vs current [A]: one A is removed, NOT zero. + expect(diffRows([{ a: 1 }, { a: 1 }], [{ a: 1 }])).toEqual({ + added: [], + removed: [{ a: 1 }], + }); + }); + + it("matches three-of-three duplicates", () => { + expect( + diffRows([{ a: 1 }, { a: 1 }, { a: 1 }], [{ a: 1 }, { a: 1 }]), + ).toEqual({ added: [], removed: [{ a: 1 }] }); + }); + + it("handles per-key independence in mixed multisets", () => { + expect( + diffRows( + [{ k: "x" }, { k: "x" }, { k: "y" }], + [{ k: "x" }, { k: "y" }, { k: "y" }, { k: "z" }], + ), + ).toEqual({ + added: [{ k: "y" }, { k: "z" }], + removed: [{ k: "x" }], + }); + }); + it("errors when --changed-since has no ref", () => { const r = parseQueryRest(["query", "--changed-since"]); expect(r.kind).toBe("error"); diff --git a/src/cli/cmd-query.ts b/src/cli/cmd-query.ts index 058bd8c..e92e593 100644 --- a/src/cli/cmd-query.ts +++ b/src/cli/cmd-query.ts @@ -251,12 +251,15 @@ export function parseQueryRest(rest: string[]): saveBaseline !== undefined || baseline !== undefined || dropBaselineName !== undefined || + summary || + changedSince !== undefined || + groupBy !== undefined || i < rest.length ) { return { kind: "error", message: - "codemap: --baselines does not take SQL, --recipe, --save-baseline, --baseline, or --drop-baseline.", + "codemap: --baselines is a list-only operation; it does not take SQL or any other --recipe / --save-baseline / --baseline / --drop-baseline / --summary / --changed-since / --group-by flag.", }; } return { kind: "listBaselines", json }; @@ -268,12 +271,15 @@ export function parseQueryRest(rest: string[]): printSqlId !== undefined || saveBaseline !== undefined || baseline !== undefined || + summary || + changedSince !== undefined || + groupBy !== undefined || i < rest.length ) { return { kind: "error", message: - "codemap: --drop-baseline does not take SQL or other baseline / recipe flags.", + "codemap: --drop-baseline only takes a name; it does not compose with SQL or any other --recipe / --save-baseline / --baseline / --summary / --changed-since / --group-by flag.", }; } return { kind: "dropBaseline", name: dropBaselineName, json }; @@ -286,6 +292,16 @@ export function parseQueryRest(rest: string[]): "codemap: --save-baseline and --baseline are mutually exclusive in one run.", }; } + if ( + groupBy !== undefined && + (saveBaseline !== undefined || baseline !== undefined) + ) { + return { + kind: "error", + message: + "codemap: --group-by cannot be combined with --save-baseline or --baseline (different output shapes).", + }; + } if (printSqlId !== undefined) { if (recipeId !== undefined) { @@ -835,20 +851,41 @@ interface BaselineDiff { removed: unknown[]; } -// Set-membership = canonical JSON.stringify(row). For v1 we don't compute a -// "changed" category — it requires a row-key heuristic and the agent can -// re-derive richer diffs from the underlying rows if needed. -function diffRows( +// Multiset diff keyed on canonical JSON.stringify(row). Naive set-diff would +// collapse duplicates: a baseline of [A, A] vs current [A] would report no +// removal even though one A is gone. Frequency maps preserve cardinality so +// non-DISTINCT queries (e.g. `SELECT name FROM symbols`) diff correctly. +// Still no "changed" category — that needs a row-key heuristic; agents can +// derive richer diffs from the raw row sets if needed. +export function diffRows( baseline: unknown[], current: unknown[], -): { - added: unknown[]; - removed: unknown[]; -} { - const baseSet = new Set(baseline.map((r) => JSON.stringify(r))); - const curSet = new Set(current.map((r) => JSON.stringify(r))); - const added = current.filter((r) => !baseSet.has(JSON.stringify(r))); - const removed = baseline.filter((r) => !curSet.has(JSON.stringify(r))); +): { added: unknown[]; removed: unknown[] } { + const countKeys = (rows: unknown[]) => { + const m = new Map(); + for (const r of rows) { + const k = JSON.stringify(r); + m.set(k, (m.get(k) ?? 0) + 1); + } + return m; + }; + const baseCounts = countKeys(baseline); + const curCounts = countKeys(current); + + const added: unknown[] = []; + for (const r of current) { + const k = JSON.stringify(r); + const remaining = baseCounts.get(k) ?? 0; + if (remaining > 0) baseCounts.set(k, remaining - 1); + else added.push(r); + } + const removed: unknown[] = []; + for (const r of baseline) { + const k = JSON.stringify(r); + const remaining = curCounts.get(k) ?? 0; + if (remaining > 0) curCounts.set(k, remaining - 1); + else removed.push(r); + } return { added, removed }; } diff --git a/src/db.test.ts b/src/db.test.ts index ca09ff5..a45e643 100644 --- a/src/db.test.ts +++ b/src/db.test.ts @@ -5,6 +5,7 @@ import { createIndexes, createTables, deleteQueryBaseline, + dropAll, getMeta, getAllFileHashes, getQueryBaseline, @@ -186,4 +187,31 @@ describe("SQLite layer (in-memory)", () => { closeDb(db); } }); + + it("query_baselines survives dropAll() — the schema-rebuild contract", () => { + const db = openCodemapDatabase(":memory:"); + try { + createTables(db); + upsertQueryBaseline(db, { + name: "fan-out", + recipe_id: "fan-out", + sql: "SELECT 1", + rows_json: "[]", + row_count: 0, + git_ref: null, + created_at: 1, + }); + + // dropAll() is what `--full` and SCHEMA_VERSION-mismatch rebuilds invoke. + // The headline contract of B.6 is that user baselines survive that path — + // exercise it explicitly so a future schema refactor can't silently break it. + dropAll(db); + createTables(db); + + expect(listQueryBaselines(db).map((b) => b.name)).toEqual(["fan-out"]); + expect(getQueryBaseline(db, "fan-out")?.recipe_id).toBe("fan-out"); + } finally { + closeDb(db); + } + }); }); diff --git a/templates/agents/skills/codemap/SKILL.md b/templates/agents/skills/codemap/SKILL.md index 40674b0..469531a 100644 --- a/templates/agents/skills/codemap/SKILL.md +++ b/templates/agents/skills/codemap/SKILL.md @@ -40,10 +40,10 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name. - **`--summary`** — counts only. With **`--json`**: **`{"count": N}`**. With **`--group-by`**: **`{"group_by": "", "groups": [{key, count}]}`**. - **`--changed-since `** — post-filter rows by **`path`** / **`file_path`** / **`from_path`** / **`to_path`** / **`resolved_path`** against **`git diff --name-only ...HEAD ∪ git status --porcelain`**. Rows with no recognised path column pass through. -- **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. +- **`--group-by owner|directory|package`** — partition into buckets and emit **`{"group_by", "groups": [{key, count, rows}]}`**. **`owner`** reads CODEOWNERS (last matching rule wins); **`directory`** is the first path segment; **`package`** uses **`package.json`** **`workspaces`** or **`pnpm-workspace.yaml`**. **Mutually exclusive with `--save-baseline` / `--baseline`.** - **`--save-baseline[=]`** — snapshot the result rows to the **`query_baselines`** table inside `.codemap.db` (no parallel JSON files; survives `--full` and SCHEMA bumps). Name defaults to the `--recipe` id; ad-hoc SQL needs an explicit `=`. Re-saving with the same name overwrites in place. -- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row `JSON.stringify` equality. Pair with `--summary` for `{added: N, removed: N}`. -- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. +- **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row multiset equality (canonical `JSON.stringify` keyed frequency map; duplicates preserved). Pair with `--summary` for `{baseline:{...}, current_row_count, added: N, removed: N}`. **Mutually exclusive with `--group-by`.** +- **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. Both reject every other flag — they're list-only / drop-only operations. - **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**. **Determinism:** Bundled recipes use stable secondary **`ORDER BY`** tie-breakers (and ordered inner **`LIMIT`** samples where applicable). Prefer **`--recipe`** over pasting SQL when you need the maintained ordering. **Canonical SQL** is whatever **`codemap query --print-sql `** or **`codemap query --recipes-json`** returns (single source in the CLI). @@ -224,8 +224,8 @@ User-facing baselines saved by `codemap query --save-baseline`, replayed by `cod | name | TEXT PK | User-supplied name; defaults to the `--recipe` id (ad-hoc SQL requires an explicit name) | | recipe_id | TEXT | The `--recipe` id when known; NULL for ad-hoc SQL | | sql | TEXT | The SQL that produced the snapshot | -| rows_json | TEXT | Canonical `JSON.stringify(rows)` — set-diff identity = per-row JSON-stringify equality | -| row_count | INTEGER | Cached length of `rows_json` | +| rows_json | TEXT | Canonical `JSON.stringify(rows)` — multiset diff identity (duplicate rows preserved) | +| row_count | INTEGER | Cached number of rows in the saved result set | | git_ref | TEXT | `git rev-parse HEAD` at save time, or NULL when not a git working tree | | created_at | INTEGER | `Date.now()` at save time (epoch ms) |