diff --git a/.agents/rules/codemap.md b/.agents/rules/codemap.md index b69b659..9648606 100644 --- a/.agents/rules/codemap.md +++ b/.agents/rules/codemap.md @@ -12,33 +12,34 @@ A local database (default **`.codemap/index.db`**) indexes structure: symbols, i ## CLI (this repository) -| Context | Incremental index | Query | -| ------------------------------ | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Default** — from this clone | `bun src/index.ts` | `bun src/index.ts query --json ""` | -| Same entry | `bun run dev` | (same as first row) | -| Query (ASCII table — optional) | — | `bun src/index.ts query ""` | -| Recipe | — | `bun src/index.ts query --json --recipe fan-out` (see **`bun src/index.ts query --help`**) | -| Parametrised recipe | — | `bun src/index.ts query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query%` — params declared in recipe `.md` frontmatter and validated before SQL binding. | -| Rename preview | — | `bun src/index.ts query --recipe rename-preview --params old=usePermissions,new=useAccess,kind=function --format diff` — read-only unified diff; codemap never writes files. | -| Recipe catalog / SQL | — | `bun src/index.ts query --recipes-json` · `bun src/index.ts query --print-sql fan-out` | -| Counts only | — | `bun src/index.ts query --json --summary -r deprecated-symbols` | -| PR-scoped rows | — | `bun src/index.ts query --json --changed-since origin/main -r fan-out` | -| Bucket by owner / dir / pkg | — | `bun src/index.ts query --json --group-by directory -r fan-in` | -| Save / diff a baseline | — | `bun src/index.ts query --save-baseline -r visibility-tags` then `… --json --baseline -r visibility-tags` | -| List / drop baselines | — | `bun src/index.ts query --baselines` · `bun src/index.ts query --drop-baseline ` | -| Per-delta audit | — | `bun src/index.ts audit --json --baseline base` (auto-resolves `base-files` / `base-dependencies` / `base-deprecated`) | -| Audit vs git ref | — | `bun src/index.ts audit --base origin/main --json` — worktree+reindex against any committish; sub-100ms second run via sha-keyed cache. Mutually exclusive with `--baseline`; per-delta overrides compose. | -| MCP server (for agent hosts) | — | `bun src/index.ts mcp [--no-watch] [--debounce ]` — JSON-RPC on stdio; one tool per CLI verb. Watcher default-ON since 2026-05. See **MCP** section below. | -| HTTP server (for non-MCP) | — | `bun src/index.ts serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]` — same tool taxonomy over POST /tool/{name}. Watcher default-ON since 2026-05. | -| Watch mode (live reindex) | — | `bun src/index.ts watch [--debounce 250] [--quiet]` — standalone long-running process; debounced reindex on file changes. `mcp` / `serve` boot the watcher in-process by default — pass `--no-watch` (or `CODEMAP_WATCH=0`) to opt out. | -| Targeted read (metadata) | — | `bun src/index.ts show [--kind ] [--in ] [--json]` — file:line + signature | -| Targeted read (source text) | — | `bun src/index.ts snippet [--kind ] [--in ] [--json]` — same lookup + source from disk + stale flag | -| Impact (blast-radius walker) | — | `bun src/index.ts impact [--direction up\|down\|both] [--depth N] [--via ] [--limit N] [--summary] [--json]` — replaces hand-composed `WITH RECURSIVE` queries | -| Coverage ingest | — | `bun src/index.ts ingest-coverage [--json]` — Istanbul (`coverage-final.json`) or LCOV (`lcov.info`); format auto-detected. Joinable to `symbols` for "untested AND dead" queries. | -| SARIF / GH annotations | — | `bun src/index.ts query --recipe deprecated-symbols --format sarif` · `… --format annotations` | -| Mermaid graph (≤50 edges) | — | `bun src/index.ts query --format mermaid 'SELECT from_path AS "from", to_path AS "to" FROM dependencies LIMIT 50'` — recipes / SQL must alias columns to `{from, to, label?, kind?}`; rejects unbounded inputs. | -| Diff preview | — | `bun src/index.ts query --format diff ''` — read-only unified diff; `--format diff-json` returns structured hunks for agents. | -| FTS5 full-text (opt-in) | `--with-fts` | `bun src/index.ts --with-fts --full` enables `source_fts` virtual table; `query --recipe text-in-deprecated-functions` demos JOINs. | +| Context | Incremental index | Query | +| ------------------------------ | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Default** — from this clone | `bun src/index.ts` | `bun src/index.ts query --json ""` | +| Same entry | `bun run dev` | (same as first row) | +| Query (ASCII table — optional) | — | `bun src/index.ts query ""` | +| Recipe | — | `bun src/index.ts query --json --recipe fan-out` (see **`bun src/index.ts query --help`**) | +| Parametrised recipe | — | `bun src/index.ts query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query%` — params declared in recipe `.md` frontmatter and validated before SQL binding. | +| Boundary violations | — | `bun src/index.ts query --json --recipe boundary-violations` — joins `dependencies` × `boundary_rules` (config-driven) via SQLite `GLOB`. `.codemap/config.ts` `boundaries: [{name, from_glob, to_glob, action?}]`; default `action: "deny"`. SARIF / annotations work via the `file_path` alias. | +| Rename preview | — | `bun src/index.ts query --recipe rename-preview --params old=usePermissions,new=useAccess,kind=function --format diff` — read-only unified diff; codemap never writes files. | +| Recipe catalog / SQL | — | `bun src/index.ts query --recipes-json` · `bun src/index.ts query --print-sql fan-out` | +| Counts only | — | `bun src/index.ts query --json --summary -r deprecated-symbols` | +| PR-scoped rows | — | `bun src/index.ts query --json --changed-since origin/main -r fan-out` | +| Bucket by owner / dir / pkg | — | `bun src/index.ts query --json --group-by directory -r fan-in` | +| Save / diff a baseline | — | `bun src/index.ts query --save-baseline -r visibility-tags` then `… --json --baseline -r visibility-tags` | +| List / drop baselines | — | `bun src/index.ts query --baselines` · `bun src/index.ts query --drop-baseline ` | +| Per-delta audit | — | `bun src/index.ts audit --json --baseline base` (auto-resolves `base-files` / `base-dependencies` / `base-deprecated`) | +| Audit vs git ref | — | `bun src/index.ts audit --base origin/main --json` — worktree+reindex against any committish; sub-100ms second run via sha-keyed cache. Mutually exclusive with `--baseline`; per-delta overrides compose. | +| MCP server (for agent hosts) | — | `bun src/index.ts mcp [--no-watch] [--debounce ]` — JSON-RPC on stdio; one tool per CLI verb. Watcher default-ON since 2026-05. See **MCP** section below. | +| HTTP server (for non-MCP) | — | `bun src/index.ts serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]` — same tool taxonomy over POST /tool/{name}. Watcher default-ON since 2026-05. | +| Watch mode (live reindex) | — | `bun src/index.ts watch [--debounce 250] [--quiet]` — standalone long-running process; debounced reindex on file changes. `mcp` / `serve` boot the watcher in-process by default — pass `--no-watch` (or `CODEMAP_WATCH=0`) to opt out. | +| Targeted read (metadata) | — | `bun src/index.ts show [--kind ] [--in ] [--json]` — file:line + signature | +| Targeted read (source text) | — | `bun src/index.ts snippet [--kind ] [--in ] [--json]` — same lookup + source from disk + stale flag | +| Impact (blast-radius walker) | — | `bun src/index.ts impact [--direction up\|down\|both] [--depth N] [--via ] [--limit N] [--summary] [--json]` — replaces hand-composed `WITH RECURSIVE` queries | +| Coverage ingest | — | `bun src/index.ts ingest-coverage [--json]` — Istanbul (`coverage-final.json`) or LCOV (`lcov.info`); format auto-detected. Joinable to `symbols` for "untested AND dead" queries. | +| SARIF / GH annotations | — | `bun src/index.ts query --recipe deprecated-symbols --format sarif` · `… --format annotations` | +| Mermaid graph (≤50 edges) | — | `bun src/index.ts query --format mermaid 'SELECT from_path AS "from", to_path AS "to" FROM dependencies LIMIT 50'` — recipes / SQL must alias columns to `{from, to, label?, kind?}`; rejects unbounded inputs. | +| Diff preview | — | `bun src/index.ts query --format diff ''` — read-only unified diff; `--format diff-json` returns structured hunks for agents. | +| FTS5 full-text (opt-in) | `--with-fts` | `bun src/index.ts --with-fts --full` enables `source_fts` virtual table; `query --recipe text-in-deprecated-functions` demos JOINs. | **Recipe metadata:** with **`--json`**, recipes that define an `actions` template append it to every row (kebab-case verb + description — e.g. `fan-out` → `review-coupling`). Under `--baseline`, actions attach to the **`added`** rows only. Parametrised recipes declare `params` in `.md` frontmatter; pass values with `--params key=value[,key=value]` (repeatable; last value wins). Inspect both via **`--recipes-json`**. Ad-hoc SQL never carries actions or params. diff --git a/.agents/skills/codemap/SKILL.md b/.agents/skills/codemap/SKILL.md index c720f50..b4c4796 100644 --- a/.agents/skills/codemap/SKILL.md +++ b/.agents/skills/codemap/SKILL.md @@ -45,6 +45,7 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name. - **`--baseline[=]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row multiset equality (canonical `JSON.stringify` keyed frequency map; duplicates preserved). Pair with `--summary` for `{baseline:{...}, current_row_count, added: N, removed: N}`. **Mutually exclusive with `--group-by`.** - **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline `** deletes one. Both reject every other flag — they're list-only / drop-only operations. - **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**. +- **Boundary violations (config-driven)** — declare `boundaries: [{name, from_glob, to_glob, action?}]` in `.codemap/config.ts` and run `bun src/index.ts query --recipe boundary-violations [--format sarif]`. The `action` field defaults to `"deny"` (the only shape v1 surfaces); rules are reconciled into the `boundary_rules` table on every index pass and joined against `dependencies` via SQLite `GLOB`. See [`docs/architecture.md` § `boundary_rules`](../../../docs/architecture.md#boundary_rules--architecture-boundary-rules-config-derived-strict-without-rowid). - **Project-local recipes** — drop **`.sql`** (and optional **`.md`** for description body, params, and actions) into **`/recipes/`** (default `/.codemap/recipes/`) to make team-internal SQL a first-class CLI verb. `--recipes-json` and the `codemap://recipes` MCP resource list project recipes alongside bundled ones with **`source: "bundled" | "project"`** discriminating them. Project recipes win on id collision; entries that override a bundled id carry **`shadows: true`** so agents reading the catalog at session start know when a recipe behaves differently from the documented bundled version. `.md` supports YAML frontmatter for `params:` and per-row `actions:` — **block-list shape only** (loader's hand-rolled parser; no inline-flow `[{...}]`). Param types: `string | number | boolean`; pass values with `--params key=value[,key=value]` (repeatable; last value wins). Example: `bun src/index.ts query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query%`. Validation: SQL is rejected at load time if it starts with DML/DDL (DELETE/DROP/UPDATE/etc.); params validate before SQL binding; runtime `PRAGMA query_only=1` is the parser-proof backstop. `.codemap/index.db` is gitignored; **`.codemap/recipes/` is NOT** — recipes are git-tracked source code authored for human review. **Audit (`bun src/index.ts audit`)** — separate top-level command for structural-drift verdicts. Composes B.6 baselines into a per-delta `{head, deltas}` envelope; v1 ships `files` / `dependencies` / `deprecated`. Two snapshot-source shapes: @@ -510,9 +511,9 @@ bun src/index.ts query --json "SELECT key, value FROM meta" ## Troubleshooting -| Problem | Solution | -| -------------------------- | ---------------------------------------------------------------------------------------------------------------------- | -| Stale results after rebase | Run **`bun src/index.ts --full`** (or **`codemap --full`** when exercising the packaged CLI) | -| Missing file in results | Check exclude / include globs in **`codemap.config.ts`**, **`codemap.config.json`**, or defaults in **`src/index.ts`** | -| `resolved_path` is NULL | Import is an external package (not in project) | -| Resolver errors | Verify `tsconfig.json` paths (or **`tsconfigPath`** in config) when resolving aliases | +| Problem | Solution | +| -------------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| Stale results after rebase | Run **`bun src/index.ts --full`** (or **`codemap --full`** when exercising the packaged CLI) | +| Missing file in results | Check exclude / include globs in **`.codemap/config.ts`**, **`.codemap/config.json`**, or defaults in **`src/index.ts`** | +| `resolved_path` is NULL | Import is an external package (not in project) | +| Resolver errors | Verify `tsconfig.json` paths (or **`tsconfigPath`** in config) when resolving aliases | diff --git a/.changeset/boundary-violations.md b/.changeset/boundary-violations.md new file mode 100644 index 0000000..900c2eb --- /dev/null +++ b/.changeset/boundary-violations.md @@ -0,0 +1,51 @@ +--- +"@stainless-code/codemap": minor +--- + +feat(boundaries): config-driven architecture-boundary rules + `boundary-violations` recipe + +Adds the smallest substrate for first-class architecture boundary checks. Schema bump 8 → 9. + +**Configure** + +```ts +import { defineConfig } from "@stainless-code/codemap"; + +export default defineConfig({ + boundaries: [ + { + name: "ui-cant-touch-server", + from_glob: "src/ui/**", + to_glob: "src/server/**", + }, + ], +}); +``` + +`action` defaults to `"deny"` (the only shape v1 surfaces); `"allow"` reserves the slot for future whitelist semantics. + +**Substrate** + +- New config field `boundaries: BoundaryRule[]` on the Zod user-config schema (`src/config.ts`); validated at config-load time. +- New table `boundary_rules(name PK, from_glob, to_glob, action CHECK IN ('deny','allow'))` (`STRICT, WITHOUT ROWID`) — fully derived from config, dropped on `--full` / `SCHEMA_VERSION` rebuilds and re-filled by the next index pass. +- New helper `reconcileBoundaryRules(db, rules)` in `src/db.ts`; called from `runCodemapIndex` after `createSchema` so the table tracks config exactly. +- New runtime accessor `getBoundaryRules()`. + +**Recipe** + +`templates/recipes/boundary-violations.{sql,md}` joins `dependencies` × `boundary_rules` via SQLite `GLOB` and surfaces violating import edges as locatable rows. `--format sarif` and `--format annotations` light up automatically (the recipe aliases `dependencies.from_path` to `file_path`). Use as a CI gate: + +```bash +codemap query --recipe boundary-violations --format sarif > findings.sarif +``` + +**Lockstep** + +- `docs/architecture.md` § Schema gains a `boundary_rules` subsection. +- `docs/glossary.md` adds `boundaries` / `boundary_rules` / `boundary-violations` entry. +- `docs/roadmap.md § Backlog` removes the now-shipped item per Rule 2. +- `templates/agents/rules/codemap.md`, `.agents/rules/codemap.md`, `templates/agents/skills/codemap/SKILL.md`, `.agents/skills/codemap/SKILL.md`, and `README.md` all document the new shape. + +**Tests** + +`src/application/boundary-rules.test.ts` covers schema creation, idempotent reconciliation, CHECK constraint, and the recipe SQL against a synthetic dependency graph. `src/config.test.ts` covers Zod validation including default-action filling and unknown-action rejection. diff --git a/README.md b/README.md index 01b6231..6a44c06 100644 --- a/README.md +++ b/README.md @@ -80,6 +80,10 @@ codemap query --json --recipe fan-out-sample # Parametrised recipes validate params from .md frontmatter before SQL binding. codemap query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query% codemap query --recipe rename-preview --params old=usePermissions,new=useAccess,kind=function --format diff +# Architecture-boundary rules (declare in .codemap/config.ts): +# boundaries: [{ name: "ui-cant-touch-server", from_glob: "src/ui/**", to_glob: "src/server/**" }] +# Default action is "deny"; the table is reconciled from config on every index pass. +codemap query --recipe boundary-violations --format sarif > boundary-findings.sarif # Counts only (skip the rows) — pairs well with --recipe for dashboards / agent context windows codemap query --json --summary -r deprecated-symbols # PR-scoped: filter result rows to those touching files changed since @@ -128,7 +132,7 @@ codemap query --format mermaid 'SELECT from_path AS "from", to_path AS "to" FROM codemap query --format diff 'SELECT "README.md" AS file_path, 1 AS line_start, "# Codemap" AS before_pattern, "# Codemap Preview" AS after_pattern' codemap query --format diff-json 'SELECT "README.md" AS file_path, 1 AS line_start, "# Codemap" AS before_pattern, "# Codemap Preview" AS after_pattern' | jq '.summary' # --with-fts — opt-in FTS5 virtual table populated at index time. Default OFF (preserves -# .codemap/index.db size); CLI flag wins over codemap.config.ts `fts5` field. Toggle change +# .codemap/index.db size); CLI flag wins over .codemap/config.ts `fts5` field. Toggle change # auto-detects and forces a full rebuild so `source_fts` stays consistent. codemap --with-fts --full codemap query --recipe text-in-deprecated-functions # demonstrates FTS5 ⨯ symbols ⨯ coverage JOIN diff --git a/codemap.config.example.json b/codemap.config.example.json index 9c9bbc7..8a94928 100644 --- a/codemap.config.example.json +++ b/codemap.config.example.json @@ -1,4 +1,11 @@ { "include": ["src/**/*.{ts,tsx,js,jsx}", "src/**/*.css", "**/*.{md,json}"], - "excludeDirNames": ["node_modules", ".git", "dist", "build", ".output"] + "excludeDirNames": ["node_modules", ".git", "dist", "build", ".output"], + "boundaries": [ + { + "name": "ui-cant-touch-server", + "from_glob": "src/ui/*", + "to_glob": "src/server/*" + } + ] } diff --git a/docs/architecture.md b/docs/architecture.md index 6b6ffda..783c3ed 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -123,7 +123,7 @@ A local SQLite database (`.codemap/index.db`) indexes the project tree and store **Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed`. CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. -**Audit wiring:** **`src/cli/cmd-audit.ts`** (argv, `--baseline ` auto-resolve sugar, `---baseline ` per-delta explicit overrides, `--base ` git-ref baseline, `--json`, `--summary`, `--no-index`) + **`src/application/audit-engine.ts`** (delta registry + diff). Mirrors the `cmd-index.ts ↔ application/index-engine.ts` seam — CLI parses + dispatches; engine does the diff. **`runAudit({db, baselines})`** iterates the per-delta baseline map; deltas absent from the map don't run. Each entry in **`V1_DELTAS`** pins a canonical SQL projection (`files`: `SELECT path FROM files`; `dependencies`: `SELECT from_path, to_path FROM dependencies`; `deprecated`: `SELECT name, kind, file_path FROM symbols WHERE doc_comment LIKE '%@deprecated%'`) plus a `requiredColumns` list. **`computeDelta`** validates baseline column-set membership, projects baseline rows down to the canonical column subset (extras dropped — schema-drift-resilient), runs the canonical SQL via the caller's DB connection, and set-diffs via the existing **`src/diff-rows.ts`** multiset helper (shared with `query --baseline`). Each emitted delta carries its own **`base`** metadata so mixed-baseline audits (e.g. `--baseline base --dependencies-baseline override`) are first-class. **`runAuditCmd`** runs an auto-incremental-index prelude (`runCodemapIndex({mode: "incremental", quiet: true})`) before the diff so `head` reflects the current source — `--no-index` opts out for frozen-DB CI scenarios. **`resolveAuditBaselines({db, baselinePrefix, perDelta})`** composes the baseline map: auto-resolves `-` for slots that exist (silently absent otherwise) and lets per-delta flags override individual slots. v1 ships no `verdict` / threshold config / non-zero exit codes — consumers compose `--json` + `jq` for CI exit codes; v1.x still tracks `verdict` + `codemap.config.audit` thresholds. **`--base ` (shipped):** **`runAuditFromRef({db, ref, perDeltaOverrides, projectRoot, reindex})`** materialises the ref via **`application/audit-worktree.ts`** — `git rev-parse --verify "^{commit}"` → resolved sha → cache lookup at `/.codemap/audit-cache//`. Cache miss: per-pid temp dir (`.tmp...`) gets `git worktree add --detach`, the injected `reindex` callback (`makeWorktreeReindex` in production — re-inits the runtime singletons against the worktree path, runs `runCodemapIndex({mode: "full"})`, restores) writes `.codemap/index.db` inside, then POSIX `rename` claims the final `/` slot. **Atomic populate** — concurrent processes resolving the same sha race-safely without lock files (loser's rename fails with EEXIST → falls through to cache hit). Eviction: hardcoded LRU 5 entries / 500 MiB; `git worktree remove --force` then `rm -rf` for each victim; orphan `.tmp.*` dirs older than 10 min get swept too. Per-delta `base` metadata gains a discriminator: existing baseline-source remains `{source: "baseline", name, sha, indexed_at}`; new ref-source is `{source: "ref", ref, sha, indexed_at}`. `--base` is mutually exclusive with `--baseline ` (parser + handler both guard); composes orthogonally with per-delta `---baseline name` overrides. Hard error on non-git projects (`existsSync(/.git)` check before any spawn). All git spawns in `audit-worktree.ts` strip inherited `GIT_*` env vars so a containing git operation (e.g. running codemap inside a husky hook) doesn't route worktree calls at the wrong index. +**Audit wiring:** **`src/cli/cmd-audit.ts`** (argv, `--baseline ` auto-resolve sugar, `---baseline ` per-delta explicit overrides, `--base ` git-ref baseline, `--json`, `--summary`, `--no-index`) + **`src/application/audit-engine.ts`** (delta registry + diff). Mirrors the `cmd-index.ts ↔ application/index-engine.ts` seam — CLI parses + dispatches; engine does the diff. **`runAudit({db, baselines})`** iterates the per-delta baseline map; deltas absent from the map don't run. Each entry in **`V1_DELTAS`** pins a canonical SQL projection (`files`: `SELECT path FROM files`; `dependencies`: `SELECT from_path, to_path FROM dependencies`; `deprecated`: `SELECT name, kind, file_path FROM symbols WHERE doc_comment LIKE '%@deprecated%'`) plus a `requiredColumns` list. **`computeDelta`** validates baseline column-set membership, projects baseline rows down to the canonical column subset (extras dropped — schema-drift-resilient), runs the canonical SQL via the caller's DB connection, and set-diffs via the existing **`src/diff-rows.ts`** multiset helper (shared with `query --baseline`). Each emitted delta carries its own **`base`** metadata so mixed-baseline audits (e.g. `--baseline base --dependencies-baseline override`) are first-class. **`runAuditCmd`** runs an auto-incremental-index prelude (`runCodemapIndex({mode: "incremental", quiet: true})`) before the diff so `head` reflects the current source — `--no-index` opts out for frozen-DB CI scenarios. **`resolveAuditBaselines({db, baselinePrefix, perDelta})`** composes the baseline map: auto-resolves `-` for slots that exist (silently absent otherwise) and lets per-delta flags override individual slots. v1 ships no `verdict` / threshold config / non-zero exit codes — consumers compose `--json` + `jq` for CI exit codes; v1.x still tracks `verdict` + an `audit` field on the config object (`.codemap/config.{ts,js,json}`) thresholds. **`--base ` (shipped):** **`runAuditFromRef({db, ref, perDeltaOverrides, projectRoot, reindex})`** materialises the ref via **`application/audit-worktree.ts`** — `git rev-parse --verify "^{commit}"` → resolved sha → cache lookup at `/.codemap/audit-cache//`. Cache miss: per-pid temp dir (`.tmp...`) gets `git worktree add --detach`, the injected `reindex` callback (`makeWorktreeReindex` in production — re-inits the runtime singletons against the worktree path, runs `runCodemapIndex({mode: "full"})`, restores) writes `.codemap/index.db` inside, then POSIX `rename` claims the final `/` slot. **Atomic populate** — concurrent processes resolving the same sha race-safely without lock files (loser's rename fails with EEXIST → falls through to cache hit). Eviction: hardcoded LRU 5 entries / 500 MiB; `git worktree remove --force` then `rm -rf` for each victim; orphan `.tmp.*` dirs older than 10 min get swept too. Per-delta `base` metadata gains a discriminator: existing baseline-source remains `{source: "baseline", name, sha, indexed_at}`; new ref-source is `{source: "ref", ref, sha, indexed_at}`. `--base` is mutually exclusive with `--baseline ` (parser + handler both guard); composes orthogonally with per-delta `---baseline name` overrides. Hard error on non-git projects (`existsSync(/.git)` check before any spawn). All git spawns in `audit-worktree.ts` strip inherited `GIT_*` env vars so a containing git operation (e.g. running codemap inside a husky hook) doesn't route worktree calls at the wrong index. **Context wiring:** **`src/cli/cmd-context.ts`** (argv + render) + **`src/application/context-engine.ts`** (engine — **`buildContextEnvelope`**, **`classifyIntent`**, `ContextEnvelope` type). `buildContextEnvelope` composes the JSON envelope from existing recipes (`fan-in` for `hubs`, `markers` SELECT for `sample_markers`, `QUERY_RECIPES` map for the catalog). **`classifyIntent`** maps `--for ""` to one of `refactor | debug | test | feature | explore | other` via regex against the trimmed input; whitespace-only intents are rejected. `--compact` drops `hubs` + `sample_markers` and emits one-line JSON; otherwise pretty-prints with 2-space indent. @@ -181,7 +181,7 @@ Optional **`/config.{ts,js,json}`** (default `.codemap/config.*`; def **Fresh database:** the default CLI **`codemap`** (incremental) calls **`createSchema()`** in **`runCodemapIndex`** before **`getChangedFiles()`**, so the **`meta`** table exists before **`getMeta(..., "last_indexed_commit")`** runs on an empty **`.codemap/index.db`**. -Current schema version: **8** — see [Schema Versioning](#schema-versioning) for details. +Current schema version: **9** — see [Schema Versioning](#schema-versioning) for details. All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. PRAGMAs and index design: [SQLite Performance Configuration](#sqlite-performance-configuration). @@ -333,7 +333,7 @@ The `fts5_enabled` key tracks the FTS5 toggle state at the last reindex; mismatc ### `source_fts` — Opt-in FTS5 virtual table over file content -Always created (near-zero space when empty); populated by the indexer only when the resolved config has FTS5 enabled (`codemap.config.ts` `fts5: true` OR `--with-fts` CLI flag at index time). Tokenizer `porter unicode61` (Porter stemmer over Unicode-aware tokeniser; ~3× smaller than the trigram alternative). `file_path UNINDEXED` skips tokenising paths since filtering is exact via `WHERE file_path = ?`. +Always created (near-zero space when empty); populated by the indexer only when the resolved config has FTS5 enabled (`.codemap/config.ts` `fts5: true` OR `--with-fts` CLI flag at index time). Tokenizer `porter unicode61` (Porter stemmer over Unicode-aware tokeniser; ~3× smaller than the trigram alternative). `file_path UNINDEXED` skips tokenising paths since filtering is exact via `WHERE file_path = ?`. | Column | Type | Description | | --------- | -------------- | -------------------------------------------------------------------------- | @@ -375,6 +375,21 @@ Three meta keys (`coverage_last_ingested_at` / `_path` / `_format`) record fresh Bundled recipes consuming the table — `untested-and-dead`, `files-by-coverage`, `worst-covered-exports`. Each ships a frontmatter `actions` block (per PR #26) so agents see per-row follow-up hints in `--json` output. +### `boundary_rules` — Architecture-boundary rules (config-derived) (`STRICT, WITHOUT ROWID`) + +Reconciled from `.codemap/config.ts` `boundaries: [...]` on every index pass via `reconcileBoundaryRules` in `db.ts`; the wiring lives in `application/run-index.ts` right after `createSchema`. Empty when the user declares no boundaries. Bundled `boundary-violations` recipe joins this table against `dependencies` via SQLite `GLOB` to surface forbidden imports; `--format sarif` lights up automatically because the recipe row aliases `dependencies.from_path` to `file_path` (the existing location-column priority list catches it). + +Dropped on every `--full` / `SCHEMA_VERSION` rebuild like the other index tables — the next index pass re-fills it from config, so no migration is needed when the schema bumps. Distinct from `query_baselines` / `coverage`: those are user data and survive rebuilds; `boundary_rules` is config data and is rebuilt deterministically. + +| Column | Type | Description | +| --------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| name | TEXT PK | Stable identifier from config — surfaced in recipe rows and SARIF message bodies. | +| from_glob | TEXT | SQLite `GLOB` pattern matched against `dependencies.from_path` (the file doing the import). | +| to_glob | TEXT | SQLite `GLOB` pattern matched against `dependencies.to_path` (the file being imported). | +| action | TEXT | `'deny'` or `'allow'` (CHECK constraint). v1 recipe filters on `action = 'deny'`; `'allow'` reserves the slot for future whitelist semantics. Defaults to `'deny'` in config. | + +Keep this table tiny by construction — one row per declared boundary. Glob complexity stays in SQLite's `GLOB` (`*` / `?` / `[abc]`); rich shapes (layer ordering, element-type rules, except-self) compile down to extra `boundary_rules` rows or stay user-side per Moat A. + ### Indexes All tables have covering indexes tuned for AI agent query patterns. See [Covering indexes](#covering-indexes) and [Partial indexes](#partial-indexes) for the full list. diff --git a/docs/glossary.md b/docs/glossary.md index bb6345e..047ab39 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -51,6 +51,10 @@ In Codemap usage: a file with a high number of `exports` rows — typically a pu The shared `batchInsert()` helper in `src/db.ts`. Splits inserts into multi-row `INSERT … VALUES (…),(…)` statements of `BATCH_SIZE` (500) rows each, with pre-computed placeholder strings. Used by every `insertX` function. +### `boundaries` (config) / `boundary_rules` (table) / `boundary-violations` (recipe) + +Architecture-boundary substrate. Users declare `boundaries: [{name, from_glob, to_glob, action?}]` in `.codemap/config.ts`; the resolver fills `action` to `"deny"` when omitted. Every index pass calls `reconcileBoundaryRules` (in `src/db.ts`) which clears `boundary_rules` and re-inserts from the resolved config — config is the single source of truth, the table is a denormalised lookup. Bundled `boundary-violations` recipe joins `dependencies` × `boundary_rules` via SQLite `GLOB` and surfaces forbidden import edges; `--format sarif` lights up automatically because the recipe row aliases `dependencies.from_path` to `file_path`. CHECK constraint pins `action ∈ {'deny','allow'}`. v1 only honours `'deny'`; `'allow'` reserves the slot for future whitelist semantics. See [architecture.md § `boundary_rules`](./architecture.md#boundary_rules--architecture-boundary-rules-config-derived-strict-without-rowid). + ### `bun:sqlite` Bun's native SQLite binding. Codemap uses it on Bun; falls back to `better-sqlite3` on Node. Both are wrapped by `src/sqlite-db.ts` so call sites are runtime-agnostic. @@ -105,7 +109,7 @@ Per-function decision-point count (REAL column on `symbols`). Computed by the pa ### `source_fts` (FTS5 virtual table) / `--with-fts` / opt-in full-text -Opt-in FTS5 virtual table over file content (`tokenize='porter unicode61'`). Always created (near-zero space when empty); populated only when the resolved config has FTS5 enabled (`codemap.config.ts` `fts5: true` OR `--with-fts` CLI flag at index time; CLI wins, logs stderr override). Demonstrates the FTS5 ⨯ `symbols` ⨯ `coverage` JOIN composability that ripgrep can't match — bundled recipe `text-in-deprecated-functions` exemplifies the JOIN. Toggle change auto-detects via `meta.fts5_enabled` and forces a full rebuild so `source_fts` is consistently populated. Stderr telemetry `[fts5] source_fts populated: files / KB` on first populate. Distinct from arbitrary full-text storage — the table is structurally identical to `coverage` (both `WITHOUT ROWID`-class virtual tables in the substrate). Default OFF preserves `.codemap/index.db` size for non-users (~30–50% growth on text-heavy projects). +Opt-in FTS5 virtual table over file content (`tokenize='porter unicode61'`). Always created (near-zero space when empty); populated only when the resolved config has FTS5 enabled (`.codemap/config.ts` `fts5: true` OR `--with-fts` CLI flag at index time; CLI wins, logs stderr override). Demonstrates the FTS5 ⨯ `symbols` ⨯ `coverage` JOIN composability that ripgrep can't match — bundled recipe `text-in-deprecated-functions` exemplifies the JOIN. Toggle change auto-detects via `meta.fts5_enabled` and forces a full rebuild so `source_fts` is consistently populated. Stderr telemetry `[fts5] source_fts populated: files / KB` on first populate. Distinct from arbitrary full-text storage — the table is structurally identical to `coverage` (both `WITHOUT ROWID`-class virtual tables in the substrate). Default OFF preserves `.codemap/index.db` size for non-users (~30–50% growth on text-heavy projects). ### `--format mermaid` / `formatMermaid` / `MERMAID_MAX_EDGES` diff --git a/docs/plans/c9-plugin-layer.md b/docs/plans/c9-plugin-layer.md index 3fe210d..8a3f73c 100644 --- a/docs/plans/c9-plugin-layer.md +++ b/docs/plans/c9-plugin-layer.md @@ -27,7 +27,7 @@ These are committed to v1. Questions opened against them must justify against th These are the design questions the plan-PR resolves before impl starts (per the parallel-plan-PR shape). Each gets a section below as it crystallises. - **Q1 — Plugin contract shape.** JSON schema (declarative)? Zod-validated TS module (typed)? Markdown-with-frontmatter (mirrors recipe-as-content registry per PR #37)? What fields beyond `entry_globs`? -- **Q2 — Plugin discovery mechanism.** npm peerDep registration (mirrors community-adapter pattern from [`roadmap.md § Strategy`](../roadmap.md#strategy))? Path-glob auto-discovery from `/.codemap/plugins/`? Config-listed (`codemap.config.ts` `plugins: [...]`)? Some combination? +- **Q2 — Plugin discovery mechanism.** npm peerDep registration (mirrors community-adapter pattern from [`roadmap.md § Strategy`](../roadmap.md#strategy))? Path-glob auto-discovery from `/.codemap/plugins/`? Config-listed (`.codemap/config.{ts,js,json}` `plugins: [...]`)? Some combination? - **Q3 — Schema delta.** `is_entry INTEGER DEFAULT 0` column on `files` (single boolean) vs separate `entry_annotations(file_path, plugin_id, reason)` table (multiple plugins can co-annotate; preserves provenance). The latter is moat-B-aligned but slightly more storage. - **Q4 — Reachability sweep algorithm.** BFS from `is_entry` files over `dependencies`? Materialised `is_reachable` column (cheap reads, expensive write)? On-demand recursive CTE (no materialisation; might be slow on big graphs)? Cache invalidation strategy? - **Q5 — Bundled starter plugins for v1.** Next.js (`app/**/page.tsx`, `pages/**/*.{ts,tsx}`, `app/**/layout.tsx`, etc.)? Vite (`vite.config.{ts,js}`, HTML `