feat(fts5+mermaid): full FTS5 + Mermaid plan implementation (slices 1-7) + docs audit#69
Conversation
…ite path)
First slice of the FTS5+Mermaid plan implementation per
docs/plans/fts5-mermaid.md. Ships the FTS5 substrate end-to-end so
follow-up slices (demo recipe, Mermaid formatter, MCP/HTTP plumbing)
have something to layer on.
Schema (per Q1):
- SCHEMA_VERSION 6 → 7 (forces dropAll on first upgrade).
- New `source_fts` virtual table — columns (file_path UNINDEXED,
content), tokeniser 'porter unicode61'. Always created; near-zero
space when empty.
- Helpers: upsertSourceFts (DELETE+INSERT — FTS5 doesn't accept
INSERT OR REPLACE on virtual tables), deleteSourceFts (manual
mirror of FK CASCADE since virtual tables can't be FK targets),
clearSourceFts.
Config (per Q6):
- New `fts5: boolean` field in Zod schema (default false, optional).
- `ResolvedCodemapConfig.fts5` resolved from config + CLI; CLI wins
per `--root` / `--state-dir` precedent. Logs stderr override line
when CLI overrides config.
- New `getFts5Enabled()` runtime accessor.
CLI:
- `--with-fts` flag in parseBootstrapArgs; threaded through
bootstrapCodemap → resolveCodemapConfig.fts5Cli; rest also pushed
into validateIndexModeArgs allowlist.
Worker plumbing (per Q2):
- WorkerInput.fts5Enabled propagated from worker-pool via
getFts5Enabled().
- ParsedFile.content optional; worker tees source into it only when
fts5Enabled (zero serialization cost on default-OFF path).
Indexer (per Q2):
- insertParsedResults writes source_fts in same transaction as files
row insert.
- Single-threaded path (parse-on-main, used in incremental
per-relPath loop) also calls upsertSourceFts gated on
getFts5Enabled().
- deleteFilesFromIndex mirrors DELETE to source_fts.
- indexFiles fullRebuild path re-seeds meta (fts5_enabled,
schema_version) after dropAll wipes meta.
Toggle-change auto-detect (per Q3):
- run-index reads meta.fts5_enabled; mismatch with current resolved
config upgrades incremental → full and logs stderr line. First-run
(no value) seeds silently. Already-full mode skips the upgrade
message but still syncs.
Verification:
- bun run typecheck passes.
- bun test: 746 pass, 0 fail.
- End-to-end smoke test (/tmp/fts5-smoke-test):
- Index without --with-fts → source_fts empty ✓
- Re-index with --with-fts → toggle-change auto-detect upgraded
incremental to full → MATCH 'TODO' returns the seeded file ✓
Slice 1 deliberately does NOT include:
- Telemetry on first FTS5 populate (Q7) — defer.
- Mermaid formatter (slices 3-5).
- Demo recipe text-in-deprecated-functions (slice 2).
- Docs/agents lockstep (slice 6) — defer until the demo recipe lands
so the rule update names a real recipe.
- Patch changeset (slice 7).
Files changed: 9 (db.ts, config.ts, runtime.ts, bootstrap.ts,
bootstrap-codemap.ts, main.ts, cmd-index.ts, parsed-types.ts,
parse-worker-core.ts, worker-pool.ts, index-engine.ts, run-index.ts).
🦋 Changeset detectedLatest commit: 6cfa3f5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (29)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 9 minutes and 52 seconds.Comment |
…try + lockstep Completes the FTS5+Mermaid plan (docs/plans/fts5-mermaid.md). Builds on slice 1's substrate. Slice 2 — demo recipe text-in-deprecated-functions: Bundled recipe: @deprecated functions in files containing TODO/FIXME/ HACK markers AND coverage <50%. Demonstrates FTS5 ⨯ symbols ⨯ coverage JOIN composability that ripgrep can't match. Returns empty when FTS5 is off (source_fts empty). Action template: review-cleanup-priority. Slices 3+4 — Mermaid formatter with bounded-input contract: - formatMermaid(rows, opts) in output-formatters.ts. Renders {from, to, label?, kind?} as flowchart LR. Reuses existing formatter plumbing pattern from SARIF / annotations. - MERMAID_MAX_EDGES = 50 hard-coded const (Q4). Auto-truncation explicitly out of scope (would be a verdict masquerading as output mode, violating moat A). - Reject error names recipe + count + scoping knobs (LIMIT / --via / WHERE) so agent knows how to scope. Slice 5 — MCP + HTTP plumbing: - formatEnum gains "mermaid"; QueryArgs / QueryRecipeArgs format unions extended. - tool-handlers.runFormattedQuery branches on "mermaid" → wraps formatMermaid in try/catch (bounded-input rejection becomes a structured ToolResult error). - MCP wrapToolResult needs no change — generic non-JSON passthrough handles "mermaid" same as "sarif" / "annotations". - HTTP writeToolResult needs no change — fall-through "text/plain; charset=utf-8" applies to mermaid (not sarif+json). - Tool descriptions in mcp-server.ts updated to mention "mermaid" format. Slice 6 — agent rule + skill lockstep (Rule 10): Both templates/agents/ AND .agents/ codemap rule + skill updated: - --format mermaid example row in CLI table - --with-fts row in CLI table - text-in-deprecated-functions in trigger pattern + recipe id list Slice 7 — Q7 telemetry + patch changeset: - Stderr line on first FTS5 populate: [fts5] source_fts populated: <N> files / <X> KB (uncompressed content). Cheap (single SUM(length(content)) on source_fts at end of full reindex); only fires when fts5 just became populated (fts5WasEmpty pre-check). - Patch changeset (.changeset/fts5-mermaid.md) per pre-v1 lesson: additive feature, default-OFF, behaviour-preserving for existing users. Verification end-to-end: - bun test: 754 pass, 0 fail (8 new mermaid formatter tests) - bun run check passes (format, lint, typecheck, 23/23 golden queries) - Smoke test with @deprecated function + TODO comment: * codemap --with-fts --full → "[fts5] source_fts populated: 2 files / 138 B (uncompressed content)" ✓ * query --recipe text-in-deprecated-functions → returns the deprecated function ✓ * query --format mermaid 'SELECT from_path AS "from", to_path AS "to" FROM dependencies LIMIT 50' → renders flowchart LR ✓ * query --format mermaid 'SELECT from_path AS "from", to_path AS "to" FROM dependencies' (unbounded) → rejects with scope- suggestion error ✓ Plan slices all complete. PR #69 description gets refreshed in a follow-up commit (this commit covers the impl — the doc-audit pass on docs/ comes next per user request).
Per docs-governance lifecycle (docs/README.md Rule 3 + Lifecycle table:
"Plan: Deleted when work ships"):
LIFTED (decisions promoted from docs/plans/fts5-mermaid.md to canonical
homes):
- docs/architecture.md
- Schema version 6 → 7
- meta table description: added fts5_enabled key
- New "source_fts" table section under § Schema, after meta —
documents tokenizer, file_path UNINDEXED, opt-in toggle precedence,
auto-detect, telemetry, and the bundled demo recipe
- application/ engines list: output-formatters.ts now mentions
Mermaid alongside SARIF + GH annotations
- docs/glossary.md
- New "source_fts (FTS5 virtual table) / --with-fts / opt-in
full-text" entry — covers schema shape, toggle precedence,
auto-detect, telemetry, default-OFF rationale, the JOIN
composability the FTS5 / ripgrep distinction tracks
- New "--format mermaid / formatMermaid / MERMAID_MAX_EDGES" entry
— covers the {from, to, label?, kind?} input contract, bounded-
input rejection (50 edges), why auto-truncation is out of scope
- docs/roadmap.md
- Backlog: "Optional FTS5 for opt-in full-text" line removed (work
has shipped per Rule 2 — backlog items move OUT when shipped)
- Non-goals: "Full-text search across all file bodies — use ripgrep
/ IDE / opt-in FTS5 (see backlog)" REWORDED to "Full-text search
default-on" non-goal — FTS5 ships per --with-fts, but default-on
is still out of scope until v2 size-tax measurements
DELETED:
- docs/plans/fts5-mermaid.md (per Rule 3 + Lifecycle: plans deleted
when work ships; decisions live in canonical homes above)
- docs/README.md File Ownership row updated to drop fts5-mermaid.md
from in-flight plans list (still has c9-plugin-layer.md)
Audit method: grep [Ff]allow / FTS5 / source_fts / --with-fts /
--format mermaid across docs/ — categorised hits as documentation
(lift to architecture.md / glossary.md), historical (research notes
keep their content; closed records like fallow.md stay closed), or
plan-residue (delete fts5-mermaid.md per lifecycle).
Verification:
- Schema version assertion in architecture.md matches SCHEMA_VERSION
in db.ts (7).
- meta.fts5_enabled key documented matches META_FTS5_ENABLED_KEY in
db.ts.
- source_fts column shape (file_path UNINDEXED, content) +
tokenizer ('porter unicode61') match the actual CREATE VIRTUAL
TABLE in db.ts createTables.
- --with-fts CLI flag documented matches parseBootstrapArgs +
validateIndexModeArgs allowlists.
- --format mermaid documented matches OUTPUT_FORMATS + formatEnum in
cmd-query.ts + tool-handlers.ts.
- 50-edge ceiling matches MERMAID_MAX_EDGES in output-formatters.ts.
- bun run check passes (format, lint, typecheck, 23/23 golden
queries).
… on parser.ts Fact-checking against codebase post-PR-#69-and-#70 surfaced four stale spots; concise-comments rule re-applied to recently-authored parser.ts comments. DOCS LIFTED (post-FTS5 / Mermaid / complexity merge): - README.md (root) line 113 — --format enum was missing `mermaid`. Updated to <text|json|sarif|annotations|mermaid> + added the bounded-input contract one-liner + 50-edge ceiling note. Added --with-fts example block alongside (was missing entirely; README is the canonical CLI surface per docs/README.md Single source of truth table). - docs/architecture.md output-formatters paragraph — described only formatSarif + formatAnnotations; missing formatMermaid + bounded- input contract. Added formatMermaid description + MERMAID_MAX_EDGES reference + the no-auto-truncation reasoning (would be a verdict masquerading as output mode). Updated the --format CLI enum to include mermaid; same for the MCP tools format union. - .agents/skills/codemap/SKILL.md + templates/agents/skills/codemap/ SKILL.md — recipe-id list missed three coverage recipes (untested-and-dead, files-by-coverage, worst-covered-exports) shipped earlier in PR #65/#56 era. Lockstep update per Rule 10. Skill now lists 20 of 20 bundled recipe ids. CONCISE-COMMENTS SWEEP on parser.ts (recently authored): - Trimmed the 14-line complexityStack JSDoc block to 6 lines. Kept: the -1 sentinel rationale (non-obvious), the WeakMap rationale (the bug fix from PR #70 review). Cut: re-stating push/pop semantics obvious from method names + step-by-step "this then that" prose. - Removed the "Defer complexity push to..." comment in the VariableDeclaration handler. The 4-line block restated the design decision documented one screen up in the complexityStack jsdoc; cross-ref makes it redundant. Per concise-comments § "Cut" rule: "Cross-references that save grep time" — keep when they actually do; cut when they restate. Verification: - bun run check: format + lint + typecheck + 23/23 golden ✓ - Recipe count: SQL files = 20, skill mentions = 20 (1:1 match) ✓ - SCHEMA_VERSION = 8 in db.ts; docs/architecture.md says 8 ✓ - complexity column documented in architecture.md + glossary.md ✓ - --with-fts in README.md + architecture.md + glossary.md + roadmap.md (consumer-facing surfaces all aligned) ✓ - --format mermaid in README.md + architecture.md + glossary.md + agent rule/skill ✓
…sted recipe (research note § 1.4) (#70) * feat(complexity): cyclomatic complexity column + high-complexity-untested recipe Research note § 1.4 ship-pick (c) per § 5 cadence. Schema bump SCHEMA_VERSION 7 → 8. Schema: - symbols.complexity REAL column. NULL for non-function kinds and class methods (v1 limitation documented in recipe .md). Parser: - complexityStack maintained alongside scopeStack. Function entry pushes {symbolIndex, count: 1}; branching-node visitors increment top.count; function exit pops + writes count into the already- pushed symbol row's complexity field. - McCabe decision points counted: if, while, do-while, for, for-in, for-of, case X (not default:), &&/||/??, ?:, catch. Bundled recipe high-complexity-untested: - Joins symbols (complexity >= 10) with coverage (< 50%). - Combines structural + runtime evidence axes — surfaces refactor- priority candidates that untested-and-dead and worst-covered-exports miss (they catch dead-or-uncalled, this catches called-but-undertested- AND-branchy). Empirical sanity check on codemap's own index after reindex: - extractFileData (parser.ts main visitor) → complexity 108 ✓ - stringifyTypeNode → 42 ✓ - All non-function kinds have NULL complexity ✓ - high-complexity-untested recipe returns 7 functions all from src/parser.ts (which has 0% coverage; complexity ≥ 10) ✓ Lockstep updates per Rule 10 (templates/agents + .agents): - Trigger pattern row "What's high-complexity AND undertested?" - Quick reference row for SELECT name, complexity FROM symbols - Recipe-id list extended in SKILL.md Plus architecture.md (schema version 8, complexity column docs), glossary.md (cyclomatic complexity entry), patch changeset. Files changed: - src/db.ts (SCHEMA_VERSION + symbols.complexity column + insertSymbols bind + SymbolRow optional complexity field) - src/parser.ts (complexityStack + branching node visitors + push/pop in FunctionDeclaration / VariableDeclaration arrow-fn paths) - templates/recipes/high-complexity-untested.{sql,md} - docs/architecture.md (schema version + symbols column doc) - docs/glossary.md (new entry) - templates/agents/rules/codemap.md + .agents/rules/codemap.md (trigger + quick-ref rows) - templates/agents/skills/codemap/SKILL.md + .agents/skills/codemap/ SKILL.md (recipe-id list) - .changeset/cyclomatic-complexity.md (patch) Verification: - bun test: 754 pass - bun run check passes (format, lint, typecheck, 23/23 golden queries) - Live re-index against codemap source produces sensible complexity values (parser visitor itself is the highest at 108, which tracks) * docs(skill): add complexity column to symbols schema in skill files CodeRabbit catch on PR #70: the high-complexity-untested recipe row was added to .agents/skills/codemap/SKILL.md but the symbols table schema section (under "### `symbols` — Functions, types, ...") still listed columns through `visibility` only, missing the new `complexity REAL` column. Verified by reading the file — claim was correct. Both lockstep mirrors (.agents/ + templates/agents/) updated with the same row: | complexity | REAL | Cyclomatic complexity (`1 + decision points`) for function-shaped symbols. NULL for non-functions and class methods (v1). Powers --recipe high-complexity-untested. Decision points: if, while, do…while, for/for-in/for-of, case X: (not default:), &&/||/??/?:, catch | Per docs/README.md Rule 10 — agent rule + skill schema docs must stay in lockstep with code-side schema changes. The trigger-pattern row + recipe-id list were already updated; the schema-table row was the gap. * fix(complexity): per-function visitors fix multi-declarator misattribution + cleanups CodeRabbit raised three valid findings on PR #70. All fact-checked against the code; all correct. A) docs/architecture.md symbols schema table was malformed: - Markdown table separator row had extra `| --- | ---` segments because oxfmt mis-counted columns when the description contained `|` chars inside `&&`/`||`/`??` backtick spans. - The complexity row's description was split across THREE cells with broken backtick fences. - Fix: restored single-row layout (3 cells: Column | Type | Description) and rephrased the decision-point list to avoid `|` inside backticks ("short-circuit `&&` / `||` / `??`" instead of "`&&`/`||`/`??`"). B) src/parser.ts complexity misattribution on multi-declarator VariableDeclaration (e.g. `const a = () => {…}, b = () => {…};`): Pre-fix: VariableDeclaration enter pushed all declarators' complexity entries up front. Then visitor traversed `a`'s body — branches incremented top (= b's entry). Then `b`'s body. Exit pops in reverse → symbols[1].complexity = 3 (wrong), symbols[0].complexity = 1 (wrong). Real bug. Fix: push/pop complexity on the FUNCTION-shaped node visitors (ArrowFunctionExpression / FunctionExpression) — not on VariableDeclaration. The VariableDeclaration handler still creates the symbol row but only RECORDS the symbol → init-node mapping in a WeakMap. The ArrowFunctionExpression / FunctionExpression enter handler reads the WeakMap to know which symbol to write back to; anonymous arrow fns (callbacks, IIFEs) get -1 and just track count without persistence. Verified against fixture: const a = () => { if (1===1) {…} }, b = () => { if (2===2) {…} }, c = () => 5; → a=2, b=2, c=1 (correct; pre-fix was a=1, b=3, c=1) C) popComplexityInto guard was a no-op (callers passed top.symbolIndex, so the equality check was always true). Simplified to parameterless popComplexityTop() that always pops + writes back if symbolIndex >= 0. Folds naturally into the B refactor — every push/pop pair now lives in a function-shaped visitor. Also re-ran codemap query against codemap source post-fix: extractFileData=108, stringifyTypeNode=42, extractClassMembers=18, extractLiteralValue=15, extractObjectMembers=14 Same scores as pre-fix on these (no FunctionExpression / arrow nesting in those particular functions, so the bug didn't surface) — confirms the refactor is a strict improvement, not a regression. * docs: audit + lift remaining stale references; concise-comments sweep on parser.ts Fact-checking against codebase post-PR-#69-and-#70 surfaced four stale spots; concise-comments rule re-applied to recently-authored parser.ts comments. DOCS LIFTED (post-FTS5 / Mermaid / complexity merge): - README.md (root) line 113 — --format enum was missing `mermaid`. Updated to <text|json|sarif|annotations|mermaid> + added the bounded-input contract one-liner + 50-edge ceiling note. Added --with-fts example block alongside (was missing entirely; README is the canonical CLI surface per docs/README.md Single source of truth table). - docs/architecture.md output-formatters paragraph — described only formatSarif + formatAnnotations; missing formatMermaid + bounded- input contract. Added formatMermaid description + MERMAID_MAX_EDGES reference + the no-auto-truncation reasoning (would be a verdict masquerading as output mode). Updated the --format CLI enum to include mermaid; same for the MCP tools format union. - .agents/skills/codemap/SKILL.md + templates/agents/skills/codemap/ SKILL.md — recipe-id list missed three coverage recipes (untested-and-dead, files-by-coverage, worst-covered-exports) shipped earlier in PR #65/#56 era. Lockstep update per Rule 10. Skill now lists 20 of 20 bundled recipe ids. CONCISE-COMMENTS SWEEP on parser.ts (recently authored): - Trimmed the 14-line complexityStack JSDoc block to 6 lines. Kept: the -1 sentinel rationale (non-obvious), the WeakMap rationale (the bug fix from PR #70 review). Cut: re-stating push/pop semantics obvious from method names + step-by-step "this then that" prose. - Removed the "Defer complexity push to..." comment in the VariableDeclaration handler. The 4-line block restated the design decision documented one screen up in the complexityStack jsdoc; cross-ref makes it redundant. Per concise-comments § "Cut" rule: "Cross-references that save grep time" — keep when they actually do; cut when they restate. Verification: - bun run check: format + lint + typecheck + 23/23 golden ✓ - Recipe count: SQL files = 20, skill mentions = 20 (1:1 match) ✓ - SCHEMA_VERSION = 8 in db.ts; docs/architecture.md says 8 ✓ - complexity column documented in architecture.md + glossary.md ✓ - --with-fts in README.md + architecture.md + glossary.md + roadmap.md (consumer-facing surfaces all aligned) ✓ - --format mermaid in README.md + architecture.md + glossary.md + agent rule/skill ✓
Summary
Complete implementation of the FTS5+Mermaid plan (
docs/plans/fts5-mermaid.md, now deleted per docs-governance lifecycle). All seven slices in one PR; decisions lifted to canonical homes (architecture.md,glossary.md,roadmap.md).What ships
FTS5 (opt-in, default OFF)
source_ftsvirtual table —(file_path UNINDEXED, content)columns,tokenize='porter unicode61'. Always created (near-zero space when empty).codemap.config.tsfts5: trueOR--with-ftsCLI flag at index time. CLI wins; logs stderr line on override.parsed.contentteed from worker only when toggle on (zero serialization cost on default-OFF). Atomic withfilesrow insert.meta.fts5_enabledmismatch upgrades incremental → full sosource_ftsis consistently populated.[fts5] source_fts populated: <N> files / <X> KB.text-in-deprecated-functions—@deprecatedfunctions in files withTODO/FIXME/HACKmarkers AND coverage<50%. Demonstrates the FTS5 ⨯symbols⨯coverageJOIN.Mermaid output formatter
--format mermaidrendering{from, to, label?, kind?}rows asflowchart LR.MERMAID_MAX_EDGESconst). Unbounded inputs reject with scope-suggestion error naming recipe + count +LIMIT/--via/WHEREknobs.query/query_recipe+ HTTPPOST /tool/query— same plumbing as SARIF/annotations.Schema bump
SCHEMA_VERSION6 → 7. First reindex after upgrade triggers a full rebuild via the existing version-mismatch path. Existing user data (query_baselines,coverage) preserved (intentionally absent fromdropAll()).Docs audit (lift + delete)
Per docs-governance Rule 3 + Lifecycle ("Plan: Deleted when work ships"), decisions lifted to canonical homes:
source_ftsschemadocs/architecture.md§ Schema (new section)--with-ftsCLI flagdocs/architecture.md§ Schema cross-ref + glossary entrymeta.fts5_enabledsemanticsdocs/architecture.md§ meta table description--format mermaidshapedocs/architecture.md§ application/ engines list + glossary entrydocs/glossary.md--format mermaidentrydocs/glossary.mdsource_ftsentrydocs/roadmap.mdBacklog: removed (Rule 2 — shipped items move out)docs/roadmap.mdNon-goals: reworded to "default-on" non-goalPlus
docs/plans/fts5-mermaid.mddeleted;docs/README.mdFile Ownership updated.Verification
End-to-end smoke test:
Test plan
bun test— 754 / 754 pass (8 new Mermaid formatter tests)bun run check— format + lint + typecheck + 23/23 golden queriesarchitecture.mdmatchesSCHEMA_VERSIONconst indb.tsmeta.fts5_enableddocumented matchesMETA_FTS5_ENABLED_KEYconstMERMAID_MAX_EDGESconstPre-v1 patch changeset
Per
.agents/lessons.md"changesets bump policy" — additive feature, default-OFF for FTS5, behaviour-preserving for existing users (--with-ftsopt-in; Mermaid is a new output mode, not a breaking change to existing formats).Doc-governance compliance
templates/agents/AND.agents/codemap rule + skill mention--with-fts,--format mermaid,text-in-deprecated-functionsrecipe, bounded-input contract.concise-comments— comments swept post-write; kept only non-obvious-constraint commentary (FTS5 no-INSERT-OR-REPLACE, FK CASCADE doesn't reach virtual tables, dropAll wipes meta) + spec cross-refs (Q1/Q2/Q3/Q6).Out of scope
Other research note § 1 candidates remain parked: § 1.3 cyclomatic complexity (AST walker), § 1.5 boundary violations (Zod config + new table), § 1.6 unused type members (substrate gap), § 1.9 recipe-recency (new table + reconciler), § 1.10 rename-preview (parametrised-recipes infra). The (b) C.9 plugin layer plan continues iterating in parallel per the original ship sequence.