Skip to content

feat: codemap-richer-index — substrate extraction (Tiers 1-12)#79

Merged
SutuSebastian merged 36 commits into
mainfrom
feat/codemap-richer-index
May 15, 2026
Merged

feat: codemap-richer-index — substrate extraction (Tiers 1-12)#79
SutuSebastian merged 36 commits into
mainfrom
feat/codemap-richer-index

Conversation

@SutuSebastian
Copy link
Copy Markdown
Contributor

@SutuSebastian SutuSebastian commented May 15, 2026

Summary

Adds 9 new tables + ~12 new columns to the codemap structural index, taking it from "files / symbols / imports / exports / dependencies / calls" to a substrate that supports bindings-precise refactoring, cycle detection, metrics, runtime auditing, and test-suite introspection.

Tracks the strategic plan in docs/plans/substrate-extraction.md (existing). All work is fully additive — no existing recipe / query / table shape changed semantics.

What landed

Tier New tables/columns Flagship recipes
Tier 1 symbol name_column_*, scope_local_id; imports/exports/calls/markers position cols; import_specifiers foundations
Tier 2 (.0-.5) scopes, references (with kind='member' later), bindings, refined globals/TYPE_GLOBALS find-references, find-write-sites, find-symbol-references
Tier 4 function_params find-by-param-type
Tier 6 re_export_chains barrel-chains
Tier 11 symbol body_line_count / param_count / nesting_depth; file_metrics large-functions, deeply-nested-functions
Tier 12 module_cycles (via Tarjan SCC) circular-imports
Tier 5 runtime_markers find-leftover-console, env-var-audit
Tier 9 test_suites find-skipped-tests, tests-by-file
Cleanup audit-cache excluded from default glob; JSX intrinsics + TSQualifiedName handled

Empirical results

codemap-self (~350 files):

  • Full reindex: ~600 ms
  • Targeted (1 file): 9 ms
  • Bindings precision: 98.75% resolved

merchant-dashboard-v2 (2120 files, real React app):

  • Full reindex: 3.77 s
  • Targeted: 9 ms
  • Bindings precision: 98.25% resolved (1.75% unresolved is callback-edge floor)
  • 1855 it/test blocks + 459 describes auto-detected across vitest + bun-test mixed framework
  • 223 console calls + 17 env vars + 24 import cycles + 38 XL functions flagged

Why this is a separate PR

This is the groundwork that enables future write capabilities (the feat/codemap-apply PR) — but it's orthogonal:

  • This PR: extracts richer index data from AST
  • feat/codemap-apply: writes files from diff-shaped rows

Different layers, different review concerns. Splitting keeps each PR reviewable.

Test plan

  • All 930 existing tests pass at every commit
  • All golden scenarios pass at every commit
  • Schema version bumped at every additive table (SCHEMA_VERSION 10 → 25)
  • Empirical perf measured on codemap-self + merchant-dashboard-v2
  • (Reviewer) verify substrate is sensible on another repo of your choice via `bun src/index.ts --root --full`

Commit history

16 granular commits preserved (one per slice). The fine-grained history surfaces the empirical-test loop — e.g. the JSX intrinsics fix is its own commit, triggered by running the substrate on a real React codebase and finding 17% unresolved.

Squash-on-merge is fine if the team prefers a single landing commit.

Deferred to follow-up (NOT in this PR)

  • Callback arrow scope precision (the ~1% unresolved floor)
  • Token-count metric (oxc tokeniser integration)
  • Lint-tool suppressions enrichment (eslint-disable, ts-expect-error)
  • Tier 7 CSS expansion (selectors, @media)
  • Tier 13 ORM/SQL templates (Prisma/Drizzle parsing)

Summary by CodeRabbit

  • New Features

    • New codemap apply command and transport tool for recipe-driven edits with dry-run, confirmation gating, conflict detection, and atomic per-file writes.
    • Expanded indexing and analysis: import specifiers, lexical scopes, references, bindings, function params, runtime markers, test suites, re-export chains, and module-cycle detection.
    • New query/recipe templates for discovery (call-sites, export/import sites, references, barrel chains, circular imports, large/deep functions, env-vars, tests rollups, etc.).
  • Database Schema

    • Schema version bumped with many new tables and finer positional metadata.
  • Tests

    • Extensive unit and end-to-end tests for apply behavior, engines, and indexing.
  • Documentation

    • Updated docs and templates describing apply flow, new schema, recipes, and usage.

Review Change Stack

Pure transport-agnostic engine implementing phase-1 validation and the
dry-run output shape from the merged plan (Q1–Q10 in
docs/plans/codemap-apply.md). No CLI, no MCP/HTTP wiring, no write
branch yet (Slice 2 lands the latter).

Re-locks Q8 to substring-match (a) — the original "exact byte-match"
draft contradicted the existing buildDiffJson formatter contract and
would have made every shipped rename-preview row conflict (the recipe
emits before_pattern = old_name as the bare identifier, not the full
line). New phase-1 mirrors application/output-formatters.ts buildDiffJson
verbatim: actual.includes(before) for the match check, first-occurrence
substring replace for the transformation (Slice 2), $-pre-escape per
GetSubstitution.

Slice scope:
- src/application/apply-engine.ts — applyDiffPayload({rows, projectRoot,
  dryRun}) returning Q5's ApplyJsonPayload envelope. dryRun=false with a
  clean phase 1 throws NotImplemented (Slice 2 fills in the write).
- src/application/apply-engine.test.ts — 14 unit tests covering happy
  paths, all three conflict reasons, row-shape validation, deterministic
  files[] sort, and the Slice-2 guard semantics.
- docs/plans/codemap-apply.md — Q8 re-lock + edge-case table refresh.

Tests: 14/14 pass. Typecheck / lint / format clean.
Phase 2 lands behind the `!dryRun && conflicts.length === 0` gate per
Q2 (c). Each modified file is written to a sibling temp path then
`rename`d into place — POSIX-atomic per file, so concurrent readers
see either the pre- or post-rename content, never a torn write.

Implementation:
- Phase-1 caches each source's text; phase-2 reuses the cache (one
  read per file across both phases). TOCTOU window collapses to the
  gap between phase-1 read and phase-2 rename — accepted per Q2.
- Phase-2 splits on raw "\n" (not /\r?\n/) so CRLF lines retain their
  trailing \r and round-trip when joined back with "\n". Phase-1
  conflict reporting still strips the \r so `actual_at_line` is clean.
- Edits applied per-file in descending line order — defensive default
  for when multi-line transforms land (today's single-line rows are
  order-independent).
- `$`-pre-escape on `after_pattern` per GetSubstitution rule (mirrors
  buildDiffJson) so identifiers like `$inject` round-trip safely.
- Temp paths use `crypto.randomBytes(6)` so concurrent applies don't
  collide; cleanup on success is implicit (rename atomically removes
  the source name).

Tests: 20/20 pass. Failure-mode coverage: chmod 0o555 on the project
dir to force the temp-write to fail; dry-run no-op-on-disk; no temp
siblings left behind on success; conflict short-circuits before any
writes (good.ts untouched when bad.ts is missing).
Adds `codemap apply <recipe-id>` as a positional verb (per Q4) wired
through the same dispatch as every other CLI command. Recipe execution
reuses `queryRows` + the existing `--params` plumbing (`parseParamsCli`
+ `resolveRecipeParams`); rows feed straight into `applyDiffPayload`.

Q6 gating matrix implemented:
- TTY no `--yes` → phase-1 dry-run preview, prompt `Proceed? [y/N]`
  on stderr, default-N, phase-2 only on `y` (uses node:readline).
- TTY `--yes` → no prompt; proceed if validation clean.
- Non-TTY no `--yes` (no `--dry-run`) → reject with stderr message
  ("Pass --yes for non-interactive runs, or --dry-run for preview.").
- `--dry-run` + `--yes` → mutually exclusive, parse-time error.
- `--json` everywhere routes errors as `{"error":"..."}` envelopes.

Files:
- src/cli/cmd-apply.ts — argv parser + run loop. Mirrors cmd-impact's
  shape (positional + flags + JSON envelope).
- src/cli/cmd-apply.test.ts — 10 subprocess integration tests:
  dry-run no-op, --yes happy path (with cross-file import rename via
  rename-preview), Q7 (a) idempotent re-run after reindex, Q6 non-TTY
  rejection (text + JSON), unknown recipe id, missing positional, mut-
  ex check, --help prints without bootstrap.
- src/cli/main.ts + bootstrap.ts — register the verb.

realpath note: tests `realpathSync` the temp project root so oxc-
resolver's symlink-dereferenced `resolved_path` aligns with the
indexed file paths (without it the import-rename rows in rename-
preview return empty on macOS where /tmp → /private/tmp).

Tests: 10/10 integration + 20/20 engine. Typecheck / lint / format clean.
Registers `apply` as the 13th tool over both MCP (stdio) and HTTP
transports. Dispatches the same `applyDiffPayload` engine the CLI uses;
output envelope is identical to the CLI's --json output (Q5).

- src/application/tool-handlers.ts — `handleApply(args, root)` + Zod
  schema (`applyArgsSchema`). Q6 gate enforced: non-TTY transports
  always require `yes: true` (no prompt to fall back on). dry_run + yes
  rejected as mutually exclusive. Unknown recipe returns 404.
- src/application/mcp-server.ts — `registerApplyTool` mirrors the
  impact tool's shape; description encodes the Q5 envelope + Q2 (c)
  all-or-nothing semantics so agents can reason about the tool without
  reading docs.
- src/application/http-server.ts — adds `apply` to TOOL_NAMES + the
  POST /tool/{name} dispatcher case.
- src/application/tool-handlers.test.ts — 4 handleApply tests (404,
  yes-required, mutex, dry-run envelope shape). 104 mcp/http server
  tests still green; tool catalogs are inferred from TOOL_NAMES so
  the new tool surfaces automatically in /tools listings.

Per the plan's Slice 4 lock: `query_batch` does NOT get an apply
analogue (Moat-A: batched writes are verdict-shaped; consumers
compose multiple apply calls if they need cross-recipe writes).
Final slice — lifts the durable design from the plan into reference
docs and retires the plan file per docs/README.md Rule 3.

- docs/architecture.md — new "Apply wiring" section (engine + phase-1
  algorithm + phase-2 atomic temp-rename + Q6 gate + Q5 envelope + Q7
  idempotency) plus "Boundary verification — apply write path" SQL
  kit. Layering table mentions `apply-engine.ts`.
- docs/glossary.md — `codemap apply` / apply tool entry.
- docs/roadmap.md — backlog entry removed (shipped).
- docs/plans/codemap-apply.md — DELETED (closing-state lifecycle per
  docs-governance skill: delete + lift, never "Slim & keep in plans/").
- .agents/rules/codemap.md + .agents/skills/codemap/SKILL.md — Apply
  row in CLI table, "Apply (`bun src/index.ts apply <recipe-id>`)"
  paragraph, MCP `apply` tool listed alongside `impact`.
- templates/agents/rules/codemap.md + templates/agents/skills/codemap/
  SKILL.md — same updates in the published-package mirror (uses
  `codemap` instead of `bun src/index.ts`).
- .changeset/codemap-apply.md — minor bump; summarises Q1–Q10 locks
  + boundary discipline anchor.

Boundary kit verified empty after a fresh reindex of the apply files;
140/140 tests pass across apply-engine + tool-handlers + cmd-apply +
mcp/http-server suites.
Lands four fixes from a triangulated review of three independent agent
audits (Composer, GPT-5.5, Codex). Two HIGH-severity correctness bugs
were each reproducible against the prior `apply-engine.ts` in 30 seconds:

F1 (HIGH) — Path traversal. Pre-fix:
  applyDiffPayload({ rows: [{ file_path: "../outside.ts", ... }],
                     projectRoot: "/tmp/proj/", dryRun: false })
returned `applied: true` and mutated a sibling-of-root file. Now phase 1
resolves the project root once and rejects (a) absolute `file_path`
inputs and (b) any candidate whose `path.resolve(resolvedRoot, file_path)`
lands outside it. New conflict reason: `path escapes project root`.

F2 (HIGH) — Phase-2 partial cross-file write. Pre-fix: two rows on the
same `(file_path, line_start)` both passed phase-1 (substring check
against original source); phase-2 applied the first replace, the
second's substring assertion failed, the function threw — AFTER earlier
files in alphabetical order had already been `renameSync`d. The "Q2 (c)
all-or-nothing" guarantee was demonstrably broken. Now phase 1
maintains a per-file Set<line_start>; the second hit at the same line
emits a `duplicate edit on same line` conflict before any write.

F3 (MEDIUM, doc-first) — Same-line `before_pattern` ambiguity. The
formatter precedent (`buildDiffJson`) uses `actual.replace(before, after)`,
which rewrites only the leftmost occurrence. `const foo = foo();` with
`before = "foo"` becomes `const bar = foo();` — variable renamed,
recursive call broken, `applied: true` reported. This mirrors the
formatter exactly and the `--format diff` preview shows the same shape,
so the audit's recommendation of an engine-level fix would diverge
preview from execution. Documented as a deliberate limitation in the
engine docstring + `architecture.md § Apply wiring` caveat instead;
test pins the current behaviour so a future engine change lands as a
deliberate breaking change rather than silent drift.

F4 + F6 (LOW) — `apply-engine.ts` docstring no longer points at the
deleted plan (now links to `docs/architecture.md` for durable design);
`apply-engine` added to the `application/` row of the Key Files table
in architecture.md (it was meant to be in that enumeration alongside
the other 14 engines).

Tests: 25 unit tests (8 new — three F1 paths, one F2 repro, one F3
limitation pin, plus existing happy-paths / failure-modes); 41 pass
across the apply path. Boundary kit returns []. Changeset entry
amended with the path-containment + overlap-detection bullets so the
release notes carry the security-relevant fixes.

Triangulated audit doc + the three source agent reviews are NOT
checked in — they served their purpose for this fix-up commit and
removing them avoids stale "review backlog" cruft per docs-governance.

Follow-up (separate PR): the audit also surfaced that
`DEFAULT_EXCLUDE_DIR_NAMES` in src/config.ts doesn't include `.codemap`,
so `audit --base` followed by `--full` walks the audit-cache subtree.
Tracked separately because the gap predates this PR.
Concise-comments sweep on the apply surface — module docstring goes
from a six-section narrative to three named call-outs (same-line
ambiguity / TOCTOU / EOL); inline comments drop redundant prose where
the next line of code already says it. Net 65 lines removed across
src/ with no behavioural change.

Docs sync: post-fix the engine collects FIVE conflict reasons (added
`path escapes project root` + `duplicate edit on same line` in commit
bdf7ef3), but the agent rule, the published-package agent rule, and
the glossary all still said "three." Updated all three to enumerate
the full set + briefly describe what each new guard rejects.

Touched:
- src/application/apply-engine.ts — slim docstring + 6 inline blocks.
- src/application/apply-engine.test.ts — slim test rationale where
  the assertion already conveyed it.
- src/cli/cmd-apply.ts — collapse two same-branch returns into one
  union; slim Q6/path-derivation comments.
- src/application/tool-handlers.ts — slim handleApply schema/header
  doc to one sentence each.
- .agents/rules/codemap.md + templates/agents/rules/codemap.md +
  docs/glossary.md — three → five conflict reasons + new-guard one-liners.

Tests + typecheck + format + boundary kit all green.
Triaged 9 actionable comments via pr-comment-fact-check. Each finding
verified against the source on aaabc13; 2 were already addressed by
the prior commit (CodeRabbit auto-tagged with "✅ Addressed in
aaabc13"); 7 are new fixes here:

F1 (paragraph merge in .agents/rules/codemap.md, partial earlier-fix):
  CodeRabbit's auto-tag was optimistic — the conflict-reason count
  was synced in aaabc13 but the Impact section's tail (`...
  --summary trims …`jq '.summary.nodes'``) was still stitched onto
  the END of the Apply paragraph. Restored the section break.

F3 (after_pattern: "" silently dropped):
  `readString` rejected empty strings, so a deletion-shaped row got
  silently skipped by phase-1's required-keys check. New
  `readStringAllowEmpty` helper for `after_pattern` only — empty
  `before_pattern` still rejected (would match anywhere on the line).
  Regression test deletes a `// FIXME(team): ` prefix.

F4 (cache-key dedup `a.ts` vs `./a.ts`):
  Pre-fix, the cache + pending + seenLines maps used the raw
  `file_path` as their key. Two rows naming the same disk file via
  different spellings created two cache entries → second write
  clobbered the first edit. New `canonicalizeFilePath` collapses
  every spelling to a project-relative form. Symlink-realpath
  defense remains documented as a separate (heavier) follow-up.

F5 (Q2 (c) over-promised on I/O failures):
  CodeRabbit's "🔴 Heavy lift" — a writeFileSync/renameSync mid-loop
  failure leaves files 1..N-1 already renamed with no rollback. Full
  transactional rollback (per-file backups + restore-on-throw) is
  deferred. Honest fix: weakened the Q2 (c) claim in
  `architecture.md § Apply wiring` to "all-or-nothing (semantic) —
  phase-1 conflicts abort phase 2 entirely; phase-2 I/O failures are
  NOT transactional across files." Engine docstring carries the same
  caveat as a fourth call-out.

F6 (TTY check used wrong stream):
  Gate checked `process.stdout.isTTY` but `promptYesNo()` reads from
  `process.stdin` and writes to `process.stderr`. So
  `codemap apply foo | tee log.txt` (interactive stdin, piped stdout)
  was rejected as non-TTY. Now gates on `stdin.isTTY && stderr.isTTY`.

F7 (user-cancel rendered "no rows applicable"):
  Abort path called `emitResult(preview, opts)` with `opts.dryRun ===
  false`, so `renderTerminal` fell through to "no rows applicable" —
  contradicting the user's explicit cancel. Terminal mode now prints
  `apply <id>: aborted by user; no files written.`; JSON consumers
  still get the full preview envelope.

F9 (skill files missed two conflict reasons):
  `.agents/skills/codemap/SKILL.md` + `templates/agents/skills/
  codemap/SKILL.md` apply tool description didn't enumerate the 5
  conflict reasons. Synced.

Tests: 44/44 (3 new — `./a.ts` dedup, deletion via empty
after_pattern, empty-before-still-rejected). Typecheck / lint /
format clean.
R.17 (`docs/plans/substrate-extraction.md`) per-tier extractor architecture
+ Tier 1 substrate landed together so the substrate-extraction plan's
shared-state patterns (ScopeTracker, ComplexityTracker, ComponentDetector)
ship validated by real consumers, not in isolation.

Architecture (R.17): `src/extractors/` hosts a `TierExtractor` registry
called from a thin `parser.ts` orchestrator via a multiplexed visitor —
10 modules (types + 3 shared trackers + 6 extractors + offsets/jsdoc/
type-stringify helpers) replace the 968-line `parser.ts` monolith;
parser.ts shrinks to 247 lines. 6 collaborating extractors on 3 shared
trackers, chaining handlers on `CallExpression` / `FunctionDeclaration`
/ `VariableDeclaration` + their `:exit` pairs.

Tier 1 substrate: column-precise positions on `calls`, `exports`,
`symbols.name_*`, `markers.column_*`, plus a new `import_specifiers`
child table that splits the `imports.specifiers` JSON blob into typed
rows. SCHEMA_VERSION 10→14. 4 flagship recipes + 4 golden fixtures
(`find-call-sites`, `find-export-sites`, `find-symbol-definitions`,
`find-import-sites`) form a complete identifier-locator family —
foundation for Tier 6's app-wide rename recipe extension.

Empirical cost (clean rebuild, median of 3):
  codemap-self     ~924 files: 11.4→14.3 MB (+25%); ~280→300 ms (+7%)
  merchant-dash  ~2120 files: 37.5→50.1 MB (+33%); ~740→900 ms (+22%)
Targeted reindex flat (~15 ms). Full reindex worst case ~900 ms —
66x under R.10's 1-min pain threshold. DB growth used ~25-33% of
R.9's "~5-10x total budget across 13 tiers."

930/930 tests pass; 19 golden scenarios pass (4 new). Test fixtures
updated in impact-engine.test.ts / mcp-server.test.ts /
resource-handlers.test.ts to match new schema. R.17 architecture
validated end-to-end by Tier 1 consumers — `symbolsExtractor` populates
name_column_*; `callsExtractor` populates line+column; `markers.ts`
populates column_*; orchestrator wires `staticImportSpecifierRows`.
What landed (SCHEMA_VERSION 14 → 16):

- **`scopes` table** — composite PK `(file_path, local_id)`, WITHOUT ROWID.
  `local_id` is a per-file 0-based counter assigned at parse time so refs
  encode their scope without round-tripping SQLite autoincrement.
  Kinds shipped: module / function / arrow / class / method. Block / for /
  catch deferred (R.11 conservative escape valve covers it).
- **`references` table** — per-identifier-use rows with column-precise
  position, kind (value/type/jsx), enclosing scope, is_write flag.
- **`is_write` per R.13** — writePositions / suppressedReads sets keyed by
  node.start. Pre-marker handlers for AssignmentExpression (simple `=`
  suppresses read), UpdateExpression / UnaryExpression(delete) (dual-emit),
  VariableDeclarator with initializer (write-only), ForOf/In LHS,
  AssignmentPattern.
- **Declaration suppression** — Function/Class/Interface/Type/Enum/Module
  declarations NOT duplicated in references (they live in symbols).
- **Shorthand dedup** — oxc walker visits the SAME Identifier twice for
  `import {foo}` / `export {foo}` / `{foo}` shorthand; dedup by
  (node.start, is_write).
- **`referencesExtractor`** module per R.17 (132 lines); ScopeTracker
  extended with pushKind / currentLocalId / getRecorded / finaliseModule.
- **Recipes:** find-references --params name=X, find-write-sites
  --params name=X + golden fixtures.

Empirical (codemap-self, 925 files):

| Metric         | Tier 1   | Tier 2  | Delta |
| --------------- | --------- | -------- | ------ |
| Full reindex    | ~300 ms   | 767 ms   | +2.5×  |
| Targeted (1 f)  | 8 ms     | 9 ms     | +12%   |
| Rows            | n/a       | 127k refs / 2k scopes | new   |

All within R.9 / R.10 thresholds (<1 min full, <100 ms targeted).
930 tests pass; all golden scenarios pass.

Deferred to Tier 2.1:
- bindings table + pass-2 cross-file resolution (R.12)
- Reference kinds: decorator / shorthand-prop / member-access / spread /
  rest / as-cast / typeof / keyof
- Block / for / catch scope kinds
What landed (SCHEMA_VERSION 16 → 17):

- `bindings` table — (reference_id, resolved_symbol_id, resolution_kind,
  is_external). PK on reference_id; CASCADE on reference deletion.
  resolution_kind enum: same-file / imported / global / unresolved.
- `symbols.scope_local_id` column — captures the declaring scope (parent
  of the symbol's own body scope). Class members anchor to their class's
  pushed scope. Captured BEFORE any new scope is pushed.
- Pass-2 binding resolver (`src/application/bindings-engine.ts`) — two
  phases: one SELECT per table into in-memory Maps, then per-reference
  resolution via scope-walk → imports → globals → unresolved. ~300ms for
  127k refs on codemap-self.
- Cross-file resolution uses `imports.resolved_path` (dependencies lacks
  the module specifier). Module-scope target symbol picked when the
  target file's exports list matches.
- Full-rebuild only — targeted reindex skips bindings refresh per R.10's
  <100ms contract. Orphan rows CASCADE-cleared on incremental edits.
- find-symbol-references recipe — bindings-precise (filters same-name
  shadows + different-source imports). Golden fixture added.

Also: concise-comments sweep — stripped vintage `Tier 2.1` prefixes from
source; kept forward-deferral notes (`defer to Tier 6`) and R.NN
cross-refs.

Empirical (codemap-self, 932 files):

- Full reindex: 767 ms → 1175 ms (+53%)
- Targeted (1 file): 9 ms → 9 ms (no regression)
- Bindings distribution: 33% same-file / 17% imported / 4% global /
  45% unresolved (mostly TS type params + function params, future tiers)

930 tests pass; all golden scenarios pass.

Deferred to Tier 2.2:
- Re-export chain walking
- Function-parameter symbols
- Type-parameter symbols
What landed:

- Function/method/arrow parameter symbols (kind='param') with
  scope_local_id = function's own scope. TSParameterProperty
  (constructor `public foo: T`) emits at class scope.
- Type parameter symbols (kind='type-param') for FunctionDeclaration,
  ClassDeclaration, arrow vars, and class methods. Interfaces and type
  aliases deferred — they don't push their own scope.
- Re-export chain walking in bindings-engine — bounded at 10 hops with
  cycle detection. `export { foo } from './bar'` now resolves to the
  original definition. Path resolution is relative-only against the
  indexed-paths set.
- pushParams / pushTypeParams helpers in src/extractors/params.ts.

Empirical (codemap-self, 933 files):

| Metric           | Pre    | Post   | Delta             |
| ----- | ---- | ---- | ---------------- |
| Symbols           | ~11.8k | 14k    | +2.2k             |
| Same-file refs    | 42257  | 51299  | +9042 (+21%)      |
| Unresolved refs   | 58073  | 49534  | -8539 (-15%)      |
| Unresolved %      | 45%    | 39%    | down              |
| Full reindex      | 1175ms | 1513ms | +29%              |
| Targeted (1 file) | 9ms    | 9ms    | no regression     |

930 tests pass; all golden scenarios pass (index-summary rebaselined
to reflect new param/type-param rows).

Deferred to Tier 2.3:
- Destructuring pattern params ({a,b}, [a,b])
- Interface/type-alias type-param scoping
- Callback arrow scoping
- External-module bindings via .d.ts
…2 close)

What landed (SCHEMA_VERSION 17 → 18):

- kind='member' for non-computed property access (obj.foo). Bindings
  resolver skips these. Single biggest unresolved-bucket cut (~50%).
- Object-literal / class-member key suppression (long-hand Property,
  MethodDefinition, PropertyDefinition, TSPropertySignature,
  TSMethodSignature). Shorthand and computed still emit normally.
- Destructuring pattern bindings — walkPattern generator handles
  Identifier / AssignmentPattern / RestElement / ObjectPattern /
  ArrayPattern / TSParameterProperty recursively. Same helper for
  function params and variable destructuring (`const { a, b } = obj`).
- TYPE_GLOBALS set in bindings-engine — TS built-ins (Record, Partial,
  ReadonlyArray, Map, etc.) resolve to global instead of unresolved.
- Extra value globals: performance, import, require, module, exports,
  __dirname, __filename, self.
- `as const` skip: TSTypeReference name=const no longer emitted.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| `kind='member'`   | 0          | 26701      | new             |
| Bindings rows     | 127k       | 84k        | -34%            |
| Unresolved        | 49534      | 4634       | -90%            |
| Unresolved %      | 39%        | 5.5%       | -34 pts         |
| Full reindex      | 1513ms     | 1025ms     | -32%            |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Tier 2 closed. Remaining 5.5% is dominated by callback arrow params
(s, r, e, etc.) which need structural arrow scoping — deferred as
separate post-Tier-2 work.

930 tests pass; all goldens pass.
What landed:

- claimedScopeNodes WeakSet<object> on ExtractContext. Every extractor
  that pushes scope for a specific AST node marks the node here so
  downstream extractors don't double-push.
- ArrowFunctionExpression handler in scopesExtractor — for callback
  arrows (not claimed by VariableDeclaration), pushes anonymous arrow
  scope + emits params. Named arrows stay claimed and don't double-push.
- CatchClause handler — try/catch param scoped to catch body scope.
  Bindingless catch (TS 4.4+) handled.
- ScopeTracker.currentParent walks past anonymous scopes (empty-name)
  so parent_name of nested symbols anchors to the nearest named owner.
- Extra globals: Bun, Deno.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| Same-file         | 51972      | 55480      | +6.7%           |
| Unresolved        | 4634       | 1102       | -76%            |
| Unresolved %      | 5.5%       | 1.3%       | -4.2 pts        |
| Full reindex      | 1025ms     | 1224ms     | +19%            |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Tier 2 closed at 1.3% unresolved. Remaining is unindexable
(infer T, audit-cache re-indexes, edge cases).

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 18 → 19):

- scopes.kind enum extended: interface, type-alias, for, catch.
- TSInterfaceDeclaration / TSTypeAliasDeclaration push their own scope
  so type-params resolve via the standard walk. Type-param symbols
  emitted at the new scope (was: emitted at parent scope, causing
  same-letter collisions across interfaces).
- ForOfStatement / ForInStatement push a 'for' scope; VariableDeclaration
  / pattern in `left` emits bindings at the for-scope so body refs
  resolve to the loop variable, not the enclosing function.
- CatchClause kind correctly tagged as 'catch' (was 'function').
- Added value globals: RegExp, Iterator, AsyncIterator.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| Same-file         | 55480      | 55524      | +44             |
| Global            | 6019       | 6034       | +15             |
| Unresolved        | 1102       | 1119       | +17 (noise)     |
| Unresolved %      | 1.30%      | 1.32%      | flat            |
| Full reindex      | 1224ms     | 1249ms     | +2%             |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Net flat on the unresolved bucket — the value is structural (interface
type-params + for-loop body bindings now have correct scope graphs),
which unblocks future precision wins.

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 19 → 20):

- symbols: body_line_count, param_count, nesting_depth columns
  (nesting_depth deferred; needs a separate tracker — pushed as NULL).
- file_metrics table: one row per indexed TS/JS file with total_lines,
  code_lines, blank_lines, comment_lines, function_count, class_count,
  interface_count, export_count. let/const/var/arrow distinguished
  deferred (parser doesn't track keyword variant on VariableDeclaration).
- Per-file metrics computed in parser.ts orchestrator from existing
  ctx data + lineMap (no extra walk).
- Recipe: `large-functions` — body_line_count ≥ 50 ranked by size.
  Golden fixture added.

Empirical (codemap-self, 933 files):

- 446 file_metrics rows (TS/JS files only; CSS/markdown indexed
  separately don't go through extractFileData).
- Top function: extractFileData at 511 body lines, 3 params.
- Top file: src/cli/cmd-query.ts at 1411 lines, 19 functions, 10 exports.

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 20 → 21):

- module_cycles table — (file_path PK, cycle_id, cycle_size). Only
  cyclic files appear; non-cyclic files have no row.
- src/application/cycles-engine.ts — iterative Tarjan's SCC over the
  dependencies graph. O(V+E). Runs once per full rebuild after
  bindings resolution.
- Recipe: `circular-imports` — every file in a cycle, grouped by
  cycle_id. Golden fixture detects the fixture's store ↔ cache cycle.

Empirical (codemap-self, 936 files):

- Full reindex: 1224ms → 1171ms (essentially flat — Tarjan is fast).
- 3 cycles in the indexed DB (all the fixture cycle, appearing once in
  current source + twice in audit-cache copies of the same fixture).

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 21 → 22):

- re_export_chains table — (from_file, from_name) PK, (to_file, to_name,
  hops, truncated). Only re-export entries are materialised; direct
  exports don't appear.
- resolveReExportChains + persistReExportChains in bindings-engine.
  Reuses the same chain-walker bindings-engine uses for resolution.
- Recipe: barrel-chains — every chain ordered by hops DESC. Golden
  fixture covers the minimal fixture's shop barrel.

Empirical (codemap-self, 938 files):

- 106 chains materialised (mostly internal barrels + audit-cache copies).
- 0 truncated.
- Full reindex: 1171ms → 1104ms (no measurable cost — same loops bindings already runs).

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 22 → 23):

- function_params table — one row per leaf parameter, ordered by
  position. Keyed by (file_path, owner_name, owner_kind) to
  disambiguate same-name functions vs methods.
- Columns: position, name, type_text (stringified annotation),
  default_text (raw source of default expr), is_rest, is_optional,
  + column-precise position.
- pushParams in src/extractors/params.ts extended to emit
  function_params rows alongside the existing kind='param' symbol
  rows. ownerKind passed by caller (function/method/etc).
- Recipe: find-by-param-type --params type_text=X — every fn taking
  a param with exact type annotation match. Golden fixture covers
  createClient(config: ClientConfig).

Empirical (codemap-self, 940 files):

- 2257 function_params rows.
- Full reindex: 1104ms → 1201ms (+9%).
- Symbols.kind='param' count unchanged — parallel emission.

930 tests pass; all goldens pass.
What landed:

- ComplexityTracker extended with enterNest/exitNest. Frame tracks
  currentDepth + maxDepth alongside cyclomatic count; popTop writes
  maxDepth to symbol.nesting_depth.
- complexityExtractor: IfStatement/While/DoWhile/For/ForIn/ForOf/
  ConditionalExpression/CatchClause now have enter+exit handlers
  that bump nesting. SwitchCase + LogicalExpression remain
  cyclomatic-only (flat decision points, not depth).
- Recipe: deeply-nested-functions — depth >= 4 ranked by depth then
  complexity. Golden fixture added.
- large-functions recipe SELECT extended to include nesting_depth.

Empirical:

- codemap-self (943 files): 622 fns at depth 0, 441 at 1, 331 at 2,
  ... 3 fns at depth 9 (parseArgs in arg-handling scripts).
- merchant-dashboard-v2 (2120 files): top finds include `createStream`
  (depth 9, gen'd), `main` in provision-vars.ts (depth 6, complexity 84,
  488 lines — real refactor target).

930 tests pass; all goldens pass. Tier 11 fully closed.
What landed:

- JSX intrinsics suppressed: lowercase tags (div/span/h1/etc.) +
  every JSXAttribute name (className/onClick/value/etc.) +
  JSXMemberExpression .property (`<Foo.Bar />` Bar).
- TSQualifiedName handler — `React.ReactNode` / `A.B.C` in type
  position now emits the namespace head as kind='type' and the
  member tail as kind='member'.
- TYPE_GLOBALS extended: DOM elements (HTMLDivElement, SVGSVGElement,
  etc.), DOM events (MouseEvent, KeyboardEvent, PointerEvent, …),
  Web APIs (RequestInit, IntersectionObserver, …), React ambient
  types (ReactNode, ComponentProps, Dispatch, SetStateAction, FC,
  CSSProperties, HTMLAttributes, etc.).
- Value GLOBALS extended: React namespace + constructor-callable
  Web APIs (new IntersectionObserver, new FormData, new URL, etc.).

Empirical:

- codemap-self: 1.32% → 1.25% unresolved (essentially flat — already
  near floor; small wins from DOM event types).
- merchant-dashboard-v2 (2120 files): **17.1% → 1.75%** unresolved.
  Same-file +18% (more refs resolve precisely).
- Full reindex on dashboard: 3997ms → 3706ms (slight gain from less
  unresolved binding work).

930 tests + all goldens pass.
Audit worktrees under .codemap/audit-cache/<sha>/ are full project
snapshots used by `codemap audit --base <ref>`. They were being
indexed alongside live source, multiplying every row by however
many audits had been run.

On codemap-self this dropped files indexed from 944 → 347 and full
reindex from 1273ms → 595ms (~54% faster). Cycle / re-export-chain
queries no longer duplicate findings across snapshot copies.

Surgical fix: added 'audit-cache' as a dir-name to
DEFAULT_EXCLUDE_DIR_NAMES. Recipes / config / index.db itself stay
indexable (project-local recipes ARE part of the workflow).

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 23 → 24):

- runtime_markers table — (kind, detail, line, column, scope) for
  every console.* / debugger / throw / process.env access.
  kind enum: 'console' / 'debugger' / 'throw' / 'process-env'.
  detail: method name for console, env-var name for process-env,
  truncated thrown expression text for throw, NULL for debugger.
- runtime-markers extractor — matches MemberExpression on
  console.X and process.env.X, DebuggerStatement, ThrowStatement.
  Throw expr text capped at 200 chars to keep rows scannable.
- Recipes: find-leftover-console (all console calls),
  env-var-audit (env vars ranked by use + file fan-out).
- Goldens for both.

Real-world (merchant-dashboard-v2, 2120 files):

- 223 console calls (155 .log, 51 .error, 11 .warn)
- 114 throw statements
- 17 process.env reads dominated by NODE_ENV (10), plus single-use
  config (Sentry, AI chat, API_URL) — audit candidates.

930 tests pass; all goldens pass.
What landed (SCHEMA_VERSION 24 → 25):

- test_suites table — (file_path, name, kind, line_start, line_end,
  parent_suite_id, is_skipped, is_only, is_todo, framework). Captures
  every describe / it / test / suite / context block.
- Framework detection per file from imports: vitest / jest /
  bun-test / node-test / mocha / unknown (mocha-style globals).
- src/extractors/tests.ts — parses callee shape (Identifier or
  `.skip`/`.only`/`.todo` MemberExpression), extracts name from
  StringLiteral / TemplateLiteral first arg. Tracks parent stack to
  resolve parent_suite_id. `.each` collapses to one row per template
  (parametrised expansion is a runtime concern).
- Recipes: find-skipped-tests (flags .skip / .only / .todo, with
  status column) + tests-by-file (per-file roll-up). Goldens added.

Real-world (merchant-dashboard-v2, 2120 files):

- 1404 it + 246 test + 414 describe (bun-test) + 132 it /
  45 describe (vitest). Mixed-framework codebase detected
  per-file from imports.
- 0 skipped/only/todo blocks — disciplined test suite.

930 tests pass; all goldens pass.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: d00cabe

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@stainless-code/codemap Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@SutuSebastian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 18 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d746fa7b-bf61-4ec3-a905-c4149f5bd2e1

📥 Commits

Reviewing files that changed from the base of the PR and between d757272 and d00cabe.

📒 Files selected for processing (1)
  • .changeset/codemap-richer-index.md
📝 Walkthrough

Walkthrough

Adds a two-phase diff apply engine wired into CLI/MCP/HTTP with tests and docs. Simultaneously overhauls indexing with new schema, parser extractors, bindings/cycles resolution, updates index-engine, and introduces numerous recipes, fixtures, and research documentation.

Changes

Codemap apply engine and integrations

Layer / File(s) Summary
Apply engine implementation and validations
src/application/apply-engine.ts
Two-phase validate/apply with path containment, duplicate detection, conflict reporting, and atomic file writes.
Apply engine unit tests
src/application/apply-engine.test.ts
Covers envelopes, conflicts, ordering, CRLF, deletions, $ escaping, duplicates, path escapes, and failures.
CLI command, parsing, and e2e tests
src/cli/*
Adds apply command, args/help, interactive preview/prompt, JSON/TTY output; end-to-end tests.
MCP/HTTP tool handlers and server wiring
src/application/tool-handlers.ts, src/application/http-server.ts, src/application/mcp-server.ts, tests
Registers apply tool, validates payloads, dispatches engine, returns structured results; tests added.
Apply docs, glossary, changeset, and roadmap update
.agents/*, docs/*, templates/agents/*, .changeset/*
Documents apply behavior/envelope and wiring; glossary entry; changeset; roadmap/backlog adjusted.

Richer indexing substrate, bindings/cycles, recipes, and fixtures

Layer / File(s) Summary
DB schema/version and TS row APIs
src/db.ts
Schema version bump with new tables/columns and indexes; updated insert/drop and TypeScript row interfaces.
Parser refactor, parsed types, and adapter outputs
src/parser.ts, src/parsed-types.ts, src/adapters/*
Modular extractor pipeline; returns new datasets; adapters emit extended payloads.
New extraction tiers (offsets/jsdoc/types/scopes/symbols/components/complexity/calls/references/runtime-markers/tests)
src/extractors/*, src/markers.ts
Implements tiered extractors to populate new schema tables and marker columns.
Index engine pass-2, bindings resolver, and import cycles
src/application/index-engine.ts, src/application/bindings-engine.ts, src/application/cycles-engine.ts, tests
Persists new datasets; computes/persists bindings, re-export chains, and module cycles; updates tests.
Recipe templates and golden fixtures
templates/recipes/*, fixtures/golden/**/*
Adds SQL/Markdown recipes for new capabilities and corresponding golden fixtures and scenarios.
Research note, architecture tables, config, and params merge tweak
docs/*, src/config.ts, src/application/recipe-params.ts
Adds research note; adjusts architecture docs; excludes audit-cache; tweaks params merge.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant MCP
  participant HTTP
  participant ToolHandlers
  participant ApplyEngine
  CLI->>ToolHandlers: parse/run apply
  MCP->>ToolHandlers: apply(args)
  HTTP->>ToolHandlers: apply(JSON)
  ToolHandlers->>ApplyEngine: validate/apply(rows, root, dryRun)
  ApplyEngine-->>ToolHandlers: envelope {mode, applied, files, conflicts, summary}
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

enhancement, documentation

Poem

A rabbit taps the keys with glee,
Diffs align in harmony.
Validate, then write—so spry,
Conflicts pause, then safely try.
New maps bind names through every lane,
Cycles traced, the loops we tame—🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/codemap-richer-index

Local macOS and CI Linux returned the same 4 markers in different
implementation-defined order without ORDER BY. Fix is to make the
query deterministic. Also bumps markers-all-kinds golden (counts
reflect the audit-cache exclusion from 9b5f9a2).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (6)
docs/research/codemap-richer-index-synthesis-2026-05.md (1)

100-113: 💤 Low value

Aspirational code example in research note — consider validation status.

This Path B sketch shows a future integration pattern with ts-morph, not a current API. The import path @stainless-code/codemap-mcp and method query_recipe may not match the actual package structure when this is implemented.

The coding guideline states "Code examples in documentation must be tested and kept current." However, this is a research note containing design explorations rather than user-facing documentation. The example is explicitly labeled as a "sketch" of a potential future direction (Path B), and section 9 notes these research notes will be slimmed or retired as decisions lift to plan PRs.

Consider whether aspirational design examples in research notes should be syntax-validated even if not functionally tested, or if the guideline only applies to user-facing documentation of current features.

As per coding guidelines: "Code examples in documentation must be tested and kept current" — but this applies to research note design sketches which describe future possibilities rather than current implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/research/codemap-richer-index-synthesis-2026-05.md` around lines 100 -
113, The example code block uses non‑existent/unstable symbols (codemap,
query_recipe from "@stainless-code/codemap-mcp", and ts-morph's Project) and is
an aspirational sketch, so update the note to mark the snippet as
unvalidated/experimental: add an explicit one‑line disclaimer before the code
that this is a Path B sketch, not a tested API, and mention the specific symbols
(codemap, query_recipe, Project, ts-morph) that may change; alternatively,
replace the snippet with a syntactically valid but clearly stubbed example
(e.g., pseudocode or comment placeholders) so the document obeys the “examples
must be kept current” guideline while preserving the research intent.
src/application/index-engine.ts (1)

293-312: ⚡ Quick win

fetchTableStats not updated for the eight new tables.

The richer-index PR adds scopes, references, bindings, import_specifiers, function_params, runtime_markers, test_suites, re_export_chains, module_cycles, file_metrics, but the post-index stats summary still only lists the original 12. Users running codemap --full won't see the new tiers' row counts in the indexer output, which makes "did extraction work?" hard to verify without ad-hoc SQL. Consider extending IndexTableStats and this query to surface the new tables.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/application/index-engine.ts` around lines 293 - 312, The fetchTableStats
function and the IndexTableStats type need to be updated to include the eight+
new tables added by the richer-index PR; update the SQL in fetchTableStats to
SELECT COUNT(*) for each new table (scopes, references, bindings,
import_specifiers, function_params, runtime_markers, test_suites,
re_export_chains, module_cycles, file_metrics, etc.) and expand the
IndexTableStats shape to include corresponding fields so the returned row maps
to the new properties; ensure the returned object keys match IndexTableStats
property names and that fetchTableStats still returns the row cast as
IndexTableStats.
src/application/bindings-engine.ts (3)

541-557: ⚡ Quick win

Duplicated re_export_source parser — extract a shared helper.

The 16-line block that parses re_export_source into (source, imported_name) is replicated verbatim in resolveBindings (541‑557) and resolveReExportChains (752‑764). Any future tweak (e.g., new .default encoding) must be mirrored in both places. Extract a small helper to keep them in sync.

♻️ Suggested helper
+function parseReExportSource(
+  raw: string,
+  fallbackName: string,
+): { source: string; imported_name: string } {
+  const dotIdx = raw.lastIndexOf(".");
+  const sourceTail = raw.split("/").pop() ?? "";
+  const hasNameSuffix =
+    !sourceTail.startsWith(".") &&
+    dotIdx > raw.lastIndexOf("/") &&
+    dotIdx > 0;
+  return {
+    source: hasNameSuffix ? raw.slice(0, dotIdx) : raw,
+    imported_name: hasNameSuffix ? raw.slice(dotIdx + 1) : fallbackName,
+  };
+}

…then call it from both loops.

Also applies to: 752-764

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/application/bindings-engine.ts` around lines 541 - 557, Extract the
duplicated parsing logic for r.re_export_source into a single helper (e.g.,
parseReExportSource or parseReExportSourceString) that accepts the
re_export_source string and returns an object with { source, imported_name };
then replace the 16-line parsing block in resolveBindings and the identical
block in resolveReExportChains with calls to that helper, passing
r.re_export_source and using the returned source/imported_name to call
re.set(r.name, ...) (or the equivalent in the other loop). Ensure the helper
implements the same rules (detecting the last dot after the last slash, handling
`.default` and names without suffixes) so behavior remains identical.

437-597: 💤 Low value

Add JSDoc to exported binding-engine functions.

resolveBindings, resolveReExportChains, persistReExportChains, and persistBindings are exported public APIs. Only resolveReExportChains has a doc comment. The repo's coding guideline requires accompanying documentation on public APIs; a one-liner stating the responsibility, side effects (persistReExportChains deletes & rewrites the whole table), and the orphan-clearing behavior of persistBindings would help callers understand the contract without reading the body.

As per coding guidelines: "All public APIs must have accompanying documentation".

Also applies to: 735-811

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/application/bindings-engine.ts` around lines 437 - 597, Add JSDoc
comments to the exported public functions resolveBindings,
resolveReExportChains, persistReExportChains, and persistBindings: for each
provide a one-line summary of responsibility and any side effects (e.g.,
persistReExportChains deletes and rewrites the entire re-export table) and for
persistBindings mention its orphan-clearing behavior when writing bindings;
place the comments immediately above each function declaration (matching the
style used for resolveReExportChains) so callers can understand the contract
without reading the implementation.

13-371: 💤 Low value

Duplicate entries in TYPE_GLOBALS and GLOBALS — dedupe.

Set<string> literals contain several duplicates that bloat the source and obscure intent:

  • TYPE_GLOBALS: Window, Document, HTMLElement, FileReader, FormData, Headers, Request, Response, URL, URLSearchParams, Event all appear twice in different "section comments".
  • GLOBALS: Number, FileReader, Image, FormData, AbortController, Headers, Request, Response, URL, URLSearchParams, Blob, File are repeated.

Also, GLOBALS contains reserved keywords / non-identifiers (undefined, null, true, false, this, arguments, super, new, void, typeof, instanceof, in, of) which can never be references.name values, so they're dead entries. None of these affect correctness (Set dedupes at insert) but they make the allowlists harder to audit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/application/bindings-engine.ts` around lines 13 - 371, Remove duplicate
and invalid entries from the two allowlists: dedupe TYPE_GLOBALS and GLOBALS by
ensuring each symbol appears only once (e.g., remove repeated "Window",
"Document", "HTMLElement", "FileReader", "FormData", "Headers", "Request",
"Response", "URL", "URLSearchParams", "Event", etc. from TYPE_GLOBALS and
repeated "Number", "FileReader", "Image", "FormData", "AbortController",
"Headers", "Request", "Response", "URL", "URLSearchParams", "Blob", "File" from
GLOBALS). Also remove reserved keywords/non-identifiers from GLOBALS (e.g.,
"undefined", "null", "true", "false", "this", "arguments", "super", "new",
"void", "typeof", "instanceof", "in", "of") since they cannot be reference
names; keep the sets concise and optionally reorder or comment-group remaining
entries for readability while preserving the Set literal names TYPE_GLOBALS and
GLOBALS.
src/extractors/runtime-markers.ts (1)

30-47: 💤 Low value

end parameter is unused — drop it or wire it through.

emit() accepts end but only does void end;. Either remove the param (cleaner signature) or emit line_end/column_end alongside line_start/column_start for symmetry with markers (which now records both). Recipes that highlight runtime markers in editors would benefit from the end span.

♻️ Suggested simplification
-    function emit(
-      kind: "console" | "debugger" | "throw" | "process-env",
-      start: number,
-      end: number,
-      detail: string | null,
-    ) {
+    function emit(
+      kind: "console" | "debugger" | "throw" | "process-env",
+      start: number,
+      detail: string | null,
+    ) {
       const lineStart = offsetToLine(lineMap, start);
       const lineStartOffset = lineMap[lineStart - 1] ?? 0;
       out.push({
         file_path: relPath,
         kind,
         line_start: lineStart,
         column_start: start - lineStartOffset,
         detail,
         scope_local_id: scopes.currentLocalId(),
       });
-      void end;
     }

…and drop end at each call site.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/extractors/runtime-markers.ts` around lines 30 - 47, The emit function
currently accepts an unused end parameter; either remove end from the emit
signature and all its call sites (simpler), or wire it through to emit span end
fields: compute lineEnd = offsetToLine(lineMap, end) and lineEndOffset =
lineMap[lineEnd - 1] ?? 0 and add line_end: lineEnd and column_end: end -
lineEndOffset to the object pushed into out (alongside existing file_path, kind,
line_start, column_start, detail, scope_local_id), updating all callers of emit
and keeping references to emit, offsetToLine, lineMap, out, relPath, and
scopes.currentLocalId() in mind; ensure symmetry with the markers
representation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/application/apply-engine.ts`:
- Around line 122-124: The lexical resolve of projectRoot (resolvedRoot) doesn't
follow symlinks and allows symlink-escape when later using filesystem-aware ops
(readFileSync, writeFileSync, renameSync); fix by resolving symlinks with
fs.realpathSync on the resolvedRoot (or realpathSync the file's parent before
any write/read/rename) and use that real path in isWithinRoot checks so all
isWithinRoot comparisons use real paths rather than path.resolve results; update
uses of resolvedRoot/isWithinRoot to compare against the realpath (or validate
realpath of targets) to prevent symlink escape.

In `@src/application/index-engine.ts`:
- Around line 443-452: The incremental path unconditionally calls
insertFileMetrics(db, [data.fileMetrics]) which can throw if data.fileMetrics is
undefined/null; match the full-rebuild behavior in insertParsedResults by adding
a presence guard (e.g., if (data.fileMetrics)) before calling insertFileMetrics
so insertFileMetrics is only invoked when fileMetrics exists; locate the call in
the incremental flow where insertFileMetrics is invoked and apply the same
conditional check used in insertParsedResults (and consider extractFileData's
possible undefined return).

In `@src/extractors/calls.ts`:
- Around line 37-44: The callee extraction currently only handles a single
MemberExpression level (Identifier.property or ThisExpression.property) and
drops nested chains like foo.bar.baz(); update the logic around callee?.type ===
"MemberExpression" to recursively flatten MemberExpression chains: walk the
callee object up while its type is MemberExpression, prepending each property
(use ".prop" for non-computed properties and "[<expression>]" for computed
properties) until you reach an Identifier or ThisExpression, then prefix the
root name (identifier name or "this") to produce calleeName; preserve existing
handling for Identifier and ThisExpression and ensure you respect computed vs
non-computed properties when building the string.

In `@src/extractors/components.ts`:
- Around line 20-32: The code currently uses a single variable currentScope in
the scope-tracking object (enter, current, exit) which loses outer scope when
nested components are traversed; replace currentScope with a stack (e.g.,
scopeStack: string[]), modify enter(name) to push name and ensure
hookCalls.set(name, []) when first seen, modify current() to return the top of
the stack (last element) or null when empty, and modify exit() to pop the stack;
apply the same stack-based change to the other similar scope-tracking blocks
(the ones mirroring enter/current/exit logic) so nested scope attribution is
preserved across all occurrences (refer to the functions/methods named enter,
current, exit and the variable hookCalls to locate each instance).

In `@src/extractors/references.ts`:
- Around line 74-80: The VariableDeclarator visitor currently only marks
declarations as writes when node.init exists, so bare declarations like let x;
or var y; are missed; update the VariableDeclarator handler (function name:
VariableDeclarator) to mark declaration binding positions unconditionally for
Identifier ids by adding the id.start to writePositions and suppressedReads
regardless of node.init (keep the existing id type check), i.e., remove or
bypass the node.init gating so declarations without initializers emit write
refs.
- Around line 54-60: The AssignmentExpression visitor only handles Identifier
LHS nodes; update it to treat ObjectPattern and ArrayPattern LHSs by walking the
pattern and adding each Identifier's start to writePositions (and to
suppressedReads when node.operator === "=") so destructured targets like ({a} =
obj) or [x] = arr are marked as writes rather than reads; keep the existing
behavior for simple Identifier handling and for non-"=" operators (compound
assignments) where suppressedReads should not be added.

In `@src/extractors/scopes.ts`:
- Line 33: The current scope string assembly (variable scopeStr built from name)
uses empty names for anonymous scopes which produces identical text (e.g.,
"foo.") and causes call-edge collisions; update the logic in the scope-building
code (the scopeStr assignment and the other occurrences that append name) to
substitute a stable anonymous token when name === "" — for example use a
deterministic local identifier (node.localId or a generated stable id tied to
the block/function) instead of an empty string, so replace `${name}` with
`${name || <stableAnonIdForNode>}` wherever scopeStr is constructed (ensure the
same token source is used at all occurrences to keep tokens stable across runs).

In `@src/extractors/symbols.ts`:
- Around line 151-152: The code currently uses node.start/node.end when
computing line spans via offsetToLine for each declarator (affecting
lineStart/lineEnd and body_line_count), which overestimates spans in
multi-declaration statements; update the logic in the symbols extraction so each
declarator uses its own positions: use decl.start/decl.end for declarators and,
for declarators whose init is a function, use init.start/init.end to compute the
function body span (so offsetToLine(lineMap, decl.start/decl.end) or
offsetToLine(lineMap, init.start/init.end) as appropriate), and ensure the same
change is applied where node.start/node.end are used later (the other occurrence
analogous to lines 180-181).

In `@src/extractors/type-stringify.ts`:
- Around line 16-18: The current TSQualifiedName handling uses tn.left?.name
which fails when tn.left is another TSQualifiedName; replace that inline string
interpolation with a recursive resolver that walks TSQualifiedName nodes to
build the full dotted identifier (e.g., implement a helper like
getQualifiedName(node) or update the current stringify function to, when
encountering tn.type === "TSQualifiedName", recursively compute the left part
(if left.type === "TSQualifiedName" call the resolver, otherwise use left.name)
and then append "." + right.name so nested names like A.B.C are returned intact
before any generic/composed logic.

In `@templates/recipes/find-export-sites.md`:
- Line 9: The action description uses call-site wording "callee-token-precise"
which is inconsistent for an export-site recipe; update the description string
so it uses export-specific terminology (e.g., replace "callee-token-precise"
with "export-name-token-precise" or "export-token-precise") so the phrase reads:
"Each row carries `file_path:line_start:column_start` (export-name-token-precise
per R.6) plus `is_re_export`..." and keep the rest of the sentence intact.

In `@templates/recipes/find-skipped-tests.sql`:
- Around line 6-10: The CASE that computes status currently checks is_skipped
before is_only, causing rows with multiple flags to be labeled 'skipped' instead
of the higher-priority 'only'; update the CASE expression (the WHEN branches
that set status) so that WHEN is_only = 1 THEN 'only' comes before WHEN
is_skipped = 1 THEN 'skipped' (keep is_todo logic and the AS status alias
unchanged) so `.only` is prioritized when multiple flags are set.

In `@templates/recipes/large-functions.md`:
- Line 10: Update the description in templates/recipes/large-functions.md to
explicitly state that the query returns only the top 50 largest functions (the
SQL uses LIMIT 50), so users know the output is truncated; mention the ranking
is by body_line_count and includes complexity, and note that if they need the
full set they should remove or change the LIMIT in the corresponding SQL query.

---

Nitpick comments:
In `@docs/research/codemap-richer-index-synthesis-2026-05.md`:
- Around line 100-113: The example code block uses non‑existent/unstable symbols
(codemap, query_recipe from "@stainless-code/codemap-mcp", and ts-morph's
Project) and is an aspirational sketch, so update the note to mark the snippet
as unvalidated/experimental: add an explicit one‑line disclaimer before the code
that this is a Path B sketch, not a tested API, and mention the specific symbols
(codemap, query_recipe, Project, ts-morph) that may change; alternatively,
replace the snippet with a syntactically valid but clearly stubbed example
(e.g., pseudocode or comment placeholders) so the document obeys the “examples
must be kept current” guideline while preserving the research intent.

In `@src/application/bindings-engine.ts`:
- Around line 541-557: Extract the duplicated parsing logic for
r.re_export_source into a single helper (e.g., parseReExportSource or
parseReExportSourceString) that accepts the re_export_source string and returns
an object with { source, imported_name }; then replace the 16-line parsing block
in resolveBindings and the identical block in resolveReExportChains with calls
to that helper, passing r.re_export_source and using the returned
source/imported_name to call re.set(r.name, ...) (or the equivalent in the other
loop). Ensure the helper implements the same rules (detecting the last dot after
the last slash, handling `.default` and names without suffixes) so behavior
remains identical.
- Around line 437-597: Add JSDoc comments to the exported public functions
resolveBindings, resolveReExportChains, persistReExportChains, and
persistBindings: for each provide a one-line summary of responsibility and any
side effects (e.g., persistReExportChains deletes and rewrites the entire
re-export table) and for persistBindings mention its orphan-clearing behavior
when writing bindings; place the comments immediately above each function
declaration (matching the style used for resolveReExportChains) so callers can
understand the contract without reading the implementation.
- Around line 13-371: Remove duplicate and invalid entries from the two
allowlists: dedupe TYPE_GLOBALS and GLOBALS by ensuring each symbol appears only
once (e.g., remove repeated "Window", "Document", "HTMLElement", "FileReader",
"FormData", "Headers", "Request", "Response", "URL", "URLSearchParams", "Event",
etc. from TYPE_GLOBALS and repeated "Number", "FileReader", "Image", "FormData",
"AbortController", "Headers", "Request", "Response", "URL", "URLSearchParams",
"Blob", "File" from GLOBALS). Also remove reserved keywords/non-identifiers from
GLOBALS (e.g., "undefined", "null", "true", "false", "this", "arguments",
"super", "new", "void", "typeof", "instanceof", "in", "of") since they cannot be
reference names; keep the sets concise and optionally reorder or comment-group
remaining entries for readability while preserving the Set literal names
TYPE_GLOBALS and GLOBALS.

In `@src/application/index-engine.ts`:
- Around line 293-312: The fetchTableStats function and the IndexTableStats type
need to be updated to include the eight+ new tables added by the richer-index
PR; update the SQL in fetchTableStats to SELECT COUNT(*) for each new table
(scopes, references, bindings, import_specifiers, function_params,
runtime_markers, test_suites, re_export_chains, module_cycles, file_metrics,
etc.) and expand the IndexTableStats shape to include corresponding fields so
the returned row maps to the new properties; ensure the returned object keys
match IndexTableStats property names and that fetchTableStats still returns the
row cast as IndexTableStats.

In `@src/extractors/runtime-markers.ts`:
- Around line 30-47: The emit function currently accepts an unused end
parameter; either remove end from the emit signature and all its call sites
(simpler), or wire it through to emit span end fields: compute lineEnd =
offsetToLine(lineMap, end) and lineEndOffset = lineMap[lineEnd - 1] ?? 0 and add
line_end: lineEnd and column_end: end - lineEndOffset to the object pushed into
out (alongside existing file_path, kind, line_start, column_start, detail,
scope_local_id), updating all callers of emit and keeping references to emit,
offsetToLine, lineMap, out, relPath, and scopes.currentLocalId() in mind; ensure
symmetry with the markers representation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f375594-3e45-41a9-a686-5c2acd6c859f

📥 Commits

Reviewing files that changed from the base of the PR and between bfc0b8a and b5191f1.

📒 Files selected for processing (99)
  • .agents/rules/codemap.md
  • .agents/skills/codemap/SKILL.md
  • .changeset/codemap-apply.md
  • docs/architecture.md
  • docs/glossary.md
  • docs/plans/codemap-apply.md
  • docs/plans/substrate-extraction.md
  • docs/research/codemap-richer-index-synthesis-2026-05.md
  • docs/roadmap.md
  • fixtures/golden/minimal/barrel-chains.json
  • fixtures/golden/minimal/circular-imports.json
  • fixtures/golden/minimal/deeply-nested-functions.json
  • fixtures/golden/minimal/env-var-audit.json
  • fixtures/golden/minimal/find-by-param-type.json
  • fixtures/golden/minimal/find-call-sites.json
  • fixtures/golden/minimal/find-export-sites.json
  • fixtures/golden/minimal/find-import-sites.json
  • fixtures/golden/minimal/find-leftover-console.json
  • fixtures/golden/minimal/find-references.json
  • fixtures/golden/minimal/find-skipped-tests.json
  • fixtures/golden/minimal/find-symbol-definitions.json
  • fixtures/golden/minimal/find-symbol-references.json
  • fixtures/golden/minimal/find-write-sites.json
  • fixtures/golden/minimal/index-summary.json
  • fixtures/golden/minimal/large-functions.json
  • fixtures/golden/minimal/tests-by-file.json
  • fixtures/golden/scenarios.json
  • src/adapters/builtin.ts
  • src/adapters/types.ts
  • src/application/apply-engine.test.ts
  • src/application/apply-engine.ts
  • src/application/bindings-engine.ts
  • src/application/cycles-engine.ts
  • src/application/http-server.ts
  • src/application/impact-engine.test.ts
  • src/application/index-engine.ts
  • src/application/mcp-server.test.ts
  • src/application/mcp-server.ts
  • src/application/recipe-params.ts
  • src/application/resource-handlers.test.ts
  • src/application/tool-handlers.test.ts
  • src/application/tool-handlers.ts
  • src/cli/bootstrap.ts
  • src/cli/cmd-apply.test.ts
  • src/cli/cmd-apply.ts
  • src/cli/main.ts
  • src/config.ts
  • src/db.ts
  • src/extractors/calls.ts
  • src/extractors/complexity.ts
  • src/extractors/components.ts
  • src/extractors/jsdoc.ts
  • src/extractors/markers.ts
  • src/extractors/offsets.ts
  • src/extractors/params.ts
  • src/extractors/references.ts
  • src/extractors/runtime-markers.ts
  • src/extractors/scopes.ts
  • src/extractors/symbols.ts
  • src/extractors/tests.ts
  • src/extractors/type-stringify.ts
  • src/extractors/types.ts
  • src/markers.ts
  • src/parsed-types.ts
  • src/parser.ts
  • templates/agents/rules/codemap.md
  • templates/agents/skills/codemap/SKILL.md
  • templates/recipes/barrel-chains.md
  • templates/recipes/barrel-chains.sql
  • templates/recipes/circular-imports.md
  • templates/recipes/circular-imports.sql
  • templates/recipes/deeply-nested-functions.md
  • templates/recipes/deeply-nested-functions.sql
  • templates/recipes/env-var-audit.md
  • templates/recipes/env-var-audit.sql
  • templates/recipes/find-by-param-type.md
  • templates/recipes/find-by-param-type.sql
  • templates/recipes/find-call-sites.md
  • templates/recipes/find-call-sites.sql
  • templates/recipes/find-export-sites.md
  • templates/recipes/find-export-sites.sql
  • templates/recipes/find-import-sites.md
  • templates/recipes/find-import-sites.sql
  • templates/recipes/find-leftover-console.md
  • templates/recipes/find-leftover-console.sql
  • templates/recipes/find-references.md
  • templates/recipes/find-references.sql
  • templates/recipes/find-skipped-tests.md
  • templates/recipes/find-skipped-tests.sql
  • templates/recipes/find-symbol-definitions.md
  • templates/recipes/find-symbol-definitions.sql
  • templates/recipes/find-symbol-references.md
  • templates/recipes/find-symbol-references.sql
  • templates/recipes/find-write-sites.md
  • templates/recipes/find-write-sites.sql
  • templates/recipes/large-functions.md
  • templates/recipes/large-functions.sql
  • templates/recipes/tests-by-file.md
  • templates/recipes/tests-by-file.sql
💤 Files with no reviewable changes (2)
  • docs/roadmap.md
  • docs/plans/codemap-apply.md

Comment thread src/application/apply-engine.ts
Comment thread src/application/index-engine.ts
Comment thread src/extractors/calls.ts
Comment thread src/extractors/components.ts Outdated
Comment thread src/extractors/references.ts
Comment thread src/extractors/symbols.ts Outdated
Comment thread src/extractors/type-stringify.ts Outdated
Comment thread templates/recipes/find-export-sites.md Outdated
Comment thread templates/recipes/find-skipped-tests.sql
Comment thread templates/recipes/large-functions.md Outdated
10 of 12 unresolved threads applied (1 deferred to feat/codemap-apply,
1 pushed back as deliberate semantics — see PR replies).

Applied:

- index-engine.ts: add missing `if (parsed.fileMetrics)` guard on the
  incremental path to mirror the full-rebuild guard at line 259.
- calls.ts: recursive member-expression flatten. `obj.foo.bar()` now
  emits `obj.foo.bar` instead of being dropped. Computed segments
  (`a[i].b()`) still abort flattening — recipe queries filter on
  dot-joined identifier shape that computed breaks. Empirically: 79
  new chain rows on codemap-self.
- components.ts: scope tracker → stack so nested PascalCase functions
  (`function Outer() { function Inner() {…} }`) preserve Outer's
  attribution after Inner exits.
- references.ts: AssignmentExpression LHS walker handles
  ObjectPattern / ArrayPattern / AssignmentPattern / RestElement,
  marking every leaf Identifier as write (and suppressing read for
  simple `=`).
- scopes.ts: anonymous scopes use `$anon_<localId>` segment in
  `scopeStr` so two sibling callbacks don't share `caller_scope` and
  dedup as one edge in `calls`.
- symbols.ts: per-declarator spans (decl.start/.end) instead of the
  whole VariableDeclaration so `body_line_count` doesn't inflate on
  `const a = (…) => long, b = (…) => longer`.
- type-stringify.ts: recursive `qualifiedNameOf` walks
  TSQualifiedName chains so `A.B.C` doesn't truncate to
  `undefined.C`.
- find-export-sites.md: rename `callee-token-precise` →
  `export-name-token-precise` (copy-paste from find-call-sites).
- find-skipped-tests.sql: prioritise `.only` over `.skip` in the
  status CASE (a row with both flags hides the higher-risk one).
- large-functions.md: mention `LIMIT 50` in the description.

930 tests pass; all goldens pass.
5 of 6 nitpicks applied (1 declined — see PR reply).

Applied:

- `IndexTableStats` + `fetchTableStats` extended to surface the 10 new
  tables in the post-index summary (scopes, references, bindings,
  import_specifiers, function_params, runtime_markers, test_suites,
  re_export_chains, module_cycles, file_metrics). Empty-stats
  initialiser bumped to match.
- Extracted `parseReExportSource(raw, fallbackName)` helper —
  resolveBindings + resolveReExportChains shared a 16-line block; any
  future `.default` encoding tweak now lives in one place.
- JSDoc one-liners on the two side-effecting bindings-engine exports
  (`persistBindings` orphan-clear, `persistReExportChains` truncate +
  rewrite). resolveX functions left bare — names are self-evident per
  concise-comments.
- Deduped TYPE_GLOBALS (4 dupes: Window, Document, HTMLElement, Event)
  and GLOBALS (12 dupes: Number, FileReader, Image, FormData,
  AbortController, Headers, Request, Response, Blob, File, URL,
  URLSearchParams). Kept reserved keywords (`undefined`, `null`, `this`,
  etc.) — `undefined` IS a valid Identifier in JS; the others are cheap
  defensive entries with no false-positive cost.
- runtime_markers gained `column_end` column (was unused emit param).
  SCHEMA_VERSION 25 → 26. Editor-highlight recipes can now span the
  whole `console.X` / `process.env.X` token.
- Research note `codemap-richer-index-synthesis-2026-05.md` Path B
  sketch gained an "Unvalidated illustration" disclaimer.

Declined: none — all nitpicks were genuine cheap wins. See PR for
the one applied selectively (JSDoc on side-effecting persist*; resolve*
left bare).

930 tests pass; all goldens pass.
…bstrate

The Tier 2-12 substrate landed (commits a1a17ba…aa18b13) but reference
docs still listed the pre-substrate 12 tables. This catches them up.

- architecture.md § Schema — added 10 new table sections (scopes,
  references, bindings, import_specifiers, function_params, file_metrics,
  re_export_chains, module_cycles, runtime_markers, test_suites) plus
  new columns on existing tables (symbols: scope_local_id /
  name_column_* / body_line_count / param_count / nesting_depth;
  calls: line_start / column_*; exports: line_* / column_* / is_re_export;
  markers: column_*).
- glossary.md — entries for bindings, scopes, references,
  function_params, file_metrics, re_export_chains, module_cycles,
  runtime_markers, test_suites, import_specifiers (alphabetic slots).
- roadmap.md Moat B example list extended with the new table names so
  the "what recipe does dropping this kill?" reviewer test has the full
  surface to defend.
- golden-queries.md status table — scenario-coverage description lists
  all current indexed tables instead of just the original 12.

930 tests pass; all goldens pass.
`codemap agents init` ships these files; they're what end-user agents
read. Before this commit, neither mentioned a single new table — agents
installed in downstream projects would have no idea the 10 new
substrate tables existed.

- templates/agents/rules/codemap.md
  - Indexing-summary sentence lists the new tables.
  - Trigger patterns gained 12 new rows: find-references,
    find-symbol-references, find-write-sites, find-by-param-type,
    circular-imports, barrel-chains, find-leftover-console,
    env-var-audit, find-skipped-tests, tests-by-file, large-functions,
    deeply-nested-functions, nesting_depth lookup.
  - Quick reference queries gained 9 substrate-table SQL examples.

- templates/agents/skills/codemap/SKILL.md
  - Added schema sections (matching the .agents/skills/codemap/SKILL.md
    + docs/architecture.md format) for: import_specifiers, scopes,
    references, bindings, function_params, file_metrics,
    re_export_chains, module_cycles, runtime_markers, test_suites.

930 tests pass; all goldens pass.
zod 4.4 reorganized internals; @modelcontextprotocol/sdk pulled its own
4.3 copy via dep range, so `zod/v4/core` resolved to two different
$ZodType identities and TS rejected our registerTool inputSchemas. Pin
both via a `"zod": "$zod"` override so a single 4.4.3 is shared.
express-rate-limit@8.5.2 bumped its ip-address constraint to ^10.2.0,
so a fresh resolve naturally picks the patched version without help.
Same fresh resolve also retires the unrelated fast-uri / hono CI audit
findings (ajv 8.20 + @hono/node-server 1.19.14 → patched transitives).

bun audit: "No vulnerabilities found".
tsdown@0.22.0 moved unrun from `dependencies` to `peerDependencies`
(constraint `*`). Locally `bun run build` works because Bun's run
intercepts the `#!/usr/bin/env node` shebang and executes tsdown under
Bun (autoLoader → "native", unrun path skipped). In CI the same script
ends up executing the binstub under Node, where `process.versions.bun`
is undefined → autoLoader resolves to "unrun" → import fails.

Declaring unrun (^0.3.0) as a devDependency satisfies the peer for the
Node code path. Verified `node node_modules/tsdown/dist/run.mjs`
succeeds locally.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/db.ts (1)

1265-1294: ⚡ Quick win

insertTestSuites uses firstId + parent_index mapping with minimal but practical constraints.

The code assumes:

  1. Parent rows are emitted before children — the test extractor's DFS traversal guarantees this: a parent is added at index idx, pushed onto parentStack, and all children are added while the parent is on the stack, so all children indices are > idx. This is not an implicit contract; it's enforced by the extractor.

  2. No inter-batch concurrent writes to test_suitesbatchInsert calls db.run() once per BATCH_SIZE batch without transaction wrapping. In the single-process per-file extraction context, this is safe; a concurrent writer between batches would desynchronize the firstId + parent_index mapping and violate the immediate FK on parent_suite_id.

  3. No gaps in sqlite_sequence — single-process per-file context makes this unlikely.

For robustness, consider wrapping the function body in a SAVEPOINT (precedent: reconcileBoundaryRules on lines 627–641) to pin the sqlite_sequence snapshot to the inserts and handle failure atomicity:

♻️ Suggested hardening
 export function insertTestSuites(db: CodemapDatabase, rows: TestSuiteRow[]) {
   if (!rows.length) return;
-  // Insert in a single transaction; rowids are sequential so the
-  // parent_index → real id mapping is `firstId + parent_index`.
+  db.run("SAVEPOINT insert_test_suites");
+  try {
     const firstIdRow = db
       .query<{ seq: number | null }>(
         "SELECT seq FROM sqlite_sequence WHERE name = 'test_suites'",
       )
       .get();
     const firstId = (firstIdRow?.seq ?? 0) + 1;
     batchInsert(
       db,
       rows,
       "INSERT INTO test_suites (file_path, name, kind, line_start, line_end, parent_suite_id, is_skipped, is_only, is_todo, framework)",
       "(?,?,?,?,?,?,?,?,?,?)",
       (r, v) =>
         v.push(
           r.file_path,
           r.name,
           r.kind,
           r.line_start,
           r.line_end,
           r.parent_index === null ? null : firstId + r.parent_index,
           r.is_skipped,
           r.is_only,
           r.is_todo,
           r.framework,
         ),
     );
+    db.run("RELEASE SAVEPOINT insert_test_suites");
+  } catch (error) {
+    db.run("ROLLBACK TO SAVEPOINT insert_test_suites");
+    db.run("RELEASE SAVEPOINT insert_test_suites");
+    throw error;
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/db.ts` around lines 1265 - 1294, insertTestSuites currently reads
sqlite_sequence then calls batchInsert across multiple db.run batches which can
be desynchronized by intervening writes; wrap the body of insertTestSuites in a
SAVEPOINT (like reconcileBoundaryRules does) so you pin the sqlite_sequence
snapshot, SELECT the seq after creating the SAVEPOINT to compute firstId, run
batchInsert while inside that SAVEPOINT, and RELEASE the SAVEPOINT on success or
ROLLBACK TO SAVEPOINT on error to preserve atomicity and keep the parent_index →
real id mapping correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/extractors/runtime-markers.ts`:
- Around line 54-60: The current runtime-marker emission uses only identifier
text (callee.object.name === "console"/"process") which misclassifies locally
shadowed bindings; before calling emit("console", ...) or emit("process", ...)
add a scope-resolved guard that verifies the Identifier is not bound in the
active scope chain (i.e. skip if there is a local binding for the Identifier
name). Locate the MemberExpression handler that references callee and emit, and
use your parser/scope API (currentScope/ancestors or
scopeManager.getScope/getBinding/hasBinding) to check for a local binding for
callee.object.name; only call emit when no local binding exists. Apply the same
change to both console and process emission sites so locally shadowed variables
are ignored (preserving global-only detection used by find-leftover-console and
env-var-audit).

---

Nitpick comments:
In `@src/db.ts`:
- Around line 1265-1294: insertTestSuites currently reads sqlite_sequence then
calls batchInsert across multiple db.run batches which can be desynchronized by
intervening writes; wrap the body of insertTestSuites in a SAVEPOINT (like
reconcileBoundaryRules does) so you pin the sqlite_sequence snapshot, SELECT the
seq after creating the SAVEPOINT to compute firstId, run batchInsert while
inside that SAVEPOINT, and RELEASE the SAVEPOINT on success or ROLLBACK TO
SAVEPOINT on error to preserve atomicity and keep the parent_index → real id
mapping correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e92d930-8cad-495b-b54d-5b23d17e6e81

📥 Commits

Reviewing files that changed from the base of the PR and between b5191f1 and d757272.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • docs/architecture.md
  • docs/glossary.md
  • docs/golden-queries.md
  • docs/research/codemap-richer-index-synthesis-2026-05.md
  • docs/roadmap.md
  • fixtures/golden/scenarios.json
  • package.json
  • src/application/bindings-engine.ts
  • src/application/index-engine.ts
  • src/application/run-index.ts
  • src/application/types.ts
  • src/db.ts
  • src/extractors/calls.ts
  • src/extractors/components.ts
  • src/extractors/references.ts
  • src/extractors/runtime-markers.ts
  • src/extractors/scopes.ts
  • src/extractors/symbols.ts
  • src/extractors/type-stringify.ts
  • templates/agents/rules/codemap.md
  • templates/agents/skills/codemap/SKILL.md
  • templates/recipes/find-export-sites.md
  • templates/recipes/find-skipped-tests.sql
  • templates/recipes/large-functions.md
✅ Files skipped from review due to trivial changes (5)
  • docs/golden-queries.md
  • templates/recipes/find-export-sites.md
  • templates/agents/rules/codemap.md
  • docs/architecture.md
  • docs/research/codemap-richer-index-synthesis-2026-05.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • templates/recipes/find-skipped-tests.sql
  • src/application/index-engine.ts
  • src/extractors/type-stringify.ts
  • src/extractors/calls.ts
  • src/extractors/scopes.ts
  • docs/roadmap.md
  • src/extractors/symbols.ts
  • src/extractors/references.ts
  • src/application/bindings-engine.ts
  • src/extractors/components.ts

Comment thread src/extractors/runtime-markers.ts
SutuSebastian and others added 2 commits May 15, 2026 16:16
PR #79 ships 10 new substrate tables + column additions across 4
existing tables + SCHEMA_VERSION 10 → 26; that fits the .agents/lessons.md
bump policy for "minor" cleanly. Without this changeset, the next
Changesets release would land the substrate silently because
.changeset/codemap-apply.md (the only file on this branch) is consumed
by PR #78's merge.
@SutuSebastian SutuSebastian merged commit ec91bdf into main May 15, 2026
11 checks passed
@SutuSebastian SutuSebastian deleted the feat/codemap-richer-index branch May 15, 2026 13:28
@github-actions github-actions Bot mentioned this pull request May 15, 2026
SutuSebastian added a commit that referenced this pull request May 15, 2026
* feat(apply): slice 1 — apply-engine phase-1 validation + dry-run

Pure transport-agnostic engine implementing phase-1 validation and the
dry-run output shape from the merged plan (Q1–Q10 in
docs/plans/codemap-apply.md). No CLI, no MCP/HTTP wiring, no write
branch yet (Slice 2 lands the latter).

Re-locks Q8 to substring-match (a) — the original "exact byte-match"
draft contradicted the existing buildDiffJson formatter contract and
would have made every shipped rename-preview row conflict (the recipe
emits before_pattern = old_name as the bare identifier, not the full
line). New phase-1 mirrors application/output-formatters.ts buildDiffJson
verbatim: actual.includes(before) for the match check, first-occurrence
substring replace for the transformation (Slice 2), $-pre-escape per
GetSubstitution.

Slice scope:
- src/application/apply-engine.ts — applyDiffPayload({rows, projectRoot,
  dryRun}) returning Q5's ApplyJsonPayload envelope. dryRun=false with a
  clean phase 1 throws NotImplemented (Slice 2 fills in the write).
- src/application/apply-engine.test.ts — 14 unit tests covering happy
  paths, all three conflict reasons, row-shape validation, deterministic
  files[] sort, and the Slice-2 guard semantics.
- docs/plans/codemap-apply.md — Q8 re-lock + edge-case table refresh.

Tests: 14/14 pass. Typecheck / lint / format clean.

* feat(apply): slice 2 — phase 2 writes via temp + rename

Phase 2 lands behind the `!dryRun && conflicts.length === 0` gate per
Q2 (c). Each modified file is written to a sibling temp path then
`rename`d into place — POSIX-atomic per file, so concurrent readers
see either the pre- or post-rename content, never a torn write.

Implementation:
- Phase-1 caches each source's text; phase-2 reuses the cache (one
  read per file across both phases). TOCTOU window collapses to the
  gap between phase-1 read and phase-2 rename — accepted per Q2.
- Phase-2 splits on raw "\n" (not /\r?\n/) so CRLF lines retain their
  trailing \r and round-trip when joined back with "\n". Phase-1
  conflict reporting still strips the \r so `actual_at_line` is clean.
- Edits applied per-file in descending line order — defensive default
  for when multi-line transforms land (today's single-line rows are
  order-independent).
- `$`-pre-escape on `after_pattern` per GetSubstitution rule (mirrors
  buildDiffJson) so identifiers like `$inject` round-trip safely.
- Temp paths use `crypto.randomBytes(6)` so concurrent applies don't
  collide; cleanup on success is implicit (rename atomically removes
  the source name).

Tests: 20/20 pass. Failure-mode coverage: chmod 0o555 on the project
dir to force the temp-write to fail; dry-run no-op-on-disk; no temp
siblings left behind on success; conflict short-circuits before any
writes (good.ts untouched when bad.ts is missing).

* feat(apply): slice 3 — CLI verb + recipe execution + TTY/--yes gate

Adds `codemap apply <recipe-id>` as a positional verb (per Q4) wired
through the same dispatch as every other CLI command. Recipe execution
reuses `queryRows` + the existing `--params` plumbing (`parseParamsCli`
+ `resolveRecipeParams`); rows feed straight into `applyDiffPayload`.

Q6 gating matrix implemented:
- TTY no `--yes` → phase-1 dry-run preview, prompt `Proceed? [y/N]`
  on stderr, default-N, phase-2 only on `y` (uses node:readline).
- TTY `--yes` → no prompt; proceed if validation clean.
- Non-TTY no `--yes` (no `--dry-run`) → reject with stderr message
  ("Pass --yes for non-interactive runs, or --dry-run for preview.").
- `--dry-run` + `--yes` → mutually exclusive, parse-time error.
- `--json` everywhere routes errors as `{"error":"..."}` envelopes.

Files:
- src/cli/cmd-apply.ts — argv parser + run loop. Mirrors cmd-impact's
  shape (positional + flags + JSON envelope).
- src/cli/cmd-apply.test.ts — 10 subprocess integration tests:
  dry-run no-op, --yes happy path (with cross-file import rename via
  rename-preview), Q7 (a) idempotent re-run after reindex, Q6 non-TTY
  rejection (text + JSON), unknown recipe id, missing positional, mut-
  ex check, --help prints without bootstrap.
- src/cli/main.ts + bootstrap.ts — register the verb.

realpath note: tests `realpathSync` the temp project root so oxc-
resolver's symlink-dereferenced `resolved_path` aligns with the
indexed file paths (without it the import-rename rows in rename-
preview return empty on macOS where /tmp → /private/tmp).

Tests: 10/10 integration + 20/20 engine. Typecheck / lint / format clean.

* feat(apply): slice 4 — MCP/HTTP `apply` tool

Registers `apply` as the 13th tool over both MCP (stdio) and HTTP
transports. Dispatches the same `applyDiffPayload` engine the CLI uses;
output envelope is identical to the CLI's --json output (Q5).

- src/application/tool-handlers.ts — `handleApply(args, root)` + Zod
  schema (`applyArgsSchema`). Q6 gate enforced: non-TTY transports
  always require `yes: true` (no prompt to fall back on). dry_run + yes
  rejected as mutually exclusive. Unknown recipe returns 404.
- src/application/mcp-server.ts — `registerApplyTool` mirrors the
  impact tool's shape; description encodes the Q5 envelope + Q2 (c)
  all-or-nothing semantics so agents can reason about the tool without
  reading docs.
- src/application/http-server.ts — adds `apply` to TOOL_NAMES + the
  POST /tool/{name} dispatcher case.
- src/application/tool-handlers.test.ts — 4 handleApply tests (404,
  yes-required, mutex, dry-run envelope shape). 104 mcp/http server
  tests still green; tool catalogs are inferred from TOOL_NAMES so
  the new tool surfaces automatically in /tools listings.

Per the plan's Slice 4 lock: `query_batch` does NOT get an apply
analogue (Moat-A: batched writes are verdict-shaped; consumers
compose multiple apply calls if they need cross-recipe writes).

* docs(apply): slice 5 — lockstep + plan retire

Final slice — lifts the durable design from the plan into reference
docs and retires the plan file per docs/README.md Rule 3.

- docs/architecture.md — new "Apply wiring" section (engine + phase-1
  algorithm + phase-2 atomic temp-rename + Q6 gate + Q5 envelope + Q7
  idempotency) plus "Boundary verification — apply write path" SQL
  kit. Layering table mentions `apply-engine.ts`.
- docs/glossary.md — `codemap apply` / apply tool entry.
- docs/roadmap.md — backlog entry removed (shipped).
- docs/plans/codemap-apply.md — DELETED (closing-state lifecycle per
  docs-governance skill: delete + lift, never "Slim & keep in plans/").
- .agents/rules/codemap.md + .agents/skills/codemap/SKILL.md — Apply
  row in CLI table, "Apply (`bun src/index.ts apply <recipe-id>`)"
  paragraph, MCP `apply` tool listed alongside `impact`.
- templates/agents/rules/codemap.md + templates/agents/skills/codemap/
  SKILL.md — same updates in the published-package mirror (uses
  `codemap` instead of `bun src/index.ts`).
- .changeset/codemap-apply.md — minor bump; summarises Q1–Q10 locks
  + boundary discipline anchor.

Boundary kit verified empty after a fresh reindex of the apply files;
140/140 tests pass across apply-engine + tool-handlers + cmd-apply +
mcp/http-server suites.

* fix(apply): path-containment + overlap detection (triangulated review)

Lands four fixes from a triangulated review of three independent agent
audits (Composer, GPT-5.5, Codex). Two HIGH-severity correctness bugs
were each reproducible against the prior `apply-engine.ts` in 30 seconds:

F1 (HIGH) — Path traversal. Pre-fix:
  applyDiffPayload({ rows: [{ file_path: "../outside.ts", ... }],
                     projectRoot: "/tmp/proj/", dryRun: false })
returned `applied: true` and mutated a sibling-of-root file. Now phase 1
resolves the project root once and rejects (a) absolute `file_path`
inputs and (b) any candidate whose `path.resolve(resolvedRoot, file_path)`
lands outside it. New conflict reason: `path escapes project root`.

F2 (HIGH) — Phase-2 partial cross-file write. Pre-fix: two rows on the
same `(file_path, line_start)` both passed phase-1 (substring check
against original source); phase-2 applied the first replace, the
second's substring assertion failed, the function threw — AFTER earlier
files in alphabetical order had already been `renameSync`d. The "Q2 (c)
all-or-nothing" guarantee was demonstrably broken. Now phase 1
maintains a per-file Set<line_start>; the second hit at the same line
emits a `duplicate edit on same line` conflict before any write.

F3 (MEDIUM, doc-first) — Same-line `before_pattern` ambiguity. The
formatter precedent (`buildDiffJson`) uses `actual.replace(before, after)`,
which rewrites only the leftmost occurrence. `const foo = foo();` with
`before = "foo"` becomes `const bar = foo();` — variable renamed,
recursive call broken, `applied: true` reported. This mirrors the
formatter exactly and the `--format diff` preview shows the same shape,
so the audit's recommendation of an engine-level fix would diverge
preview from execution. Documented as a deliberate limitation in the
engine docstring + `architecture.md § Apply wiring` caveat instead;
test pins the current behaviour so a future engine change lands as a
deliberate breaking change rather than silent drift.

F4 + F6 (LOW) — `apply-engine.ts` docstring no longer points at the
deleted plan (now links to `docs/architecture.md` for durable design);
`apply-engine` added to the `application/` row of the Key Files table
in architecture.md (it was meant to be in that enumeration alongside
the other 14 engines).

Tests: 25 unit tests (8 new — three F1 paths, one F2 repro, one F3
limitation pin, plus existing happy-paths / failure-modes); 41 pass
across the apply path. Boundary kit returns []. Changeset entry
amended with the path-containment + overlap-detection bullets so the
release notes carry the security-relevant fixes.

Triangulated audit doc + the three source agent reviews are NOT
checked in — they served their purpose for this fix-up commit and
removing them avoids stale "review backlog" cruft per docs-governance.

Follow-up (separate PR): the audit also surfaced that
`DEFAULT_EXCLUDE_DIR_NAMES` in src/config.ts doesn't include `.codemap`,
so `audit --base` followed by `--full` walks the audit-cache subtree.
Tracked separately because the gap predates this PR.

* chore(apply): slim comments + sync docs to five conflict reasons

Concise-comments sweep on the apply surface — module docstring goes
from a six-section narrative to three named call-outs (same-line
ambiguity / TOCTOU / EOL); inline comments drop redundant prose where
the next line of code already says it. Net 65 lines removed across
src/ with no behavioural change.

Docs sync: post-fix the engine collects FIVE conflict reasons (added
`path escapes project root` + `duplicate edit on same line` in commit
bdf7ef3), but the agent rule, the published-package agent rule, and
the glossary all still said "three." Updated all three to enumerate
the full set + briefly describe what each new guard rejects.

Touched:
- src/application/apply-engine.ts — slim docstring + 6 inline blocks.
- src/application/apply-engine.test.ts — slim test rationale where
  the assertion already conveyed it.
- src/cli/cmd-apply.ts — collapse two same-branch returns into one
  union; slim Q6/path-derivation comments.
- src/application/tool-handlers.ts — slim handleApply schema/header
  doc to one sentence each.
- .agents/rules/codemap.md + templates/agents/rules/codemap.md +
  docs/glossary.md — three → five conflict reasons + new-guard one-liners.

Tests + typecheck + format + boundary kit all green.

* fix(apply): address CodeRabbit review (7 of 9; 2 already fixed)

Triaged 9 actionable comments via pr-comment-fact-check. Each finding
verified against the source on aaabc13; 2 were already addressed by
the prior commit (CodeRabbit auto-tagged with "✅ Addressed in
aaabc13"); 7 are new fixes here:

F1 (paragraph merge in .agents/rules/codemap.md, partial earlier-fix):
  CodeRabbit's auto-tag was optimistic — the conflict-reason count
  was synced in aaabc13 but the Impact section's tail (`...
  --summary trims …`jq '.summary.nodes'``) was still stitched onto
  the END of the Apply paragraph. Restored the section break.

F3 (after_pattern: "" silently dropped):
  `readString` rejected empty strings, so a deletion-shaped row got
  silently skipped by phase-1's required-keys check. New
  `readStringAllowEmpty` helper for `after_pattern` only — empty
  `before_pattern` still rejected (would match anywhere on the line).
  Regression test deletes a `// FIXME(team): ` prefix.

F4 (cache-key dedup `a.ts` vs `./a.ts`):
  Pre-fix, the cache + pending + seenLines maps used the raw
  `file_path` as their key. Two rows naming the same disk file via
  different spellings created two cache entries → second write
  clobbered the first edit. New `canonicalizeFilePath` collapses
  every spelling to a project-relative form. Symlink-realpath
  defense remains documented as a separate (heavier) follow-up.

F5 (Q2 (c) over-promised on I/O failures):
  CodeRabbit's "🔴 Heavy lift" — a writeFileSync/renameSync mid-loop
  failure leaves files 1..N-1 already renamed with no rollback. Full
  transactional rollback (per-file backups + restore-on-throw) is
  deferred. Honest fix: weakened the Q2 (c) claim in
  `architecture.md § Apply wiring` to "all-or-nothing (semantic) —
  phase-1 conflicts abort phase 2 entirely; phase-2 I/O failures are
  NOT transactional across files." Engine docstring carries the same
  caveat as a fourth call-out.

F6 (TTY check used wrong stream):
  Gate checked `process.stdout.isTTY` but `promptYesNo()` reads from
  `process.stdin` and writes to `process.stderr`. So
  `codemap apply foo | tee log.txt` (interactive stdin, piped stdout)
  was rejected as non-TTY. Now gates on `stdin.isTTY && stderr.isTTY`.

F7 (user-cancel rendered "no rows applicable"):
  Abort path called `emitResult(preview, opts)` with `opts.dryRun ===
  false`, so `renderTerminal` fell through to "no rows applicable" —
  contradicting the user's explicit cancel. Terminal mode now prints
  `apply <id>: aborted by user; no files written.`; JSON consumers
  still get the full preview envelope.

F9 (skill files missed two conflict reasons):
  `.agents/skills/codemap/SKILL.md` + `templates/agents/skills/
  codemap/SKILL.md` apply tool description didn't enumerate the 5
  conflict reasons. Synced.

Tests: 44/44 (3 new — `./a.ts` dedup, deletion via empty
after_pattern, empty-before-still-rejected). Typecheck / lint /
format clean.

* refactor(mergeParams): simplify parameter merging logic

* feat(tier1): R.17 extractor architecture + position-precision substrate

R.17 (`docs/plans/substrate-extraction.md`) per-tier extractor architecture
+ Tier 1 substrate landed together so the substrate-extraction plan's
shared-state patterns (ScopeTracker, ComplexityTracker, ComponentDetector)
ship validated by real consumers, not in isolation.

Architecture (R.17): `src/extractors/` hosts a `TierExtractor` registry
called from a thin `parser.ts` orchestrator via a multiplexed visitor —
10 modules (types + 3 shared trackers + 6 extractors + offsets/jsdoc/
type-stringify helpers) replace the 968-line `parser.ts` monolith;
parser.ts shrinks to 247 lines. 6 collaborating extractors on 3 shared
trackers, chaining handlers on `CallExpression` / `FunctionDeclaration`
/ `VariableDeclaration` + their `:exit` pairs.

Tier 1 substrate: column-precise positions on `calls`, `exports`,
`symbols.name_*`, `markers.column_*`, plus a new `import_specifiers`
child table that splits the `imports.specifiers` JSON blob into typed
rows. SCHEMA_VERSION 10→14. 4 flagship recipes + 4 golden fixtures
(`find-call-sites`, `find-export-sites`, `find-symbol-definitions`,
`find-import-sites`) form a complete identifier-locator family —
foundation for Tier 6's app-wide rename recipe extension.

Empirical cost (clean rebuild, median of 3):
  codemap-self     ~924 files: 11.4→14.3 MB (+25%); ~280→300 ms (+7%)
  merchant-dash  ~2120 files: 37.5→50.1 MB (+33%); ~740→900 ms (+22%)
Targeted reindex flat (~15 ms). Full reindex worst case ~900 ms —
66x under R.10's 1-min pain threshold. DB growth used ~25-33% of
R.9's "~5-10x total budget across 13 tiers."

930/930 tests pass; 19 golden scenarios pass (4 new). Test fixtures
updated in impact-engine.test.ts / mcp-server.test.ts /
resource-handlers.test.ts to match new schema. R.17 architecture
validated end-to-end by Tier 1 consumers — `symbolsExtractor` populates
name_column_*; `callsExtractor` populates line+column; `markers.ts`
populates column_*; orchestrator wires `staticImportSpecifierRows`.

* feat(tier2): scopes + references substrate per R.11/R.13

What landed (SCHEMA_VERSION 14 → 16):

- **`scopes` table** — composite PK `(file_path, local_id)`, WITHOUT ROWID.
  `local_id` is a per-file 0-based counter assigned at parse time so refs
  encode their scope without round-tripping SQLite autoincrement.
  Kinds shipped: module / function / arrow / class / method. Block / for /
  catch deferred (R.11 conservative escape valve covers it).
- **`references` table** — per-identifier-use rows with column-precise
  position, kind (value/type/jsx), enclosing scope, is_write flag.
- **`is_write` per R.13** — writePositions / suppressedReads sets keyed by
  node.start. Pre-marker handlers for AssignmentExpression (simple `=`
  suppresses read), UpdateExpression / UnaryExpression(delete) (dual-emit),
  VariableDeclarator with initializer (write-only), ForOf/In LHS,
  AssignmentPattern.
- **Declaration suppression** — Function/Class/Interface/Type/Enum/Module
  declarations NOT duplicated in references (they live in symbols).
- **Shorthand dedup** — oxc walker visits the SAME Identifier twice for
  `import {foo}` / `export {foo}` / `{foo}` shorthand; dedup by
  (node.start, is_write).
- **`referencesExtractor`** module per R.17 (132 lines); ScopeTracker
  extended with pushKind / currentLocalId / getRecorded / finaliseModule.
- **Recipes:** find-references --params name=X, find-write-sites
  --params name=X + golden fixtures.

Empirical (codemap-self, 925 files):

| Metric         | Tier 1   | Tier 2  | Delta |
| --------------- | --------- | -------- | ------ |
| Full reindex    | ~300 ms   | 767 ms   | +2.5×  |
| Targeted (1 f)  | 8 ms     | 9 ms     | +12%   |
| Rows            | n/a       | 127k refs / 2k scopes | new   |

All within R.9 / R.10 thresholds (<1 min full, <100 ms targeted).
930 tests pass; all golden scenarios pass.

Deferred to Tier 2.1:
- bindings table + pass-2 cross-file resolution (R.12)
- Reference kinds: decorator / shorthand-prop / member-access / spread /
  rest / as-cast / typeof / keyof
- Block / for / catch scope kinds

* feat(tier2.1): bindings substrate + find-symbol-references recipe

What landed (SCHEMA_VERSION 16 → 17):

- `bindings` table — (reference_id, resolved_symbol_id, resolution_kind,
  is_external). PK on reference_id; CASCADE on reference deletion.
  resolution_kind enum: same-file / imported / global / unresolved.
- `symbols.scope_local_id` column — captures the declaring scope (parent
  of the symbol's own body scope). Class members anchor to their class's
  pushed scope. Captured BEFORE any new scope is pushed.
- Pass-2 binding resolver (`src/application/bindings-engine.ts`) — two
  phases: one SELECT per table into in-memory Maps, then per-reference
  resolution via scope-walk → imports → globals → unresolved. ~300ms for
  127k refs on codemap-self.
- Cross-file resolution uses `imports.resolved_path` (dependencies lacks
  the module specifier). Module-scope target symbol picked when the
  target file's exports list matches.
- Full-rebuild only — targeted reindex skips bindings refresh per R.10's
  <100ms contract. Orphan rows CASCADE-cleared on incremental edits.
- find-symbol-references recipe — bindings-precise (filters same-name
  shadows + different-source imports). Golden fixture added.

Also: concise-comments sweep — stripped vintage `Tier 2.1` prefixes from
source; kept forward-deferral notes (`defer to Tier 6`) and R.NN
cross-refs.

Empirical (codemap-self, 932 files):

- Full reindex: 767 ms → 1175 ms (+53%)
- Targeted (1 file): 9 ms → 9 ms (no regression)
- Bindings distribution: 33% same-file / 17% imported / 4% global /
  45% unresolved (mostly TS type params + function params, future tiers)

930 tests pass; all golden scenarios pass.

Deferred to Tier 2.2:
- Re-export chain walking
- Function-parameter symbols
- Type-parameter symbols

* feat(tier2.2): function/type params + re-export chain walking

What landed:

- Function/method/arrow parameter symbols (kind='param') with
  scope_local_id = function's own scope. TSParameterProperty
  (constructor `public foo: T`) emits at class scope.
- Type parameter symbols (kind='type-param') for FunctionDeclaration,
  ClassDeclaration, arrow vars, and class methods. Interfaces and type
  aliases deferred — they don't push their own scope.
- Re-export chain walking in bindings-engine — bounded at 10 hops with
  cycle detection. `export { foo } from './bar'` now resolves to the
  original definition. Path resolution is relative-only against the
  indexed-paths set.
- pushParams / pushTypeParams helpers in src/extractors/params.ts.

Empirical (codemap-self, 933 files):

| Metric           | Pre    | Post   | Delta             |
| ----- | ---- | ---- | ---------------- |
| Symbols           | ~11.8k | 14k    | +2.2k             |
| Same-file refs    | 42257  | 51299  | +9042 (+21%)      |
| Unresolved refs   | 58073  | 49534  | -8539 (-15%)      |
| Unresolved %      | 45%    | 39%    | down              |
| Full reindex      | 1175ms | 1513ms | +29%              |
| Targeted (1 file) | 9ms    | 9ms    | no regression     |

930 tests pass; all golden scenarios pass (index-summary rebaselined
to reflect new param/type-param rows).

Deferred to Tier 2.3:
- Destructuring pattern params ({a,b}, [a,b])
- Interface/type-alias type-param scoping
- Callback arrow scoping
- External-module bindings via .d.ts

* feat(tier2.3): member-kind refs + destructuring + type globals (Tier 2 close)

What landed (SCHEMA_VERSION 17 → 18):

- kind='member' for non-computed property access (obj.foo). Bindings
  resolver skips these. Single biggest unresolved-bucket cut (~50%).
- Object-literal / class-member key suppression (long-hand Property,
  MethodDefinition, PropertyDefinition, TSPropertySignature,
  TSMethodSignature). Shorthand and computed still emit normally.
- Destructuring pattern bindings — walkPattern generator handles
  Identifier / AssignmentPattern / RestElement / ObjectPattern /
  ArrayPattern / TSParameterProperty recursively. Same helper for
  function params and variable destructuring (`const { a, b } = obj`).
- TYPE_GLOBALS set in bindings-engine — TS built-ins (Record, Partial,
  ReadonlyArray, Map, etc.) resolve to global instead of unresolved.
- Extra value globals: performance, import, require, module, exports,
  __dirname, __filename, self.
- `as const` skip: TSTypeReference name=const no longer emitted.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| `kind='member'`   | 0          | 26701      | new             |
| Bindings rows     | 127k       | 84k        | -34%            |
| Unresolved        | 49534      | 4634       | -90%            |
| Unresolved %      | 39%        | 5.5%       | -34 pts         |
| Full reindex      | 1513ms     | 1025ms     | -32%            |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Tier 2 closed. Remaining 5.5% is dominated by callback arrow params
(s, r, e, etc.) which need structural arrow scoping — deferred as
separate post-Tier-2 work.

930 tests pass; all goldens pass.

* feat(tier2.4): arrow + catch scoping (Tier 2 truly closed at 1.3%)

What landed:

- claimedScopeNodes WeakSet<object> on ExtractContext. Every extractor
  that pushes scope for a specific AST node marks the node here so
  downstream extractors don't double-push.
- ArrowFunctionExpression handler in scopesExtractor — for callback
  arrows (not claimed by VariableDeclaration), pushes anonymous arrow
  scope + emits params. Named arrows stay claimed and don't double-push.
- CatchClause handler — try/catch param scoped to catch body scope.
  Bindingless catch (TS 4.4+) handled.
- ScopeTracker.currentParent walks past anonymous scopes (empty-name)
  so parent_name of nested symbols anchors to the nearest named owner.
- Extra globals: Bun, Deno.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| Same-file         | 51972      | 55480      | +6.7%           |
| Unresolved        | 4634       | 1102       | -76%            |
| Unresolved %      | 5.5%       | 1.3%       | -4.2 pts        |
| Full reindex      | 1025ms     | 1224ms     | +19%            |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Tier 2 closed at 1.3% unresolved. Remaining is unindexable
(infer T, audit-cache re-indexes, edge cases).

930 tests pass; all goldens pass.

* feat(tier2.5): interface/type-alias + for-of/in body scoping

What landed (SCHEMA_VERSION 18 → 19):

- scopes.kind enum extended: interface, type-alias, for, catch.
- TSInterfaceDeclaration / TSTypeAliasDeclaration push their own scope
  so type-params resolve via the standard walk. Type-param symbols
  emitted at the new scope (was: emitted at parent scope, causing
  same-letter collisions across interfaces).
- ForOfStatement / ForInStatement push a 'for' scope; VariableDeclaration
  / pattern in `left` emits bindings at the for-scope so body refs
  resolve to the loop variable, not the enclosing function.
- CatchClause kind correctly tagged as 'catch' (was 'function').
- Added value globals: RegExp, Iterator, AsyncIterator.

Empirical (codemap-self, 933 files):

| Metric            | Pre        | Post       | Delta           |
| ----- | ----- | ----- | ------------- |
| Same-file         | 55480      | 55524      | +44             |
| Global            | 6019       | 6034       | +15             |
| Unresolved        | 1102       | 1119       | +17 (noise)     |
| Unresolved %      | 1.30%      | 1.32%      | flat            |
| Full reindex      | 1224ms     | 1249ms     | +2%             |
| Targeted (1 file) | 9ms        | 9ms        | no regression   |

Net flat on the unresolved bucket — the value is structural (interface
type-params + for-loop body bindings now have correct scope graphs),
which unblocks future precision wins.

930 tests pass; all goldens pass.

* feat(tier11): per-symbol metrics + file_metrics aggregate

What landed (SCHEMA_VERSION 19 → 20):

- symbols: body_line_count, param_count, nesting_depth columns
  (nesting_depth deferred; needs a separate tracker — pushed as NULL).
- file_metrics table: one row per indexed TS/JS file with total_lines,
  code_lines, blank_lines, comment_lines, function_count, class_count,
  interface_count, export_count. let/const/var/arrow distinguished
  deferred (parser doesn't track keyword variant on VariableDeclaration).
- Per-file metrics computed in parser.ts orchestrator from existing
  ctx data + lineMap (no extra walk).
- Recipe: `large-functions` — body_line_count ≥ 50 ranked by size.
  Golden fixture added.

Empirical (codemap-self, 933 files):

- 446 file_metrics rows (TS/JS files only; CSS/markdown indexed
  separately don't go through extractFileData).
- Top function: extractFileData at 511 body lines, 3 params.
- Top file: src/cli/cmd-query.ts at 1411 lines, 19 functions, 10 exports.

930 tests pass; all goldens pass.

* feat(tier12): module_cycles table via Tarjan's SCC

What landed (SCHEMA_VERSION 20 → 21):

- module_cycles table — (file_path PK, cycle_id, cycle_size). Only
  cyclic files appear; non-cyclic files have no row.
- src/application/cycles-engine.ts — iterative Tarjan's SCC over the
  dependencies graph. O(V+E). Runs once per full rebuild after
  bindings resolution.
- Recipe: `circular-imports` — every file in a cycle, grouped by
  cycle_id. Golden fixture detects the fixture's store ↔ cache cycle.

Empirical (codemap-self, 936 files):

- Full reindex: 1224ms → 1171ms (essentially flat — Tarjan is fast).
- 3 cycles in the indexed DB (all the fixture cycle, appearing once in
  current source + twice in audit-cache copies of the same fixture).

930 tests pass; all goldens pass.

* feat(tier6): re_export_chains materialised table

What landed (SCHEMA_VERSION 21 → 22):

- re_export_chains table — (from_file, from_name) PK, (to_file, to_name,
  hops, truncated). Only re-export entries are materialised; direct
  exports don't appear.
- resolveReExportChains + persistReExportChains in bindings-engine.
  Reuses the same chain-walker bindings-engine uses for resolution.
- Recipe: barrel-chains — every chain ordered by hops DESC. Golden
  fixture covers the minimal fixture's shop barrel.

Empirical (codemap-self, 938 files):

- 106 chains materialised (mostly internal barrels + audit-cache copies).
- 0 truncated.
- Full reindex: 1171ms → 1104ms (no measurable cost — same loops bindings already runs).

930 tests pass; all goldens pass.

* feat(tier4): function_params first-class table

What landed (SCHEMA_VERSION 22 → 23):

- function_params table — one row per leaf parameter, ordered by
  position. Keyed by (file_path, owner_name, owner_kind) to
  disambiguate same-name functions vs methods.
- Columns: position, name, type_text (stringified annotation),
  default_text (raw source of default expr), is_rest, is_optional,
  + column-precise position.
- pushParams in src/extractors/params.ts extended to emit
  function_params rows alongside the existing kind='param' symbol
  rows. ownerKind passed by caller (function/method/etc).
- Recipe: find-by-param-type --params type_text=X — every fn taking
  a param with exact type annotation match. Golden fixture covers
  createClient(config: ClientConfig).

Empirical (codemap-self, 940 files):

- 2257 function_params rows.
- Full reindex: 1104ms → 1201ms (+9%).
- Symbols.kind='param' count unchanged — parallel emission.

930 tests pass; all goldens pass.

* feat(tier11.5): nesting_depth tracker (Tier 11 close)

What landed:

- ComplexityTracker extended with enterNest/exitNest. Frame tracks
  currentDepth + maxDepth alongside cyclomatic count; popTop writes
  maxDepth to symbol.nesting_depth.
- complexityExtractor: IfStatement/While/DoWhile/For/ForIn/ForOf/
  ConditionalExpression/CatchClause now have enter+exit handlers
  that bump nesting. SwitchCase + LogicalExpression remain
  cyclomatic-only (flat decision points, not depth).
- Recipe: deeply-nested-functions — depth >= 4 ranked by depth then
  complexity. Golden fixture added.
- large-functions recipe SELECT extended to include nesting_depth.

Empirical:

- codemap-self (943 files): 622 fns at depth 0, 441 at 1, 331 at 2,
  ... 3 fns at depth 9 (parseArgs in arg-handling scripts).
- merchant-dashboard-v2 (2120 files): top finds include `createStream`
  (depth 9, gen'd), `main` in provision-vars.ts (depth 6, complexity 84,
  488 lines — real refactor target).

930 tests pass; all goldens pass. Tier 11 fully closed.

* feat(refs): JSX intrinsics + DOM/React globals + TS qualified names

What landed:

- JSX intrinsics suppressed: lowercase tags (div/span/h1/etc.) +
  every JSXAttribute name (className/onClick/value/etc.) +
  JSXMemberExpression .property (`<Foo.Bar />` Bar).
- TSQualifiedName handler — `React.ReactNode` / `A.B.C` in type
  position now emits the namespace head as kind='type' and the
  member tail as kind='member'.
- TYPE_GLOBALS extended: DOM elements (HTMLDivElement, SVGSVGElement,
  etc.), DOM events (MouseEvent, KeyboardEvent, PointerEvent, …),
  Web APIs (RequestInit, IntersectionObserver, …), React ambient
  types (ReactNode, ComponentProps, Dispatch, SetStateAction, FC,
  CSSProperties, HTMLAttributes, etc.).
- Value GLOBALS extended: React namespace + constructor-callable
  Web APIs (new IntersectionObserver, new FormData, new URL, etc.).

Empirical:

- codemap-self: 1.32% → 1.25% unresolved (essentially flat — already
  near floor; small wins from DOM event types).
- merchant-dashboard-v2 (2120 files): **17.1% → 1.75%** unresolved.
  Same-file +18% (more refs resolve precisely).
- Full reindex on dashboard: 3997ms → 3706ms (slight gain from less
  unresolved binding work).

930 tests + all goldens pass.

* fix(indexer): exclude .codemap/audit-cache from default glob

Audit worktrees under .codemap/audit-cache/<sha>/ are full project
snapshots used by `codemap audit --base <ref>`. They were being
indexed alongside live source, multiplying every row by however
many audits had been run.

On codemap-self this dropped files indexed from 944 → 347 and full
reindex from 1273ms → 595ms (~54% faster). Cycle / re-export-chain
queries no longer duplicate findings across snapshot copies.

Surgical fix: added 'audit-cache' as a dir-name to
DEFAULT_EXCLUDE_DIR_NAMES. Recipes / config / index.db itself stay
indexable (project-local recipes ARE part of the workflow).

930 tests pass; all goldens pass.

* feat(tier5): runtime_markers + find-leftover-console + env-var-audit

What landed (SCHEMA_VERSION 23 → 24):

- runtime_markers table — (kind, detail, line, column, scope) for
  every console.* / debugger / throw / process.env access.
  kind enum: 'console' / 'debugger' / 'throw' / 'process-env'.
  detail: method name for console, env-var name for process-env,
  truncated thrown expression text for throw, NULL for debugger.
- runtime-markers extractor — matches MemberExpression on
  console.X and process.env.X, DebuggerStatement, ThrowStatement.
  Throw expr text capped at 200 chars to keep rows scannable.
- Recipes: find-leftover-console (all console calls),
  env-var-audit (env vars ranked by use + file fan-out).
- Goldens for both.

Real-world (merchant-dashboard-v2, 2120 files):

- 223 console calls (155 .log, 51 .error, 11 .warn)
- 114 throw statements
- 17 process.env reads dominated by NODE_ENV (10), plus single-use
  config (Sentry, AI chat, API_URL) — audit candidates.

930 tests pass; all goldens pass.

* feat(tier9): test_suites + find-skipped-tests + tests-by-file

What landed (SCHEMA_VERSION 24 → 25):

- test_suites table — (file_path, name, kind, line_start, line_end,
  parent_suite_id, is_skipped, is_only, is_todo, framework). Captures
  every describe / it / test / suite / context block.
- Framework detection per file from imports: vitest / jest /
  bun-test / node-test / mocha / unknown (mocha-style globals).
- src/extractors/tests.ts — parses callee shape (Identifier or
  `.skip`/`.only`/`.todo` MemberExpression), extracts name from
  StringLiteral / TemplateLiteral first arg. Tracks parent stack to
  resolve parent_suite_id. `.each` collapses to one row per template
  (parametrised expansion is a runtime concern).
- Recipes: find-skipped-tests (flags .skip / .only / .todo, with
  status column) + tests-by-file (per-file roll-up). Goldens added.

Real-world (merchant-dashboard-v2, 2120 files):

- 1404 it + 246 test + 414 describe (bun-test) + 132 it /
  45 describe (vitest). Mixed-framework codebase detected
  per-file from imports.
- 0 skipped/only/todo blocks — disciplined test suite.

930 tests pass; all goldens pass.

* fix(golden): add ORDER BY line_number to markers-notes-todo query

Local macOS and CI Linux returned the same 4 markers in different
implementation-defined order without ORDER BY. Fix is to make the
query deterministic. Also bumps markers-all-kinds golden (counts
reflect the audit-cache exclusion from 9b5f9a2).

* fix(refs+calls+scopes+symbols): address CodeRabbit review on #79

10 of 12 unresolved threads applied (1 deferred to feat/codemap-apply,
1 pushed back as deliberate semantics — see PR replies).

Applied:

- index-engine.ts: add missing `if (parsed.fileMetrics)` guard on the
  incremental path to mirror the full-rebuild guard at line 259.
- calls.ts: recursive member-expression flatten. `obj.foo.bar()` now
  emits `obj.foo.bar` instead of being dropped. Computed segments
  (`a[i].b()`) still abort flattening — recipe queries filter on
  dot-joined identifier shape that computed breaks. Empirically: 79
  new chain rows on codemap-self.
- components.ts: scope tracker → stack so nested PascalCase functions
  (`function Outer() { function Inner() {…} }`) preserve Outer's
  attribution after Inner exits.
- references.ts: AssignmentExpression LHS walker handles
  ObjectPattern / ArrayPattern / AssignmentPattern / RestElement,
  marking every leaf Identifier as write (and suppressing read for
  simple `=`).
- scopes.ts: anonymous scopes use `$anon_<localId>` segment in
  `scopeStr` so two sibling callbacks don't share `caller_scope` and
  dedup as one edge in `calls`.
- symbols.ts: per-declarator spans (decl.start/.end) instead of the
  whole VariableDeclaration so `body_line_count` doesn't inflate on
  `const a = (…) => long, b = (…) => longer`.
- type-stringify.ts: recursive `qualifiedNameOf` walks
  TSQualifiedName chains so `A.B.C` doesn't truncate to
  `undefined.C`.
- find-export-sites.md: rename `callee-token-precise` →
  `export-name-token-precise` (copy-paste from find-call-sites).
- find-skipped-tests.sql: prioritise `.only` over `.skip` in the
  status CASE (a row with both flags hides the higher-risk one).
- large-functions.md: mention `LIMIT 50` in the description.

930 tests pass; all goldens pass.

* fix(bindings+stats+markers): address CodeRabbit nitpicks on #79

5 of 6 nitpicks applied (1 declined — see PR reply).

Applied:

- `IndexTableStats` + `fetchTableStats` extended to surface the 10 new
  tables in the post-index summary (scopes, references, bindings,
  import_specifiers, function_params, runtime_markers, test_suites,
  re_export_chains, module_cycles, file_metrics). Empty-stats
  initialiser bumped to match.
- Extracted `parseReExportSource(raw, fallbackName)` helper —
  resolveBindings + resolveReExportChains shared a 16-line block; any
  future `.default` encoding tweak now lives in one place.
- JSDoc one-liners on the two side-effecting bindings-engine exports
  (`persistBindings` orphan-clear, `persistReExportChains` truncate +
  rewrite). resolveX functions left bare — names are self-evident per
  concise-comments.
- Deduped TYPE_GLOBALS (4 dupes: Window, Document, HTMLElement, Event)
  and GLOBALS (12 dupes: Number, FileReader, Image, FormData,
  AbortController, Headers, Request, Response, Blob, File, URL,
  URLSearchParams). Kept reserved keywords (`undefined`, `null`, `this`,
  etc.) — `undefined` IS a valid Identifier in JS; the others are cheap
  defensive entries with no false-positive cost.
- runtime_markers gained `column_end` column (was unused emit param).
  SCHEMA_VERSION 25 → 26. Editor-highlight recipes can now span the
  whole `console.X` / `process.env.X` token.
- Research note `codemap-richer-index-synthesis-2026-05.md` Path B
  sketch gained an "Unvalidated illustration" disclaimer.

Declined: none — all nitpicks were genuine cheap wins. See PR for
the one applied selectively (JSDoc on side-effecting persist*; resolve*
left bare).

930 tests pass; all goldens pass.

* docs: sync architecture / glossary / roadmap / golden-queries with substrate

The Tier 2-12 substrate landed (commits a1a17ba…aa18b13) but reference
docs still listed the pre-substrate 12 tables. This catches them up.

- architecture.md § Schema — added 10 new table sections (scopes,
  references, bindings, import_specifiers, function_params, file_metrics,
  re_export_chains, module_cycles, runtime_markers, test_suites) plus
  new columns on existing tables (symbols: scope_local_id /
  name_column_* / body_line_count / param_count / nesting_depth;
  calls: line_start / column_*; exports: line_* / column_* / is_re_export;
  markers: column_*).
- glossary.md — entries for bindings, scopes, references,
  function_params, file_metrics, re_export_chains, module_cycles,
  runtime_markers, test_suites, import_specifiers (alphabetic slots).
- roadmap.md Moat B example list extended with the new table names so
  the "what recipe does dropping this kill?" reviewer test has the full
  surface to defend.
- golden-queries.md status table — scenario-coverage description lists
  all current indexed tables instead of just the original 12.

930 tests pass; all goldens pass.

* templates(agents): sync user-facing rule + skill with substrate

`codemap agents init` ships these files; they're what end-user agents
read. Before this commit, neither mentioned a single new table — agents
installed in downstream projects would have no idea the 10 new
substrate tables existed.

- templates/agents/rules/codemap.md
  - Indexing-summary sentence lists the new tables.
  - Trigger patterns gained 12 new rows: find-references,
    find-symbol-references, find-write-sites, find-by-param-type,
    circular-imports, barrel-chains, find-leftover-console,
    env-var-audit, find-skipped-tests, tests-by-file, large-functions,
    deeply-nested-functions, nesting_depth lookup.
  - Quick reference queries gained 9 substrate-table SQL examples.

- templates/agents/skills/codemap/SKILL.md
  - Added schema sections (matching the .agents/skills/codemap/SKILL.md
    + docs/architecture.md format) for: import_specifiers, scopes,
    references, bindings, function_params, file_metrics,
    re_export_chains, module_cycles, runtime_markers, test_suites.

930 tests pass; all goldens pass.

* chore(deps): bump deps to latest + dedupe zod across MCP SDK

zod 4.4 reorganized internals; @modelcontextprotocol/sdk pulled its own
4.3 copy via dep range, so `zod/v4/core` resolved to two different
$ZodType identities and TS rejected our registerTool inputSchemas. Pin
both via a `"zod": "$zod"` override so a single 4.4.3 is shared.

* chore(deps): drop ip-address override (no longer load-bearing)

express-rate-limit@8.5.2 bumped its ip-address constraint to ^10.2.0,
so a fresh resolve naturally picks the patched version without help.
Same fresh resolve also retires the unrelated fast-uri / hono CI audit
findings (ajv 8.20 + @hono/node-server 1.19.14 → patched transitives).

bun audit: "No vulnerabilities found".

* fix(build): declare unrun as devDependency for tsdown 0.22

tsdown@0.22.0 moved unrun from `dependencies` to `peerDependencies`
(constraint `*`). Locally `bun run build` works because Bun's run
intercepts the `#!/usr/bin/env node` shebang and executes tsdown under
Bun (autoLoader → "native", unrun path skipped). In CI the same script
ends up executing the binstub under Node, where `process.versions.bun`
is undefined → autoLoader resolves to "unrun" → import fails.

Declaring unrun (^0.3.0) as a devDependency satisfies the peer for the
Node code path. Verified `node node_modules/tsdown/dist/run.mjs`
succeeds locally.

* feat(cli): add `codemap skill` and `codemap rule` content-serving verbs

Foundation for the thin-pointer plan: pointer files written by
`agents init` can redirect agents here so future codemap upgrades carry
today's content without re-running init. This commit only adds the
fetch surface — templates still ship full content; pointers come in a
later bullet.

`codemap skill` → bundled templates/agents/skills/codemap/SKILL.md
`codemap rule`  → bundled templates/agents/rules/codemap.md

* feat(mcp): add codemap://rule resource (mirror of codemap://skill)

Pairs with the new \`codemap rule\` CLI verb so MCP-native agents can
fetch the bundled rule markdown over JSON-RPC just like the skill.
Same lazy-cache pattern as skill — neither changes mid-session.

Updates resources/list, the listResources() catalog, mcp + serve help
text, and the corresponding mcp-server / http-server tests.

* feat(skill): thin pointer template + server-side section assembler

Consumer-disk SKILL.md (written by \`agents init\`) is now a ~16-line
pointer with rich trigger frontmatter; full content lives at
templates/agent-content/skill/00-full.md inside the package and is
assembled live by \`codemap skill\` / \`codemap://skill\`. Package
upgrades carry today's reference automatically — no
\`agents init --force\` needed unless the pointer protocol itself
changes.

Section files concatenate in lexical order (joined by blank line), so
later bullets drop \`*.gen.md\` sections (recipes, schema) alongside
hand-written prose without touching the assembler.

* feat(rule): thin pointer template + assembler covers both kinds

Mirrors the skill change from c9889ac for the rule side: full content
moves to templates/agent-content/rule/00-full.md inside the package;
consumer-disk templates/agents/rules/codemap.md becomes a ~22-line
pointer with the alwaysApply frontmatter and the CLI/MCP fetch links.

assembleSkill() / assembleRule() collapse into one generic
assembleAgentContent(kind) — same lexical-section-order semantics as
the skill side, ready for *.gen.md additions in later bullets.

* feat(skill+rule): pointer-version stamp + once-per-process staleness nag

EXPECTED_POINTER_VERSION constant lets the binary detect when a
consumer's \`.agents/{skills/codemap/SKILL,rules/codemap}.md\` is from
an older pointer protocol than this codemap expects, and print a
single stderr nag at process start telling them to re-run
\`codemap agents init --force\`. Should fire roughly once a year, not
per release — bump the constant only when the pointer SHAPE changes,
not when the content served by \`codemap skill\` / \`codemap rule\`
changes.

Heuristic for unstamped files: nag only when the file is >50 lines
(pre-pointer legacy template); short user-managed overrides stay
silent. Warning goes to stderr so \`codemap skill > file.md\` stays
clean.

Eats own dog food: \`.agents/skills/codemap/SKILL.md\` and
\`.agents/rules/codemap.md\` regenerated as v1 pointers via
\`agents init --force\`.

* feat(skill): auto-generated recipe catalog section

\`templates/agent-content/skill/10-recipes.gen.md\` becomes a placeholder
file that triggers \`renderRecipesSection()\` at fetch time —
\`codemap skill\` (and \`codemap://skill\`) now embeds a live markdown
table of every recipe id, source (bundled / project), params, and
description, regenerated from \`listQueryRecipeCatalog()\` per call.

Adding a recipe to \`templates/recipes/\` (or a project-local one to
\`.codemap/recipes/\` when the runtime is up) now surfaces in the skill
output automatically — no template edit, no \`agents init\` re-run.

\`*.gen.md\` filenames preserve lexical section order; renderer dispatch
is keyed by \`<kind>/<filename>\` in the RENDERERS map. Fallback prose
in the placeholder file ships for environments where the renderer is
unreachable (today: none — generator runs in-process).

* feat(skill): auto-generated schema DDL section

\`templates/agent-content/skill/20-schema.gen.md\` triggers
\`renderSchemaSection()\` at fetch time — opens an in-memory SQLite,
runs the same \`createTables()\` codemap uses for real indexes, reads
\`sqlite_schema\`, and emits each table's CREATE TABLE statement as a
markdown subsection.

Adding a column / table in \`src/db.ts\` automatically surfaces in
\`codemap skill\` output — no separate schema doc to keep in sync.
Result memoized at module scope since the DDL is static per process.

* feat(skill): split hand-written half into named sections; dedupe generated content

Plan parity: the monolithic 660-line \`00-full.md\` is replaced by named
sections matching the screenshot layout (\`00-overview\`, \`10-recipes-
context\`, \`20-recipes.gen\` (renumbered from 10-), \`30-schema.gen\`
(renumbered from 20-), \`40-query-patterns\`, \`50-maintenance\`,
\`90-troubleshooting\`). Section order is lexical so renumbering = file
rename; no code change.

The hand-written enumeration of recipe ids and the entire per-table
schema description block are gone — \`20-recipes.gen.md\` and
\`30-schema.gen.md\` cover that ground exhaustively. Net skill output
drops from 1156 → ~850 lines while gaining drift-free recipes + schema.

RENDERERS map keys updated for the renumbered generated sections.

* docs: sync agent-content pointer pattern across README + docs

Updates the canonical agent docs to reflect bullets 1–8:

- docs/agents.md gains the templates/agents/ vs templates/agent-content/
  split, the Live fetch surface table (CLI / MCP / HTTP), the section
  assembler + *.gen.md mechanism, the pointer protocol + staleness
  detection contract, and an updated maintenance-discipline note.
- docs/architecture.md Key Files table adds cmd-skill.ts +
  agent-content.ts rows; MCP wiring mentions codemap://rule and points
  at the assembler; Agent templates line summarises the pointer +
  staleness model.
- docs/packaging.md notes the templates/agents/ vs templates/agent-content/
  split in the published surface.
- docs/README.md Rule 10 retired the "two-copy" obligation (auto-flows
  via *.gen.md + single source for narrative in agent-content/skill/),
  doc-index row + duplication-prevention row updated.
- README.md (root) version-matched-guidance paragraph rewritten;
  resource list mentions codemap://rule.

Per docs-governance: user-visible behavior change required doc sync.

* docs(templates): fact-check templates/ against actual codemap surface

Audit pass after the pointer rewrite landed:

- agent-content/rule/00-full.md: \`codemap mcp\` line now lists
  \`[--no-watch] [--debounce <ms>]\` like serve does; "six resources"
  → "seven" with codemap://rule added to the lazy-cache list and the
  resources bullet; \`.agents/skills/codemap/SKILL.md\` reference
  rewritten to point at \`codemap skill\` / \`codemap://skill\` (the
  on-disk file is now a thin pointer); "Bundled rules/skills" + "Generic
  defaults" paragraphs rewritten to reflect the live-serve model.
- agents/README.md rewritten to explain the pointer pattern, the
  templates/agents/ vs templates/agent-content/ split, and how to drop
  project-specific rules / skills alongside the bundled pointers.
- recipes/find-symbol-references.md: dev-mode \`bun src/index.ts --full\`
  swapped for consumer-facing \`codemap --full\` (this file ships to npm).

Recipe-id surface verified: \`codemap query --recipes-json\` lists the
same 40 ids as the on-disk \`.sql\` files; outcome aliases
(\`dead-code\` / \`deprecated\` / \`boundaries\` / \`hotspots\` /
\`coverage-gaps\`) all responded to \`--help\`.

* docs: surface parity — skill/rule served by CLI + MCP + HTTP everywhere

You called out that several places said \`codemap://skill\` / \`codemap://rule\`
are MCP-only — they're also exposed by \`codemap serve\` over HTTP at
\`GET /resources/{encoded-uri}\` (same shared \`readResource()\` handler).
Fixes:

- templates/agents/rules/codemap.md gains an HTTP bullet next to CLI + MCP.
- templates/agent-content/rule/00-full.md both "(MCP)" mentions now read
  "(MCP) and HTTP via codemap serve" with the explicit URL pattern.
- templates/agent-content/skill/10-recipes-context.md Resources block
  clarifies the HTTP mirror up front so the bulleted URIs apply to both.
- docs/agents.md Live fetch surface table fills the HTTP row for BOTH
  Skill and Rule (no longer "for either URI" with an em-dash); also
  updates the "All three resolve to..." paragraph to note MCP+HTTP share
  the resource cache.
- docs/glossary.md MCP-server entry: six → seven resources, codemap://rule
  added, "same handlers serve codemap serve (HTTP)" expanded to mention
  the resource handler path explicitly.
- docs/README.md duplication-prevention bullet (Rule 10 narrative-changes
  branch) names all three transports.

CLI / MCP / HTTP all go through assembleAgentContent(kind) — there is no
MCP-only or HTTP-only path. Docs now reflect that contract everywhere.

* feat(rule): trim to ~100 lines — priming-only; reference moves to skill

The rule is loaded into agent context **every turn** while the skill
loads only on trigger, so the rule should carry priming, not reference.
The pre-trim rule had grown to 248 lines (CLI command table, recipe
metadata, audit / apply / baselines narrative, MCP details, …) — all
useful, none of it priming.

Trimmed rule keeps the load-bearing surface:
- STOP banner + one-paragraph indexed-tables overview
- "How to query" (CLI usage + DISTINCT footgun + failure contract)
- Trigger patterns table — the actual priming surface (30 rows)
- Top-11 quick reference queries (foundational tables)
- "When Grep is appropriate" (3 bullets)
- "Keeping the index fresh" (3 commands)
- Pointer to \`codemap skill\` / \`codemap://skill\` / HTTP for full ref

Net: 248 → 102 lines, ~59% cut → ~900 tokens saved per turn × every
session. The dropped content was already in the skill (CLI table /
recipe context / MCP narrative live in 10-recipes-context.md;
maintenance lives in 50-maintenance.md), so this is dedupe rather than
loss — agents fetch the skill once-per-need instead of paying the
weight every turn.

Substrate-table example queries (references / bindings / function_params /
runtime_markers / test_suites / module_cycles / re_export_chains /
coverage) were only in the rule's quick-reference table; they now live
in the skill's 40-query-patterns.md as a new "Substrate tables" section
so the deep reference still carries them.

* docs(templates): trim verbosity + remove npm-unreachable references

Two passes in one commit:

(1) Trim narrative bloat in the skill / templates README without losing
vital info:

- skill/00-overview.md: dropped the dated "edit your copy" advice (with
  the pointer pattern, consumers add SIBLING files instead of editing
  pointers); merged CLI invocation + dlx alternatives into a single
  bash block; tightened the output contract from 4 bullets to 4 bullets
  with less waffle. 31 → 27 lines.
- skill/10-recipes-context.md: collapsed the Tools section from 13
  multi-line paragraphs (each duplicating param-by-param detail that
  lives in \`codemap <verb> --help\`) to 13 one-line existence +
  transport notes pointing back at \`--help\`. Same for the Resources
  section. Vital info preserved: tool / resource existence, args
  shape, MCP-only / live-cache flags; lost: param walkthroughs already
  in \`--help\` output.
- templates/agents/README.md: rewrote to focus on what consumers need
  (pointer pattern, how to extend with sibling files) instead of repo
  contributor framing. 13 → 7 lines.

(2) Remove markdown links + prose path references that escape the npm
tarball (\`templates/\` is published; \`docs/\`, \`src/\`, \`.github/\`
are NOT — links to those 404 for consumers):

- templates/agents/README.md: \`[docs/agents.md](../../docs/agents.md)\`
  → GitHub-anchored URL (reachable for both npm + GitHub readers).
- templates/recipes/refactor-risk-ranking.md: dropped
  "(docs/research/non-goals-reassessment-2026-05.md § 1.4)" — kept the
  per-symbol-vs-per-file rationale.
- templates/recipes/untested-and-dead.md: dropped "D11 of
  docs/plans/coverage-ingestion.md" — kept "Known v1 limitation".
- templates/recipes/files-by-coverage.md: dropped "per D2 of
  docs/plans/coverage-ingestion.md" — kept the GROUP BY index note.
- templates/recipes/high-complexity-untested.md: \`src/parser.ts\` →
  "Codemap's parser walker"; contributor-only "Refactor the
  MethodDefinition visitor" advice → file-an-issue note.
- templates/recipes/boundary-violations.md: \`src/config.ts\` → "Codemap's
  Zod config schema (the \`boundaries\` array on the user-config object)".

Audit-grep clean: no remaining \`docs/...\` or repo-internal source-file
references in templates/; only the explicit GitHub URL remains and is
intentional.

* chore: add changeset for agent-content pointer pattern (minor)

* docs: correct \`codemap://recipes\` caching claim across all 3 surfaces

CodeRabbit caught a real drift: my pointer-pattern doc-sync commits
(c077a68, 3eabdb7) grouped \`codemap://recipes\` and \`codemap://recipes/{id}\`
under "lazy-cached" wording in three places, contradicting the in-code
truth that resource-handlers.ts has \`schemaCache\` / \`skillCache\` /
\`ruleCache\` but deliberately NO cache for recipes (the inline
\`last_run_at\` / \`run_count\` recency fields would freeze at first-read).

Fixed in:
- docs/architecture.md § MCP wiring — split the Resources sentence by
  freshness contract (lazy-cached vs live read-per-call).
- docs/glossary.md MCP entry — same split; recipes move to the
  live-read group with a one-clause explanation.
- README.md MCP block comment — list lazy-cached + live URIs separately.
- src/application/resource-handlers.ts module docstring — was "catalog-
  style resources cache lazily" (misleading: recipes ARE catalog-shaped
  but NOT cached); rewritten to enumerate each cache class.

In-code source of truth (the comment above the \`let schemaCache\` /
\`skillCache\` / \`ruleCache\` declarations) was already correct.

* docs(skill-gen): add HTTP route to fallback prose in recipes/schema placeholders

CodeRabbit nitpick (run 99f80eff): the placeholder bodies in
20-recipes.gen.md + 30-schema.gen.md only mentioned CLI + MCP fetch
paths, not HTTP. The bodies are normally replaced at fetch time by the
RENDERERS map, so end-users rarely see them — but contributors reading
the files on disk should see the same CLI/MCP/HTTP parity the rest of
the PR's docs carry.

Both files now list \`codemap query\` (CLI), MCP resource URI, and
\`GET /resources/{encoded-uri}\` against \`codemap serve\` (HTTP).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant