Skip to content

perf(db): drop redundant idx_edges_source and idx_edges_target#15

Open
mschreib28 wants to merge 2 commits into
mainfrom
upstream/perf/drop-redundant-edge-indexes
Open

perf(db): drop redundant idx_edges_source and idx_edges_target#15
mschreib28 wants to merge 2 commits into
mainfrom
upstream/perf/drop-redundant-edge-indexes

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nDrops two redundant secondary indexes from the edges table:\n\n- idx_edges_source — fully covered by idx_edges_source_kind via SQLite's left-prefix scan.\n- idx_edges_target — fully covered by idx_edges_target_kind.\n\nThese indexes pre-date the (source, kind) / (target, kind) composites added later. Once the wider indexes were introduced, the narrow ones became dead weight on every write and every disk page — but they were never removed.\n\n## Empirical validation\n\nMeasured on a synthesized 50K-node / 250K-edge SQLite database that mirrors a realistic mid-size codebase. Reproducer: scripts/spikes/spike-edge-indexes.mjs.\n\n| Metric | Baseline (with redundant) | Stripped | Δ |\n|---|---|---|---|\n| DB size | 34.7 MB | 27.0 MB | -22.2% |\n| Bulk insert (250K edges) | 590 ms | 431 ms | 1.37× faster |\n| WHERE source = ? latency | reference | 0.88× | no regression |\n| WHERE target = ? latency | reference | 0.95× | no regression |\n\nEXPLAIN output on the stripped DB confirms the kind-prefixed indexes still cover source-only / target-only queries:\n\n\nSEARCH edges USING COVERING INDEX idx_edges_source_kind (source=?)\n\n\nRun yourself with node scripts/spikes/spike-edge-indexes.mjs.\n\n## Why this is safe\n\n| Query shape | Old plan | New plan |\n|---|---|---|\n| WHERE source = ? | idx_edges_source | idx_edges_source_kind (left-prefix, covering) |\n| WHERE target = ? | idx_edges_target | idx_edges_target_kind (left-prefix, covering) |\n| WHERE source = ? AND kind = ? | idx_edges_source_kind | idx_edges_source_kind (unchanged) |\n| WHERE target = ? AND kind = ? | idx_edges_target_kind | idx_edges_target_kind (unchanged) |\n\nSQLite's query planner uses a B-tree's leading column(s) for prefix matches, so the composite indexes are a strict superset of the dropped ones for these queries.\n\n## Migration\n\nAdds schema migration v4 (CURRENT_SCHEMA_VERSION 3 → 4):\n\nsql\nDROP INDEX IF EXISTS idx_edges_source;\nDROP INDEX IF EXISTS idx_edges_target;\n\n\n- Fresh databases skip both indexes (removed from schema.sql).\n- Existing v3 databases drop them on next open via the migration runner.\n- Reversible (just re-create the indexes if needed) — no data loss.\n\n## Test plan\n\n- [x] npx tsc --noEmit clean\n- [x] npx vitest run — 382/382 tests pass\n- [x] New tests in __tests__/pr19-improvements.test.ts:\n - Fresh DB does not contain idx_edges_source / idx_edges_target\n - Upgrade path drops both narrow indexes when present and bumps schema version to 4\n- [x] Spike script reproduces the measurements above\n\n## Files changed\n\n| File | Change |\n|---|---|\n| src/db/migrations.ts | New v4 migration; CURRENT_SCHEMA_VERSION 3 → 4 |\n| src/db/schema.sql | Removed two CREATE INDEX lines |\n| __tests__/foundation.test.ts | Updated v3 → v4 expectation |\n| __tests__/pr19-improvements.test.ts | Added v4 migration tests + updated version expectation |\n| scripts/spikes/spike-edge-indexes.mjs | New benchmark reproducer |\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#122

andreinknv and others added 2 commits April 27, 2026 17:09
Today every PR adding a schema migration claims
`CURRENT_SCHEMA_VERSION = next` AND adds an array entry to
`migrations: Migration[]` in src/db/migrations.ts. Two PRs both
claiming the same version resolve as: "second PR's v4 silently
no-ops on existing DBs" — a real silent-data-loss bug class
(PR colbymchenry#113's reviewer caught one).

After this refactor:

  Adding a new schema migration:
  1. Pick the next free 3-digit prefix (`git ls-files
     'src/db/migrations/[0-9]*.ts'` shows what's taken).
  2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a
     `MIGRATION: MigrationModule` (description + up).
  3. Add one import line and one entry to
     `src/db/migrations/index.ts`'s REGISTERED_MODULES array.

Two PRs both creating `004-foo.ts` collide on the FILESYSTEM —
the maintainer sees it instantly. No more silent skipped
migrations.

## What's new

- `src/db/migrations/types.ts` — `MigrationModule { description,
  up }` and `Migration extends MigrationModule { version }`.
- `src/db/migrations/002-project-metadata.ts` — extracted v2
  body verbatim.
- `src/db/migrations/003-lower-name-index.ts` — extracted v3
  body verbatim.
- `src/db/migrations/index.ts` — central registry. Static-imports
  each migration, parses the version FROM THE FILENAME (no
  hand-typed version field that can drift), enforces strict
  `NNN-kebab-name.ts` shape, validates uniqueness/sort at module
  load (throws loudly on collision), exposes ALL_MIGRATIONS and
  CURRENT_SCHEMA_VERSION.
- `src/db/migrations.ts` — refactored to a thin runner. Same
  exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion,
  runMigrations, needsMigration, getPendingMigrations,
  getMigrationHistory, Migration type) — every existing import
  keeps working unchanged.
- `__tests__/migrations-registry.test.ts` — 8 invariant tests:
  registry non-empty, versions unique + strictly ascending,
  CURRENT_SCHEMA_VERSION matches max, every file matches the
  strict NNN-kebab-name pattern, no orphan files, no phantom
  registrations.

## Reviewer pass

Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed:

1. Hand-typed `version` field in REGISTERED_MODULES could drift
   from filename. **Fixed**: removed the version field; registry
   now parses version from filename via FILENAME_PATTERN regex
   inside validateRegistered.
2. Filename-pattern test was lenient (allowed 4-digit or 1-digit
   prefixes). **Fixed**: new "every migration file matches the
   strict NNN-kebab-name.ts pattern" test catches malformed
   filenames as orphan-detection-bypassing offenders.
3. `getPendingMigrations` returned `readonly Migration[]`,
   breaking callers that typed the result as `Migration[]`.
   **Fixed**: returns a fresh mutable array via `.slice()`.
4. No throw-on-duplicate test for validateRegistered (module
   evaluation timing). Acknowledged; not added.

## Backward compat

Every existing import works unchanged:
- `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓
- `import { runMigrations } from './migrations'` ✓
- `import { needsMigration } from './migrations'` ✓
- `import { getMigrationHistory } from './migrations'` ✓
- `import { getPendingMigrations } from './migrations'` — returns
   mutable Migration[] (preserved)
- `Migration` type — re-exported

## Affected open PRs

Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange,
colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113
issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently
claims migration v4 and conflicts with each other on
`migrations.ts`. After this lands they each become:
- 1 new file: `src/db/migrations/<NNN>-<name>.ts`
- 2 lines in registry.ts (import + array entry)

Conflict shape changes from "next free version + array entry +
CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1
new file" + 2-line registry edit. If two PRs target the same
NNN, the filesystem collision surfaces immediately — no silent
skipped migrations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These two narrow indexes are fully covered by the wider
idx_edges_source_kind and idx_edges_target_kind composite indexes
via SQLite's left-prefix scan. Keeping them costs DB size and bulk-
insert time without giving any query that the kind-prefixed indexes
don't already cover.

Empirical measurements on a 50K-node / 250K-edge synthesized DB
(scripts/spikes/spike-edge-indexes.mjs):

  - DB size: -22.2% (34.7 MB -> 27.0 MB)
  - Bulk insert: 1.37x faster (590ms -> 431ms)
  - Source-only / target-only query latency: no regression
    (EXPLAIN: SEARCH edges USING COVERING INDEX
     idx_edges_source_kind (source=?))

Ported to the post-colbymchenry#118 file-based-migration form: migration body
lives in src/db/migrations/017-drop-redundant-edge-indexes.ts and
is registered in src/db/migrations/index.ts. CURRENT_SCHEMA_VERSION
is auto-derived from the registry — no constant bump.

Schema: removes the two CREATE INDEX statements on the narrow
indexes; the wider kind-prefixed indexes (which actually serve
queries) stay.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants