perf(db): drop redundant idx_edges_source and idx_edges_target#15
Open
mschreib28 wants to merge 2 commits into
Open
perf(db): drop redundant idx_edges_source and idx_edges_target#15mschreib28 wants to merge 2 commits into
mschreib28 wants to merge 2 commits into
Conversation
Today every PR adding a schema migration claims `CURRENT_SCHEMA_VERSION = next` AND adds an array entry to `migrations: Migration[]` in src/db/migrations.ts. Two PRs both claiming the same version resolve as: "second PR's v4 silently no-ops on existing DBs" — a real silent-data-loss bug class (PR colbymchenry#113's reviewer caught one). After this refactor: Adding a new schema migration: 1. Pick the next free 3-digit prefix (`git ls-files 'src/db/migrations/[0-9]*.ts'` shows what's taken). 2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a `MIGRATION: MigrationModule` (description + up). 3. Add one import line and one entry to `src/db/migrations/index.ts`'s REGISTERED_MODULES array. Two PRs both creating `004-foo.ts` collide on the FILESYSTEM — the maintainer sees it instantly. No more silent skipped migrations. ## What's new - `src/db/migrations/types.ts` — `MigrationModule { description, up }` and `Migration extends MigrationModule { version }`. - `src/db/migrations/002-project-metadata.ts` — extracted v2 body verbatim. - `src/db/migrations/003-lower-name-index.ts` — extracted v3 body verbatim. - `src/db/migrations/index.ts` — central registry. Static-imports each migration, parses the version FROM THE FILENAME (no hand-typed version field that can drift), enforces strict `NNN-kebab-name.ts` shape, validates uniqueness/sort at module load (throws loudly on collision), exposes ALL_MIGRATIONS and CURRENT_SCHEMA_VERSION. - `src/db/migrations.ts` — refactored to a thin runner. Same exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion, runMigrations, needsMigration, getPendingMigrations, getMigrationHistory, Migration type) — every existing import keeps working unchanged. - `__tests__/migrations-registry.test.ts` — 8 invariant tests: registry non-empty, versions unique + strictly ascending, CURRENT_SCHEMA_VERSION matches max, every file matches the strict NNN-kebab-name pattern, no orphan files, no phantom registrations. ## Reviewer pass Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed: 1. Hand-typed `version` field in REGISTERED_MODULES could drift from filename. **Fixed**: removed the version field; registry now parses version from filename via FILENAME_PATTERN regex inside validateRegistered. 2. Filename-pattern test was lenient (allowed 4-digit or 1-digit prefixes). **Fixed**: new "every migration file matches the strict NNN-kebab-name.ts pattern" test catches malformed filenames as orphan-detection-bypassing offenders. 3. `getPendingMigrations` returned `readonly Migration[]`, breaking callers that typed the result as `Migration[]`. **Fixed**: returns a fresh mutable array via `.slice()`. 4. No throw-on-duplicate test for validateRegistered (module evaluation timing). Acknowledged; not added. ## Backward compat Every existing import works unchanged: - `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓ - `import { runMigrations } from './migrations'` ✓ - `import { needsMigration } from './migrations'` ✓ - `import { getMigrationHistory } from './migrations'` ✓ - `import { getPendingMigrations } from './migrations'` — returns mutable Migration[] (preserved) - `Migration` type — re-exported ## Affected open PRs Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange, colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113 issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently claims migration v4 and conflicts with each other on `migrations.ts`. After this lands they each become: - 1 new file: `src/db/migrations/<NNN>-<name>.ts` - 2 lines in registry.ts (import + array entry) Conflict shape changes from "next free version + array entry + CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1 new file" + 2-line registry edit. If two PRs target the same NNN, the filesystem collision surfaces immediately — no silent skipped migrations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These two narrow indexes are fully covered by the wider
idx_edges_source_kind and idx_edges_target_kind composite indexes
via SQLite's left-prefix scan. Keeping them costs DB size and bulk-
insert time without giving any query that the kind-prefixed indexes
don't already cover.
Empirical measurements on a 50K-node / 250K-edge synthesized DB
(scripts/spikes/spike-edge-indexes.mjs):
- DB size: -22.2% (34.7 MB -> 27.0 MB)
- Bulk insert: 1.37x faster (590ms -> 431ms)
- Source-only / target-only query latency: no regression
(EXPLAIN: SEARCH edges USING COVERING INDEX
idx_edges_source_kind (source=?))
Ported to the post-colbymchenry#118 file-based-migration form: migration body
lives in src/db/migrations/017-drop-redundant-edge-indexes.ts and
is registered in src/db/migrations/index.ts. CURRENT_SCHEMA_VERSION
is auto-derived from the registry — no constant bump.
Schema: removes the two CREATE INDEX statements on the narrow
indexes; the wider kind-prefixed indexes (which actually serve
queries) stay.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary\n\nDrops two redundant secondary indexes from the
edgestable:\n\n-idx_edges_source— fully covered byidx_edges_source_kindvia SQLite's left-prefix scan.\n-idx_edges_target— fully covered byidx_edges_target_kind.\n\nThese indexes pre-date the(source, kind)/(target, kind)composites added later. Once the wider indexes were introduced, the narrow ones became dead weight on every write and every disk page — but they were never removed.\n\n## Empirical validation\n\nMeasured on a synthesized 50K-node / 250K-edge SQLite database that mirrors a realistic mid-size codebase. Reproducer:scripts/spikes/spike-edge-indexes.mjs.\n\n| Metric | Baseline (with redundant) | Stripped | Δ |\n|---|---|---|---|\n| DB size | 34.7 MB | 27.0 MB | -22.2% |\n| Bulk insert (250K edges) | 590 ms | 431 ms | 1.37× faster |\n|WHERE source = ?latency | reference | 0.88× | no regression |\n|WHERE target = ?latency | reference | 0.95× | no regression |\n\nEXPLAIN output on the stripped DB confirms the kind-prefixed indexes still cover source-only / target-only queries:\n\n\nSEARCH edges USING COVERING INDEX idx_edges_source_kind (source=?)\n\n\nRun yourself withnode scripts/spikes/spike-edge-indexes.mjs.\n\n## Why this is safe\n\n| Query shape | Old plan | New plan |\n|---|---|---|\n|WHERE source = ?|idx_edges_source|idx_edges_source_kind(left-prefix, covering) |\n|WHERE target = ?|idx_edges_target|idx_edges_target_kind(left-prefix, covering) |\n|WHERE source = ? AND kind = ?|idx_edges_source_kind|idx_edges_source_kind(unchanged) |\n|WHERE target = ? AND kind = ?|idx_edges_target_kind|idx_edges_target_kind(unchanged) |\n\nSQLite's query planner uses a B-tree's leading column(s) for prefix matches, so the composite indexes are a strict superset of the dropped ones for these queries.\n\n## Migration\n\nAdds schema migration v4 (CURRENT_SCHEMA_VERSION3 → 4):\n\nsql\nDROP INDEX IF EXISTS idx_edges_source;\nDROP INDEX IF EXISTS idx_edges_target;\n\n\n- Fresh databases skip both indexes (removed fromschema.sql).\n- Existing v3 databases drop them on next open via the migration runner.\n- Reversible (just re-create the indexes if needed) — no data loss.\n\n## Test plan\n\n- [x]npx tsc --noEmitclean\n- [x]npx vitest run— 382/382 tests pass\n- [x] New tests in__tests__/pr19-improvements.test.ts:\n - Fresh DB does not containidx_edges_source/idx_edges_target\n - Upgrade path drops both narrow indexes when present and bumps schema version to 4\n- [x] Spike script reproduces the measurements above\n\n## Files changed\n\n| File | Change |\n|---|---|\n|src/db/migrations.ts| New v4 migration;CURRENT_SCHEMA_VERSION3 → 4 |\n|src/db/schema.sql| Removed twoCREATE INDEXlines |\n|__tests__/foundation.test.ts| Updated v3 → v4 expectation |\n|__tests__/pr19-improvements.test.ts| Added v4 migration tests + updated version expectation |\n|scripts/spikes/spike-edge-indexes.mjs| New benchmark reproducer |\n\n🤖 Generated with Claude Code\nCopied from colbymchenry/codegraph#122