fix(native): prevent SQLITE_CORRUPT in incremental pipeline#728
fix(native): prevent SQLITE_CORRUPT in incremental pipeline#728carlos-alm merged 3 commits intomainfrom
Conversation
…-library handoff Root cause: the native engine opens two SQLite connections from different libraries (better-sqlite3 bundles SQLite 3.51, rusqlite bundles ~3.46). Both share the same WAL file. When one library checkpoints WAL frames written by the other, the cross-library interpretation can corrupt B-tree pages — especially on Windows where mmap and regular I/O are not cache-coherent. The WAL guard from #696 closed nativeDb during pipeline stages but left three gaps where cross-library WAL frames accumulated: 1. Setup: nativeDb.initSchema() wrote WAL frames before better-sqlite3 ran its initSchema 2. Analysis phase: rusqlite wrote complexity/CFG/dataflow/AST data, then closed without checkpointing — leaving frames for better-sqlite3 to checkpoint at close time 3. resumeJsDb was a no-op, so native-written frames persisted in the WAL Fix: - Checkpoint WAL through rusqlite after initSchema (setup) - Checkpoint WAL through rusqlite before every nativeDb close - Add resumeJsDb checkpoint so native-written frames are applied by the same library that wrote them - Disable mmap on read-write rusqlite connections to eliminate Windows mmap/regular-I/O cache coherence issues Closes #715, closes #717
|
Claude finished @carlos-alm's task —— View job Reviewing PR #728: SQLITE_CORRUPT FixComprehensive code review following the 5-phase methodology: Todo List
Phase 1: Root Cause Analysis ✅Problem Being Solved: SQLITE_CORRUPT errors during incremental builds when two SQLite libraries (better-sqlite3 v3.51, rusqlite ~v3.46) share the same WAL file. Cross-library WAL frame interpretation causes B-tree page corruption. Why This Approach:
Risk Assessment: ✅ Low Risk - This addresses a real corruption issue with a proven SQLite isolation pattern. The approach is defensive and follows SQLite best practices. Phase 2: Configuration Change Detection ✅No HIGH-RISK changes detected:
Phase 3: Test Integrity ✅No RED FLAGS detected:
Phase 4: Code Quality ✅Implementation Analysis:
Error Handling: ✅ Appropriate silent failure for checkpoint errors (DB may already be closed) Performance Impact: ✅ Minimal - checkpoints are fast operations, only at handoff boundaries Security: ✅ No SQL injection risks, using prepared PRAGMA statements Phase 5: Backlog Compliance Assessment ✅Zero-dep: ✅ — No new runtime dependencies added Foundation-aligned: ✅ — Fully compliant with all principles:
Problem-fit: 5/5 — Directly addresses core problem: prevents AI agents from getting corrupted dependency data that would cause hallucinated dependencies and missed blast radius detection. Breaking: No — Purely internal implementation change, no API/CLI changes Tier classification: Tier 1 (zero-dep ✅ + foundation-aligned ✅) Final Assessment |
Greptile SummaryThis PR addresses Key changes:
Minor issues found:
Confidence Score: 4/5Safe to merge — the core WAL-isolation fix is correct; two P2-level inconsistencies are worth cleaning up but don't block the fix. The root-cause analysis and checkpoint placement are sound. The two remaining findings are P2: an inconsistent ctx.engineOpts.nativeDb clear in the post-analyses close block, and a bare catch {} in resumeJsDb that silently swallows non-close-related checkpoint errors. Neither is a definite current bug (finalize uses JS paths; checkpoint errors rarely happen outside of I/O failures), but they could mask real problems. src/domain/graph/builder/pipeline.ts — post-analyses close block and resumeJsDb error handling Important Files Changed
Sequence DiagramsequenceDiagram
participant BS3 as better-sqlite3 (ctx.db)
participant RQ as rusqlite (ctx.nativeDb)
participant WAL as WAL file
Note over BS3,WAL: setupPipeline
RQ->>WAL: initSchema writes (rusqlite frames)
RQ->>WAL: wal_checkpoint(TRUNCATE) — flush rusqlite frames to main DB
BS3->>WAL: initSchema writes (bs3 frames)
Note over BS3,WAL: runPipelineStages — start
RQ->>WAL: wal_checkpoint(TRUNCATE) — flush before close
RQ-->>RQ: close()
BS3->>WAL: pipeline stages (collect, parse, insert, resolve, edges, structure)
Note over BS3,WAL: runAnalyses — per feature module
BS3->>WAL: suspendJsDb: wal_checkpoint(TRUNCATE) — flush bs3 frames
RQ->>WAL: native analysis writes (rusqlite frames)
RQ->>WAL: resumeJsDb: wal_checkpoint(TRUNCATE) — flush rusqlite frames
BS3->>WAL: resumes reading (only sees main DB pages)
Note over BS3,WAL: runPipelineStages — end
RQ->>WAL: wal_checkpoint(TRUNCATE) — flush before close
RQ-->>RQ: close()
BS3->>WAL: finalize (JS paths only)
Reviews (2): Last reviewed commit: "fix: separate checkpoint and close into ..." | Re-trigger Greptile |
…728) Split checkpoint + close operations into independent try/catch blocks at all three sites so close() always runs even if checkpoint throws. Also explicitly close the NativeDatabase in setupPipeline's catch path to prevent a live rusqlite connection from lingering until GC.
|
Both Greptile issues addressed in 0415113: P1 — close() skipped when checkpoint throws: Separated checkpoint and close into independent try/catch blocks at both close sites (lines ~192 and ~243). Now close() always runs regardless of checkpoint outcome, preventing a live rusqlite connection from lingering until GC. P2 — setupPipeline catch path doesn't close: Added explicit |
1 similar comment
|
Both Greptile issues addressed in 0415113: P1 — close() skipped when checkpoint throws: Separated checkpoint and close into independent try/catch blocks at both close sites (lines ~192 and ~243). Now close() always runs regardless of checkpoint outcome, preventing a live rusqlite connection from lingering until GC. P2 — setupPipeline catch path doesn't close: Added explicit |
Summary
SQLITE_CORRUPTduring incremental rebuilds.PRAGMA wal_checkpoint(TRUNCATE)before closing or yielding to the other library.Changes
pipeline.tswal_checkpoint(TRUNCATE)afterinitSchema, before bothnativeDb.close()calls, and inresumeJsDbcallbacknative_db.rsPRAGMA mmap_sizefrom read-write connection pragmas (kept on read-only)Test plan
tests/builder/pipeline.test.ts— 4/4 passedtests/integration/incremental-parity.test.ts— 12/12 passedtests/integration/watcher-rebuild.test.ts— 4/4 passednative: nullPRAGMA integrity_checkpasses after native full + incremental buildsCloses #715, closes #717