fix(native): prevent SQLITE_CORRUPT in incremental pipeline by carlos-alm · Pull Request #728 · optave/ops-codegraph-tool

carlos-alm · 2026-04-01T01:46:31Z

Summary

Root cause: Two different SQLite libraries (better-sqlite3 v3.51, rusqlite ~v3.46) share the same WAL file. When one library checkpoints WAL frames written by the other, cross-library interpretation can corrupt B-tree pages — manifesting as SQLITE_CORRUPT during incremental rebuilds.
Checkpoint before every cross-library handoff: Each library now checkpoints its own WAL frames via PRAGMA wal_checkpoint(TRUNCATE) before closing or yielding to the other library.
Disable mmap on read-write connections: Eliminates Windows mmap/regular-I/O cache coherence issues between the two libraries sharing a DB file.
resumeJsDb now checkpoints: Previously a no-op — native-written WAL frames accumulated until better-sqlite3 checkpointed them at close time. Now rusqlite checkpoints its own frames immediately after each analysis write batch.

Changes

File	What
`pipeline.ts`	Add `wal_checkpoint(TRUNCATE)` after `initSchema`, before both `nativeDb.close()` calls, and in `resumeJsDb` callback
`native_db.rs`	Remove `PRAGMA mmap_size` from read-write connection pragmas (kept on read-only)

Test plan

tests/builder/pipeline.test.ts — 4/4 passed
tests/integration/incremental-parity.test.ts — 12/12 passed
tests/integration/watcher-rebuild.test.ts — 4/4 passed
Biome lint clean
CI: incremental benchmark should no longer report native: null
CI: PRAGMA integrity_check passes after native full + incremental builds

Closes #715, closes #717

…-library handoff Root cause: the native engine opens two SQLite connections from different libraries (better-sqlite3 bundles SQLite 3.51, rusqlite bundles ~3.46). Both share the same WAL file. When one library checkpoints WAL frames written by the other, the cross-library interpretation can corrupt B-tree pages — especially on Windows where mmap and regular I/O are not cache-coherent. The WAL guard from #696 closed nativeDb during pipeline stages but left three gaps where cross-library WAL frames accumulated: 1. Setup: nativeDb.initSchema() wrote WAL frames before better-sqlite3 ran its initSchema 2. Analysis phase: rusqlite wrote complexity/CFG/dataflow/AST data, then closed without checkpointing — leaving frames for better-sqlite3 to checkpoint at close time 3. resumeJsDb was a no-op, so native-written frames persisted in the WAL Fix: - Checkpoint WAL through rusqlite after initSchema (setup) - Checkpoint WAL through rusqlite before every nativeDb close - Add resumeJsDb checkpoint so native-written frames are applied by the same library that wrote them - Disable mmap on read-write rusqlite connections to eliminate Windows mmap/regular-I/O cache coherence issues Closes #715, closes #717

claude · 2026-04-01T01:46:49Z

Claude finished @carlos-alm's task —— View job

Reviewing PR #728: SQLITE_CORRUPT Fix

Comprehensive code review following the 5-phase methodology:

Todo List

Read and analyze changed files
Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity
Phase 4: Code Quality
Phase 5: Backlog Compliance Assessment
Provide final recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: SQLITE_CORRUPT errors during incremental builds when two SQLite libraries (better-sqlite3 v3.51, rusqlite ~v3.46) share the same WAL file. Cross-library WAL frame interpretation causes B-tree page corruption.

Why This Approach:

WAL Checkpointing: Forces each library to checkpoint its own WAL frames via PRAGMA wal_checkpoint(TRUNCATE) before handoff, ensuring clean state
mmap Removal: Eliminates Windows cache coherence issues between libraries
Strategic Placement: Checkpoints at all critical handoff points (init, suspend, resume, close)

Risk Assessment: ✅ Low Risk - This addresses a real corruption issue with a proven SQLite isolation pattern. The approach is defensive and follows SQLite best practices.

Phase 2: Configuration Change Detection ✅

No HIGH-RISK changes detected:

❌ No Biome/linting rules disabled or relaxed
❌ No test timeouts increased or coverage thresholds lowered
❌ No build configuration changes
❌ No TypeScript strict mode flags disabled
✅ Only change: Removed PRAGMA mmap_size from read-write connections (justified for cache coherence)

Phase 3: Test Integrity ✅

No RED FLAGS detected:

✅ PR mentions all existing tests passing (4/4, 12/12, 4/4)
✅ No removed or weakened assertions
✅ No test files modified alongside code changes
✅ No eslint-disable comments added
✅ Reasonable PR size (2 files, +27/-6 lines)
✅ Single focused concern (SQLite corruption fix)

Phase 4: Code Quality ✅

Implementation Analysis:

pipeline.ts:133 - Schema init checkpoint: ✅ Correct placement after nativeDb.initSchema()

pipeline.ts:196, 247 - Close-time checkpoints: ✅ Proper error handling with try/catch, prevents dual-connection corruption

pipeline.ts:52-56 - resumeJsDb callback: ✅ CRITICAL FIX - Previously a no-op, now checkpoints native WAL frames before better-sqlite3 resumes

native_db.rs:385 - mmap removal: ✅ Well-commented rationale, kept on read-only connections (no WAL conflict)

Error Handling: ✅ Appropriate silent failure for checkpoint errors (DB may already be closed)

Performance Impact: ✅ Minimal - checkpoints are fast operations, only at handoff boundaries

Security: ✅ No SQL injection risks, using prepared PRAGMA statements

Phase 5: Backlog Compliance Assessment ✅

Zero-dep: ✅ — No new runtime dependencies added

Foundation-aligned: ✅ — Fully compliant with all principles:

✅ P1 (always-current): Fixes corruption affecting build accuracy
✅ P4 (zero-cost core): No external dependencies, improves reliability
✅ P8 (not an agent): Pure infrastructure fix, no decision-making
✅ P5 (embeddable-first): Maintains programmatic API contract

Problem-fit: 5/5 — Directly addresses core problem: prevents AI agents from getting corrupted dependency data that would cause hallucinated dependencies and missed blast radius detection.

Breaking: No — Purely internal implementation change, no API/CLI changes

Tier classification: Tier 1 (zero-dep ✅ + foundation-aligned ✅)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Real SQLITE_CORRUPT issue from cross-library WAL sharing
**Why This Approach**: Proven SQLite isolation pattern with strategic checkpointing
**Risk Assessment**: Low risk, defensive fix following SQLite best practices

## Backlog Compliance
- Zero-dep: ✓ — no new dependencies
- Foundation-aligned: ✓ — fully compliant with all principles  
- Problem-fit: 5 — directly prevents corrupted dependency data
- Breaking: No — internal implementation only
- Tier: 1

## Critical Concerns
None. This is a well-implemented fix for a real corruption issue.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Addresses real corruption bug with proven SQLite pattern. Excellent root cause analysis, conservative implementation, comprehensive checkpointing at all handoff points. Only minor concern is the complexity of dual-engine coordination, but this is inherent to the architecture and well-handled here.

greptile-apps · 2026-04-01T01:49:49Z

Greptile Summary

This PR addresses SQLITE_CORRUPT errors during incremental rebuilds caused by two SQLite libraries (rusqlite and better-sqlite3) sharing the same WAL-mode database file. The fix applies PRAGMA wal_checkpoint(TRUNCATE) at every library handoff point and disables mmap on read-write rusqlite connections.

Key changes:

pipeline.ts: Adds wal_checkpoint(TRUNCATE) after initSchema, before both nativeDb.close() calls, and converts resumeJsDb from a no-op into a real rusqlite checkpoint.
pipeline.ts: Properly closes nativeDb in the setup fallback error path (previously the connection could leak).
native_db.rs: Removes PRAGMA mmap_size from read-write connections to eliminate Windows mmap/regular-I/O cache-coherence issues between the two libraries; read-only connections retain mmap_size.

Minor issues found:

The post-analyses nativeDb.close() block does not clear ctx.engineOpts.nativeDb, inconsistent with the earlier close block which explicitly nulls it to prevent stale-reference access.
resumeJsDb uses a bare catch {} that silently suppresses all checkpoint errors, not just "connection already closed" — a real I/O failure during checkpoint would be invisible and leave better-sqlite3 exposed to cross-library WAL frames.

Confidence Score: 4/5

Safe to merge — the core WAL-isolation fix is correct; two P2-level inconsistencies are worth cleaning up but don't block the fix.

The root-cause analysis and checkpoint placement are sound. The two remaining findings are P2: an inconsistent ctx.engineOpts.nativeDb clear in the post-analyses close block, and a bare catch {} in resumeJsDb that silently swallows non-close-related checkpoint errors. Neither is a definite current bug (finalize uses JS paths; checkpoint errors rarely happen outside of I/O failures), but they could mask real problems.

src/domain/graph/builder/pipeline.ts — post-analyses close block and resumeJsDb error handling

Important Files Changed

Filename	Overview
src/domain/graph/builder/pipeline.ts	Adds wal_checkpoint(TRUNCATE) at every rusqlite→better-sqlite3 handoff point (post-initSchema, pre-close ×2, and in resumeJsDb); also adds proper nativeDb.close() in the setup error path. Two minor inconsistencies: ctx.engineOpts.nativeDb is not cleared in the post-analyses close block, and resumeJsDb swallows all checkpoint errors silently.
crates/codegraph-core/src/native_db.rs	Removes PRAGMA mmap_size from read-write connections to avoid Windows mmap/regular-I/O cache incoherence when two SQLite libraries share a WAL file; read-only connections intentionally retain mmap_size. Change is targeted and correct.

Sequence Diagram

sequenceDiagram
    participant BS3 as better-sqlite3 (ctx.db)
    participant RQ as rusqlite (ctx.nativeDb)
    participant WAL as WAL file

    Note over BS3,WAL: setupPipeline
    RQ->>WAL: initSchema writes (rusqlite frames)
    RQ->>WAL: wal_checkpoint(TRUNCATE) — flush rusqlite frames to main DB
    BS3->>WAL: initSchema writes (bs3 frames)

    Note over BS3,WAL: runPipelineStages — start
    RQ->>WAL: wal_checkpoint(TRUNCATE) — flush before close
    RQ-->>RQ: close()
    BS3->>WAL: pipeline stages (collect, parse, insert, resolve, edges, structure)

    Note over BS3,WAL: runAnalyses — per feature module
    BS3->>WAL: suspendJsDb: wal_checkpoint(TRUNCATE) — flush bs3 frames
    RQ->>WAL: native analysis writes (rusqlite frames)
    RQ->>WAL: resumeJsDb: wal_checkpoint(TRUNCATE) — flush rusqlite frames
    BS3->>WAL: resumes reading (only sees main DB pages)

    Note over BS3,WAL: runPipelineStages — end
    RQ->>WAL: wal_checkpoint(TRUNCATE) — flush before close
    RQ-->>RQ: close()
    BS3->>WAL: finalize (JS paths only)

_{Reviews (2): Last reviewed commit: "fix: separate checkpoint and close into ..." | Re-trigger Greptile}

…728) Split checkpoint + close operations into independent try/catch blocks at all three sites so close() always runs even if checkpoint throws. Also explicitly close the NativeDatabase in setupPipeline's catch path to prevent a live rusqlite connection from lingering until GC.

carlos-alm · 2026-04-01T02:26:23Z

Both Greptile issues addressed in 0415113:

P1 — close() skipped when checkpoint throws: Separated checkpoint and close into independent try/catch blocks at both close sites (lines ~192 and ~243). Now close() always runs regardless of checkpoint outcome, preventing a live rusqlite connection from lingering until GC.

P2 — setupPipeline catch path doesn't close: Added explicit ctx.nativeDb?.close() (wrapped in its own try/catch) before clearing the reference. Also fixed the misleading message from "init failed" to "setup failed" since the error could come from the checkpoint call rather than init.

carlos-alm · 2026-04-01T02:26:34Z

Both Greptile issues addressed in 0415113:

P1 — close() skipped when checkpoint throws: Separated checkpoint and close into independent try/catch blocks at both close sites (lines ~192 and ~243). Now close() always runs regardless of checkpoint outcome, preventing a live rusqlite connection from lingering until GC.

P2 — setupPipeline catch path doesn't close: Added explicit ctx.nativeDb?.close() (wrapped in its own try/catch) before clearing the reference. Also fixed the misleading message from "init failed" to "setup failed" since the error could come from the checkpoint call rather than init.

carlos-alm · 2026-04-01T02:26:47Z

@greptileai

Merge branch 'main' into fix/wal-checkpoint-corruption

9a1dfff

carlos-alm merged commit ef15740 into main Apr 1, 2026
18 checks passed

carlos-alm deleted the fix/wal-checkpoint-corruption branch April 1, 2026 03:47

github-actions bot locked and limited conversation to collaborators Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(native): prevent SQLITE_CORRUPT in incremental pipeline#728

fix(native): prevent SQLITE_CORRUPT in incremental pipeline#728
carlos-alm merged 3 commits intomainfrom
fix/wal-checkpoint-corruption

carlos-alm commented Apr 1, 2026

Uh oh!

claude bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Apr 1, 2026

Summary

Changes

Test plan

Uh oh!

claude bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #728: SQLITE_CORRUPT Fix

Todo List

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality ✅

Phase 5: Backlog Compliance Assessment ✅

Final Assessment

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 1, 2026 •

edited

Loading

greptile-apps bot commented Apr 1, 2026 •

edited

Loading