perf(native): use single rusqlite connection for entire build pipeline#897
perf(native): use single rusqlite connection for entire build pipeline#897carlos-alm merged 6 commits intomainfrom
Conversation
The v3.9.2 release PR (#891) bumped package.json and Cargo.toml to 3.9.2 while optionalDependencies still pointed to 3.9.1. This caused preflight tests to fail because the installed native binary (3.9.1) mismatched the declared version (3.9.2). Reverting to 3.9.1 so the publish workflow can bump versions atomically after building native binaries at the correct version.
When the native addon is available, the build pipeline now uses only rusqlite for all stages via a NativeDbProxy that implements the BetterSqlite3Database interface. This eliminates the dual-connection WAL corruption problem and removes the open/close/reopen dance that forced most stages to fall back to JS. Key changes: - NativeDbProxy wraps NativeDatabase.queryAll/queryGet/exec to satisfy the BetterSqlite3Database interface transparently - setupPipeline() opens only rusqlite when native is available - runPipelineStages() skips WAL checkpoint dance in native-first mode - tryNativeInsert() skips WAL guards when single connection is active - Fallback to better-sqlite3 when native is unavailable (unchanged) - CODEGRAPH_FORCE_JS_PIPELINE=1 and --engine wasm bypass native-first Benchmarks (native v3.9.1, 677 files): - Full build: 6,668ms → 5,844ms (12% faster) - 1-file rebuild: 1,375ms → 960ms (30% faster) - No-op rebuild: 17ms (unchanged) - CFG phase: 466ms → 6.7ms (70x faster) - Finalize: 156ms → 25ms (6x faster) - DB size: 27.2MB → 23.3MB (14% smaller)
|
Claude finished @carlos-alm's task —— View job PR Review AnalysisAnalyzing this performance optimization PR through the 5-phase review process... Todo List
Phase 1: Root Cause Analysis ✅Problem: The build pipeline suffered from dual-connection WAL corruption when both better-sqlite3 and rusqlite attempted to write to the same SQLite database. This forced most pipeline stages to fall back to JavaScript, eliminating performance gains from the native Rust code. Approach: Introduce a Justification: This is addressing a real architectural problem, not masking symptoms. The approach is sound and benchmarks show significant improvements across multiple metrics. Phase 2: Configuration Change Detection ✅No concerning configuration changes detected:
Phase 3: Test Integrity ✅No red flags detected:
Phase 4: Code Quality ✅Strong implementation quality: Architecture: The Error handling: Comprehensive fallback logic at multiple levels:
Security: No injection risks detected - no dynamic SQL construction, all operations delegate to existing rusqlite methods. Performance: The proxy design is minimal overhead - direct delegation to native methods with no complex translation logic. Phase 5: Backlog Compliance Assessment ✅Root Cause AnalysisProblem Being Solved: Dual-connection WAL corruption between better-sqlite3 and rusqlite causing performance degradation and forced JS fallbacks Backlog Compliance
Final Recommendation
Outstanding work. This is exactly the type of performance optimization that makes codegraph more responsive for AI agents while maintaining reliability. |
Codegraph Impact Analysis23 functions changed → 31 callers affected across 16 files
|
Greptile SummaryThis PR introduces
Confidence Score: 4/5Safe to merge after addressing the nativeFirstProxy reset in the orchestrator catch block. One P1 issue remains: the catch block in buildGraph() doesn't reset nativeFirstProxy when the orchestrator fails after writing data, which can cause a silent empty return. The P2 sanitize() gap is low-risk in practice since pipeline SQL params are strings/numbers. All other changes are clean and previous thread concerns are resolved. src/domain/graph/builder/pipeline.ts — the catch block around tryNativeOrchestrator needs a nativeFirstProxy reset.
|
| Filename | Overview |
|---|---|
| src/domain/graph/builder/native-db-proxy.ts | New class wrapping NativeDatabase to satisfy BetterSqlite3Database; sanitize() only coerces undefined→null, leaving other non-primitive types unguarded at runtime. |
| src/domain/graph/builder/pipeline.ts | Core orchestrator updated for native-first mode; catch block after tryNativeOrchestrator does not reset nativeFirstProxy, risking a silent empty return when the orchestrator fails after committing data. |
| src/domain/graph/builder/context.ts | Adds nativeFirstProxy boolean flag to PipelineContext; straightforward and well-documented addition. |
| src/domain/graph/builder/stages/insert-nodes.ts | WAL checkpoint logic correctly split between native-first (no checkpoint needed) and dual-connection paths; logic is clean. |
| src/db/connection.ts | acquireAdvisoryLock and releaseAdvisoryLock exported; no logic changes, straightforward visibility promotion. |
| src/db/index.ts | Re-exports the two newly-exported lock helpers; trivial change. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[buildGraph] --> B[setupPipeline]
B --> C{native available & FORCE_JS != 1?}
C -- Yes --> D[acquireAdvisoryLock / openReadWrite / initSchema]
D --> E[NativeDbProxy wraps NativeDatabase]
E --> F[ctx.db = proxy / nativeFirstProxy = true]
C -- No --> G[openDb better-sqlite3 / initSchema]
G --> H[nativeFirstProxy = false]
F --> I[tryNativeOrchestrator]
H --> I
I --> J{shouldSkip?}
J -- Yes --> K[runPipelineStages]
J -- No --> L[nativeDb.buildGraph]
L --> M{result.earlyExit?}
M -- Yes --> N[closeDbPair → return]
M -- No --> O[setBuildMeta / runPostNative…]
O --> P{throws?}
P -- Yes --> Q[⚠️ catch: nativeFirstProxy still true]
Q --> K
P -- No --> R[closeDbPair → return BuildResult]
K --> S{nativeFirstProxy?}
S -- Yes --> T[Native-first stages: collect→detect→parse→insert→resolve→edges→structure→analyses→finalize]
S -- No --> U[Legacy dual-connection stages with WAL dance]
Comments Outside Diff (1)
-
src/domain/graph/builder/pipeline.ts, line 794-801 (link)nativeFirstProxynot reset when orchestrator fails after writing dataWhen
nativeFirstProxy=trueandtryNativeOrchestratorthrows afternativeDb.buildGraph()has already committed its writes (e.g. duringsetBuildMeta), the catch block falls through torunPipelineStageswithctx.nativeFirstProxystilltrue.detectChangesthen sees all file hashes as current (the orchestrator just updated them), setsctx.earlyExit=true, andbuildGraph()returnsundefined— silently discarding the finished orchestrator build and leaving the DB missing its updatedbuild_meta.The catch block should reset
ctx.nativeFirstProxytofalse, close the existing native connection, and reopen a fresh better-sqlite3 handle before falling through:} catch (err) { warn(`Native build orchestrator failed, falling back to JS pipeline: ${toErrorMessage(err)}`); if (ctx.nativeFirstProxy) { ctx.nativeFirstProxy = false; try { ctx.nativeDb?.close(); } catch { /* ignore */ } ctx.nativeDb = undefined; try { ctx.db = openDb(ctx.dbPath); initSchema(ctx.db); } catch (reopenErr) { warn(`DB reopen failed after orchestrator failure: ${toErrorMessage(reopenErr)}`); throw reopenErr; } } // Fall through to JS pipeline }
Reviews (2): Last reviewed commit: "fix: sanitize undefined params and retri..." | Re-trigger Greptile
| run(...params: unknown[]): { changes: number; lastInsertRowid: number | bigint } { | ||
| ndb.queryAll(sql, params as Array<string | number | null>); | ||
| return RUN_STUB; | ||
| }, |
There was a problem hiding this comment.
run() stub silently drops changes and lastInsertRowid
RUN_STUB always returns { changes: 0, lastInsertRowid: 0 }, so any caller that inspects these fields — e.g. if (stmt.run(...).changes === 0) throw … — will silently get wrong data. No pipeline stage currently consumes these values, but the proxy advertises full BetterSqlite3Database compatibility, which could surprise future contributors.
Consider delegating to a dedicated exec-style method on NativeDatabase (if one exists or is added) that returns actual rowcount/rowid, or at minimum document the known limitation at the call site.
| run(...params: unknown[]): { changes: number; lastInsertRowid: number | bigint } { | |
| ndb.queryAll(sql, params as Array<string | number | null>); | |
| return RUN_STUB; | |
| }, | |
| run(...params: unknown[]): { changes: number; lastInsertRowid: number | bigint } { | |
| // NOTE: changes and lastInsertRowid are not available via queryAll — | |
| // callers that rely on these values must use the native fast-paths directly. | |
| ndb.queryAll(sql, params as Array<string | number | null>); | |
| return RUN_STUB; | |
| }, |
There was a problem hiding this comment.
Added a NOTE comment on the run() stub clarifying that changes and lastInsertRowid are not available via queryAll and callers relying on these values must use native fast-paths directly. This documents the known limitation at the call site as suggested.
There was a problem hiding this comment.
Follow-up: the run() stub was actually causing a real bug — undefined params from callers like block.label threw serde_json::Value errors in napi-rs, and the zero-valued lastInsertRowid broke CFG block-edge mapping. Fixed in de237e4: added sanitize() to coerce undefined to null, and replaced the static stub with a real SELECT last_insert_rowid() query.
| transaction<F extends (...args: any[]) => any>( | ||
| fn: F, | ||
| ): (...args: F extends (...a: infer A) => unknown ? A : never) => ReturnType<F> { | ||
| const ndb = this.#ndb; | ||
| return ((...args: unknown[]) => { | ||
| ndb.exec('BEGIN'); | ||
| try { | ||
| const result = fn(...args); | ||
| ndb.exec('COMMIT'); | ||
| return result; | ||
| } catch (e) { | ||
| try { | ||
| ndb.exec('ROLLBACK'); | ||
| } catch { | ||
| // Ignore rollback errors — the original error is more important | ||
| } | ||
| throw e; | ||
| } | ||
| }) as any; | ||
| } |
There was a problem hiding this comment.
transaction() doesn't support nested transactions
better-sqlite3's .transaction() automatically promotes to a savepoint when called inside an existing transaction, so nesting is safe. This implementation always issues BEGIN, which SQLite will reject with "cannot start a transaction within a transaction" if a transaction is already active.
The current pipeline stages (detectChanges, insertNodes, buildEdges, buildStructure) each call db.transaction(...) at the top level and not recursively, so this isn't a bug today. But since the proxy claims the same interface, a future stage that wraps an existing helper in a transaction will break in native-first mode without an obvious error message. Consider adding a comment documenting this limitation.
| transaction<F extends (...args: any[]) => any>( | |
| fn: F, | |
| ): (...args: F extends (...a: infer A) => unknown ? A : never) => ReturnType<F> { | |
| const ndb = this.#ndb; | |
| return ((...args: unknown[]) => { | |
| ndb.exec('BEGIN'); | |
| try { | |
| const result = fn(...args); | |
| ndb.exec('COMMIT'); | |
| return result; | |
| } catch (e) { | |
| try { | |
| ndb.exec('ROLLBACK'); | |
| } catch { | |
| // Ignore rollback errors — the original error is more important | |
| } | |
| throw e; | |
| } | |
| }) as any; | |
| } | |
| transaction<F extends (...args: any[]) => any>( | |
| fn: F, | |
| ): (...args: F extends (...a: infer A) => unknown ? A : never) => ReturnType<F> { | |
| const ndb = this.#ndb; | |
| return ((...args: unknown[]) => { | |
| // NOTE: nested transactions (savepoints) are not supported — ensure callers | |
| // do not invoke a transaction() wrapper from within an existing transaction. | |
| ndb.exec('BEGIN'); | |
| try { | |
| const result = fn(...args); | |
| ndb.exec('COMMIT'); | |
| return result; | |
| } catch (e) { | |
| try { | |
| ndb.exec('ROLLBACK'); | |
| } catch { | |
| // Ignore rollback errors — the original error is more important | |
| } | |
| throw e; | |
| } | |
| }) as any; | |
| } |
There was a problem hiding this comment.
Added a NOTE comment on the transaction() method documenting that nested transactions (savepoints) are not supported and callers must not invoke a transaction() wrapper from within an existing transaction. This makes the behavioral divergence from better-sqlite3 explicit as suggested.
…DbProxy (#897) better-sqlite3 silently coerces `undefined` to NULL, but napi-rs/serde cannot represent `undefined` — causing `buildCFGData` to throw "undefined cannot be represented as a serde_json::Value" and silently produce zero CFG blocks. Also replace the static RUN_STUB with a real `last_insert_rowid()` query so callers like CFG block-edge mapping get correct rowid values.
Summary
NativeDbProxythat implements theBetterSqlite3DatabaseinterfaceCODEGRAPH_FORCE_JS_PIPELINE=1and--engine wasmbypass native-first modeBenchmarks (native v3.9.1, 677 files)
How it works
NativeDbProxywrapsNativeDatabase.queryAll/queryGet/exec/pragmato satisfy theBetterSqlite3Databaseinterface. When native is detected insetupPipeline(), the proxy replaces the real better-sqlite3 connection asctx.db. All stage code works unchanged — they callctx.db.prepare(sql).all(...)and the proxy routes it through rusqlite.Test plan
CODEGRAPH_FORCE_JS_PIPELINE=1produces identical results--engine wasmproduces identical results