perf(native): fix WASM fallback bypass and batch SQL inserts#606
perf(native): fix WASM fallback bypass and batch SQL inserts#606carlos-alm merged 8 commits intomainfrom
Conversation
Add `npm run benchmark` script to make benchmark execution discoverable instead of requiring manual `node --import ./scripts/ts-resolve-loader.js` invocation. Warn users when embeddings predate the last graph rebuild so they know to re-run `codegraph embed` for fresh search results. Impact: 1 functions changed, 8 affected
Fix interface property signatures (dotted names, single-line spans) incorrectly triggering WASM tree creation on native builds across engine.ts, complexity.ts, and cfg.ts. Add statement caching and batch UPDATE optimizations for insert and role classification stages. Native full build: 2001ms vs WASM 3116ms (1.6x faster). Key wins: complexity 4.2x, cfg 3.2x, parse 2.4x faster. Impact: 26 functions changed, 25 affected
Greptile SummaryThis PR delivers two independent improvements: a correctness fix that prevents interface/type property signatures from incorrectly triggering the WASM parser fallback on native builds, and a suite of SQL batch-insert optimizations that reduce per-iteration statement compilation and statement count. Together they yield a measured 1.6× end-to-end speedup on a full native build. Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant T as insertNodes (transaction)
participant P1 as insertDefinitionsAndExports
participant P23 as insertChildrenAndEdges
participant DB as SQLite DB
T->>P1: call
P1->>DB: batchInsertNodes(files + defs + exports)<br/>INSERT OR IGNORE INTO nodes [chunks of 500]
P1->>DB: batch UPDATE exported=1<br/>WHERE OR conditions [chunks of 500]
P1-->>T: done
T->>P23: call
Note over P23: Pass 1 – collect file→def edges & child rows
P23->>DB: bulkNodeIdsByFile (per file)
P23-->>P23: accumulate edgeRows[file→def] + childRows
P23->>DB: batchInsertNodes(childRows)<br/>INSERT OR IGNORE INTO nodes [chunks of 500]
Note over P23: Pass 2 – re-fetch IDs (now includes children)
P23->>DB: bulkNodeIdsByFile (per file, again)
P23-->>P23: accumulate edgeRows[def→child + parameter_of]
P23->>DB: batchInsertEdges(all edgeRows)<br/>INSERT INTO edges [chunks of 500]
P23-->>T: done
Reviews (3): Last reviewed commit: "fix: correct misleading comments and cac..." | Re-trigger Greptile |
| const stmt = HOTSPOT_QUERIES[metric] ?? HOTSPOT_QUERIES['fan-in']; | ||
| const rows = stmt?.all(kind, limit); | ||
|
|
||
| const hotspots = rows.map((r) => ({ |
There was a problem hiding this comment.
Unnecessary optional chaining produces potentially-undefined
rows
HOTSPOT_QUERIES['fan-in'] is always a concrete Statement (it's a required property of the local object literal), so after the ?? fallback stmt is non-nullable. The ?. on stmt.all() is therefore superfluous, but depending on how TypeScript resolves the type, it may widen rows to Statement.all(...) | undefined. The immediately following rows.map(...) on line 749 has no null guard, so if TypeScript infers rows as T[] | undefined this is a latent type error. Either keep the original ! assertions (which were semantically accurate), or add an explicit null check before the map:
| const stmt = HOTSPOT_QUERIES[metric] ?? HOTSPOT_QUERIES['fan-in']; | |
| const rows = stmt?.all(kind, limit); | |
| const hotspots = rows.map((r) => ({ | |
| const stmt = HOTSPOT_QUERIES[metric] ?? HOTSPOT_QUERIES['fan-in']; | |
| const rows = stmt.all(kind, limit) as { |
Or, if strict null-checks are not flagging this already, at minimum add a guard:
const hotspots = (rows ?? []).map((r) => ({There was a problem hiding this comment.
Fixed — removed the superfluous \ on . Since \ is always concrete after the \ fallback to , the optional chaining was unnecessary and could widen the return type to include . Now uses direct \ call.
| for (let i = 0; i < exportKeys.length; i += EXPORT_CHUNK) { | ||
| const end = Math.min(i + EXPORT_CHUNK, exportKeys.length); | ||
| const chunkSize = end - i; | ||
| const conditions = Array.from( | ||
| { length: chunkSize }, | ||
| () => '(name = ? AND kind = ? AND file = ? AND line = ?)', | ||
| ).join(' OR '); | ||
| const vals: unknown[] = []; | ||
| for (let j = i; j < end; j++) { | ||
| const k = exportKeys[j] as unknown[]; | ||
| vals.push(k[0], k[1], k[2], k[3]); | ||
| } | ||
| db.prepare(`UPDATE nodes SET exported = 1 WHERE ${conditions}`).run(...vals); |
There was a problem hiding this comment.
Export-marking UPDATE statement is re-prepared on every batch iteration
db.prepare(...) is called inside the loop, so for each chunk a new SQL string is parsed and a new prepared statement is compiled. This is inconsistent with the WeakMap-based caching introduced in helpers.ts (getNodeStmt / getEdgeStmt), which was added specifically to avoid per-chunk recompilation.
For a typical build the number of distinct chunk sizes is small (usually just 1 or 2), so the overhead is modest, but the inconsistency means this UPDATE still pays a preparation cost per batch iteration that the INSERT paths no longer do. Hoisting the statement cache outside the loop (or reusing the same pattern from helpers.ts) would close the gap:
// outside the loop:
const stmtCache = new Map<number, BetterSqlite3.Statement>();
// inside:
let updateStmt = stmtCache.get(chunkSize);
if (!updateStmt) {
updateStmt = db.prepare(`UPDATE nodes SET exported = 1 WHERE ${conditions}`);
stmtCache.set(chunkSize, updateStmt);
}
updateStmt.run(...vals);There was a problem hiding this comment.
Fixed — added a Map<number, BetterSqlite3.Statement> cache (exportStmtCache) outside the loop body, keyed by chunk size. The prepared statement is now reused across iterations with the same chunk size, consistent with the WeakMap-based caching pattern in helpers.ts.
…ity.ts (#606) Impact: 3 functions changed, 2 affected
- Fix misleading "single nodeIdMap pass" comment in insertChildrenAndEdges (actually two passes: one before and one after batchInsertNodes) - Cache role UPDATE prepared statements per chunk size in classifyNodeRoles, consistent with WeakMap-based caching pattern used in helpers.ts Impact: 2 functions changed, 4 affected
|
Addressed the remaining items from the second Greptile review:
Commit: |
Impact: 3 functions changed, 8 affected
Summary
Interface.prop, single-line spans) were incorrectly triggering WASM tree creation on native builds. Fixed inengine.ts(ensureWasmTreesIfNeeded,setupVisitors),complexity.ts(initWasmParsersIfNeeded), andcfg.ts(initCfgParsers) by adding!d.name.includes('.')andendLine > linefilters.helpers.ts, chunk size 200→500, batched export marking ininsert-nodes.ts, batchUPDATE nodes SET role = ? WHERE id IN (...)grouped by role instructure.ts.!non-null assertions with safe alternatives in touched files.Benchmark results (native vs WASM, full build on codegraph itself)
Test plan
symbols.dataflowfor supported languages)