fix: close edge gap in watcher single-file rebuild (#533)#542
fix: close edge gap in watcher single-file rebuild (#533)#542carlos-alm merged 9 commits intomainfrom
Conversation
rebuildFile (used by watch mode) deleted all edges for a changed file but only rebuilt the file's own outgoing edges — incoming edges from other files were lost. This produced ~3.3% fewer edges than a full build. Root causes fixed: - No reverse-dep cascade: files importing the changed file never had their outgoing edges rebuilt after the changed file's node IDs changed. Added findReverseDeps + two-pass rebuild (direct edges first, then barrel resolution) to match the build pipeline's behavior. - Missing child nodes: insertFileNodes skipped def.children (parameters, properties), losing contains/parameter_of edges. - Missing containment edges: file→symbol and dir→file contains edges were never created by the watcher path. - Missing ancillary table cleanup: function_complexity, cfg_blocks, etc. had FK references to old nodes, causing SQLITE_CONSTRAINT_FOREIGNKEY on node deletion. Added purgeAncillaryData before node deletion. - No barrel resolution: import edges through re-export chains (barrel files) were not resolved, losing transitive import edges.
|
Claude finished @carlos-alm's task —— View job PR Review AnalysisReview Progress
Root Cause Analysis ✅Problem Being Solved: Real issue — incremental builds (watcher mode) produced ~3.3% fewer edges than full builds. The bug was specifically in the Why This Approach: The fix correctly addresses each root cause systematically with proper cascade handling, ancillary table cleanup, and barrel resolution. Risk Assessment: ✅ LOW RISK - This is clearly fixing fundamental correctness bugs, not hiding symptoms. Backlog Compliance ✅
Critical ConcernsNone identified. This is a high-quality bug fix that addresses fundamental accuracy issues in the incremental build pipeline. Final Recommendation
This PR exemplifies the type of fundamental correctness improvement that strengthens codegraph's reliability for AI agents and users. The two-pass rebuild algorithm with reverse-dep cascade is exactly the kind of systematic fix needed for a dependency graph tool. Recommendation: Merge immediately. |
Greptile SummaryThis PR fixes a longstanding ~3.3% edge gap between watcher (incremental) builds and full builds by overhauling
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant W as watcher.js
participant R as rebuildFile()
participant DB as SQLite DB
participant P as parseFileIncremental()
W->>R: rebuildFile(db, rootDir, filePath, stmts)
R->>DB: findReverseDeps(relPath)
DB-->>R: [depFile1, depFile2, ...]
R->>DB: purgeAncillaryData(relPath)
R->>DB: deleteEdgesForFile(relPath)
R->>DB: deleteNodes(relPath)
R->>P: parseFileIncremental(filePath)
P-->>R: symbols
R->>DB: insertFileNodes(defs + children + exports)
R->>DB: buildContainmentEdges(file→def, def→child)
R->>DB: rebuildDirContainment(dir→file)
R->>DB: buildImportEdges + resolveBarrelImportEdges
R->>DB: buildCallEdges
loop for each depFile in reverseDeps
R->>P: parseReverseDep(depFile)
P-->>R: symbols_
R->>DB: deleteOutgoingEdges(depFile)
end
loop Pass 1 — direct edges, no barrel
R->>DB: rebuildReverseDepEdges(depFile, skipBarrel=true)
end
loop Pass 2 — barrel import edges
R->>DB: resolveBarrelImportEdges per import
end
R-->>W: result object with edgesAdded
|
| // ── Ancillary table cleanup ──────────────────────────────────────────── | ||
|
|
||
| function purgeAncillaryData(db, relPath) { | ||
| const tryExec = (sql, ...args) => { | ||
| try { | ||
| db.prepare(sql).run(...args); | ||
| } catch { | ||
| /* table may not exist */ | ||
| } | ||
| }; | ||
| tryExec( | ||
| 'DELETE FROM function_complexity WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| relPath, | ||
| ); | ||
| tryExec( | ||
| 'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| relPath, | ||
| ); | ||
| tryExec( | ||
| 'DELETE FROM cfg_edges WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| relPath, | ||
| ); | ||
| tryExec( | ||
| 'DELETE FROM cfg_blocks WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| relPath, | ||
| ); | ||
| tryExec( | ||
| 'DELETE FROM dataflow WHERE source_id IN (SELECT id FROM nodes WHERE file = ?) OR target_id IN (SELECT id FROM nodes WHERE file = ?)', | ||
| relPath, | ||
| relPath, | ||
| ); | ||
| tryExec('DELETE FROM ast_nodes WHERE file = ?', relPath); | ||
| } |
There was a problem hiding this comment.
purgeAncillaryData silently swallows all exceptions
The bare catch {} in tryExec is commented as /* table may not exist */, but it catches every exception — not just SQLite's "no such table" error. If any DELETE fails for a genuine reason (e.g., a locked database, a schema inconsistency, or a bug in the WHERE clause), the error is silently discarded. The subsequent stmts.deleteNodes.run(relPath) on line 382 would then fail with SQLITE_CONSTRAINT_FOREIGNKEY because the ancillary rows (which still reference those nodes' IDs) were never cleaned up. This is especially risky because the entire function exists specifically to prevent FK violations.
A more targeted guard would only ignore "no such table" errors:
const tryExec = (sql, ...args) => {
try {
db.prepare(sql).run(...args);
} catch (err) {
if (!err?.message?.includes('no such table')) throw err;
}
};There was a problem hiding this comment.
Fixed — \ now only catches errors containing 'no such table' and re-throws all other exceptions. This prevents masking genuine write failures while still handling the optional-table case.
| function isBarrelFile(db, relPath) { | ||
| const reexportCount = db | ||
| .prepare( | ||
| `SELECT COUNT(*) as c FROM edges e | ||
| JOIN nodes n ON e.source_id = n.id | ||
| WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'`, | ||
| ) | ||
| .get(relPath)?.c; | ||
| return (reexportCount || 0) > 0; | ||
| } | ||
|
|
||
| function resolveBarrelTarget(db, barrelPath, symbolName, visited = new Set()) { | ||
| if (visited.has(barrelPath)) return null; | ||
| visited.add(barrelPath); | ||
|
|
||
| // Find re-export targets from this barrel | ||
| const reexportTargets = db | ||
| .prepare( | ||
| `SELECT DISTINCT n2.file FROM edges e | ||
| JOIN nodes n1 ON e.source_id = n1.id | ||
| JOIN nodes n2 ON e.target_id = n2.id | ||
| WHERE e.kind = 'reexports' AND n1.file = ? AND n1.kind = 'file'`, | ||
| ) | ||
| .all(barrelPath); | ||
|
|
||
| for (const { file: targetFile } of reexportTargets) { | ||
| // Check if the symbol is defined in this target file | ||
| const hasDef = db | ||
| .prepare( | ||
| `SELECT 1 FROM nodes WHERE name = ? AND file = ? AND kind != 'file' AND kind != 'directory' LIMIT 1`, | ||
| ) | ||
| .get(symbolName, targetFile); | ||
| if (hasDef) return targetFile; | ||
|
|
||
| // Recurse through barrel chains | ||
| if (isBarrelFile(db, targetFile)) { | ||
| const deeper = resolveBarrelTarget(db, targetFile, symbolName, visited); | ||
| if (deeper) return deeper; | ||
| } | ||
| } | ||
| return null; |
There was a problem hiding this comment.
Prepared statements allocated inside hot-loop functions
Both isBarrelFile and resolveBarrelTarget call db.prepare(...) on every invocation. These functions are called in a tight loop — once per imported name, for every import from a barrel file, and resolveBarrelTarget recurses through barrel chains. Allocating a new prepared statement on each call negates the performance benefit of prepared statements (the whole point of which is to parse the SQL once and reuse the plan).
The queries should be prepared once at module scope (or lazily on first call, cached in a closure) and reused:
// At module scope
let _isBarrelStmt = null;
let _reexportTargetsStmt = null;
let _hasDefStmt = null;
function getIsBarrelStmt(db) {
return (_isBarrelStmt ??= db.prepare(
`SELECT COUNT(*) as c FROM edges e
JOIN nodes n ON e.source_id = n.id
WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'`
));
}For a project with dozens of barrel imports this is a minor overhead, but for large repos with many barrel files the cost can add up across the full reverse-dep cascade.
There was a problem hiding this comment.
Fixed — barrel resolution prepared statements (isBarrelStmt, reexportTargetsStmt, hasDefStmt) are now lazily cached at module scope via getBarrelStmts(db) and reused across all invocations for the same database instance.
- purgeAncillaryData: only catch "no such table" errors instead of swallowing all exceptions (P1 from Greptile) - Cache barrel resolution prepared statements to avoid re-preparing inside hot loops (P2 from Greptile) - Fix stale @param _db JSDoc tag - Prefix unused db param with underscore in rebuildDirContainment
|
Addressed all Greptile feedback:
All 6 integration tests pass, lint is green. |
| for (const depRelPath of reverseDeps) { | ||
| deleteOutgoingEdges(db, depRelPath); | ||
| const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache); | ||
| if (symbols_) depSymbols.set(depRelPath, symbols_); | ||
| } |
There was a problem hiding this comment.
Reverse-dep edges permanently deleted when parse fails
deleteOutgoingEdges is called unconditionally for every reverse-dep file before attempting to parse it. If parseReverseDep returns null (file unreadable, parse error, or file already deleted), no entry is placed in depSymbols. Pass 1 and Pass 2 then skip the file entirely, meaning its outgoing edges were wiped and are never rebuilt — leaving the graph in a permanently inconsistent state until the next full build.
The fix is to parse first and only delete edges for files that successfully parsed:
// Parse BEFORE deleting so a failed parse doesn't orphan the file
const depSymbols = new Map();
for (const depRelPath of reverseDeps) {
const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache);
if (symbols_) depSymbols.set(depRelPath, symbols_);
}
// Now it's safe to delete — every file in depSymbols will be rebuilt
for (const [depRelPath] of depSymbols) {
deleteOutgoingEdges(db, depRelPath);
}This preserves the invariant that edges are only deleted when they will be immediately rebuilt.
There was a problem hiding this comment.
Fixed in bdf3f77 — moved deleteOutgoingEdges after parseReverseDep succeeds. If parsing returns null, edges are now preserved instead of permanently lost. Also fixed a TypeError: typeMap.get is not a function in the same commit: the native engine returns typeMap as a plain array, which was not being coerced to a Map for JS files after the TS-only backfill restriction.
The native engine returns typeMap as a plain array, not a Map. After the TS-only backfill restriction (8e78e62), JS files no longer get their native typeMap converted to a Map, causing a TypeError on .get() during incremental reverse-dep edge rebuilds. Also moves deleteOutgoingEdges after parseReverseDep succeeds, preventing permanent edge loss when a reverse-dep file fails to parse.
|
Addressed remaining Greptile summary observations:
All 6 integration tests pass. Lint clean. |
Summary
Fixes #533 — incremental builds (watcher path) produced ~3.3% fewer edges than full builds.
The bug was in
rebuildFile(used by watch mode), not the build pipeline. When a file changed, the watcher deleted all edges (incoming + outgoing) but only rebuilt the changed file's own outgoing edges. Incoming edges from other files were permanently lost.Root causes fixed:
findReverseDeps+ two-pass rebuild (direct edges first, then barrel resolution)insertFileNodesskippeddef.children(parameters, properties), losingcontains/parameter_ofedgesfile→symbolanddir→filecontains edges were never createdfunction_complexity,cfg_blocks, etc. causedSQLITE_CONSTRAINT_FOREIGNKEYon node deletionTest plan
watcher-rebuild.test.js— exercisesrebuildFiledirectly against deep-deps fixture, verifies identical nodes/edges vs full buildincr-edge-gap.test.js— build pipeline parity test (leaf + mid-level file touch)deep-deps-projectfixture — 9 files with barrel re-exports and multi-level depsincremental-parity.test.jspasses (barrel-project fixture)