fix: close edge gap in watcher single-file rebuild (#533) by carlos-alm · Pull Request #542 · optave/codegraph

carlos-alm · 2026-03-19T10:10:37Z

Summary

Fixes #533 — incremental builds (watcher path) produced ~3.3% fewer edges than full builds.

The bug was in rebuildFile (used by watch mode), not the build pipeline. When a file changed, the watcher deleted all edges (incoming + outgoing) but only rebuilt the changed file's own outgoing edges. Incoming edges from other files were permanently lost.

Root causes fixed:

No reverse-dep cascade: files importing the changed file never had their outgoing edges rebuilt after node IDs changed. Added findReverseDeps + two-pass rebuild (direct edges first, then barrel resolution)
Missing child nodes: insertFileNodes skipped def.children (parameters, properties), losing contains/parameter_of edges
Missing containment edges: file→symbol and dir→file contains edges were never created
Missing ancillary table cleanup: FK references in function_complexity, cfg_blocks, etc. caused SQLITE_CONSTRAINT_FOREIGNKEY on node deletion
No barrel resolution: import edges through re-export chains were not resolved

Test plan

New watcher-rebuild.test.js — exercises rebuildFile directly against deep-deps fixture, verifies identical nodes/edges vs full build
New incr-edge-gap.test.js — build pipeline parity test (leaf + mid-level file touch)
New deep-deps-project fixture — 9 files with barrel re-exports and multi-level deps
Existing incremental-parity.test.js passes (barrel-project fixture)
Full test suite: 2006 tests passed, 0 failures

rebuildFile (used by watch mode) deleted all edges for a changed file but only rebuilt the file's own outgoing edges — incoming edges from other files were lost. This produced ~3.3% fewer edges than a full build. Root causes fixed: - No reverse-dep cascade: files importing the changed file never had their outgoing edges rebuilt after the changed file's node IDs changed. Added findReverseDeps + two-pass rebuild (direct edges first, then barrel resolution) to match the build pipeline's behavior. - Missing child nodes: insertFileNodes skipped def.children (parameters, properties), losing contains/parameter_of edges. - Missing containment edges: file→symbol and dir→file contains edges were never created by the watcher path. - Missing ancillary table cleanup: function_complexity, cfg_blocks, etc. had FK references to old nodes, causing SQLITE_CONSTRAINT_FOREIGNKEY on node deletion. Added purgeAncillaryData before node deletion. - No barrel resolution: import edges through re-export chains (barrel files) were not resolved, losing transitive import edges.

claude · 2026-03-19T10:10:50Z

Claude finished @carlos-alm's task —— View job

PR Review Analysis

Review Progress

Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity Review
Phase 4: Code Quality Analysis
Phase 5: Backlog Compliance Assessment
Final Assessment & Recommendation

Root Cause Analysis ✅

Problem Being Solved: Real issue — incremental builds (watcher mode) produced ~3.3% fewer edges than full builds. The bug was specifically in the rebuildFile function used by watch mode.

Why This Approach: The fix correctly addresses each root cause systematically with proper cascade handling, ancillary table cleanup, and barrel resolution.

Risk Assessment: ✅ LOW RISK - This is clearly fixing fundamental correctness bugs, not hiding symptoms.

Backlog Compliance ✅

Zero-dep: ✅ Uses only existing tree-sitter AST + SQLite infrastructure
Foundation-aligned: ✅ Improves always-current graph accuracy (P1), maintains zero-cost core (P4), embeddable-first design (P5)
Problem-fit: 5/5 — Directly prevents "missed blast radius" — the core problem codegraph exists to solve
Breaking: No — Purely additive edge creation, identical API
Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a high-quality bug fix that addresses fundamental accuracy issues in the incremental build pipeline.

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning:
- Fixes a real accuracy gap (3.3% edge loss) in incremental builds
- Comprehensive solution addressing all root causes systematically
- Excellent test coverage with rigorous parity verification
- High code quality with proper error handling and documentation
- Zero-dep, foundation-aligned, directly solves the core blast-radius problem
- No configuration relaxation or test weakening detected

This PR exemplifies the type of fundamental correctness improvement that strengthens codegraph's reliability for AI agents and users. The two-pass rebuild algorithm with reverse-dep cascade is exactly the kind of systematic fix needed for a dependency graph tool.

Recommendation: Merge immediately.

greptile-apps · 2026-03-19T10:17:35Z

Greptile Summary

This PR fixes a longstanding ~3.3% edge gap between watcher (incremental) builds and full builds by overhauling rebuildFile in incremental.js with five targeted corrections: reverse-dep cascade (files that import the changed file have their outgoing edges rebuilt), child-node insertion (def.children parameters/properties), containment edge creation (file→symbol, dir→file), ancillary table cleanup before node deletion (function_complexity, cfg_blocks, etc.), and barrel re-export resolution. A new deep-deps-project fixture with 9 files and barrel re-exports, plus two new integration test files, gives high confidence the fix is correct.

Reverse-dep cascade (findReverseDeps + two-pass rebuild) correctly handles node ID changes when a file is rebuilt; pass 1 creates reexports edges so pass 2 can resolve barrel chains that depend on them
typeMap coercion from native-engine array format to Map (fixing a TypeError) is a clean, narrowly-scoped fix
purgeAncillaryData now correctly guards against FK violations on stmts.deleteNodes, and the tryExec helper now only swallows no such table errors (prior review feedback addressed)
Prepared-statement caching for getRevDepStmts and getBarrelStmts follows the lazy-singleton pattern (prior review feedback addressed)
Minor: purgeAncillaryData still creates 6 fresh db.prepare() calls on every rebuildFile invocation — unlike the barrel/rev-dep caches, these are not cached at module scope
Minor: The second test case in incr-edge-gap.test.js (mid-level file touch) is missing the raw edge-count assertion that the first test case includes, leaving a small gap for duplicate-edge regressions to pass silently

Confidence Score: 4/5

Safe to merge — correctness logic is sound, prior critical feedback was addressed, and the two-pass cascade design is well-tested.
The core algorithm (find-before-purge, two-pass reverse-dep cascade, no such table-only catch) is correct and the full test suite (2006 passing) validates parity. One point deducted for the purgeAncillaryData prepared-statement allocation on every call (a minor but addressable hot-path overhead) and the missing edge-count assertion in the second integration test.
src/domain/graph/builder/incremental.js — purgeAncillaryData statement caching; tests/integration/incr-edge-gap.test.js — missing length assertion in second test case.

Important Files Changed

Filename	Overview
src/domain/graph/builder/incremental.js	Core fix for the watcher edge gap — adds reverse-dep cascade, containment edges, child-node insertion, barrel resolution, and ancillary table cleanup. Logic is sound. Minor: `purgeAncillaryData` recreates 6 prepared statements on every call, unlike the barrel/rev-dep statement caches introduced in the same PR.
tests/integration/watcher-rebuild.test.js	New integration test that exercises `rebuildFile` directly against the deep-deps fixture and asserts identical nodes/edges vs a full build. Well-structured with `beforeAll`/`afterAll` isolation and detailed diagnostic output on failure.
tests/integration/incr-edge-gap.test.js	Build-pipeline parity test for leaf and mid-level file touches. The second test case is missing a raw edge-count assertion (unlike the first), which could allow duplicate-edge regressions to pass silently.
tests/fixtures/deep-deps-project/shared/constants.js	New fixture file — leaf node with constants and a utility function, correctly serving as the deeply-imported file that triggers the reverse-dep cascade in the tests.
tests/benchmarks/resolution/fixtures/typescript/index.ts	One-line benchmark fixture change: adds the `type` modifier to a `UserService` import to make it a type-only import. Straightforward and unrelated to the core fix.

Sequence Diagram

sequenceDiagram
    participant W as watcher.js
    participant R as rebuildFile()
    participant DB as SQLite DB
    participant P as parseFileIncremental()

    W->>R: rebuildFile(db, rootDir, filePath, stmts)
    R->>DB: findReverseDeps(relPath)
    DB-->>R: [depFile1, depFile2, ...]
    R->>DB: purgeAncillaryData(relPath)
    R->>DB: deleteEdgesForFile(relPath)
    R->>DB: deleteNodes(relPath)
    R->>P: parseFileIncremental(filePath)
    P-->>R: symbols
    R->>DB: insertFileNodes(defs + children + exports)
    R->>DB: buildContainmentEdges(file→def, def→child)
    R->>DB: rebuildDirContainment(dir→file)
    R->>DB: buildImportEdges + resolveBarrelImportEdges
    R->>DB: buildCallEdges

    loop for each depFile in reverseDeps
        R->>P: parseReverseDep(depFile)
        P-->>R: symbols_
        R->>DB: deleteOutgoingEdges(depFile)
    end

    loop Pass 1 — direct edges, no barrel
        R->>DB: rebuildReverseDepEdges(depFile, skipBarrel=true)
    end

    loop Pass 2 — barrel import edges
        R->>DB: resolveBarrelImportEdges per import
    end

    R-->>W: result object with edgesAdded

Comments Outside Diff (2)

src/domain/graph/builder/incremental.js, line 152-157 (link)

purgeAncillaryData creates six prepared statements on every call

Unlike getRevDepStmts and getBarrelStmts (which lazily cache their statements at module scope), purgeAncillaryData calls db.prepare(sql) six times on every rebuildFile invocation. In heavy watch-mode usage — where the same DB instance persists for the lifetime of the watcher process — these statements are recreated needlessly on every file-change event. Preparing a statement re-parses and compiles the SQL each time, which is exactly what the caching pattern elsewhere was introduced to avoid.

The fix follows the same pattern as the barrel/rev-dep caches:
```
let _purgeDb = null;
const _purgeStmts = {};

function getPurgeStmts(db) {
  if (_purgeDb !== db) {
    _purgeDb = db;
    _purgeStmts.funcComplexity = db.prepare(
      'DELETE FROM function_complexity WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
    );
    _purgeStmts.nodeMetrics = db.prepare(
      'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
    );
    // ... etc.
  }
  return _purgeStmts;
}
```
Then tryExec calls getPurgeStmts(db).funcComplexity.run(relPath) etc. The no such table guard on .run() still applies.
tests/integration/incr-edge-gap.test.js, line 633-637 (link)

Missing edge count assertion in the mid-level file test

The first test (touching the leaf file) asserts missing, extra, and raw edge counts:
```
expect(missing).toEqual([]);
expect(extra).toEqual([]);
expect(incrGraph.edges.length).toBe(fullGraph.edges.length); // ← explicit count check
```
This second test checks only missing and extra. Because these are built from Set keys, duplicate edges collapse — so a scenario where the incremental build produces duplicate edges (possible with bare INSERT INTO edges without OR IGNORE) would yield equal missing/extra sets while still having a different raw count. The total-length assertion in the first test catches this; the second test misses it.

_{Last reviewed commit: "Merge branch 'main' ..."}

greptile-apps · 2026-03-19T10:17:38Z

src/domain/graph/builder/incremental.js

+// ── Ancillary table cleanup ────────────────────────────────────────────
+
+function purgeAncillaryData(db, relPath) {
+  const tryExec = (sql, ...args) => {
+    try {
+      db.prepare(sql).run(...args);
+    } catch {
+      /* table may not exist */
+    }
+  };
+  tryExec(
+    'DELETE FROM function_complexity WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
+    relPath,
+  );
+  tryExec(
+    'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
+    relPath,
+  );
+  tryExec(
+    'DELETE FROM cfg_edges WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)',
+    relPath,
+  );
+  tryExec(
+    'DELETE FROM cfg_blocks WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)',
+    relPath,
+  );
+  tryExec(
+    'DELETE FROM dataflow WHERE source_id IN (SELECT id FROM nodes WHERE file = ?) OR target_id IN (SELECT id FROM nodes WHERE file = ?)',
+    relPath,
+    relPath,
+  );
+  tryExec('DELETE FROM ast_nodes WHERE file = ?', relPath);
+}


purgeAncillaryData silently swallows all exceptions

The bare catch {} in tryExec is commented as /* table may not exist */, but it catches every exception — not just SQLite's "no such table" error. If any DELETE fails for a genuine reason (e.g., a locked database, a schema inconsistency, or a bug in the WHERE clause), the error is silently discarded. The subsequent stmts.deleteNodes.run(relPath) on line 382 would then fail with SQLITE_CONSTRAINT_FOREIGNKEY because the ancillary rows (which still reference those nodes' IDs) were never cleaned up. This is especially risky because the entire function exists specifically to prevent FK violations.

A more targeted guard would only ignore "no such table" errors:

const tryExec = (sql, ...args) => { try { db.prepare(sql).run(...args); } catch (err) { if (!err?.message?.includes('no such table')) throw err; } };

Fixed — \ now only catches errors containing 'no such table' and re-throws all other exceptions. This prevents masking genuine write failures while still handling the optional-table case.

greptile-apps · 2026-03-19T10:17:39Z

src/domain/graph/builder/incremental.js

+function isBarrelFile(db, relPath) {
+  const reexportCount = db
+    .prepare(
+      `SELECT COUNT(*) as c FROM edges e
+       JOIN nodes n ON e.source_id = n.id
+       WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'`,
+    )
+    .get(relPath)?.c;
+  return (reexportCount || 0) > 0;
+}
+
+function resolveBarrelTarget(db, barrelPath, symbolName, visited = new Set()) {
+  if (visited.has(barrelPath)) return null;
+  visited.add(barrelPath);
+
+  // Find re-export targets from this barrel
+  const reexportTargets = db
+    .prepare(
+      `SELECT DISTINCT n2.file FROM edges e
+       JOIN nodes n1 ON e.source_id = n1.id
+       JOIN nodes n2 ON e.target_id = n2.id
+       WHERE e.kind = 'reexports' AND n1.file = ? AND n1.kind = 'file'`,
+    )
+    .all(barrelPath);
+
+  for (const { file: targetFile } of reexportTargets) {
+    // Check if the symbol is defined in this target file
+    const hasDef = db
+      .prepare(
+        `SELECT 1 FROM nodes WHERE name = ? AND file = ? AND kind != 'file' AND kind != 'directory' LIMIT 1`,
+      )
+      .get(symbolName, targetFile);
+    if (hasDef) return targetFile;
+
+    // Recurse through barrel chains
+    if (isBarrelFile(db, targetFile)) {
+      const deeper = resolveBarrelTarget(db, targetFile, symbolName, visited);
+      if (deeper) return deeper;
+    }
+  }
+  return null;


Prepared statements allocated inside hot-loop functions

Both isBarrelFile and resolveBarrelTarget call db.prepare(...) on every invocation. These functions are called in a tight loop — once per imported name, for every import from a barrel file, and resolveBarrelTarget recurses through barrel chains. Allocating a new prepared statement on each call negates the performance benefit of prepared statements (the whole point of which is to parse the SQL once and reuse the plan).

The queries should be prepared once at module scope (or lazily on first call, cached in a closure) and reused:

// At module scope let _isBarrelStmt = null; let _reexportTargetsStmt = null; let _hasDefStmt = null; function getIsBarrelStmt(db) { return (_isBarrelStmt ??= db.prepare( `SELECT COUNT(*) as c FROM edges e JOIN nodes n ON e.source_id = n.id WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'` )); }

For a project with dozens of barrel imports this is a minor overhead, but for large repos with many barrel files the cost can add up across the full reverse-dep cascade.

Fixed — barrel resolution prepared statements (isBarrelStmt, reexportTargetsStmt, hasDefStmt) are now lazily cached at module scope via getBarrelStmts(db) and reused across all invocations for the same database instance.

@param

- purgeAncillaryData: only catch "no such table" errors instead of swallowing all exceptions (P1 from Greptile) - Cache barrel resolution prepared statements to avoid re-preparing inside hot loops (P2 from Greptile) - Fix stale @param _db JSDoc tag - Prefix unused db param with underscore in rebuildDirContainment

carlos-alm · 2026-03-19T10:22:43Z

Addressed all Greptile feedback:

P1 — purgeAncillaryData catch-all: Now only catches 'no such table' errors; re-throws everything else.
P2 — Prepared statements in hot loops: Barrel resolution stmts are now lazily cached at module scope via getBarrelStmts(db).
P2 — Stale JSDoc: Fixed @param _db → @param db.
Lint/format: Fixed all Biome issues in source, fixtures, and tests.

All 6 integration tests pass, lint is green.

@greptileai

greptile-apps · 2026-03-19T10:27:57Z

src/domain/graph/builder/incremental.js

+  for (const depRelPath of reverseDeps) {
+    deleteOutgoingEdges(db, depRelPath);
+    const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache);
+    if (symbols_) depSymbols.set(depRelPath, symbols_);
+  }


Reverse-dep edges permanently deleted when parse fails

deleteOutgoingEdges is called unconditionally for every reverse-dep file before attempting to parse it. If parseReverseDep returns null (file unreadable, parse error, or file already deleted), no entry is placed in depSymbols. Pass 1 and Pass 2 then skip the file entirely, meaning its outgoing edges were wiped and are never rebuilt — leaving the graph in a permanently inconsistent state until the next full build.

The fix is to parse first and only delete edges for files that successfully parsed:

// Parse BEFORE deleting so a failed parse doesn't orphan the file const depSymbols = new Map(); for (const depRelPath of reverseDeps) { const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache); if (symbols_) depSymbols.set(depRelPath, symbols_); } // Now it's safe to delete — every file in depSymbols will be rebuilt for (const [depRelPath] of depSymbols) { deleteOutgoingEdges(db, depRelPath); }

This preserves the invariant that edges are only deleted when they will be immediately rebuilt.

Fixed in bdf3f77 — moved deleteOutgoingEdges after parseReverseDep succeeds. If parsing returns null, edges are now preserved instead of permanently lost. Also fixed a TypeError: typeMap.get is not a function in the same commit: the native engine returns typeMap as a plain array, which was not being coerced to a Map for JS files after the TS-only backfill restriction.

The native engine returns typeMap as a plain array, not a Map. After the TS-only backfill restriction (8e78e62), JS files no longer get their native typeMap converted to a Map, causing a TypeError on .get() during incremental reverse-dep edge rebuilds. Also moves deleteOutgoingEdges after parseReverseDep succeeds, preventing permanent edge loss when a reverse-dep file fails to parse.

carlos-alm · 2026-03-19T10:58:49Z

@greptileai

carlos-alm · 2026-03-20T06:34:03Z

Addressed remaining Greptile summary observations:

Stmt re-preparation inconsistency (11f4f0f): findReverseDeps and deleteOutgoingEdges now use lazily-cached prepared statements via getRevDepStmts(db), consistent with the barrel-stmt optimization.
Pass 2 barrel-resolution duplication (11f4f0f): Extracted the shared barrel import resolution logic into resolveBarrelImportEdges() — both buildImportEdges and the Pass 2 reverse-dep cascade now call the same helper, eliminating the ~25-line duplication.
Weak edge assertion in watcher-rebuild test (4cf64d8): readGraph now includes n1.file AS src_file and n2.file AS tgt_file in the edge query, so same-named nodes in different files are distinguished. Also wrapped in try/finally for db.close() safety.

All 6 integration tests pass. Lint clean.

@greptileai

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

carlos-alm added 2 commits March 19, 2026 04:22

style: fix lint and formatting in fixtures and tests (#542)

ca7ba2a

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

carlos-alm added 2 commits March 19, 2026 04:47

Merge branch 'main' into fix/incremental-edge-gap-533

03c8070

carlos-alm added 2 commits March 20, 2026 00:33

perf: cache findReverseDeps/deleteOutgoingEdges prepared statements

11f4f0f

test: include file paths in watcher-rebuild edge comparison

4cf64d8

carlos-alm added 2 commits March 20, 2026 00:37

Merge branch 'main' into fix/incremental-edge-gap-533

0e09359

Merge branch 'main' into fix/incremental-edge-gap-533

c4a6644

carlos-alm merged commit 15af91f into main Mar 20, 2026
15 checks passed

carlos-alm deleted the fix/incremental-edge-gap-533 branch March 20, 2026 06:42

github-actions bot locked and limited conversation to collaborators Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: close edge gap in watcher single-file rebuild (#533)#542

fix: close edge gap in watcher single-file rebuild (#533)#542
carlos-alm merged 9 commits intomainfrom
fix/incremental-edge-gap-533

carlos-alm commented Mar 19, 2026

Uh oh!

claude bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 19, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Analysis

Review Progress

Root Cause Analysis ✅

Backlog Compliance ✅

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 19, 2026 •

edited

Loading

greptile-apps bot commented Mar 19, 2026 •

edited

Loading