Skip to content

fix: close edge gap in watcher single-file rebuild (#533)#542

Merged
carlos-alm merged 9 commits intomainfrom
fix/incremental-edge-gap-533
Mar 20, 2026
Merged

fix: close edge gap in watcher single-file rebuild (#533)#542
carlos-alm merged 9 commits intomainfrom
fix/incremental-edge-gap-533

Conversation

@carlos-alm
Copy link
Contributor

Summary

Fixes #533 — incremental builds (watcher path) produced ~3.3% fewer edges than full builds.

The bug was in rebuildFile (used by watch mode), not the build pipeline. When a file changed, the watcher deleted all edges (incoming + outgoing) but only rebuilt the changed file's own outgoing edges. Incoming edges from other files were permanently lost.

Root causes fixed:

  • No reverse-dep cascade: files importing the changed file never had their outgoing edges rebuilt after node IDs changed. Added findReverseDeps + two-pass rebuild (direct edges first, then barrel resolution)
  • Missing child nodes: insertFileNodes skipped def.children (parameters, properties), losing contains/parameter_of edges
  • Missing containment edges: file→symbol and dir→file contains edges were never created
  • Missing ancillary table cleanup: FK references in function_complexity, cfg_blocks, etc. caused SQLITE_CONSTRAINT_FOREIGNKEY on node deletion
  • No barrel resolution: import edges through re-export chains were not resolved

Test plan

  • New watcher-rebuild.test.js — exercises rebuildFile directly against deep-deps fixture, verifies identical nodes/edges vs full build
  • New incr-edge-gap.test.js — build pipeline parity test (leaf + mid-level file touch)
  • New deep-deps-project fixture — 9 files with barrel re-exports and multi-level deps
  • Existing incremental-parity.test.js passes (barrel-project fixture)
  • Full test suite: 2006 tests passed, 0 failures

rebuildFile (used by watch mode) deleted all edges for a changed file
but only rebuilt the file's own outgoing edges — incoming edges from
other files were lost. This produced ~3.3% fewer edges than a full build.

Root causes fixed:
- No reverse-dep cascade: files importing the changed file never had
  their outgoing edges rebuilt after the changed file's node IDs changed.
  Added findReverseDeps + two-pass rebuild (direct edges first, then
  barrel resolution) to match the build pipeline's behavior.
- Missing child nodes: insertFileNodes skipped def.children (parameters,
  properties), losing contains/parameter_of edges.
- Missing containment edges: file→symbol and dir→file contains edges
  were never created by the watcher path.
- Missing ancillary table cleanup: function_complexity, cfg_blocks, etc.
  had FK references to old nodes, causing SQLITE_CONSTRAINT_FOREIGNKEY
  on node deletion. Added purgeAncillaryData before node deletion.
- No barrel resolution: import edges through re-export chains (barrel
  files) were not resolved, losing transitive import edges.
@claude
Copy link

claude bot commented Mar 19, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Review Progress

  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Review
  • Phase 4: Code Quality Analysis
  • Phase 5: Backlog Compliance Assessment
  • Final Assessment & Recommendation

Root Cause Analysis ✅

Problem Being Solved: Real issue — incremental builds (watcher mode) produced ~3.3% fewer edges than full builds. The bug was specifically in the rebuildFile function used by watch mode.

Why This Approach: The fix correctly addresses each root cause systematically with proper cascade handling, ancillary table cleanup, and barrel resolution.

Risk Assessment: ✅ LOW RISK - This is clearly fixing fundamental correctness bugs, not hiding symptoms.

Backlog Compliance ✅

  • Zero-dep: ✅ Uses only existing tree-sitter AST + SQLite infrastructure
  • Foundation-aligned: ✅ Improves always-current graph accuracy (P1), maintains zero-cost core (P4), embeddable-first design (P5)
  • Problem-fit: 5/5 — Directly prevents "missed blast radius" — the core problem codegraph exists to solve
  • Breaking: No — Purely additive edge creation, identical API
  • Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a high-quality bug fix that addresses fundamental accuracy issues in the incremental build pipeline.

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning:
    • Fixes a real accuracy gap (3.3% edge loss) in incremental builds
    • Comprehensive solution addressing all root causes systematically
    • Excellent test coverage with rigorous parity verification
    • High code quality with proper error handling and documentation
    • Zero-dep, foundation-aligned, directly solves the core blast-radius problem
    • No configuration relaxation or test weakening detected

This PR exemplifies the type of fundamental correctness improvement that strengthens codegraph's reliability for AI agents and users. The two-pass rebuild algorithm with reverse-dep cascade is exactly the kind of systematic fix needed for a dependency graph tool.

Recommendation: Merge immediately.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR fixes a longstanding ~3.3% edge gap between watcher (incremental) builds and full builds by overhauling rebuildFile in incremental.js with five targeted corrections: reverse-dep cascade (files that import the changed file have their outgoing edges rebuilt), child-node insertion (def.children parameters/properties), containment edge creation (file→symbol, dir→file), ancillary table cleanup before node deletion (function_complexity, cfg_blocks, etc.), and barrel re-export resolution. A new deep-deps-project fixture with 9 files and barrel re-exports, plus two new integration test files, gives high confidence the fix is correct.

  • Reverse-dep cascade (findReverseDeps + two-pass rebuild) correctly handles node ID changes when a file is rebuilt; pass 1 creates reexports edges so pass 2 can resolve barrel chains that depend on them
  • typeMap coercion from native-engine array format to Map (fixing a TypeError) is a clean, narrowly-scoped fix
  • purgeAncillaryData now correctly guards against FK violations on stmts.deleteNodes, and the tryExec helper now only swallows no such table errors (prior review feedback addressed)
  • Prepared-statement caching for getRevDepStmts and getBarrelStmts follows the lazy-singleton pattern (prior review feedback addressed)
  • Minor: purgeAncillaryData still creates 6 fresh db.prepare() calls on every rebuildFile invocation — unlike the barrel/rev-dep caches, these are not cached at module scope
  • Minor: The second test case in incr-edge-gap.test.js (mid-level file touch) is missing the raw edge-count assertion that the first test case includes, leaving a small gap for duplicate-edge regressions to pass silently

Confidence Score: 4/5

  • Safe to merge — correctness logic is sound, prior critical feedback was addressed, and the two-pass cascade design is well-tested.
  • The core algorithm (find-before-purge, two-pass reverse-dep cascade, no such table-only catch) is correct and the full test suite (2006 passing) validates parity. One point deducted for the purgeAncillaryData prepared-statement allocation on every call (a minor but addressable hot-path overhead) and the missing edge-count assertion in the second integration test.
  • src/domain/graph/builder/incremental.js — purgeAncillaryData statement caching; tests/integration/incr-edge-gap.test.js — missing length assertion in second test case.

Important Files Changed

Filename Overview
src/domain/graph/builder/incremental.js Core fix for the watcher edge gap — adds reverse-dep cascade, containment edges, child-node insertion, barrel resolution, and ancillary table cleanup. Logic is sound. Minor: purgeAncillaryData recreates 6 prepared statements on every call, unlike the barrel/rev-dep statement caches introduced in the same PR.
tests/integration/watcher-rebuild.test.js New integration test that exercises rebuildFile directly against the deep-deps fixture and asserts identical nodes/edges vs a full build. Well-structured with beforeAll/afterAll isolation and detailed diagnostic output on failure.
tests/integration/incr-edge-gap.test.js Build-pipeline parity test for leaf and mid-level file touches. The second test case is missing a raw edge-count assertion (unlike the first), which could allow duplicate-edge regressions to pass silently.
tests/fixtures/deep-deps-project/shared/constants.js New fixture file — leaf node with constants and a utility function, correctly serving as the deeply-imported file that triggers the reverse-dep cascade in the tests.
tests/benchmarks/resolution/fixtures/typescript/index.ts One-line benchmark fixture change: adds the type modifier to a UserService import to make it a type-only import. Straightforward and unrelated to the core fix.

Sequence Diagram

sequenceDiagram
    participant W as watcher.js
    participant R as rebuildFile()
    participant DB as SQLite DB
    participant P as parseFileIncremental()

    W->>R: rebuildFile(db, rootDir, filePath, stmts)
    R->>DB: findReverseDeps(relPath)
    DB-->>R: [depFile1, depFile2, ...]
    R->>DB: purgeAncillaryData(relPath)
    R->>DB: deleteEdgesForFile(relPath)
    R->>DB: deleteNodes(relPath)
    R->>P: parseFileIncremental(filePath)
    P-->>R: symbols
    R->>DB: insertFileNodes(defs + children + exports)
    R->>DB: buildContainmentEdges(file→def, def→child)
    R->>DB: rebuildDirContainment(dir→file)
    R->>DB: buildImportEdges + resolveBarrelImportEdges
    R->>DB: buildCallEdges

    loop for each depFile in reverseDeps
        R->>P: parseReverseDep(depFile)
        P-->>R: symbols_
        R->>DB: deleteOutgoingEdges(depFile)
    end

    loop Pass 1 — direct edges, no barrel
        R->>DB: rebuildReverseDepEdges(depFile, skipBarrel=true)
    end

    loop Pass 2 — barrel import edges
        R->>DB: resolveBarrelImportEdges per import
    end

    R-->>W: result object with edgesAdded
Loading

Comments Outside Diff (2)

  1. src/domain/graph/builder/incremental.js, line 152-157 (link)

    purgeAncillaryData creates six prepared statements on every call

    Unlike getRevDepStmts and getBarrelStmts (which lazily cache their statements at module scope), purgeAncillaryData calls db.prepare(sql) six times on every rebuildFile invocation. In heavy watch-mode usage — where the same DB instance persists for the lifetime of the watcher process — these statements are recreated needlessly on every file-change event. Preparing a statement re-parses and compiles the SQL each time, which is exactly what the caching pattern elsewhere was introduced to avoid.

    The fix follows the same pattern as the barrel/rev-dep caches:

    let _purgeDb = null;
    const _purgeStmts = {};
    
    function getPurgeStmts(db) {
      if (_purgeDb !== db) {
        _purgeDb = db;
        _purgeStmts.funcComplexity = db.prepare(
          'DELETE FROM function_complexity WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
        );
        _purgeStmts.nodeMetrics = db.prepare(
          'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
        );
        // ... etc.
      }
      return _purgeStmts;
    }

    Then tryExec calls getPurgeStmts(db).funcComplexity.run(relPath) etc. The no such table guard on .run() still applies.

  2. tests/integration/incr-edge-gap.test.js, line 633-637 (link)

    Missing edge count assertion in the mid-level file test

    The first test (touching the leaf file) asserts missing, extra, and raw edge counts:

    expect(missing).toEqual([]);
    expect(extra).toEqual([]);
    expect(incrGraph.edges.length).toBe(fullGraph.edges.length); // ← explicit count check

    This second test checks only missing and extra. Because these are built from Set keys, duplicate edges collapse — so a scenario where the incremental build produces duplicate edges (possible with bare INSERT INTO edges without OR IGNORE) would yield equal missing/extra sets while still having a different raw count. The total-length assertion in the first test catches this; the second test misses it.

Last reviewed commit: "Merge branch 'main' ..."

Comment on lines +143 to +175
// ── Ancillary table cleanup ────────────────────────────────────────────

function purgeAncillaryData(db, relPath) {
const tryExec = (sql, ...args) => {
try {
db.prepare(sql).run(...args);
} catch {
/* table may not exist */
}
};
tryExec(
'DELETE FROM function_complexity WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
relPath,
);
tryExec(
'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
relPath,
);
tryExec(
'DELETE FROM cfg_edges WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)',
relPath,
);
tryExec(
'DELETE FROM cfg_blocks WHERE function_node_id IN (SELECT id FROM nodes WHERE file = ?)',
relPath,
);
tryExec(
'DELETE FROM dataflow WHERE source_id IN (SELECT id FROM nodes WHERE file = ?) OR target_id IN (SELECT id FROM nodes WHERE file = ?)',
relPath,
relPath,
);
tryExec('DELETE FROM ast_nodes WHERE file = ?', relPath);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 purgeAncillaryData silently swallows all exceptions

The bare catch {} in tryExec is commented as /* table may not exist */, but it catches every exception — not just SQLite's "no such table" error. If any DELETE fails for a genuine reason (e.g., a locked database, a schema inconsistency, or a bug in the WHERE clause), the error is silently discarded. The subsequent stmts.deleteNodes.run(relPath) on line 382 would then fail with SQLITE_CONSTRAINT_FOREIGNKEY because the ancillary rows (which still reference those nodes' IDs) were never cleaned up. This is especially risky because the entire function exists specifically to prevent FK violations.

A more targeted guard would only ignore "no such table" errors:

const tryExec = (sql, ...args) => {
  try {
    db.prepare(sql).run(...args);
  } catch (err) {
    if (!err?.message?.includes('no such table')) throw err;
  }
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — \ now only catches errors containing 'no such table' and re-throws all other exceptions. This prevents masking genuine write failures while still handling the optional-table case.

Comment on lines +179 to +219
function isBarrelFile(db, relPath) {
const reexportCount = db
.prepare(
`SELECT COUNT(*) as c FROM edges e
JOIN nodes n ON e.source_id = n.id
WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'`,
)
.get(relPath)?.c;
return (reexportCount || 0) > 0;
}

function resolveBarrelTarget(db, barrelPath, symbolName, visited = new Set()) {
if (visited.has(barrelPath)) return null;
visited.add(barrelPath);

// Find re-export targets from this barrel
const reexportTargets = db
.prepare(
`SELECT DISTINCT n2.file FROM edges e
JOIN nodes n1 ON e.source_id = n1.id
JOIN nodes n2 ON e.target_id = n2.id
WHERE e.kind = 'reexports' AND n1.file = ? AND n1.kind = 'file'`,
)
.all(barrelPath);

for (const { file: targetFile } of reexportTargets) {
// Check if the symbol is defined in this target file
const hasDef = db
.prepare(
`SELECT 1 FROM nodes WHERE name = ? AND file = ? AND kind != 'file' AND kind != 'directory' LIMIT 1`,
)
.get(symbolName, targetFile);
if (hasDef) return targetFile;

// Recurse through barrel chains
if (isBarrelFile(db, targetFile)) {
const deeper = resolveBarrelTarget(db, targetFile, symbolName, visited);
if (deeper) return deeper;
}
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prepared statements allocated inside hot-loop functions

Both isBarrelFile and resolveBarrelTarget call db.prepare(...) on every invocation. These functions are called in a tight loop — once per imported name, for every import from a barrel file, and resolveBarrelTarget recurses through barrel chains. Allocating a new prepared statement on each call negates the performance benefit of prepared statements (the whole point of which is to parse the SQL once and reuse the plan).

The queries should be prepared once at module scope (or lazily on first call, cached in a closure) and reused:

// At module scope
let _isBarrelStmt = null;
let _reexportTargetsStmt = null;
let _hasDefStmt = null;

function getIsBarrelStmt(db) {
  return (_isBarrelStmt ??= db.prepare(
    `SELECT COUNT(*) as c FROM edges e
     JOIN nodes n ON e.source_id = n.id
     WHERE e.kind = 'reexports' AND n.file = ? AND n.kind = 'file'`
  ));
}

For a project with dozens of barrel imports this is a minor overhead, but for large repos with many barrel files the cost can add up across the full reverse-dep cascade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — barrel resolution prepared statements (isBarrelStmt, reexportTargetsStmt, hasDefStmt) are now lazily cached at module scope via getBarrelStmts(db) and reused across all invocations for the same database instance.

- purgeAncillaryData: only catch "no such table" errors instead of
  swallowing all exceptions (P1 from Greptile)
- Cache barrel resolution prepared statements to avoid re-preparing
  inside hot loops (P2 from Greptile)
- Fix stale @param _db JSDoc tag
- Prefix unused db param with underscore in rebuildDirContainment
@carlos-alm
Copy link
Contributor Author

Addressed all Greptile feedback:

  1. P1 — purgeAncillaryData catch-all: Now only catches 'no such table' errors; re-throws everything else.
  2. P2 — Prepared statements in hot loops: Barrel resolution stmts are now lazily cached at module scope via getBarrelStmts(db).
  3. P2 — Stale JSDoc: Fixed @param _db@param db.
  4. Lint/format: Fixed all Biome issues in source, fixtures, and tests.

All 6 integration tests pass, lint is green.

@greptileai

Comment on lines +443 to +447
for (const depRelPath of reverseDeps) {
deleteOutgoingEdges(db, depRelPath);
const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache);
if (symbols_) depSymbols.set(depRelPath, symbols_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Reverse-dep edges permanently deleted when parse fails

deleteOutgoingEdges is called unconditionally for every reverse-dep file before attempting to parse it. If parseReverseDep returns null (file unreadable, parse error, or file already deleted), no entry is placed in depSymbols. Pass 1 and Pass 2 then skip the file entirely, meaning its outgoing edges were wiped and are never rebuilt — leaving the graph in a permanently inconsistent state until the next full build.

The fix is to parse first and only delete edges for files that successfully parsed:

// Parse BEFORE deleting so a failed parse doesn't orphan the file
const depSymbols = new Map();
for (const depRelPath of reverseDeps) {
  const symbols_ = await parseReverseDep(rootDir, depRelPath, engineOpts, cache);
  if (symbols_) depSymbols.set(depRelPath, symbols_);
}
// Now it's safe to delete — every file in depSymbols will be rebuilt
for (const [depRelPath] of depSymbols) {
  deleteOutgoingEdges(db, depRelPath);
}

This preserves the invariant that edges are only deleted when they will be immediately rebuilt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bdf3f77 — moved deleteOutgoingEdges after parseReverseDep succeeds. If parsing returns null, edges are now preserved instead of permanently lost. Also fixed a TypeError: typeMap.get is not a function in the same commit: the native engine returns typeMap as a plain array, which was not being coerced to a Map for JS files after the TS-only backfill restriction.

The native engine returns typeMap as a plain array, not a Map. After the
TS-only backfill restriction (8e78e62), JS files no longer get their
native typeMap converted to a Map, causing a TypeError on .get() during
incremental reverse-dep edge rebuilds.

Also moves deleteOutgoingEdges after parseReverseDep succeeds, preventing
permanent edge loss when a reverse-dep file fails to parse.
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm
Copy link
Contributor Author

Addressed remaining Greptile summary observations:

  1. Stmt re-preparation inconsistency (11f4f0f): findReverseDeps and deleteOutgoingEdges now use lazily-cached prepared statements via getRevDepStmts(db), consistent with the barrel-stmt optimization.

  2. Pass 2 barrel-resolution duplication (11f4f0f): Extracted the shared barrel import resolution logic into resolveBarrelImportEdges() — both buildImportEdges and the Pass 2 reverse-dep cascade now call the same helper, eliminating the ~25-line duplication.

  3. Weak edge assertion in watcher-rebuild test (4cf64d8): readGraph now includes n1.file AS src_file and n2.file AS tgt_file in the edge query, so same-named nodes in different files are distinguished. Also wrapped in try/finally for db.close() safety.

All 6 integration tests pass. Lint clean.

@greptileai

@carlos-alm carlos-alm merged commit 15af91f into main Mar 20, 2026
15 checks passed
@carlos-alm carlos-alm deleted the fix/incremental-edge-gap-533 branch March 20, 2026 06:42
@github-actions github-actions bot locked and limited conversation to collaborators Mar 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: incremental builds produce fewer edges than full builds (3.3% gap)

1 participant