Skip to content

bug: incremental rebuild leaks ~249 duplicate edges per run #979

@carlos-alm

Description

@carlos-alm

Found during dogfooding v3.9.4

Severity: Critical
Command: codegraph build

Every incremental rebuild after a content change inserts a fresh batch of duplicate edges into the DB without removing the prior copies. The DB grows unbounded with each rebuild, and queries over it return inflated fan-in / fan-out / caller / callee counts.

Reproduction

# In the codegraph source repo:
rm -rf .codegraph
npx codegraph build .
# Baseline: 17278 nodes, 36325 edges

# Mutate a file and rebuild twice incrementally:
echo "// probe-1" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 36536 edges  (+211)

echo "// probe-2" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 36785 edges  (+249)

echo "// probe-3" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 37034 edges  (+249)

Node count is stable at 17278 across all rebuilds — only edges leak. A full rebuild (rm -rf .codegraph && codegraph build .) restores the baseline 36325 edges.

The oneFileRebuildMs tier of scripts/benchmark.js (which runs 3 incrementals on queries.ts) reproduces the pattern: baseline 36325 → 36536 → 36785 → 37034 → 37283 across the benchmark's rebuild passes.

Sample duplicated edges after one incremental rebuild

After one rebuild, the duplicate-edge signature count jumps from 29 (legit multi-site edges) to 276. Inspection shows most of the new duplicates are edges sourced from files other than the file that was modified — e.g. src/domain/parser.ts emits its edges multiple times even though only src/domain/queries.ts was touched.

SELECT n1.file, n2.file, e.kind, COUNT(*) AS n
FROM edges e
JOIN nodes n1 ON n1.id = e.source_id
JOIN nodes n2 ON n2.id = e.target_id
GROUP BY e.source_id, e.target_id, e.kind
HAVING n > 1
ORDER BY n DESC;

Expected behavior

Incremental rebuild of one changed file should only touch that file's edges. Edge count should be stable across repeated incrementals against the same changeset.

Actual behavior

~249 duplicate edges are inserted every incremental rebuild, sourced from files that were not changed. The duplicates persist across subsequent rebuilds.

Root cause

Not fully diagnosed. The incremental flow appears to re-insert edges for files adjacent to the changed file (importers? same-directory resolution targets?) without de-duplicating against the existing row set. Likely in the native engine's edge-persistence path (codegraph-core Rust crate) given that native is the default engine and the benchmark script exercises it.

Suggested fix

Either:

  1. Delete edges sourced from the changed file (and any dependent files the pipeline decides to re-run) before re-inserting them, or
  2. Use INSERT OR IGNORE with a unique index on (source_id, target_id, kind), and audit whether confidence / dynamic columns should be part of the key.

Option 1 is more correct since edge properties like confidence can change between parses and silently keeping the old value is a stale-data bug.

Impact

All fn-impact, fn-deps, stats, triage, map, roles results are wrong after any incremental rebuild. The longer a dev session runs without a full rebuild, the more inflated the numbers become. Watch-mode users are the most exposed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions