Found during dogfooding v3.9.4
Severity: Critical
Command: codegraph build
Every incremental rebuild after a content change inserts a fresh batch of duplicate edges into the DB without removing the prior copies. The DB grows unbounded with each rebuild, and queries over it return inflated fan-in / fan-out / caller / callee counts.
Reproduction
# In the codegraph source repo:
rm -rf .codegraph
npx codegraph build .
# Baseline: 17278 nodes, 36325 edges
# Mutate a file and rebuild twice incrementally:
echo "// probe-1" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 36536 edges (+211)
echo "// probe-2" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 36785 edges (+249)
echo "// probe-3" >> src/domain/queries.ts
npx codegraph build .
# 17278 nodes, 37034 edges (+249)
Node count is stable at 17278 across all rebuilds — only edges leak. A full rebuild (rm -rf .codegraph && codegraph build .) restores the baseline 36325 edges.
The oneFileRebuildMs tier of scripts/benchmark.js (which runs 3 incrementals on queries.ts) reproduces the pattern: baseline 36325 → 36536 → 36785 → 37034 → 37283 across the benchmark's rebuild passes.
Sample duplicated edges after one incremental rebuild
After one rebuild, the duplicate-edge signature count jumps from 29 (legit multi-site edges) to 276. Inspection shows most of the new duplicates are edges sourced from files other than the file that was modified — e.g. src/domain/parser.ts emits its edges multiple times even though only src/domain/queries.ts was touched.
SELECT n1.file, n2.file, e.kind, COUNT(*) AS n
FROM edges e
JOIN nodes n1 ON n1.id = e.source_id
JOIN nodes n2 ON n2.id = e.target_id
GROUP BY e.source_id, e.target_id, e.kind
HAVING n > 1
ORDER BY n DESC;
Expected behavior
Incremental rebuild of one changed file should only touch that file's edges. Edge count should be stable across repeated incrementals against the same changeset.
Actual behavior
~249 duplicate edges are inserted every incremental rebuild, sourced from files that were not changed. The duplicates persist across subsequent rebuilds.
Root cause
Not fully diagnosed. The incremental flow appears to re-insert edges for files adjacent to the changed file (importers? same-directory resolution targets?) without de-duplicating against the existing row set. Likely in the native engine's edge-persistence path (codegraph-core Rust crate) given that native is the default engine and the benchmark script exercises it.
Suggested fix
Either:
- Delete edges sourced from the changed file (and any dependent files the pipeline decides to re-run) before re-inserting them, or
- Use
INSERT OR IGNORE with a unique index on (source_id, target_id, kind), and audit whether confidence / dynamic columns should be part of the key.
Option 1 is more correct since edge properties like confidence can change between parses and silently keeping the old value is a stale-data bug.
Impact
All fn-impact, fn-deps, stats, triage, map, roles results are wrong after any incremental rebuild. The longer a dev session runs without a full rebuild, the more inflated the numbers become. Watch-mode users are the most exposed.
Found during dogfooding v3.9.4
Severity: Critical
Command:
codegraph buildEvery incremental rebuild after a content change inserts a fresh batch of duplicate edges into the DB without removing the prior copies. The DB grows unbounded with each rebuild, and queries over it return inflated fan-in / fan-out / caller / callee counts.
Reproduction
Node count is stable at 17278 across all rebuilds — only edges leak. A full rebuild (
rm -rf .codegraph && codegraph build .) restores the baseline 36325 edges.The
oneFileRebuildMstier ofscripts/benchmark.js(which runs 3 incrementals onqueries.ts) reproduces the pattern: baseline 36325 → 36536 → 36785 → 37034 → 37283 across the benchmark's rebuild passes.Sample duplicated edges after one incremental rebuild
After one rebuild, the duplicate-edge signature count jumps from 29 (legit multi-site edges) to 276. Inspection shows most of the new duplicates are edges sourced from files other than the file that was modified — e.g.
src/domain/parser.tsemits its edges multiple times even though onlysrc/domain/queries.tswas touched.Expected behavior
Incremental rebuild of one changed file should only touch that file's edges. Edge count should be stable across repeated incrementals against the same changeset.
Actual behavior
~249 duplicate edges are inserted every incremental rebuild, sourced from files that were not changed. The duplicates persist across subsequent rebuilds.
Root cause
Not fully diagnosed. The incremental flow appears to re-insert edges for files adjacent to the changed file (importers? same-directory resolution targets?) without de-duplicating against the existing row set. Likely in the native engine's edge-persistence path (
codegraph-coreRust crate) given that native is the default engine and the benchmark script exercises it.Suggested fix
Either:
INSERT OR IGNOREwith a unique index on(source_id, target_id, kind), and audit whetherconfidence/dynamiccolumns should be part of the key.Option 1 is more correct since edge properties like
confidencecan change between parses and silently keeping the old value is a stale-data bug.Impact
All
fn-impact,fn-deps,stats,triage,map,rolesresults are wrong after any incremental rebuild. The longer a dev session runs without a full rebuild, the more inflated the numbers become. Watch-mode users are the most exposed.