Skip to content

fix(incremental): prevent duplicate edges on rebuild (#979)#998

Merged
carlos-alm merged 5 commits intomainfrom
fix/incremental-edge-duplication-979
Apr 22, 2026
Merged

fix(incremental): prevent duplicate edges on rebuild (#979)#998
carlos-alm merged 5 commits intomainfrom
fix/incremental-edge-duplication-979

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Stage 6b (reparse_barrel_candidates in crates/codegraph-core/src/build_pipeline.rs) only purged imports and reexports edges before Stage 7 re-emitted every edge kind, so every incremental rebuild appended duplicate calls, receiver, extends, implements, imports-type, and dynamic-imports edges for any hybrid barrel file picked up via reverse-dep expansion.
  • Extended the scoped DELETE to cover all 8 edge kinds that Stage 7 re-emits. contains and parameter_of are intentionally excluded because Stage 5 (insert_nodes) does not re-run for barrel candidates merged in at Stage 6b — wiping them would permanently drop those edges.
  • Added a regression test and fixture that exercises a hybrid barrel (reexports AND local definitions that call helpers).

Root cause

DELETE FROM edges WHERE source_id IN (... file = ?1) AND kind IN ('imports', 'reexports') was too narrow. Stage 7 re-emits 8 kinds for re-parsed files, so the other 6 kinds leaked duplicates on every incremental rebuild (~250 new edges per run in real repos).

Before / After

Run Main (#979) This PR
Full build 28487 28487
Incr. #1 28699 28445
Incr. #2 28949 28445
Incr. #3 29199 28445

Stable edge counts across consecutive incremental rebuilds; no new duplicates introduced beyond the pre-existing full-build baseline.

Test plan

  • New regression test tests/integration/incremental-edge-duplication.test.ts passes (fails on main to prove it catches the bug).
  • cargo test --lib in crates/codegraph-core: 176 passed.
  • npm run lint:fix: clean.
  • Integration suite for the rest of the repo unaffected.

Fixes #979

Stage 6b (reparse_barrel_candidates) purged only 'imports' and 'reexports'
edges for re-parsed barrel files before Stage 7 re-emitted every edge kind.
As a result, every incremental rebuild appended duplicate 'calls',
'receiver', 'extends', 'implements', 'imports-type', and 'dynamic-imports'
edges for any hybrid barrel file picked up via reverse-dep expansion,
causing the edge count to grow by ~250 per rebuild.

Extend the scoped DELETE to cover all 8 edge kinds that Stage 7 re-emits.
'contains' and 'parameter_of' are intentionally excluded because Stage 5
(insert_nodes) only runs on the original changed + reverse-dep file set,
not on barrel candidates that are merged in here after Stage 5 — wiping
them would permanently drop those edges.

Add a regression test (tests/integration/incremental-edge-duplication.test.ts)
that runs three consecutive incremental rebuilds against a hybrid-barrel
fixture (reexports AND local definitions that call helpers) and asserts
the edge count is stable and no new duplicates are introduced beyond the
pre-existing full-build baseline.

Before: 28487 -> 28699 -> 28949 -> 29199 (growing ~250/rebuild)
After:  28487 -> 28445 -> 28445 -> 28445 (stable)

Fixes #979

Impact: 7 functions changed, 5 affected
@claude
Copy link
Copy Markdown

claude Bot commented Apr 21, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Fix Incremental Edge Duplication

Reviewing PR #979 for incremental edge duplication fix. Following the 5-phase review process:

Todo List

  • Phase 1: Root cause analysis - understand the problem and approach
  • Phase 2: Configuration change detection - check for rule relaxations
  • Phase 3: Test integrity - verify test quality and assertions
  • Phase 4: Code quality - review implementation details
  • Phase 5: Backlog compliance - rate against zero-dep, foundation-aligned, problem-fit, breaking criteria
  • Final Assessment: Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Stage 6b (reparse_barrel_candidates) in the Rust native engine was only purging imports and reexports edges before Stage 7 re-emitted all 8 edge kinds, causing incremental rebuilds to leak duplicate edges for calls, receiver, extends, implements, imports-type, and dynamic-imports for any hybrid barrel file picked up via reverse-dep expansion.

Why This Approach: The fix extends the scoped DELETE to cover all 8 edge kinds that Stage 7 re-emits while intentionally excluding contains and parameter_of since Stage 5 (insert_nodes) doesn't re-run for barrel candidates merged at Stage 6b. This is the correct approach - surgical targeting of the exact edge kinds being re-emitted.

Risk Assessment: Low risk. The change is well-isolated to the specific issue, maintains the same exclusion logic for edges that shouldn't be purged, and includes comprehensive regression testing.

Phase 2: Configuration Change Detection ✅

No configuration files modified. No lint rules relaxed, no test timeouts increased, no build configuration changes. This is purely a bug fix in application logic.

Phase 3: Test Integrity ✅

Regression Test Quality: Excellent. The test at tests/integration/incremental-edge-duplication.test.ts:

  • Creates a proper hybrid barrel scenario (re-exports AND local function calls)
  • Tests the exact failure mode (3 consecutive incremental rebuilds)
  • Asserts both edge count stability AND that no new duplicates are introduced beyond baseline
  • Uses isolated temporary directories to avoid cross-contamination
  • 60s timeout is appropriate for integration test with multiple builds

No Test Weakening: All assertions are strict equality/bounds checks. No existing tests were modified.

Phase 4: Code Quality ✅

Implementation Analysis (build_pipeline.rs:667-671):

✅ Correct Edge Kind Coverage: The DELETE now covers exactly the 8 kinds Stage 7 re-emits:

  • Old: 'imports', 'reexports' (2/8 kinds)
  • New: 'imports', 'imports-type', 'dynamic-imports', 'reexports', 'calls', 'receiver', 'extends', 'implements' (8/8 kinds)

✅ Proper Exclusions: Correctly excludes contains and parameter_of since Stage 5 doesn't re-run for barrel candidates.

✅ Clear Documentation: Excellent inline comments explaining the reasoning and which edges are intentionally excluded and why.

✅ SQL Safety: Parameterized query prevents injection. The subquery pattern matches existing codebase patterns.

Phase 5: Backlog Compliance ✅

Zero-dep: ✅ No new runtime dependencies added
Foundation-aligned: ✅ Fixes data accuracy without changing core architecture
Problem-fit: 5/5 Directly prevents edge count inflation that would impact AI agent token usage
Breaking: No - Internal fix with no API or output changes

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Root Cause Analysis

Problem Being Solved: Incremental rebuilds leaked duplicate edges due to insufficient edge purging in Stage 6b barrel re-parsing
Why This Approach: Surgical expansion of DELETE scope to match Stage 7's re-emission behavior
Risk Assessment: Low risk - isolated fix with comprehensive regression testing

Backlog Compliance

  • Zero-dep: ✅ — no new dependencies
  • Foundation-aligned: ✅ — improves data accuracy without architectural changes
  • Problem-fit: 5 — directly prevents token waste from inflated edge counts
  • Breaking: No
  • Tier: 1

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: Exemplary bug fix - precise root cause identification, surgical fix, excellent test coverage, and clear documentation. The fix directly addresses edge count inflation that impacts AI agent performance.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR fixes incremental rebuild edge duplication (#979) by broadening Stage 6b's DELETE from a positive allowlist of two edge kinds to a negative filter (NOT IN ('contains', 'parameter_of')), ensuring all edge kinds re-emitted by Stage 7 are purged before barrel candidates are re-parsed. A new regression test and hybrid-barrel fixture validate three invariants: count stability across consecutive rebuilds, no new duplicates beyond a fresh full build, and total parity with a clean full build after all mutations.

Confidence Score: 5/5

Safe to merge — core fix is correct and well-tested; only finding is a minor test-coverage gap.

The Rust change is a targeted, well-reasoned one-liner that addresses previous review feedback (negative filter over allowlist). All three prior P0/P1 concerns from earlier rounds are resolved. The single remaining comment is a P2 style suggestion to extend Invariant 2 to all three rebuilds; Invariants 1 and 3 together already make the practical risk negligible.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/build_pipeline.rs Switches Stage 6b DELETE from a brittle positive allowlist to a safe negative filter (NOT IN ('contains', 'parameter_of')), preventing duplicate edge accumulation for hybrid barrels on incremental rebuilds. Comment updated to accurately describe hybrid barrel behaviour.
tests/integration/incremental-edge-duplication.test.ts Solid regression test with three complementary invariants. Minor gap: Invariant 2 only checks history[2].duplicates; intermediate rebuilds are not verified for duplicate spikes.
tests/fixtures/issue-979-hybrid-barrel/core/index.js Purpose-built hybrid barrel fixture: simultaneously re-exports doubleValue from helpers and imports + uses both helpers in local function bodies, faithfully exercising the bug scenario.
tests/fixtures/issue-979-hybrid-barrel/app.js Simple entry-point fixture that imports from driver.js, providing the reverse-dep chain needed to trigger barrel candidate re-parse.
tests/fixtures/issue-979-hybrid-barrel/consumers/driver.js Consumer fixture that imports from the hybrid barrel; the test mutates this file to trigger incremental rebuilds.
tests/fixtures/issue-979-hybrid-barrel/core/helpers.js Helper utilities exported and called by the hybrid barrel, completing the fixture's call-edge coverage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    S5["Stage 5: insert_nodes\n(runs on changed + reverse-deps)\nEmits: contains, parameter_of"]
    S6b["Stage 6b: reparse_barrel_candidates\nIdentifies hybrid barrel files\nvia reverse-dep expansion"]
    DEL["DELETE FROM edges\nWHERE source_id IN file nodes\nAND kind NOT IN\n('contains','parameter_of')\n✅ Clears: calls, receiver, extends,\nimplements, imports, reexports,\nimports-type, dynamic-imports"]
    KEEP["Preserved edges:\ncontains, parameter_of\n(emitted by Stage 5, not re-run\nfor barrel candidates)"]
    S7["Stage 7: insert_edges\nRe-emits ALL outgoing edge kinds\nfor re-parsed barrel files"]
    STABLE["Stable edge count\nacross incremental rebuilds"]

    S5 --> S6b
    S6b --> DEL
    DEL --> KEEP
    DEL --> S7
    S7 --> STABLE

    style DEL fill:#d4edda,stroke:#28a745
    style KEEP fill:#fff3cd,stroke:#ffc107
    style STABLE fill:#cce5ff,stroke:#004085
Loading

Fix All in Claude Code

Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile

Comment on lines +668 to +670
"DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \
AND kind IN ('imports', 'reexports')",
AND kind IN ('imports', 'imports-type', 'dynamic-imports', 'reexports', \
'calls', 'receiver', 'extends', 'implements')",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Positive allowlist is brittle — prefer a negative filter

The IN (...) list must be kept in sync with every edge kind Stage 7 emits. If a new kind (e.g. type-reference, uses) is added to Stage 7 later, it will silently accumulate duplicates again for barrel candidates, reproducing the exact class of bug this PR fixes.

Since only two kinds must be preserved (contains and parameter_of), inverting the filter is safer and self-documenting:

Suggested change
"DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \
AND kind IN ('imports', 'reexports')",
AND kind IN ('imports', 'imports-type', 'dynamic-imports', 'reexports', \
'calls', 'receiver', 'extends', 'implements')",
"DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \
AND kind NOT IN ('contains', 'parameter_of')",

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — switched Stage 6b's DELETE to a negative filter (AND kind NOT IN ('contains', 'parameter_of')) in 19f7651 so any future Stage 7 edge kind is automatically purged on barrel re-parse. The two preserved kinds are the only ones emitted by Stage 5 that must survive, so this is both safer and self-documenting.

Comment on lines +82 to +88
// Invariant 1: incremental edge count must not grow across rebuilds.
expect(history[1].total).toBe(history[0].total);
expect(history[2].total).toBe(history[0].total);

// Invariant 2: incremental must not introduce new duplicates beyond the
// pre-existing duplicates present in a clean full build.
expect(history[2].duplicates).toBeLessThanOrEqual(freshFull.duplicates);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test does not assert total-count parity with a fresh full build

Invariant 1 proves the count stabilises across rebuild 2 and 3 relative to rebuild 1, but does not assert that rebuild 1 (history[0]) produces the same total edge count as a clean full build over the same code state. A scenario where the first incremental rebuild leaves stale edges that happen to not be flagged as (source, target, kind) duplicates (e.g. edges pointing at a stale node id) would pass both invariants. Adding the following assertion would close the gap:

// After applying all 3 bumps, both dirs have the same code — edge totals should match.
expect(history[2].total).toBe(freshFull.total);

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 0093a02 as Invariant 3: expect(history[2].total).toBe(freshFull.total). This closes the gap where stale edges pointing at orphaned node ids would slip past the (source, target, kind) duplicate check. Test still passes locally.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Codegraph Impact Analysis

7 functions changed5 callers affected across 4 files

  • reparse_barrel_candidates in crates/codegraph-core/src/build_pipeline.rs:586 (1 transitive callers)
  • main in tests/fixtures/issue-979-hybrid-barrel/app.js:3 (0 transitive callers)
  • run in tests/fixtures/issue-979-hybrid-barrel/consumers/driver.js:3 (1 transitive callers)
  • clampValue in tests/fixtures/issue-979-hybrid-barrel/core/helpers.js:1 (3 transitive callers)
  • doubleValue in tests/fixtures/issue-979-hybrid-barrel/core/helpers.js:7 (3 transitive callers)
  • processValue in tests/fixtures/issue-979-hybrid-barrel/core/index.js:6 (3 transitive callers)
  • processAll in tests/fixtures/issue-979-hybrid-barrel/core/index.js:10 (2 transitive callers)

…998)

Replace the positive allowlist in Stage 6b's barrel-candidate DELETE with
a `NOT IN ('contains', 'parameter_of')` filter so any future edge kind
added to Stage 7 is automatically purged on re-parse. Previously, adding
a new kind to Stage 7 would silently reintroduce the duplicate-edge
accumulation bug this PR fixes (#979).

Also refresh the adjacent comment: "barrel files are re-export-only" was
contradicted by the hybrid-barrel fixture this PR adds.

Impact: 1 functions changed, 1 affected
…uild (#998)

Add Invariant 3 to the #979 regression test: after the 3 incremental
bumps, the edge total must equal a clean full build over the same code
state. The previous invariants only asserted stability across rebuilds
and a duplicate-free (source, target, kind) set, which would miss stale
edges pointing at orphaned node ids that escape the scoped DELETE.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai Thanks for the thorough review. All three findings addressed:

  1. Positive allowlist (build_pipeline.rs:668) — switched to AND kind NOT IN ('contains', 'parameter_of') in 19f7651. Future Stage 7 edge kinds are now automatically purged.
  2. Stale re-export-only comment (build_pipeline.rs:644) — rewritten in 19f7651 to describe hybrid barrels (reexports AND local call sites) and clarify why dataflow/AST analysis is still skipped.
  3. Test parity gap (incremental-edge-duplication.test.ts) — added Invariant 3: expect(history[2].total).toBe(freshFull.total) in 0093a02 so stale edges pointing at orphaned node ids can't slip through.

npx vitest run tests/integration (565 passed) and cargo test --lib (176 passed) locally. Ready for re-review.

@carlos-alm carlos-alm merged commit b075f5a into main Apr 22, 2026
27 checks passed
@carlos-alm carlos-alm deleted the fix/incremental-edge-duplication-979 branch April 22, 2026 07:50
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: incremental rebuild leaks ~249 duplicate edges per run

1 participant