fix(incremental): prevent duplicate edges on rebuild (#979)#998
fix(incremental): prevent duplicate edges on rebuild (#979)#998carlos-alm merged 5 commits intomainfrom
Conversation
Stage 6b (reparse_barrel_candidates) purged only 'imports' and 'reexports' edges for re-parsed barrel files before Stage 7 re-emitted every edge kind. As a result, every incremental rebuild appended duplicate 'calls', 'receiver', 'extends', 'implements', 'imports-type', and 'dynamic-imports' edges for any hybrid barrel file picked up via reverse-dep expansion, causing the edge count to grow by ~250 per rebuild. Extend the scoped DELETE to cover all 8 edge kinds that Stage 7 re-emits. 'contains' and 'parameter_of' are intentionally excluded because Stage 5 (insert_nodes) only runs on the original changed + reverse-dep file set, not on barrel candidates that are merged in here after Stage 5 — wiping them would permanently drop those edges. Add a regression test (tests/integration/incremental-edge-duplication.test.ts) that runs three consecutive incremental rebuilds against a hybrid-barrel fixture (reexports AND local definitions that call helpers) and asserts the edge count is stable and no new duplicates are introduced beyond the pre-existing full-build baseline. Before: 28487 -> 28699 -> 28949 -> 29199 (growing ~250/rebuild) After: 28487 -> 28445 -> 28445 -> 28445 (stable) Fixes #979 Impact: 7 functions changed, 5 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Fix Incremental Edge DuplicationReviewing PR #979 for incremental edge duplication fix. Following the 5-phase review process: Todo List
Phase 1: Root Cause Analysis ✅Problem Being Solved: Stage 6b ( Why This Approach: The fix extends the scoped DELETE to cover all 8 edge kinds that Stage 7 re-emits while intentionally excluding Risk Assessment: Low risk. The change is well-isolated to the specific issue, maintains the same exclusion logic for edges that shouldn't be purged, and includes comprehensive regression testing. Phase 2: Configuration Change Detection ✅No configuration files modified. No lint rules relaxed, no test timeouts increased, no build configuration changes. This is purely a bug fix in application logic. Phase 3: Test Integrity ✅Regression Test Quality: Excellent. The test at
No Test Weakening: All assertions are strict equality/bounds checks. No existing tests were modified. Phase 4: Code Quality ✅Implementation Analysis ( ✅ Correct Edge Kind Coverage: The DELETE now covers exactly the 8 kinds Stage 7 re-emits:
✅ Proper Exclusions: Correctly excludes ✅ Clear Documentation: Excellent inline comments explaining the reasoning and which edges are intentionally excluded and why. ✅ SQL Safety: Parameterized query prevents injection. The subquery pattern matches existing codebase patterns. Phase 5: Backlog Compliance ✅Zero-dep: ✅ No new runtime dependencies added Tier: 1 (zero-dep ✓ + foundation-aligned ✓) Root Cause AnalysisProblem Being Solved: Incremental rebuilds leaked duplicate edges due to insufficient edge purging in Stage 6b barrel re-parsing Backlog Compliance
Final Recommendation
|
Greptile SummaryThis PR fixes incremental rebuild edge duplication (#979) by broadening Stage 6b's Confidence Score: 5/5Safe to merge — core fix is correct and well-tested; only finding is a minor test-coverage gap. The Rust change is a targeted, well-reasoned one-liner that addresses previous review feedback (negative filter over allowlist). All three prior P0/P1 concerns from earlier rounds are resolved. The single remaining comment is a P2 style suggestion to extend Invariant 2 to all three rebuilds; Invariants 1 and 3 together already make the practical risk negligible. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
S5["Stage 5: insert_nodes\n(runs on changed + reverse-deps)\nEmits: contains, parameter_of"]
S6b["Stage 6b: reparse_barrel_candidates\nIdentifies hybrid barrel files\nvia reverse-dep expansion"]
DEL["DELETE FROM edges\nWHERE source_id IN file nodes\nAND kind NOT IN\n('contains','parameter_of')\n✅ Clears: calls, receiver, extends,\nimplements, imports, reexports,\nimports-type, dynamic-imports"]
KEEP["Preserved edges:\ncontains, parameter_of\n(emitted by Stage 5, not re-run\nfor barrel candidates)"]
S7["Stage 7: insert_edges\nRe-emits ALL outgoing edge kinds\nfor re-parsed barrel files"]
STABLE["Stable edge count\nacross incremental rebuilds"]
S5 --> S6b
S6b --> DEL
DEL --> KEEP
DEL --> S7
S7 --> STABLE
style DEL fill:#d4edda,stroke:#28a745
style KEEP fill:#fff3cd,stroke:#ffc107
style STABLE fill:#cce5ff,stroke:#004085
Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile |
| "DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \ | ||
| AND kind IN ('imports', 'reexports')", | ||
| AND kind IN ('imports', 'imports-type', 'dynamic-imports', 'reexports', \ | ||
| 'calls', 'receiver', 'extends', 'implements')", |
There was a problem hiding this comment.
Positive allowlist is brittle — prefer a negative filter
The IN (...) list must be kept in sync with every edge kind Stage 7 emits. If a new kind (e.g. type-reference, uses) is added to Stage 7 later, it will silently accumulate duplicates again for barrel candidates, reproducing the exact class of bug this PR fixes.
Since only two kinds must be preserved (contains and parameter_of), inverting the filter is safer and self-documenting:
| "DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \ | |
| AND kind IN ('imports', 'reexports')", | |
| AND kind IN ('imports', 'imports-type', 'dynamic-imports', 'reexports', \ | |
| 'calls', 'receiver', 'extends', 'implements')", | |
| "DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?1) \ | |
| AND kind NOT IN ('contains', 'parameter_of')", |
There was a problem hiding this comment.
Good catch — switched Stage 6b's DELETE to a negative filter (AND kind NOT IN ('contains', 'parameter_of')) in 19f7651 so any future Stage 7 edge kind is automatically purged on barrel re-parse. The two preserved kinds are the only ones emitted by Stage 5 that must survive, so this is both safer and self-documenting.
| // Invariant 1: incremental edge count must not grow across rebuilds. | ||
| expect(history[1].total).toBe(history[0].total); | ||
| expect(history[2].total).toBe(history[0].total); | ||
|
|
||
| // Invariant 2: incremental must not introduce new duplicates beyond the | ||
| // pre-existing duplicates present in a clean full build. | ||
| expect(history[2].duplicates).toBeLessThanOrEqual(freshFull.duplicates); |
There was a problem hiding this comment.
Test does not assert total-count parity with a fresh full build
Invariant 1 proves the count stabilises across rebuild 2 and 3 relative to rebuild 1, but does not assert that rebuild 1 (history[0]) produces the same total edge count as a clean full build over the same code state. A scenario where the first incremental rebuild leaves stale edges that happen to not be flagged as (source, target, kind) duplicates (e.g. edges pointing at a stale node id) would pass both invariants. Adding the following assertion would close the gap:
// After applying all 3 bumps, both dirs have the same code — edge totals should match.
expect(history[2].total).toBe(freshFull.total);There was a problem hiding this comment.
Added in 0093a02 as Invariant 3: expect(history[2].total).toBe(freshFull.total). This closes the gap where stale edges pointing at orphaned node ids would slip past the (source, target, kind) duplicate check. Test still passes locally.
Codegraph Impact Analysis7 functions changed → 5 callers affected across 4 files
|
…998) Replace the positive allowlist in Stage 6b's barrel-candidate DELETE with a `NOT IN ('contains', 'parameter_of')` filter so any future edge kind added to Stage 7 is automatically purged on re-parse. Previously, adding a new kind to Stage 7 would silently reintroduce the duplicate-edge accumulation bug this PR fixes (#979). Also refresh the adjacent comment: "barrel files are re-export-only" was contradicted by the hybrid-barrel fixture this PR adds. Impact: 1 functions changed, 1 affected
…uild (#998) Add Invariant 3 to the #979 regression test: after the 3 incremental bumps, the edge total must equal a clean full build over the same code state. The previous invariants only asserted stability across rebuilds and a duplicate-free (source, target, kind) set, which would miss stale edges pointing at orphaned node ids that escape the scoped DELETE.
|
@greptileai Thanks for the thorough review. All three findings addressed:
|
Summary
reparse_barrel_candidatesincrates/codegraph-core/src/build_pipeline.rs) only purgedimportsandreexportsedges before Stage 7 re-emitted every edge kind, so every incremental rebuild appended duplicatecalls,receiver,extends,implements,imports-type, anddynamic-importsedges for any hybrid barrel file picked up via reverse-dep expansion.DELETEto cover all 8 edge kinds that Stage 7 re-emits.containsandparameter_ofare intentionally excluded because Stage 5 (insert_nodes) does not re-run for barrel candidates merged in at Stage 6b — wiping them would permanently drop those edges.Root cause
DELETE FROM edges WHERE source_id IN (... file = ?1) AND kind IN ('imports', 'reexports')was too narrow. Stage 7 re-emits 8 kinds for re-parsed files, so the other 6 kinds leaked duplicates on every incremental rebuild (~250 new edges per run in real repos).Before / After
Stable edge counts across consecutive incremental rebuilds; no new duplicates introduced beyond the pre-existing full-build baseline.
Test plan
tests/integration/incremental-edge-duplication.test.tspasses (fails on main to prove it catches the bug).cargo test --libincrates/codegraph-core: 176 passed.npm run lint:fix: clean.Fixes #979