fix(native): purge stale rows when WASM-only files are deleted#1122
fix(native): purge stale rows when WASM-only files are deleted#1122carlos-alm wants to merge 6 commits into
Conversation
Closes #1073 When a file in a WASM-only language is deleted from disk, neither engine cleans up its DB rows: Rust's detect_removed_files (#1070) skips files outside is_supported_extension, and the JS-side backfill only inserts. Add computeWasmOnlyStaleFiles to detect (existingNodes ∪ existingHashes) not on disk, filtered to extensions installed for WASM but absent from NATIVE_SUPPORTED_EXTENSIONS so Rust still owns its own purge path. Wire it into backfillNativeDroppedFiles to call purgeFilesData after the better-sqlite3 handoff. Unit tests cover the helper. docs check acknowledged — internal pipeline fix, no doc-visible surface.
|
Claude finished @carlos-alm's task —— View job Comprehensive PR ReviewTodo List:
Root Cause AnalysisProblem Being Solved: When WASM-only files are deleted from disk, their rows persist in the database indefinitely. This happens because:
Why This Approach: The fix is correctly scoped - it only handles purges for extensions with installed WASM grammars that are NOT in Risk Assessment: Low risk. The approach is conservative and well-isolated. Currently, all registered languages have Rust extractors (per PR description), so the bug surface is empty today. The fix protects against future WASM-only languages entering the registry before their native ports. Code Quality Assessment⭐ Excellent Implementation QualityStrengths:
Technical Excellence:
Configuration & Breaking Changes
Test Integrity
Backlog Compliance
Minor ObservationsPositive Notes:
No Issues Found:
Final AssessmentThis PR represents high-quality defensive programming that directly supports the tool's core mission of providing accurate dependency information to AI agents. |
Greptile SummaryThis PR fixes a gap where WASM-only files deleted from disk left stale
Confidence Score: 5/5The change is narrowly scoped to WASM-only extensions and does not touch the native purge path; the only anomaly is an unreachable duplicate return statement left from refactoring. The core logic is correct and well-tested. The back-slash normalisation fix is properly applied ( The duplicate early-return in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[tryNativeOrchestrator] --> B[detectDroppedLanguageGap]
B --> C[Collect existingNodes & existingHashes from DB]
C --> D[Build expected set from disk files]
D --> E[Compute missingRel / missingAbs]
D --> F[computeWasmOnlyStaleFiles]
E --> G{missingAbs OR staleRel non-empty?}
F --> G
G -- No --> H[Return]
G -- Yes --> I[backfillNativeDroppedFiles]
I --> J{staleRel non-empty?}
J -- Yes --> K[purgeFilesData DELETE rows]
J -- No --> L{missingAbs non-empty?}
K --> L
L -- No --> M[Return]
L -- Yes --> N[WASM backfill + batchInsertNodes]
Reviews (7): Last reviewed commit: "fix(stale-purge): preserve raw path so D..." | Re-trigger Greptile |
| const { byReason: staleByReason, totals: staleTotals } = classifyNativeDrops(staleRel); | ||
| info( | ||
| `Detected ${staleRel.length} deleted WASM-only file(s) the native orchestrator skipped; purging stale rows: ${formatDropExtensionSummary(staleByReason['unsupported-by-native'])}`, | ||
| ); | ||
| // staleRel is restricted above to extensions outside NATIVE_SUPPORTED_EXTENSIONS, | ||
| // so the native-extractor-failure bucket should always be empty here. | ||
| if (staleTotals['native-extractor-failure'] > 0) { | ||
| debug( | ||
| `backfillNativeDroppedFiles: stale-purge classified ${staleTotals['native-extractor-failure']} native-supported file(s) — unexpected; inspect the filter`, | ||
| ); | ||
| } | ||
| purgeFilesData(dbConn, staleRel); |
There was a problem hiding this comment.
classifyNativeDrops call is redundant for the stale-purge path
computeWasmOnlyStaleFiles already guarantees every path in staleRel has an extension outside NATIVE_SUPPORTED_EXTENSIONS, so classifyNativeDrops will always put 100 % of staleRel into byReason['unsupported-by-native'] and staleTotals['native-extractor-failure'] can never be > 0. The defensive debug log is harmless, but the full classification pass (including building the Map<string, string[]> buckets) is unused except to feed formatDropExtensionSummary. Consider building the extension summary directly from staleRel — or at minimum remove the staleTotals['native-extractor-failure'] guard block, since its impossible-condition silently contradicts the invariant comment above it.
There was a problem hiding this comment.
Fixed in 8af09cc. Replaced the classifyNativeDrops call with a dedicated groupByExtension helper that builds the Map<string, string[]> summary directly from staleRel. Also dropped the unreachable native-extractor-failure guard block — the comment above already documents the invariant, no need for impossible-condition debug logging contradicting it.
| const consider = (rel: string): void => { | ||
| if (expected.has(rel) || seen.has(rel)) return; |
There was a problem hiding this comment.
expected.has(rel) uses raw DB paths against normalised on-disk paths
expected is built with normalizePath(path.relative(ctx.rootDir, f)), while existingNodes and existingHashes are raw file column values from the DB. On Windows, a DB row stored with back-slashes (src\a.gleam) would fail the expected.has(rel) check even when the file still exists on disk, causing a false-positive stale detection and an unintended purge of live rows. The same path-normalisation mismatch already exists in the missingRel loop above (where it causes under-detection rather than over-deletion), so this is a pre-existing assumption — but the consequence is worse on the new purge path. Worth adding a normalizePath call on rel inside consider if Windows support is ever on the table, or a comment documenting the invariant that DB paths are always forward-slash-normalised.
There was a problem hiding this comment.
Fixed in 8af09cc. consider() now applies rawRel.replace(/\\/g, '/') before comparing against expected (which is already forward-slash-normalised), so a stale DB row carrying back-slashes — e.g. one migrated from a Windows-built DB — no longer triggers a false-positive purge of a live file. I used an explicit regex replace rather than normalizePath because normalizePath only touches path.sep, so it would be a no-op on POSIX against a Windows-flavoured string. Added a regression test in wasm-only-stale-files.test.ts for the back-slash case.
Codegraph Impact Analysis12 functions changed → 9 callers affected across 5 files
|
…tching (#1122) Address Greptile review on the WASM-only stale purge: - Replace classifyNativeDrops + unreachable native-extractor-failure guard in the stale-purge branch with a direct groupByExtension helper. The paths returned by computeWasmOnlyStaleFiles are guaranteed to be outside NATIVE_SUPPORTED_EXTENSIONS, so the classification pass was always a no-op for the unreachable bucket. - Normalise back-slashes to forward-slashes inside consider() before comparing against the expected set. Defends against a stale DB row (e.g. migrated from a Windows-built DB) being treated as missing and purged even when the file still exists on disk. Replace `\\` explicitly rather than calling normalizePath so the defence works on POSIX too. - Add a regression test for the back-slash path-matching case.
| seen.add(rel); | ||
| stale.push(rel); |
There was a problem hiding this comment.
stale.push(rel) stores the forward-slash-normalised path, but purgeFilesData executes DELETE FROM nodes WHERE file = ? with that value. If the DB row was stored with back-slashes (the Windows-migration case the comment describes), the SQL predicate won't match and the DELETE hits 0 rows — the stale row silently persists, which is exactly the regression this PR is trying to fix.
The dedup key should remain the normalised form (so src/a.gleam and src\a.gleam are still treated as one file), but the value pushed into stale must be the original rawRel so the later WHERE file = ? matches the actual stored path. The existing back-slash test only covers the "file still on disk" path; adding a counterpart with an empty expected set would catch this.
| seen.add(rel); | |
| stale.push(rel); | |
| seen.add(rel); | |
| stale.push(rawRel); |
There was a problem hiding this comment.
Fixed in 4c22410. consider() now pushes rawRel (not the normalised rel) into the stale list, so purgeFilesData's DELETE FROM nodes WHERE file = ? is byte-identical to the stored row. The dedup seen set still uses the normalised form so a path written once with \ and once with / is treated as one entry. Added the counterpart regression test (preserves back-slash form so DELETE matches the actual DB row) covering a stale back-slash row with an empty expected set.
Greptile P1: pushing the forward-slash-normalised path into the stale list meant 'DELETE FROM nodes WHERE file = ?' missed rows that had been written with back-slashes (e.g. legacy Windows-migrated DBs) — the exact regression #1073 is trying to fix. The dedup key keeps the normalised form so a file written once with '\' and once with '/' is still treated as one entry, but the value the SQL sees is now byte-identical to what's stored. Added a regression test covering a back-slash row with an empty 'expected' set.
Summary
computeWasmOnlyStaleFileshelper that finds files present innodes/file_hashesbut absent from disk, scoped to extensions installed for WASM and outsideNATIVE_SUPPORTED_EXTENSIONSso the Rust orchestrator still owns its own purge path.backfillNativeDroppedFiles: after the better-sqlite3 handoff, callpurgeFilesDatafor the stale set. Early-return now requires bothmissingAbsandstaleRelto be empty.Context
Closes #1073.
PR #1070 made Rust's
detect_removed_filesskip files outsideis_supported_extensionto stop purging-and-reinserting WASM-only files on every incremental rebuild (#1066). Side effect: when a WASM-only file is genuinely deleted from disk, no path removes its rows — Rust skips it, and JSbackfillNativeDroppedFilesonly inserts.The fix is scoped narrowly: the new helper restricts purges to extensions where the WASM grammar is installed and Rust has no extractor, so the natively-handled delete path is untouched. Every currently-registered language now has a Rust extractor, so the bug surface today is the empty set; the fix protects against any future WASM-only language entering the registry before its native port lands.
Test plan
npx vitest run tests/builder/wasm-only-stale-files.test.ts)tsc --noEmitcleanbiome checkclean on changed files