fix: GC scanning now handles nested/namespaced model types#352
Conversation
#330) The data lifecycle service scanned .swamp/data/ using a flat 3-level readDir() loop (type -> modelId -> dataName). For nested/namespaced model types like command/shell or aws/ec2/vpc, the first directory segment was incorrectly treated as the full type and subsequent segments as the model ID, causing expired data to never be garbage collected. Replace the manual directory scanning in findExpiredData() and deleteExpiredData() with calls to the repository's findAllGlobal() method, which already handles arbitrary type nesting depth via recursive directory walking with isModelIdDirectory() detection. This also removes the now-unnecessary repoDir constructor parameter from DefaultDataLifecycleService since the service no longer performs its own filesystem scanning. Co-Authored-By: Magistr <2269529+umag@users.noreply.github.com>
There was a problem hiding this comment.
Review Summary
This PR fixes a significant bug where garbage collection was failing to find and clean expired data for nested/namespaced model types (like command/shell, aws/ec2/vpc, @docker/host). The fix correctly delegates to the repository's findAllGlobal() method which already handles arbitrary nesting depth.
No Blocking Issues
The code is well-written and follows the project's patterns.
DDD Analysis
Good patterns observed:
- The
DataLifecycleServicecorrectly operates as a domain service - stateless and coordinating operations across multiple aggregates - The service properly delegates filesystem operations to the
UnifiedDataRepositoryrather than doing its own directory traversal - The refactoring removes the
repoDirparameter from the service, which was an infrastructure concern leaking into the domain layer - this is a positive change that improves separation of concerns - The service now depends only on repository interfaces, not filesystem paths
Architecture improvement: By removing the manual readDir() loops and using findAllGlobal(), the domain service no longer has knowledge of how model types map to directory structure. This keeps the domain layer clean of infrastructure details.
Code Quality
- ✅ TypeScript strict mode compliance - uses proper types throughout
- ✅ Uses type-only import for
ModelType(good practice) - ✅ Named exports used consistently
- ✅ AGPLv3 license headers present
- ✅ No
anytypes in the changed files (the existingAnyOptionsindata_gc.tspredates this PR and is already suppressed with a lint comment)
Test Coverage
- ✅ Tests added for
findExpiredDatacovering:- Deep nested types (
aws/ec2/vpc) - Two-level nested types (
command/shell) - Non-expired data (skipping behavior)
- Empty repositories
- Deep nested types (
- ✅ Tests use proper mocking of the
findAllGlobalmethod - ✅ Tests live next to source files as required
Security Review
- ✅ No security vulnerabilities introduced
- ✅ The existing path traversal protection in
assertPathContainedremains intact in the repository layer - ✅ No new user input handling that could introduce injection risks
Suggestions (non-blocking)
-
Minor optimization consideration: In
deleteExpiredData(),findAllGlobal()is called twice whendryRunis false - once infindExpiredData()and once for the version GC loop. This could be optimized to reuse the first result, but the current implementation is clear and correct. -
Test completeness: Consider adding a test for
deleteExpiredData()to verify the version GC loop correctly deduplicates type/modelId pairs, though the existing tests provide good coverage of the core functionality.
Overall, this is a well-executed bug fix that improves the domain architecture while maintaining comprehensive test coverage.
…42) (systeminit#1383) ## Summary Removes the legacy `registries.<kind>.failures[]` dual failure surface and makes `aggregateState.sourceDetails[]` the single authoritative source of extension failure state in `swamp doctor extensions`. Closes the W1-W6 rearchitecture story — W7 removes the last vestige of the old failure-reporting model. ### What changed - **Shared failure recorder** (`src/domain/extensions/source_failure_recorder.ts`) — extracted from reconcile's catch block. Both reconcile and buildIndex call the same helper, eliminating the consistency gap where buildIndex bypassed the aggregate. - **buildIndex failure routing** — warm-start catch routes failures through `recordSourceFailure` → `repository.saveAll()`. Cold-start failures also routed, with `originalError` preserving `ValidationError` type for correct state-tag discrimination. - **`markCatalogValidationFailed` removed** — validation failures now throw `ValidationError`, caught by the shared helper. No more direct catalog upserts that bypass the aggregate. - **`registries.failures[]` removed** — per-registry status derived from `sourceDetails[]` failure-state tags (`ValidationFailed`, `BundleBuildFailed`, `EntryPointUnreadable`). - **`extension_load_warnings.ts` split** — kept `emitTypeExtractionFailure` for stderr regex-mismatch warnings; removed `recordLoadFailures`, `getExtensionLoadWarnings`, and the warnings array accumulation. - **`last_error` catalog column** — schema migration adds `last_error TEXT` to `bundle_types`. Error messages persist across process invocations. Clears on recovery to Indexed. - **SQL state-preservation fix** — `upsert` ON CONFLICT UPDATE now conditionally excludes `state` and `last_error` when callers don't provide them, preventing `populateCatalogFromDir` from overwriting reconcile's failure states with `Indexed`. ### JSON schema changes (doctor extensions --json) 1. **Removed**: `registries.<kind>.failures[]` — the field no longer exists on `DoctorRegistryResult` 2. **Added**: `sourceDetails[].lastError?: string` — populated for failure states (`ValidationFailed`, `BundleBuildFailed`, `EntryPointUnreadable`), absent for healthy states ### Migration note Pre-migration catalog rows in failure states get empty `lastError` until the next reconcile pass. Self-heals on first run after upgrade — not a bug, just a transient state. ### Known behavior changes (follow-ups filed) - **Type-extraction warnings** (systeminit#351): models with non-literal `type` fields (e.g. `const TYPE = "..."`) produce stderr warnings but no longer flip doctor exit code. They don't register and don't appear in `sourceDetails[]`. Previously surfaced via the removed `registries.failures[]` pipeline. - **Streaming protocol** (systeminit#352): `kind-completed` events carry preliminary `status: "pass"` during the scan phase; final status is computed from aggregate state in the `completed` event. Internal-only (renderer is the sole consumer). ## Test plan - [x] 5908 unit/integration tests pass (0 failures) - [x] `deno check` / `deno lint` / `deno fmt` clean - [x] 24-point binary verification matrix against compiled `./swamp`: - Healthy model → Indexed, exit 0 - Schema-invalid model → failure state, exit 1, lastError populated - Missing version → ValidationFailed, exit 1 - Recovery: broken → fixed → Indexed, lastError clears - Rebundle loop guard: stable broken source converges (3 runs) - Mixed healthy + broken: healthy registers, broken surfaces - JSON contract: no `failures` field, `lastError` present - Vault failure surfaces in sourceDetails with correct registry status - lastError persists across separate process invocations - lastError absent from sourceDetails after recovery to Indexed - [x] SQL regression tests: upsert without state preserves both state and last_error; upsert with explicit state overwrites both - [x] Integration test: lastError lifecycle (populates on failure, clears on recovery) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes #330.
Garbage collection (
swamp data gc) was silently failing to find and clean upexpired data for any model with a nested or namespaced type — including built-in
types like
command/shelland user-defined types likeaws/ec2/vpcor@docker/host.The root cause was that
DefaultDataLifecycleService.findExpiredData()and theversion GC loop in
deleteExpiredData()both scanned.swamp/data/using aflat 3-level
readDir()loop (type → modelId → dataName). For a type likecommand/shell, the scanner would readcommandas the full type andshellas a model ID, producing completely wrong lookups. Expired data for all
nested/namespaced types was never garbage collected.
What changed
findExpiredData()— replaced the manual 3-level directory scan with acall to the repository's
findAllGlobal()method, which already correctlyhandles arbitrary nesting depth using recursive directory walking with
isModelIdDirectory()detection.deleteExpiredData()version GC loop — replaced the second manualdirectory scan with unique type+modelId pairs derived from
findAllGlobal().repoDirconstructor parameter fromDefaultDataLifecycleServicesince the service no longer does its own filesystem path construction.
join,SWAMP_SUBDIRS,swampPath) and changedModelTypeto a type-only import.findExpiredDatacovering nested types (aws/ec2/vpc,command/shell), non-expired data, and empty repositories.What this means for users
If you use models with nested types (like
command/shell, custom extensionmodels under namespaces like
aws/ec2/vpc, or scoped types like@docker/host),swamp data gcwill now correctly identify and clean upexpired data and old versions for those models. Previously this data would
accumulate indefinitely regardless of lifetime or garbage collection policies.
Files changed
src/domain/data/data_lifecycle_service.tsfindAllGlobal(), removerepoDirparamsrc/cli/commands/data_gc.tsrepoDirfrom constructor callsrc/domain/data/data_lifecycle_service_test.tsfindAllGlobalto mock, add nested type testsTest plan
deno check— type checking passesdeno lint— no lint errorsdeno fmt— formatting correctdeno run test— all 1701 tests passdeno run compile— binary compiles successfully🤖 Generated with Claude Code