Skip to content

fix: GC scanning now handles nested/namespaced model types#352

Merged
stack72 merged 1 commit into
mainfrom
fix/gc-nested-model-types
Feb 14, 2026
Merged

fix: GC scanning now handles nested/namespaced model types#352
stack72 merged 1 commit into
mainfrom
fix/gc-nested-model-types

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Feb 14, 2026

Summary

Fixes #330.

Garbage collection (swamp data gc) was silently failing to find and clean up
expired data for any model with a nested or namespaced type — including built-in
types like command/shell and user-defined types like aws/ec2/vpc or
@docker/host.

The root cause was that DefaultDataLifecycleService.findExpiredData() and the
version GC loop in deleteExpiredData() both scanned .swamp/data/ using a
flat 3-level readDir() loop (type → modelId → dataName). For a type like
command/shell, the scanner would read command as the full type and shell
as a model ID, producing completely wrong lookups. Expired data for all
nested/namespaced types was never garbage collected.

What changed

  • findExpiredData() — replaced the manual 3-level directory scan with a
    call to the repository's findAllGlobal() method, which already correctly
    handles arbitrary nesting depth using recursive directory walking with
    isModelIdDirectory() detection.
  • deleteExpiredData() version GC loop — replaced the second manual
    directory scan with unique type+modelId pairs derived from findAllGlobal().
  • Removed repoDir constructor parameter from DefaultDataLifecycleService
    since the service no longer does its own filesystem path construction.
  • Removed unused imports (join, SWAMP_SUBDIRS, swampPath) and changed
    ModelType to a type-only import.
  • Added tests for findExpiredData covering nested types (aws/ec2/vpc,
    command/shell), non-expired data, and empty repositories.

What this means for users

If you use models with nested types (like command/shell, custom extension
models under namespaces like aws/ec2/vpc, or scoped types like
@docker/host), swamp data gc will now correctly identify and clean up
expired data and old versions for those models. Previously this data would
accumulate indefinitely regardless of lifetime or garbage collection policies.

Files changed

File Change
src/domain/data/data_lifecycle_service.ts Replace manual dir scanning with findAllGlobal(), remove repoDir param
src/cli/commands/data_gc.ts Remove repoDir from constructor call
src/domain/data/data_lifecycle_service_test.ts Add findAllGlobal to mock, add nested type tests

Test plan

  • deno check — type checking passes
  • deno lint — no lint errors
  • deno fmt — formatting correct
  • deno run test — all 1701 tests pass
  • deno run compile — binary compiles successfully

🤖 Generated with Claude Code

#330)

The data lifecycle service scanned .swamp/data/ using a flat 3-level
readDir() loop (type -> modelId -> dataName). For nested/namespaced
model types like command/shell or aws/ec2/vpc, the first directory
segment was incorrectly treated as the full type and subsequent segments
as the model ID, causing expired data to never be garbage collected.

Replace the manual directory scanning in findExpiredData() and
deleteExpiredData() with calls to the repository's findAllGlobal()
method, which already handles arbitrary type nesting depth via recursive
directory walking with isModelIdDirectory() detection.

This also removes the now-unnecessary repoDir constructor parameter
from DefaultDataLifecycleService since the service no longer performs
its own filesystem scanning.

Co-Authored-By: Magistr <2269529+umag@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes a significant bug where garbage collection was failing to find and clean expired data for nested/namespaced model types (like command/shell, aws/ec2/vpc, @docker/host). The fix correctly delegates to the repository's findAllGlobal() method which already handles arbitrary nesting depth.

No Blocking Issues

The code is well-written and follows the project's patterns.

DDD Analysis

Good patterns observed:

  • The DataLifecycleService correctly operates as a domain service - stateless and coordinating operations across multiple aggregates
  • The service properly delegates filesystem operations to the UnifiedDataRepository rather than doing its own directory traversal
  • The refactoring removes the repoDir parameter from the service, which was an infrastructure concern leaking into the domain layer - this is a positive change that improves separation of concerns
  • The service now depends only on repository interfaces, not filesystem paths

Architecture improvement: By removing the manual readDir() loops and using findAllGlobal(), the domain service no longer has knowledge of how model types map to directory structure. This keeps the domain layer clean of infrastructure details.

Code Quality

  • ✅ TypeScript strict mode compliance - uses proper types throughout
  • ✅ Uses type-only import for ModelType (good practice)
  • ✅ Named exports used consistently
  • ✅ AGPLv3 license headers present
  • ✅ No any types in the changed files (the existing AnyOptions in data_gc.ts predates this PR and is already suppressed with a lint comment)

Test Coverage

  • ✅ Tests added for findExpiredData covering:
    • Deep nested types (aws/ec2/vpc)
    • Two-level nested types (command/shell)
    • Non-expired data (skipping behavior)
    • Empty repositories
  • ✅ Tests use proper mocking of the findAllGlobal method
  • ✅ Tests live next to source files as required

Security Review

  • ✅ No security vulnerabilities introduced
  • ✅ The existing path traversal protection in assertPathContained remains intact in the repository layer
  • ✅ No new user input handling that could introduce injection risks

Suggestions (non-blocking)

  1. Minor optimization consideration: In deleteExpiredData(), findAllGlobal() is called twice when dryRun is false - once in findExpiredData() and once for the version GC loop. This could be optimized to reuse the first result, but the current implementation is clear and correct.

  2. Test completeness: Consider adding a test for deleteExpiredData() to verify the version GC loop correctly deduplicates type/modelId pairs, though the existing tests provide good coverage of the core functionality.

Overall, this is a well-executed bug fix that improves the domain architecture while maintaining comprehensive test coverage.

@stack72 stack72 merged commit 6817069 into main Feb 14, 2026
4 checks passed
@stack72 stack72 deleted the fix/gc-nested-model-types branch February 14, 2026 23:09
adamstac pushed a commit to adamstac/swamp that referenced this pull request May 14, 2026
…42) (systeminit#1383)

## Summary

Removes the legacy `registries.<kind>.failures[]` dual failure surface
and makes `aggregateState.sourceDetails[]` the single authoritative
source of extension failure state in `swamp doctor extensions`. Closes
the W1-W6 rearchitecture story — W7 removes the last vestige of the old
failure-reporting model.

### What changed

- **Shared failure recorder**
(`src/domain/extensions/source_failure_recorder.ts`) — extracted from
reconcile's catch block. Both reconcile and buildIndex call the same
helper, eliminating the consistency gap where buildIndex bypassed the
aggregate.
- **buildIndex failure routing** — warm-start catch routes failures
through `recordSourceFailure` → `repository.saveAll()`. Cold-start
failures also routed, with `originalError` preserving `ValidationError`
type for correct state-tag discrimination.
- **`markCatalogValidationFailed` removed** — validation failures now
throw `ValidationError`, caught by the shared helper. No more direct
catalog upserts that bypass the aggregate.
- **`registries.failures[]` removed** — per-registry status derived from
`sourceDetails[]` failure-state tags (`ValidationFailed`,
`BundleBuildFailed`, `EntryPointUnreadable`).
- **`extension_load_warnings.ts` split** — kept
`emitTypeExtractionFailure` for stderr regex-mismatch warnings; removed
`recordLoadFailures`, `getExtensionLoadWarnings`, and the warnings array
accumulation.
- **`last_error` catalog column** — schema migration adds `last_error
TEXT` to `bundle_types`. Error messages persist across process
invocations. Clears on recovery to Indexed.
- **SQL state-preservation fix** — `upsert` ON CONFLICT UPDATE now
conditionally excludes `state` and `last_error` when callers don't
provide them, preventing `populateCatalogFromDir` from overwriting
reconcile's failure states with `Indexed`.

### JSON schema changes (doctor extensions --json)

1. **Removed**: `registries.<kind>.failures[]` — the field no longer
exists on `DoctorRegistryResult`
2. **Added**: `sourceDetails[].lastError?: string` — populated for
failure states (`ValidationFailed`, `BundleBuildFailed`,
`EntryPointUnreadable`), absent for healthy states

### Migration note

Pre-migration catalog rows in failure states get empty `lastError` until
the next reconcile pass. Self-heals on first run after upgrade — not a
bug, just a transient state.

### Known behavior changes (follow-ups filed)

- **Type-extraction warnings** (systeminit#351): models with non-literal `type`
fields (e.g. `const TYPE = "..."`) produce stderr warnings but no longer
flip doctor exit code. They don't register and don't appear in
`sourceDetails[]`. Previously surfaced via the removed
`registries.failures[]` pipeline.
- **Streaming protocol** (systeminit#352): `kind-completed` events carry
preliminary `status: "pass"` during the scan phase; final status is
computed from aggregate state in the `completed` event. Internal-only
(renderer is the sole consumer).

## Test plan

- [x] 5908 unit/integration tests pass (0 failures)
- [x] `deno check` / `deno lint` / `deno fmt` clean
- [x] 24-point binary verification matrix against compiled `./swamp`:
  - Healthy model → Indexed, exit 0
  - Schema-invalid model → failure state, exit 1, lastError populated
  - Missing version → ValidationFailed, exit 1
  - Recovery: broken → fixed → Indexed, lastError clears
  - Rebundle loop guard: stable broken source converges (3 runs)
  - Mixed healthy + broken: healthy registers, broken surfaces
  - JSON contract: no `failures` field, `lastError` present
  - Vault failure surfaces in sourceDetails with correct registry status
  - lastError persists across separate process invocations
  - lastError absent from sourceDetails after recovery to Indexed
- [x] SQL regression tests: upsert without state preserves both state
and last_error; upsert with explicit state overwrites both
- [x] Integration test: lastError lifecycle (populates on failure,
clears on recovery)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data lifecycle GC scanning fails for nested/namespaced model types

1 participant