Skip to content

feat(data): giga-swamp phase 3 — namespace-aware layout + per-namespace locking (swamp-club#486)#1474

Merged
stack72 merged 1 commit into
mainfrom
worktree-486
May 29, 2026
Merged

feat(data): giga-swamp phase 3 — namespace-aware layout + per-namespace locking (swamp-club#486)#1474
stack72 merged 1 commit into
mainfrom
worktree-486

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 29, 2026

Summary

Phase 3 of giga-swamp (multi-repo shared datastores). It makes the datastore storage layout namespace-aware so repos sharing a datastore land data under {datastore}/{namespace}/ and don't block each other on structural commands. Solo mode (empty namespace) is byte-identical to before — the regression gate.

Builds on Phase 1 (Namespace value object + config) and Phase 2 (catalog v4 + repository + DataRecord).

What changed

  • Path resolverDefaultDatastorePathResolver prepends {namespace}/ as the outermost datastore-tier segment via the single datastorePath() chokepoint (@std/path join; no prefix in solo mode).
  • Repo-local catalog — new centralized catalogDbPath() helper resolves _catalog.db to .swamp/data/ regardless of namespace (used by createCatalogStore and datastore compact, so the path can't drift). Each repo owns a private catalog, so per-repo full-replace backfill never clobbers another repo's rows.
  • Per-namespace global lock.locks/{namespace}.lock via a namespace-aware createDatastoreLock + datastoreGlobalLockOptions, routed through every global-lock site including the acquireModelLocks drain inspect, so the symmetric-drain protocol stays correct in namespaced mode. Per-model locks are left un-namespaced (model ids are UUIDs → already globally unique).
  • Namespace threading — the namespace value flows through the 13 direct FileSystemUnifiedDataRepository construction sites (via a shared namespaceFromResolver() helper) and the 2 execution_service sites + their callers, keeping catalog stamping in lockstep with the namespaced data path.

Boundaries (deliberately out of scope)

No CEL/query changes (Phase 4), no CLI columns/namespace commands (Phase 5), no sync changes (Phase 6), no existing-data migration (Phase 7), no DEFAULT_DATASTORE_SUBDIRS/markDirty changes. Secrets/vault-bundles are repo-local by construction (never routed through resolvePath). One known cosmetic artifact documented in design/datastores.md: in namespaced mode the un-namespaced per-model lock leaves empty {ds}/data/{type}/{id}/ dirs behind (no data, correctness-safe).

Test Plan

  • deno check / deno lint / deno fmt --check clean; deno run test 6471 passed / 0 failed.
  • New: 11 unit tests (resolver namespace prefixing + solo-identical + exclude-pattern guard; repo-local catalogDbPath across default/external-fs/namespaced/custom; namespaceFromResolver; namespaced FileLock lazy .locks/ + concurrent acquire; datastoreGlobalLockOptions) and a 3-case integration test (giga_swamp_namespace_layout_test.ts).
  • Compiled-binary smoke:
    • Solo — data at {ds}/data/ (no prefix), no .locks/, data querynamespace: "".
    • Namespaced — data at {ds}/infra/data/..., data list/query stamped namespace: "infra", repo-local catalog, .locks/ created.
    • Catalog relocation migration — pre-Phase-3 binary wrote data (catalog at old {ds}/data/_catalog.db); new binary transparently rebuilt it repo-local with identical query results and intact content; old catalog orphaned harmlessly.
    • Live MinIO (S3) — namespace is the outermost key prefix (giga/infra/data/...); local cache mirrors the remote layout; catalog not synced.

Closes swamp-club#486.

🤖 Generated with Claude Code

…ce locking (swamp-club#486)

Phase 3 makes the datastore storage layout namespace-aware so repos sharing a
datastore land data under {datastore}/{namespace}/ and never block each other on
structural commands. Solo mode (empty namespace) is byte-identical to before.

- Path resolver: DefaultDatastorePathResolver prepends {namespace}/ as the
  outermost datastore-tier segment via the single datastorePath() chokepoint.
- Catalog is repo-local at .swamp/data/_catalog.db via a centralized
  catalogDbPath() helper (used by createCatalogStore and datastore_compact), so
  repos sharing a datastore each own a private catalog and full-replace backfill
  never clobbers another repo's rows.
- Global datastore lock becomes .locks/{namespace}.lock via a namespace-aware
  createDatastoreLock + datastoreGlobalLockOptions, routed through every
  global-lock site including the acquireModelLocks drain inspect so the
  symmetric drain stays correct in namespaced mode. Per-model locks are left
  un-namespaced (model ids are UUIDs, already globally unique).
- Namespace value threaded through the 13 direct FileSystemUnifiedDataRepository
  construction sites (via namespaceFromResolver) and the 2 execution_service
  sites plus their callers, so catalog stamping stays in lockstep with the
  namespaced data path.

Verified: deno check/lint/fmt/test (6471 passed) + compiled-binary smoke for
solo, namespaced, and the external-filesystem catalog-relocation migration
(old catalog orphaned harmlessly, new one self-heals with identical results),
plus a live MinIO run confirming the namespace is the outermost S3 key prefix
and the local cache mirrors the remote layout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

None.

Verdict

PASS — This PR has no user-visible UX impact. It is a pure infrastructure change (namespace-aware storage layout, per-namespace locking, centralized catalog path helper). No new flags, commands, or help text are introduced; no output format changes in log or JSON mode. The only observable CLI difference is a minor improvement: LockTimeoutError now reports the correct lock key (.locks/{namespace}.lock) in namespaced repos instead of the hardcoded .datastore.lock. Solo mode is byte-identical to before.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Well-structured Phase 3 implementation. The namespace threading is comprehensive — all 13 FileSystemUnifiedDataRepository construction sites, both execution_service paths, and all global-lock sites are correctly updated. The catalog relocation from datastore-tier to repo-local (resolvePathlocalPath) is a sound design decision with proper centralization via catalogDbPath(). Solo-mode backward compatibility is preserved throughout.

Blocking Issues

None.

Suggestions

  1. Defense-in-depth: validate namespace at config parse timedatastoreGlobalLockOptions() in repo_context.ts:1205 uses the raw config.namespace string in path construction (.locks/${namespace}.lock) before createNamespace() validates it at the composition root. The Namespace regex (/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/) rejects all path-traversal characters, so there's no exploitable vulnerability, but if a future code path ever calls datastoreGlobalLockOptions before hitting the composition root, the raw string would flow into lock paths unvalidated. Consider either (a) validating in resolve_datastore.ts at parse time, or (b) accepting Namespace instead of DatastoreConfig in datastoreGlobalLockOptions to make the type system enforce validation order.

  2. Lock key uses string interpolation instead of join()repo_context.ts:1207: `.locks/${namespace}.lock` uses a forward-slash literal. Per CLAUDE.md, path operations should use @std/path. However, since lockKey is a logical key that also flows to cloud backends (S3) where forward slashes are canonical, this might be intentional. If it is, a brief inline comment noting the cross-backend rationale would prevent future reviewers from "fixing" it.

What looks good

  • Test coverage: 11 unit tests + 3 integration test cases covering solo, namespaced, and two-namespace-shared-datastore scenarios. The assertPathEquals helper is used for cross-platform correctness.
  • catalogDbPath centralization: Eliminates the old drift-prone pattern where datastore compact independently recomputed the catalog path. The localPath usage ensures the catalog is never namespace-prefixed.
  • namespaceFromResolver helper: Clean extraction that keeps the 13 direct construction sites from diverging on namespace derivation.
  • Design doc updates: Thorough documentation of the namespace prefixing strategy, lock partitioning, and the known cosmetic artifact (empty un-namespaced dirs from per-model locks).
  • Libswamp import boundary: All CLI commands import from src/libswamp/mod.ts or infrastructure directly — no internal libswamp path violations.
  • License headers: Present on the new integration test file.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Reviewed all 28 changed files. Traced namespace flow from config resolution → path resolver → repository factory → all 13 direct FileSystemUnifiedDataRepository construction sites + 5 WorkflowExecutionService callers + DefaultStepExecutor.buildDeps. Verified lock construction consistency across structural-command acquire, acquireModelLocks drain-inspect, TOCTOU retry, push-lock, and breakglass commands.

Critical / High

None.

Medium

  1. Stale PHASE 3 DEPENDENCY commentsrc/infrastructure/persistence/unified_data_repository.ts:84-96: The docstring on the namespace constructor parameter says "PHASE 3 DEPENDENCY: those direct construction sites … must thread the configured namespace through … This is a tracked dependency, not an oversight." This PR is Phase 3 and completes that work — every direct site now passes namespaceFromResolver(datastoreResolver). The comment is misleading; future readers will think the work is still pending. Should be updated to reflect completion.

Low

  1. Namespace validation timing in datastoreGlobalLockOptions and DefaultDatastorePathResolversrc/cli/repo_context.ts:1202-1208, src/infrastructure/persistence/default_datastore_path_resolver.ts:52,84: Both use config.namespace as a raw string (interpolated into .locks/${namespace}.lock and join(base, namespace, …)) before createNamespace() validation runs in createRepositoryContext (line 356 of repository_factory.ts). In requireInitializedRepo, the lock is acquired (via registerDatastoreSync) before the factory validates. An invalid namespace (e.g. containing /) would create a lock file at an unexpected path, then createNamespace would throw. Mitigated by: (a) the config is a local file — this is self-inflicted; (b) createNamespace rejects all path-traversal characters (/, ., spaces) via [a-z0-9][a-z0-9-]*; (c) the orphaned lock would eventually be cleaned up by staleness detection. Still, validating the namespace earlier (e.g. in resolveDatastoreConfig at src/cli/resolve_datastore.ts:290) would close the window entirely.

  2. waitForPerModelLocks scans un-namespaced rootsrc/cli/repo_context.ts:553,574: In a namespaced repo, per-model locks live at {ds}/data/{type}/{id}/.lock (un-namespaced), so waitForPerModelLocks(datastoreConfig.path) drains writers across all namespaces, not just the structural command's own. This is documented as "conservative over-wait" in design/datastores.md and is correctness-safe (data dirs are partitioned, catalogs are repo-local, global locks are namespaced). Noting it for completeness.

Verdict

PASS — The namespace threading is consistent and complete across all code paths. The catalog relocation from resolvePathlocalPath is correct and well-tested. Lock coordination properly uses datastoreGlobalLockOptions at every global-lock site. The integration test covers solo, namespaced, and shared-datastore scenarios. The one stale comment (Medium #1) is cosmetic; the validation timing (Low #1) is theoretical given the input source.

@stack72 stack72 merged commit 09c0ab5 into main May 29, 2026
11 checks passed
@stack72 stack72 deleted the worktree-486 branch May 29, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant