fix(docs): contract-compliant BlobAttachment + defensive read + heal 16 malformed rows (T11262)#812
Merged
Merged
Conversation
…16 malformed rows (T11262)
`cleo docs fetch <slug>` and `cleo docs list --type <kind>` returned E_INTERNAL:
"Cannot read properties of undefined (reading 'split')" whenever a malformed
blob attachment row was encountered. Root cause: the writer at
`docs-update.ts:529-535` emitted a non-contract shape `{kind, name, mime, size,
blobId}` instead of the canonical `{kind, sha256, storageKey, mime, size}` from
`@cleocode/contracts`. A single poisoned row poisoned project-scope listings
because `resolveAllFromTasksDb` iterates eagerly and never recovers per-row.
Fix:
- `docs-read-model.ts:extractBlobName` — defensive read, falls back to legacy
`name` field when `storageKey` is undefined/empty rather than throwing.
- `docs-update.ts` — writer now emits the canonical BlobAttachment shape and
validates against `blobAttachmentSchema` before persisting.
- `attachment-store.ts:put` — computes the canonical `storageKey` for blob
kinds at the chokepoint (overriding caller placeholders) and validates the
full attachment against `attachmentSchema` before insert. Every future
writer is now contract-compliant by construction.
- `scripts/heal-malformed-blob-attachments.mjs` — one-shot, idempotent heal
for the 16 in-DB rows where `storageKey IS NULL`. Reports `Healed N rows`,
rerun reports `Healed 0 rows`.
- `blob-attachment-contract-T11262.test.ts` — 5 focused unit tests: writer
emits contract-compliant JSON, Zod rejects both legacy `{name, blobId}` and
`storageKey:''` shapes, JSON round-trip preserves the contract, defensive
read survives the malformed shape.
Verified live: `cleo docs fetch db-substrate-pglite-vision-2026-05-28`,
`cleo docs list --type research`, and `cleo docs list --task T11242` all
return success after running the heal script.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…okepoint The PM-Core V2 migration (c636c66, T11013/T11018) routed getDbPath and ~40 other .cleo-path resolvers through resolveProjectByCwd + resolveCanonicalCleoDir (projectId -> nexus project_registry lookup) but did NOT carry forward the absolute-CLEO_DIR override that getProjectRoot still honors. Because the unit harness pins NEXUS_HOME to an empty per-fork temp sandbox, every test that sets CLEO_DIR=/tmp/.../.cleo (156 files) and any code reaching getDb threw E_PROJECT_NOT_FOUND across the suite. The breakage was masked on main because every commit since was docs-only and the path-filter skips the unit-test job. Restore the documented backward-compat contract at the single chokepoint: - resolveProjectByCwd: when CLEO_DIR is absolute, return a deterministic projectId derived from its parent (so it never throws on empty registry). - resolveCanonicalCleoDir: when CLEO_DIR is absolute, return it verbatim (bypassing the nexus lookup), mirroring getProjectRoot's precedence. Verified: paths resolution chokepoint tests 91/91 pass; session-store, decisions, attachment-store go red->green; no regression (project-info's 2 failures are pre-existing on origin/main, unrelated to CLEO_DIR). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (T11011 fallout) remote.test.ts and dream-status.test.ts mocked the deprecated getCleoDirAbsolute export, but remote/index.ts and dream-cycle.ts were migrated to resolveProjectByCwd + resolveCanonicalCleoDir (T11011) — vitest threw 'No export defined on the mock'. Provide the migrated exports. 37/37 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…no id round-trip (T11262) Root fix for the PM-Core V2 resolution regression that broke the unit suite (129 failing files) and was masked because docs-only commits skip the unit-test CI job. The migration introduced a second, lossy resolution chain: resolveProjectByCwd(cwd) derived a projectId from <root>/.cleo/project-info.json then resolveCanonicalCleoDir(projectId) re-resolved that id to a path ONLY via the nexus project_registry — throwing E_PROJECT_NOT_FOUND for any project that exists on disk but is not registered (every per-fork test sandbox + fresh checkouts). It also dropped the long-standing 'presence of .cleo/ = project' and absolute-CLEO_DIR contracts. Introduce a single canonical chokepoint resolveCleoDir(cwd): string with one precedence and NO id round-trip / nexus dependency for the local case: 1. active worktree scope 2. absolute CLEO_DIR override 3. nearest ancestor with a .cleo/ dir -> <root>/.cleo (direct) 4. nexus registry (cross-project id lookup) 5. throw Reroute getDbPath, getBrainDbPath and ~30 other .cleo resolvers from the lossy two-step to resolveCleoDir. project-detection.ts health checks use a direct join(projectRoot, '.cleo') (they inspect an explicit root, not cwd). The projectId-based resolveProjectByCwd/resolveCanonicalCleoDir remain for genuine cross-project id lookups. Verified: 98 of 128 originally-failing files go green; paths resolution suite 91/91; correct with project-info.json both present and absent; zero regressions (the 2 not-previously-listed failures are pre-existing on origin/main / a known flaky slow test). Remaining ~30 failures are unrelated pre-existing causes (nexus identity, release fixtures, evidence, migrations). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1262) The canonical resolveCleoDir omitted the worktree-gitlink resolution that getProjectRoot and the pre-refactor resolveProjectByCwd both had: when cwd is inside a git worktree (.git is a FILE pointing to <mainRepo>/.git/worktrees/), the canonical project root is the MAIN repo, which has the .cleo/. Worktrees live separately (under <cleoHome>/worktrees/) with no .cleo of their own, so the ancestor walk never finds one. Add step 3b mirroring _resolveMainRepoFromGitlink (T9092/T11034) so worktree-resident callers resolve the main repo's .cleo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ble (T11262/E13) loadOwningTaskStatuses hard-threw (via openCleoDb->resolveCleoDir) when no .cleo project resolved from projectRoot, failing the entire 'cleo worktree list/prune' even though task-status is OPTIONAL enrichment — git worktree list is the primary source. Make the tasks-DB open best-effort: degrade to null statuses instead of throwing. Fixes worktree-list/prune/force-unlock integration tests (13/13). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ktree-prune, E13) The mock factory only stubbed execFileSync; the worktree status-enrichment path (list.ts) transitively imported by prune calls execFile, so vitest threw 'No execFile export on the mock'. 12/12 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ks (T11280) E13 unit-test stabilization tail. Two root-cause fixes + stale-mock repair: 1. probeAndMarkApplied: suppress the index existence probe for ANY migration that performs a table rebuild/rename (renameMap > 0), not only "rebuild-only" ones. wave0-schema-hardening creates idx_task_relations_related_to which t9519/t10571 later DROP when rebuilding task_relations with a new PK — so probing that index reported wave0 as un-applied, causing migrate() to destructively re-run the rebuild and crash on "table sessions already exists". Final-table presence is the only reliable evidence for a rebuild migration. 2. migration-sqlite topo-sort fixture: T003 changed task->subtask so the 3-level hierarchy obeys the PM-Core V2 parent-type matrix (task->subtask only). 3. resolveCleoDir mocks: remote.test + dream-status mocks now expose the canonical resolveCleoDir helper their sources import after the T11262 refactor. 4. sqlite.test path-validation: pre-create .cleo/ in temp dirs so resolveCleoDir resolves them (no orphan synthesis, T9803/T11262). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… setup (T11280)
Root-cause fix for sqlite-lazy-init.test.ts ("importing sqlite.ts does NOT
require node:sqlite at module-load time", T1331). Three eager node:sqlite
loaders had crept into sqlite.ts's static import graph:
1. @cleocode/paths (cleo-paths.ts): eager `import { DatabaseSync } from
'node:sqlite'` → lazy createRequire; only resolveCanonicalCleoDir's
nexus-registry lookup needs it.
2. core paths.ts: same eager import (added T11022) → lazy createRequire.
3. core sqlite.ts: eager `import { drizzle } from 'drizzle-orm/node-sqlite'`
— the drizzle v1-beta driver statically imports node:sqlite — → lazy
_getDrizzle() via createRequire at the two runtime call sites.
All mirror the established getDbSyncConstructor pattern in sqlite-native.ts.
Also: relations/parallel-task-update/open-cleo-db tests now pre-create `.cleo/`
in their temp dirs so the canonical resolveCleoDir SSoT resolves them as project
roots (no orphan synthesis — T11262/T9803).
Full src/store/ suite: 108 files / 1352 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oDir (T11280) - New paths.tryResolveCleoDir(cwd): string|null — non-throwing resolveCleoDir for genuinely-optional `.cleo/` lookups. loadProjectContext (variable-substitution) now uses it so resolveToolCommand degrades gracefully (cwd marker-file primaryType detection) outside a CLEO project instead of throwing E_NO_PROJECT. - Test fixtures updated to obey the PM-Core V2 parent-type matrix (saga→epic→task→subtask): coordination-parent children → subtask; generic-tree cycle chain → epic/task/subtask + sagas typed 'saga'; task-reopen-policy adds makeSaga for the saga root. - sg-display-preservation: assert the INTENDED post-T10638 behavior — the retired legacy label shape (type=epic+labels:['saga']) is no longer a saga. - task-engine: AC-coverage gate now runs inside the write transaction, so the test's tx mock gains getAcRows/getAcBindings/insertAcBindings. - workgraph-architecture: allowlist workgraph/relations.ts (it references the groups-as-hierarchy pattern to FORBID it, not to traverse via groups). - relates/update-relates/e3-absorbed-closure: pre-create `.cleo/` in temp dirs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ul project-health (T11280) Real product bug: the T11262 resolveCleoDir refactor (02fcd3c) corrupted two CheckResult functions in scaffold/project-detection.ts — a stray `return join(...)` at the top of checkLogDir and checkWorktreeInclude short-circuited each function, returning a STRING instead of a CheckResult and leaving the real logic as dead code (referencing undefined `logDir`/`legacyPath`). The bug was masked by a `@ts-nocheck` added in a3c0e4b. Fixed both functions (restored the local path bindings) and removed the now-unnecessary `@ts-nocheck` so the type checker guards this file again. This is what surfaced as `doctorProject` emitting a non-CheckResult string entry (project-tools.test.ts: "check entries have a string id field"). Also: checkProjectHealth uses tryResolveCleoDir + `<root>/.cleo` fallback so a reachable project dir with no `.cleo/` yet reports overall=unknown instead of throwing E_NO_PROJECT (project-health.test.ts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wording (T11280) c636c66 (PM-Core V2) intentionally extended ensureProjectInfo to backfill BOTH projectId AND name, changing the repair message from 'Added projectId' to the generic 'Backfilled missing fields'. Updated the stale characterization + project-info tests to the new intended behavior: assert 'Backfilled missing fields', and seed `name` in the skip-path fixtures so a complete record truly returns action='skipped'. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Real product bug: registerProjectOnEncounter (paths.ts) inserted the registry row with projectHash = canonicalResult.components.gitRoot (a PATH) instead of the canonical sha256 hash. This polluted project_registry with a path in the hash column, producing spurious "Project name already exists" collisions when a later nexusRegister computed the real hash for the same project. Now uses generateProjectHash(resolvedPath), consistent with nexusRegister. Test setup: tasks-bridge, living-brain (+ its absent-substrate cases), task-sweeper-wired, and registry "empty project" now pre-create `.cleo/` in their temp dirs so the canonical resolveCleoDir SSoT resolves them (T11262). Note: the reconcile.test.ts (4) + nexus-e2e "reconcile creates audit entries" (1) failures are a SEPARATE pre-existing-on-main bug — the unawaited registerProjectOnEncounter fire-and-forget races nexusReconcile's auto-register (project_path UNIQUE / canonicalId-vs-info-UUID identity conflict). Filed under T11280; out of scope for E13 path-resolution. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (T11280) - plan-saga-leaf-changelog: saga fixtures use type='saga' + parent_id member containment (T10638 switched saga detection to isSagaShape/type='saga' and membership from task_relations.groups to parent_id); E_EPIC_EMPTY fixture retyped to a memberless type='saga'. - plan E_EVIDENCE_INSUFFICIENT: child set to status='active' so it is NOT grandfathered past the evidence gate (T9758/5d60cbc4a grandfathers already-done tasks per ADR-051 §11.1). - aggregator-ssot-first / release-engine / release-push-guard: pre-create `.cleo/` in temp dirs so resolveCleoDir resolves them deterministically (no reliance on a CLEO_DIR env leaked by a prior test). Full src/release/: 39 files / 347 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Core V2 (T11280) Real product bug: the T11262 resolveCleoDir refactor corrupted migrateConsensusFiles and migrateContributionFiles in lifecycle/consolidate-rcasd.ts — a stray `return join(resolveCleoDir(cwd), '<dir>')` short-circuited each, returning a path STRING instead of MoveRecord[] and leaving the real logic as dead code (undefined consensusDir/contribDir). Masked by `@ts-nocheck`. Fixed both (restored the local dir bindings) and removed `@ts-nocheck` so the type checker guards the file. doctor saga-audit + invariant-audit: fixtures retyped to PM-Core V2 `type='saga'` with parent_id membership; added a neutralizeSagaStructuralGuards helper that drops the parent-type-matrix triggers AND strips the chk_tasks_saga_no_parent CHECK (via writable_schema) so the audits can be tested against the deliberately-malformed saga-with-parent (I5) and nested-saga (I7) data they exist to detect. changesets/dual-write: pre-create `.cleo/` in temp dir (T11262). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hema (T11280)
Real source bug: migrateSagaContainment read the raw DB via a non-existent
`(accessor as {db}).db` property — the SqliteDataAccessor exposes no such handle,
so every call returned E_GENERAL 'Database handle not available'. Now acquires the
canonical native handle via getNativeTasksDb() (after getTaskAccessor opens the
singleton) and runs ad-hoc reads through its prepared-statement .all().
Test: createRawTestDb now builds the schema via the real migration path
(createSqliteDataAccessor) instead of a hand-rolled partial schema (which diverged
from the migration chain and triggered a destructive wave0 `sessions` rebuild that
failed on a missing `name` column). Drops the task_relations_non_containment
triggers so the LEGACY parent-child groups edges the migration cleans up can be
seeded. Corrected the idempotency assertion: a second run is a safe no-op
(migrated=0, skipped=0) since run 1 removes the groups relation.
Full src/sagas/: 8 files / 67 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g asserts (T11280) - attachment-store.put: derive `size` (bytes) from the buffer at the write chokepoint for blob/local-file kinds (an explicit size still wins), matching the sha256/storageKey injection. The canonical attachmentSchema requires a non-negative `size`; callers (search-docs publish, changeset writer, ivtr-loop, docs-add) need not pass it. Fixes the ZodError in search-all-project-docs. - init-e2e: config version assertion 2.10.0 -> 2.11.0 (PM-Core V2 createDefaultConfig). - help-tier-snapshot: refreshed the operation-count regression-lock (294/413 -> 295/414) for the legitimately-added `docs llm-output` operation. - paths-debug: assert the canonical sha256-derived projectId shape (12-hex) that resolveProjectByCwd returns from a worktree, not the raw project-info field. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tighten the chk_tasks_saga_no_parent guard in saga-audit/invariant-audit tests to a single typeof narrowing — clears the two lint/complexity/useOptionalChain warnings introduced by neutralizeSagaStructuralGuards. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… stabilization for next release (T11262/T11280) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…--no-ff SSoT (T11282/T11124)
The TS napi binding integrateWorktree (T11124) and 4 worktree integration tests
(spawn/worktree-merge, spawn/worktree-audit, orchestrate/worktree-complete,
worktree-complete-auto) were shipped expecting a native integrate_worktree, but
worktrunk-core only had provision/destroy — getNative().integrateWorktree was
undefined ('not a function'). Implement it:
- worktrunk-core/git_wt.rs: integrate_worktree() runs git merge --no-ff of the
worktree branch into target_branch (preserving agent commit SHAs per ADR-062),
embeds the task ID in the merge-commit subject, returns IntegrateOutcome
{task_id, target_branch, merged, merge_commit, commit_count, rebased, error}.
Branch-missing -> merged:false/error; zero-commits -> merged:true + empty
mergeCommit; conflict -> abort + merged:false. HEAD left on target_branch.
- worktree-napi/lib.rs: IntegrateOpts/IntegrateResult napi structs + thin wrapper.
CI rebuilds the napi from the crate; local verification pending (stale target/
proc-macro2 metadata mismatch blocks a local incremental build — env, not code).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ate)
The summary value contained an unquoted colon ('(pre-existing PM-Core V2
breakage): migration...') which YAML parses as a nested mapping, failing
lint-changesets and the Release Readiness Preflight check (8 pass / 1 fail).
Quoting the scalar fixes it: 182/182 changesets now validate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…party better-sqlite3; raise Node floor to 24.16 (T11242)
Owner-ratified substrate driver decision (2026-05-29): remain on first-party
node:sqlite (DatabaseSync); do NOT adopt better-sqlite3. SQLite 3.53.0 already
carries the WAL-reset corruption fix (epic:T1075 class) and node:sqlite keeps the
persistence layer zero-native-dependency.
- Convert the only 2 first-party better-sqlite3 references to node:sqlite:
- scripts/backfill-pipeline-stages.ts: Database -> DatabaseSync; db.transaction()
(a better-sqlite3-only API) -> explicit BEGIN/COMMIT/ROLLBACK.
- nexus/__tests__/tasks-bridge.test.ts: 'better-sqlite3' default type import ->
'node:sqlite' named type; DatabaseSync.Database -> DatabaseSync (node:sqlite
exports DatabaseSync as a class, not a namespace).
- specs/sqlite-pragmas.json: resolve the T10313 held pin -> 3.53.0 / Node 24.16.0
with the WAL-reset-fix-confirmed rationale.
- engines.node floor 24.13.0 -> 24.16.0 (root, core, cleo) to close the
silent-corruption window (Node 24.13/24.14 ship pre-fix SQLite 3.51.x).
- sqlite-version-pin.test.ts: add the WAL-reset-fix floor assertion (SQLite
>=3.53.0 on Node >=24.16.0).
Verified: zero first-party better-sqlite3 imports remain; tsc @cleocode/core clean;
sqlite-version-pin.test 4/4.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t non-compliant writer (T11262) E9 (Saga T11242 docs SSoT pre-cutover hardening). The docs-fetch READ bug was already fixed (ce2ae31); this finishes E9 so EVERY existing + future blob row satisfies attachmentSchema's storageKey:z.string().min(1) — required by the T11242 exodus round-trip (E5/E2). - New idempotent data migration 20260529000075_t11262-heal-blob-storagekey: heals every kind:'blob' row with storageKey='' to the canonical CAS path <sha[:2]>/<sha[2:]><extFromMime(mime)>, mirroring attachment-store.ts blobPath. Verified on the operator tasks.db copy: 2476 -> 0; idempotent 2nd run = 0; every healed key points to a real on-disk file (2476/2476). - T10165-backfill-adr-index.ts: the last writer emitting storageKey='' now computes the contract path + runs blobAttachmentSchema.parse() before insert (matching docs-update.ts:545). All 3 blob-insert sites now validate. - biome import-order touch-up on the prior node:sqlite script conversion. Verified: tsc @cleocode/core clean; heal SQL validated on a tasks.db copy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t (T11281) Owner directive (2026-05-29): a project's projectId is assigned ONCE and is its identity for life, regardless of which nexus it registers with; projectHash is the dual path-fingerprint that updates on relocation. registerProjectOnEncounter previously registered the PATH-DERIVED canonical id as the registry projectId — which mutates on move, contradicting the immutable-id contract and producing the T11281 'identity conflict: hash registered to <canonical> but project has <stored-id>' against nexusReconcile (which correctly keys on the stored project-info id). Fix: - Register the IMMUTABLE stored id (infoProjectId) as projectId; fall back to the canonical id only on first assignment (no stored id yet). - On encounter at a new path (same immutable id) update projectPath AND projectHash — seamless move/rename/export-import. - Record the path-derived canonical id + legacy ids as ALIASES -> immutable id, so lookups by any token still resolve via the alias table (backward compatible). - onConflictDoNothing on the insert: the fire-and-forget auto-register (T11021) is best-effort/idempotent and must not throw the project_path UNIQUE race. - CLEO_DISABLE_PROJECT_AUTOREGISTER (default off; production always auto-registers): reconcile/registry UNIT tests set it so the encounter-time side-effect doesn't pre-register the fixture out from under explicit registration assertions. Verified: reconcile 5/5, nexus-e2e-registry 25/25, full nexus suite 372 pass / 0 regressions; tsc @cleocode/core clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ntract (T11281)
The T11021 tests asserted the superseded path-derived-canonical-id model — most
directly AC5 ('registers at new path with NEW canonical ID when directory moves',
asserting cid2 != cid1), which is the literal opposite of the owner's immutable-id
directive. Updated to the T11281 contract:
- AC1/AC3: the registered projectId IS the immutable stored project-info id
(verbatim), not a path-derived 12-hex id.
- AC3: the path-derived canonical id is recorded as an ALIAS — resolveProjectById
resolves by it to the same immutable-id row.
- AC5: a move RETAINS the immutable id; the single row is updated in place
(new path + new projectHash); no second row lingers at the old path.
These tests asserted obsolete behavior — the product is correct per T11281; the
tests were the stale side. Verified 7/7.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cwd-default test CI-deterministic - contracts/operations-registry: regenerate the OPERATIONS snapshot to the current intended registry (827 ops). The drift is pre-existing PM-Core V2 (operations added without updating the snapshot; reproduces on clean main), not from this branch. - paths/cleo-paths 'uses process.cwd()': the test asserted the AMBIENT repo is a registered project, but .cleo/project-info.json is gitignored and ABSENT in CI -> resolveProjectByCwd() returned null -> failure. Drive it from a deterministic fixture + chdir + restore so it no longer depends on resolution working by accident (T11281 controlled-test-env principle). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ization)
brain-stdp-functional (real-CLI/real-brain.db) and performance-safety (hard
wall-clock <50ms/<200ms assertions) pass in isolation but flake under the full
parallel shard sweep (CPU + SQLITE_BUSY contention). CI only passes --retry=2 to
macOS shard 1, so add a test-level { retry: 2 } guard covering the other shards.
Pre-existing flakes (documented known-main flakes), not introduced by this branch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#812 green) Root-cause fixes for content/fixture drift PM-Core V2 left broken on main (none from this branch; diagnosed via parallel workflow investigators): - ct-cleo SKILL.md: commit 40cc928 (T10645) rewrote PM-Core V2 sections and dropped 6 doctrine markers the test requires -> restore T10638 groups-migration note, single-line 'MUST NOT satisfy containment' I3 phrase, typed-AC prose, T10639 note, T10632-34 + T10629-31 provenance; fix one test asserting a non-existent 'cleo graph validate' command. - CLEO-INJECTION.md: hygiene commit ebd9376 dropped 'arbitrary external absolute paths' + 'DocKindRegistry' path-policy doctrine -> restore. - lint-no-cwd-walkup.test.mjs: fixture hardcoded upgrade.ts (migrated to 0 violations by 4c680ed/T11011) -> swap to still-unmigrated project-info.ts. Verified: ct-cleo-skill + injection-content 44/44; injection-mvi-tiers 21/21; lint-no-cwd-walkup 30/30 isolated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…T11281 cross-file leak)
Root-cause fix for the cross-file test-isolation leak that failed saga-audit +
invariant-audit in CI's full-workspace shard (NOT per-package). The
brain-tool-complete PostToolUse capture + its setImmediate observers call
observeBrain->getBrainDb->resolveCleoDir; when a deferred observer runs after a
sibling test deleted CLEO_DIR + rm'd its tmpdir, resolveCleoDir throws
NEXUS_PROJECT_NOT_FOUND ('No CLEO project found') — masked locally by the
gitignored .cleo/project-info.json, uncaught in CI (absent) -> collateral-failed
the running test file.
A best-effort telemetry hook must never crash its host when there is no project
to observe into. Add isNoProjectError() guard (mirrors isMissingBrainSchemaError)
and swallow it in both handleToolStart/handleToolComplete catches. Narrow, tiny
blast radius; also a production-robustness fix. (Replaces the reverted over-broad
vitest.setup CLEO_ROOT-pin harness that broke 226 tests.)
Verified: full-workspace faithful-repro --shard=2/2 (project-info hidden = CI
condition) -> saga-audit + invariant-audit NO LONGER in the failure set; tsc clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…master surgery (T11281) The saga-audit + invariant-audit suites neutralize the tasks-table structural guards so they can seed deliberately invariant-violating fixtures (a saga with a parent_id for I5, a saga inside a saga for I7). The helper did this via `PRAGMA writable_schema=ON; UPDATE sqlite_master SET sql=… ` to strip the chk_tasks_saga_no_parent CHECK. node:sqlite @ SQLite 3.53.0 (standardized this branch in 6620e8e / T11242, engines.node >=24.16.0) runs in DEFENSIVE mode and rejects that write with "table sqlite_master may not be modified" — even under writable_schema=ON. CI (Node 24.16+/SQLite 3.53.0) failed all 12 tests; local (Node 24.13.1/SQLite 3.51.2) passed because the older driver permitted the sqlite_master write — a SQLite-version local≠CI trap. Replace the schema surgery with `PRAGMA ignore_check_constraints=ON`, which disables CHECK enforcement on the connection without mutating sqlite_master. Version-agnostic (verified on 3.51.2; defensive-mode-proof for 3.53.0) and simpler — the DROP TRIGGER lines are retained. Production guards are untouched. Fixes (CI run 26646441003): saga-audit.test.ts (6) + invariant-audit.test.ts (6). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1281)
Adds `canonicalizePath()` to @cleocode/paths as the single path-canonicalization
entry point for the CLEO SDK: realpath (resolves symlinks + mount/alias
divergence — macOS /var→/private/var, Windows drive-case/8.3, Linux bind-mounts)
with a lexical fallback when the target does not exist on disk.
Closes two CI failures (run 26646441003) rooted in ad-hoc realpath handling:
1. computeCanonicalProjectId(repoPath) called realpathSync(resolve(repoPath))
raw. The cleo-paths test passes the literal '/mnt/projects/cleocode' (the dev
machine's own repo path) — it exists locally (realpath OK) but not on CI
(ENOENT). A canonical-id computation must not crash on a moved/absent path;
it now falls back to the lexical path. (local≠CI: hardcoded dev path.)
2. resolveProjectByCwd's inline realpath-with-fallback is replaced by the SSoT
helper. Its tests realpath-normalize their tmpdir fixture the same way the
function does, fixing the macOS /var vs /private/var assertion mismatch.
Routes both functions + the tests through canonicalizePath — no more scattered
realpathSync for path identity. Verified: canonicalizePath('/nonexistent') no
longer throws; cleo-paths.test.ts 35/35.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1281) Root cause of the 24.13.1-ran-CLEO-anyway class: four version declarations, none enforcing the real floor. The CLI guard (cli/index.ts) and doctor's checkNode() both compared MAJOR ONLY (major < 24 / major >= 24) against a hardcoded literal, while the true floor is engines.node >=24.16.0 (where bundled SQLite 3.53.0's WAL-reset corruption fix landed). 24.13.1 → major 24 → waved through, then the node:sqlite persistence layer diverged from CI (sqlite_master DEFENSIVE handling, etc.). engines.node also drifted across packages (>=24.0.0 / >=22 / >=20 / absent). The gate (one place, dynamic SSoT, consumed everywhere): - packages/paths/src/node-version-gate.ts — `evaluateNodeVersion()` (pure, full-semver compare; testable) + `enforceNodeVersion()` (no-op when compliant; else OS/manager-aware guidance + exit 1). The floor is READ AT RUNTIME from @cleocode/paths' own engines.node — bumping it is one root-package.json edit. Lives in paths: the zero-dep leaf importable before @cleocode/core, so an under-floor Node fails with guidance, not at node:sqlite load (paths does not eagerly load node:sqlite — lazy via createRequire). - cli/index.ts: the major-only guard → single `enforceNodeVersion()` call. This is the line that now stops 24.13.1. - core/system/dependencies.ts checkNode(): delegates to `evaluateNodeVersion()`, deleting the major-only literal (+ now-unused parseMajorVersion). - engines.node synced to >=24.16.0 across all 23 workspace packages (was the drift; paths' own value is what the gate reads, so this is correctness, not cosmetics). - scripts/lint-node-engine-ssot.mjs: CI guardrail asserting every package's engines.node === root + FALLBACK_MIN_NODE matches. Floor bumps stay one edit. Auto-install: default `enforce` (hard-fail + exact copy-paste command); opt-in `CLEO_NODE_AUTO_UPGRADE=1` runs the install then still exits non-zero — version managers switch Node via shell shims, so a child process cannot hot-swap the running interpreter. Silent toolchain mutation is deliberately never the default. Tests: 14/14 (24.13.1 fails, 24.16.0/26.x pass, malformed fails-closed, enforce exits 1, warn continues, no-op when compliant). Verified getRequiredNodeVersion() reads 24.16.0 from the SSoT; lint green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the engines.node SSoT lint (Gate 8) to the Arch Boundary Check workflow so floor drift is caught in CI, and documents it in the AGENTS.md architectural gate table. arch-boundary-check.yml is not template-managed (only release-*.yml are), so this does not affect the Deployed Template Parity gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…path (T11281) worktree-identity.test.ts created its fixture under mkdtempSync(tmpdir()). On macOS tmpdir() is /var/folders/… but realpath resolves to /private/var/folders/…, so getCleoDirAbsolute's realpath-normalized worktree→main-repo resolution diverged from the raw-path assertions (macos shard-1 FAILs in run 26646441003: "getCleoDirAbsolute from worktree paths" ×N). Canonicalize the fixture root at creation via the @cleocode/paths canonicalizePath SSoT so git init, project-info, the worktree gitlink, and resolution all agree. Root cause C; macOS-only (CI is the arbiter — /private/var does not reproduce on Linux). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
macOS shard 2 was the only unit shard without --retry=2. It failed run 26649981753 with a single non-deterministic `EnvironmentTeardownError: Closing rpc while onUserConsoleLog was pending` — the Darwin worker-teardown race where deferred best-effort ops (brain observers, worktree integration, intelligence hooks) emit console output as the vitest worker closes. All 541 test files passed; only the teardown race failed the job. This is the pre-existing T10490 flake class already guarded on macOS shard 1. Briefing §8.5 explicitly prescribed standardizing the macOS retry beyond shard 1. Both ubuntu shards are deterministically green (saga/invariant audit + computeCanonicalProjectId + pipeline-stage all pass). Root cause (deferred-op test-boundary escape) remains tracked under T10490; retry is the interim guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ce (T11281 · T10490) macOS shard 2 kept failing with a single `EnvironmentTeardownError: Closing rpc while "onUserConsoleLog" was pending` (runs 26649981753, 26650685810) while all 541 test files passed. Root mechanism: CLEO's best-effort telemetry (setImmediate brain observers + intelligence/worktree hooks dispatched via Promise.allSettled) keeps logging AFTER a test file finishes; on Darwin timing the in-flight onUserConsoleLog RPC races worker teardown → unhandled rejection → exit 1. --retry (added for shard 2 in the prior commit) cannot help: there is no failing TEST to retry, only a post-run unhandled error. disableConsoleIntercept routes worker console output straight to the terminal (no RPC to race). Output is still inherited into the teed shard log so the schema-warning-budget gate keeps working, and real test failures / in-test unhandled rejections are unaffected. The deeper root cause — deferred best-effort ops escaping the test boundary — stays tracked under T10490; this removes the infra race that was failing the shard. Verified config accepted locally (14/14). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n-RPC race (T11281 · T10490)" This reverts commit 0dfc0db.
…(T11281 · T10490) Root cause of the macOS/ubuntu shard-2 `EnvironmentTeardownError: Closing rpc while "onUserConsoleLog" was pending` unhandled rejection (runs 26649981753, 26650685810, 26651270712): CLEO's best-effort brain side-effects, fired from deferred/fire-and-forget observers (setImmediate brain captures, runObserver, Step-9f prune sweep, detectSupersession), keep logging AFTER a test file completes via raw `console.warn`/`console.error`. vitest intercepts console.* and forwards each call to the main process over the `onUserConsoleLog` RPC; on a post-teardown log the in-flight RPC races the worker close → a single unhandled rejection fails the shard even though all 541 test files pass. `--retry` cannot help (no failing test) and root-level `disableConsoleIntercept` did not propagate to the workspace project runs (reverted in 18cc118). Route these logs through the existing pino logger (`getLogger(subsystem)`), which writes straight to stderr and bypasses vitest's console interceptor entirely — which is precisely why the structured `{level,subsystem}` logs never triggered the race. Diagnostics are preserved (now as structured JSON). Converted the confirmed deferred sources in graph-auto-populate (6, incl. the ensureTaskNode path in the unhandled-rejection stacks), observer-reflector (3), brain-maintenance (4 incl. prune-sweep), temporal-supersession (1). The broader deferred-op console→logger sweep remains tracked under T10490. Verified: core builds, brain-automation 54/54. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1281)
CI used `node-version: '24'` (latest cached 24.x) — non-deterministic runner
roulette: the macOS runner resolved 24.15.0 (< the 24.16.0 engines floor; pnpm
warned "Unsupported engine"). The new enforceNodeVersion() gate then correctly
blocked the real `cleo` binary that brain-stdp-functional.test.ts spawns
("cleo memory dream exited 1: cleo requires Node.js >= 24.16.0", macos shard 1,
run 26652046242). ubuntu resolved 24.16+ and passed — confirming the deferred
console→pino fix cleared the teardown-RPC race (ubuntu+macos shard 2 now green).
Pin all ci.yml node-version entries to 24.16.0 so every runner meets the floor
the gate + engines enforce — and extend lint-node-engine-ssot to assert ci.yml's
node-version equals the root floor, closing the SSoT loop: engines.node →
gate enforces → CI runs it → lint keeps CI in lock-step. Also removes the
24.15-vs-24.16 SQLite-version nondeterminism that masked the original failures.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…T11242) llmtxt pins -> ^2026.5.15 (root/cleo/core/studio) — consumes the fully node:sqlite + rc.3 SDK release. drizzle-orm/drizzle-kit beta.22/beta.19 -> 1.0.0-rc.3 (root/core/nexus/playbooks) to satisfy llmtxt's peer + align the substrate. Pure version bump — rc.3 required zero source changes (build clean). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…11294) The Manual Write Sweep CI job (T10372) self-cancelled on every PR #812 run, blocking the merge. Root cause: `cleo docs list --json` truncated the content hash to 8 chars in the machine envelope (a display hack that leaked into the data layer via a `_sortSha`/drop pattern), so scripts/sweep-manual-doc-writes.mjs had to spawn `cleo docs fetch <sha>` once per doc to recover the full digest. On PR #812 that is 1963 docs x 7-10s cold start (T11292) = 229-327 min — far past the job's timeout-minutes:5, after which workflow-global cancel-in-progress killed it. Fix (root cause, no band-aid): - packages/cleo/src/dispatch/domains/docs.ts: emit the FULL 64-hex sha256 in the docs.list JSON for both project + owner scopes; drop the _sortSha hack and sort by sha256 directly. Truncation is a render concern, not data. - packages/contracts/src/operations/docs.ts: DocsAttachmentRow.sha256 doc now states the field is the full digest. - scripts/sweep-manual-doc-writes.mjs: build an in-memory full-sha index from the single `docs list` call; classify() resolves in-sync via map lookup with ZERO per-file spawns. 1964 spawns -> 1. Verified: real sweep over 1963 files now completes in 14.2s (was never); sweep unit test 4/4; tsc clean. docs-integration test asserts full sha. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…0490) addTask/completeTask kicked off best-effort graph + LOOM writes as orphaned import().then().catch() promises. Under the vitest forks pool a detached op from one test could still be in flight when the next test reset the shared SQLite singleton (store/sqlite.ts _db/_nativeDb), landing on the new fixture's connection and corrupting its reads — e.g. a freshly-written pipeline_stage reading back null, silently flipping the forward-only transition guard. That is the intermittent 'rejects backward stage transition' ubuntu-shard-1 failure the drizzle rc.3 microtask-timing shift surfaced on PR #812 (base 13be7d2 was green; the deps bump only moved timing). Fix (mechanism-class, no band-aid): - packages/core/src/store/background-ops.ts: trackBackgroundOp() registers each detached op; awaitBackgroundOps() drains them; pendingBackgroundOpCount() asserts the registry is empty. - add.ts (ensureTaskNode + initLoomForEpic) + complete.ts (graph upsert/edge): wrap the detached promises in trackBackgroundOp. Production unchanged — still detached, nobody awaits on the hot path. - test-db-helper.ts: createTestDb drains prior-test ops before resetDbState (at fixture creation AND in cleanup), so no detached promise survives a test boundary regardless of microtask scheduling. Cannot be 100% verified against the intermittent race directly (it did not reproduce across full-shard, single-fork, and looped local runs), but the new background-ops regression test proves pendingBackgroundOpCount()===0 after addTask/epic-create + flush — the orphan-promise mechanism class is eliminated. add/complete/pipeline-stage 99/99 green; core tsc clean. CI is the arbiter. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
'returns null exactly at the staleness boundary (age == threshold)' was an intermittent ubuntu-shard-1 flake: the test derived detectedAt from Date.now() then the detector re-read Date.now() later, so ageMs = threshold + elapsed; under CI load even 1ms tipped it past the threshold → a proposal instead of null. Detector logic (<= threshold → null) is correct; the test assumed zero elapsed time. Fix: vi.useFakeTimers + setSystemTime so both Date.now() reads share one frozen clock. Same timing-flake class as the pipeline-stage race fixed in 59b7a35; surfaced after that fix removed the first shard-1 flake. Verified 13/13 green; proactive sweep found no other Date.now()-window boundary tests lacking a frozen clock. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Branch protection on main requires a status context named 'CI'. GitHub posts one check-run per JOB (biome, unit-tests, ...), not per workflow, so the workflow being named 'CI' never produced a 'CI' check — the required context was a phantom that blocked every PR's normal-merge path (only enforce_admins= false override could merge, e.g. PR #811). Add an aggregate gate job (name: CI, if: always(), needs: all 45 jobs) that fails iff any upstream job concluded failure/cancelled and accepts skipped (path-filtered) jobs. This makes the required 'CI' context a real, durable check — no brittle per-job required list, and robust to docs-only PRs that skip the code jobs. ci.yml is hand-maintained (not under Gate 7 template parity), so editing it directly is in-bounds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the live
cleo docs fetch/cleo docs listfailure (E_INTERNAL: Cannot read properties of undefined (reading 'split')) that made canonical docs unreachable for any blob attachment written bydocs-update.ts. Roots out the writer/reader contract drift and heals the 16 malformed rows already in the wild.This unblocks the Gen-7 substrate saga (T11242), whose own research docs were unreachable via the canonical CLI surface.
Root cause
docs-read-model.ts:799):att.storageKey.split('/')threw whenstorageKeywasundefined. BecauseresolveAllFromTasksDb()iterates eagerly, a single malformed row poisoned project-scope list ops.docs-update.ts): emitted a non-contractBlobAttachmentshape ({kind, name, mime, size, blobId}) instead of the contract{kind, sha256, storageKey, mime, size}frompackages/contracts/src/attachment.ts. The Zod validator existed but was never invoked at write time.Changes
docs-read-model.ts— survives malformed/legacy rows.docs-update.ts— emitssha256+ computedstorageKey.attachment-store.ts::put— validates the full discriminated union (all 5 attachment variants) on every write, so future drift fails fast at the source.scripts/heal-malformed-blob-attachments.mjs— one-shot, idempotent; healed 16 NULL-storageKeyrows.docs/__tests__/blob-attachment-contract-T11262.test.ts— 5 cases (writer emit, legacy reject, empty-string reject, JSON round-trip, defensive read).Validation (all pass)
cleo docs fetch db-substrate-pglite-vision-2026-05-28→ success, 18900 bytescleo docs list --type research→ success, 50 rowscleo docs list --task T11242→ success, 4 attachmentsHealed 16 rowsthenHealed 0 rows(idempotent)pnpm biome checkcleanpnpm --filter @cleocode/core run buildcleanpnpm --filter @cleocode/core run test docs— 5/5 new tests pass; no new failures (42 pre-existingE_PROJECT_NOT_FOUNDharness failures on main are unrelated, confirmed by stash-and-rerun)Notes
storageKey:''rows don't break reads (empty string has.split); they heal-on-next-write through the chokepoint.🤖 Generated with Claude Code