Skip to content

feat(test-harness): per-worker shadow-repo + git support#292

Open
nick-inkeep wants to merge 14 commits intomainfrom
worktree-spec-shadow-repo-test-harness
Open

feat(test-harness): per-worker shadow-repo + git support#292
nick-inkeep wants to merge 14 commits intomainfrom
worktree-spec-shadow-repo-test-harness

Conversation

@nick-inkeep
Copy link
Copy Markdown
Contributor

@nick-inkeep nick-inkeep commented Apr 23, 2026

Summary

  • Flips the dev-plugin test-isolation guard so Playwright workers spawn with a real per-worker shadow repo at <tmpdir>/.git/open-knowledge/ (FR1) instead of silently disabling git, and adds a withShadow: true opt-in to the Tier 1 integration harness (FR2) — together unlocking automated test coverage for timeline / save-version / attribution / reconciliation surfaces that were previously uncoverable.
  • Seals seven projectDir couplings in hocuspocus-plugin.ts via a single module-level projectRoot binding (FR4 / D12), preventing /api/save-version and adjacent surfaces from writing real ok/v<N> tags + commits into the developer's actual OK repo during local test runs (the worst-case audit finding).
  • Closes a latent FR1 race (commit b9f1fb28, post-QA) where agent-write/save-version/rollback requests arriving within ~100-200 ms of bun run dev startup silently lost their L2 shadow commit. Now: api-extension awaits the dev plugin's shadowReadyPromise before reading shadowRef.current. End-to-end Playwright validation in _qa-shadow-default-on.e2e.ts proves the full path now works without artificial delays.
  • Internal test-harness change only — zero customer-visible behavior, no CI runner config change, production dev path byte-for-byte identical.

Spec: specs/2026-04-22-per-worker-shadow-repo-test-harness/SPEC.md (15 LOCKED decisions, 8 functional requirements).

Changeset: bumps @inkeep/open-knowledge-server + @inkeep/open-knowledge-app only (the two packages with actual source changes; core, cli, desktop are unchanged in this PR per git diff main...HEAD --stat).

Key decisions

D2 — Tier-appropriate defaults Playwright tier defaults shadow ON (full-app topology matches production); Tier 1 integration tier defaults shadow OFF, opt-in via { withShadow: true }. The 37/38 shadow-orthogonal Tier 1 tests stay untouched; only the 1/38 that actually asserts shadow state pays the ~50-200 ms init cost.
D12 — Single projectRoot binding One module-scope const projectRoot = isTestIsolated ? CONTENT_DIR : PROJECT_ROOT threaded through all 7 PROJECT_ROOT consumers in hocuspocus-plugin.ts. Audit found the previous per-site decomposition missed 3 sites (L245, L250, L275); the single-binding pattern is the precedent so any future addition is obviously wrong without it.
D13 — Broadened fail-fast Under isTestIsolated: true, handleDevShadowInitError now exits non-zero on ANY shadow-init throw (not just ProjectGitInitError). Production path unchanged — retains existing degraded-warn for non-ProjectGitInitError. Closes a silent-degradation hole that would otherwise let tests pass without producing the shadow refs they assert on.
D14 — Acceptance bar All four FR8 acceptance tests (T1, T2, T3, T6) ship with this spec as Must-tier. Greenfield posture rejects Could-tagged acceptance. T6 (write/read round-trip via /api/history) is the regression guard for the L245 leak D12 just sealed.
D15 — SPEC 2026-04-21 §Q3 corrigendum The prior shadow-repo spec closed Q3 with "No harness change needed." That closure is invalidated by this PR; per CLAUDE.md post-ship corrigendum protocol, an italic/bracketed/dated breadcrumb pointing here was added to that spec's §Q3 row inline.
Post-QA: shadowReadyPromise gate ApiExtensionOptions gains an optional shadowReadyPromise?: Promise<void> field. The dev plugin captures runDevShadowInit's promise (previously void-discarded) and passes it through. flushDocToGit (covers all 4 agent-write handlers) + handleSaveVersion + handleRollback await it before reading shadowRef.current. Non-breaking; Tier 1 / CLI callers omit the field and behavior is unchanged.

What changed (FR coverage)

FR What it does Where
FR1 Playwright workers default-on shadow under OK_TEST_CONTENT_DIR packages/app/src/server/hocuspocus-plugin.ts (drop if (!isTestIsolated) guard; flip gitEnabled to true unconditional; capture shadowReadyPromise and pass through)
FR2 createTestServer({ withShadow: true }) opt-in packages/app/tests/integration/test-harness.ts (new withShadow?: boolean option, shadowDir?: string exposure)
FR3 Drop redundant gitEnabled: false from migrated tests symlink-alias.test.ts:46, provider-pool-reconnect.test.ts:74
FR4 Single projectRoot binding sealing 7 PROJECT_ROOT couplings packages/app/src/server/hocuspocus-plugin.ts:121 (the binding) + 7 threaded sites
FR5 Fail-fast on ALL shadow-init errors under isolation (D13) packages/app/src/server/dev-shadow-init.ts (broadened handleDevShadowInitError) + dev-shadow-init.test.ts (3 new tests)
FR6 Migrate persistence-fan-out.test.ts (app integration tier) to harness opt-in packages/app/tests/integration/persistence-fan-out.test.ts (delete hand-fork; clearContributors→swapContributors; -130 net lines of duplicated boot code)
FR7 Auto-wire swapContributors() lifecycle packages/app/tests/integration/test-harness.ts (setup + cleanup; serial-execution constraint documented as STOP rule)
FR8 T1-T3 + T6 acceptance tests (all Must-tier) packages/app/tests/integration/shadow-harness-{agent-write,save-version,external-write,history-roundtrip}.test.ts (4 new files)
FR1 (post-QA) Race gate so Playwright agent writes deterministically produce shadow refs packages/server/src/api-extension.ts (shadowReadyPromise option + 3 await sites) + packages/app/tests/stress/_qa-shadow-default-on.e2e.ts (regression guard)

Plus:

  • Server package re-exportsswapContributors + destroyShadowRepo added to packages/server/src/index.ts (US-001 substrate).
  • requireShadowDir helper at test-harness.ts:506 for typed-boundary error messages when callers forget withShadow: true (with explicit unit test coverage in test-harness-helpers.test.ts after review-cloud pass 1).

Verification

QA verdict: GO — full plan at tmp/ship/qa-progress.json (gitignored, generated locally).

Story Tested Gaps Fidelity
US-001: server exports 1/1 ✅ shell
US-002: projectRoot binding (FR4) 2/2 ✅ shell
US-003: dev-plugin shadow + fail-fast (FR1, FR5) 2/2 ✅ shell
US-004: harness withShadow + lifecycle 5/5 ✅ shell
US-005: migrated persistence-fan-out 1/1 ✅ shell
US-007: T1-T3+T6 acceptance 4/4 ✅ shell
US-008: corrigendum 1/1 ✅ shell
FR1 e2e (post-QA, this commit) 1/1 ✅ browser (Playwright + simpleGit assertion)
(integration baseline) 1/1 ✅ shell
(perf baseline) 1/1 ✅ shell

Highlights:

  • FR1 end-to-end via real Playwright worker_qa-shadow-default-on.e2e.ts passes in ~4s. Asserts shadow exists at <contentDir>/.git/open-knowledge/, agent write produces refs/wip/main/agent-<id> with wip: subject and ok-actor: body within ~5s, no artificial delays. Without the post-QA fix, this test fails deterministically (verified during /debug).
  • FR4 invariant proof — full bun run check; pre-suite vs post-suite tag count: 0→0; HEAD: unchanged; git status --porcelain: empty; .open-knowledge/cache/ not created in dev's repo.
  • All 4 FR8 acceptance tests passshadow-harness-{agent-write,save-version,external-write,history-roundtrip}.test.ts green (8 pass / 0 fail combined with the migrated persistence-fan-out.test.ts).
  • D13 broadened fail-fastdev-shadow-init.test.ts 11/11 including new tests for plain Error + non-Error thrown values under isTestIsolated: true.
  • Cross-worker isolation — verified via docs-open.e2e.ts --workers=4 (19/20 pass; 1 unrelated pre-existing flake on F10 source-editor warm-swap, confirmed reproducible on pre-fix git stash state).

Pre-merge checklist for human reviewer:

  • Production dev-server smoke (manual): cd to your actual OK repo (not this worktree), run bun run dev, perform a few agent writes, trigger Save Version. Confirm shadow at PROJECT_ROOT/.git/open-knowledge/, ok/v<N> tag in PROJECT_ROOT/.git/, behavior identical to pre-PR.
  • Read-through of AGENTS.md updates: 7b5af64c documents the new withShadow option, dev-mode shadow default-on, and the PROJECT_ROOT STOP rule. Verify the section reads clearly enough that the next test author can write a shadow-asserting test in <10 min from docs alone.

Out of scope (per SPEC §16)

  • packages/server/src/persistence-fan-out.test.ts (server-tier sibling — deliberately independent of the app integration harness)
  • CRDT observer code, shadow-lock / server-lock primitives, production CLI behavior (no behavior changes)
  • PR Editor asset + embed surface (streaming uploads) #270's "unify dev-plugin + createServer" refactor (D1 — separate future spec; merge-order protocol documented in §13)
  • T4 (rollback E2E) + T5 (TimelinePanel UI render) — downstream feature tests (NG6); harness now supports them
  • Production CLI race in bootServer/standalone.ts (analogous to the dev plugin race fixed by b9f1fb28) — future spec; bounded customer impact (production users navigate UI before writing) and shadowReadyPromise is now a one-line addition at bootServer when needed.

Future considerations

Surfaced opportunities (from Phase 4 docs subprocess):

  • AGENTS.md / CLAUDE.md duplication — the file has two near-identical halves with drift on precedent counts (25 vs 27), V0-14 status, and a few label names. This PR's docs additions correctly mirror into both halves (verified byte-identical). The underlying duplication predates this PR and warrants a separate docs-housekeeping pass.
  • Child claude -p subprocess MCP visibility — nested subprocesses (Phase 4 docs, Phase 6 qa-plan) do not inherit parent-session MCP server config (no mcp__open-knowledge__* tools available in the child). Used the documented escape hatch ("Open Knowledge MCP unavailable") per CLAUDE.md.

Deferred (in-scope follow-up, not this PR):

  • Per-server-instance contributor-tracker state — long-term fix for the pendingContributors module-level state. Today's STOP rule (no test.concurrent() in withShadow:true files) is the convention enforcing serial execution.
  • Expose persistence.flushPendingGitCommit() on ServerInstance — would enable T2/T6 acceptance tests to drain L2 deterministically without polling. Out of spec scope (touches packages/server/src/standalone.ts per SPEC §16 EXCLUDE) but a single-line addition next time ServerInstance's shape opens.
  • handleDevShadowInitError category-extraction (cloud reviewer Consider, declined) — speculative refactor for hypothetical future error types; the 3-branch if/else is the right shape for 3 cases.
  • Production CLI race fix at bootServer — same shape as the dev plugin fix in b9f1fb28; separate spec when prioritized.

Review-cloud iteration history

Pass 1 (commit 120ccc13):

  • ✅ Trimmed changeset to 2 actually-changed packages (reviewer flagged 1 of 3 unchanged)
  • ✅ Added test-harness-helpers.test.ts with 3 tests covering requireShadowDir error path
  • ✅ Added point-in-time disclaimer to evidence/projectdir-couplings.md
  • ❌ Declined extract-error-categorization Consider (per reviewer's own "no change needed" rationale)

Pass 2 (cloud review of 120ccc13): ✅ APPROVE with 0 issues.

Pass 3 fix (commit b9f1fb28, post-QA validation):

  • ✅ Discovered FR1 race via /debug while writing end-to-end Playwright validation test
  • ✅ Fixed via shadowReadyPromise gate in api-extension (3 sites: flushDocToGit + handleSaveVersion + handleRollback)
  • ✅ Permanent regression-guard test in _qa-shadow-default-on.e2e.ts

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 23, 2026

🦋 Changeset detected

Latest commit: 4c94ed7

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@inkeep/open-knowledge-server Patch
@inkeep/open-knowledge-app Patch
@inkeep/open-knowledge Patch
@inkeep/open-knowledge-desktop Patch
@inkeep/open-knowledge-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(1) Total Issues | Risk: Low

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: .changeset/per-worker-shadow-repo-test-harness.md:7 Changeset includes @inkeep/open-knowledge-core: patch but that package has no changes

💭 Consider (2) 💭

💭 1) test-harness.ts Add explicit test for requireShadowDir error path

Issue: The requireShadowDir() helper throws a descriptive error when shadowDir is undefined, but there's no explicit test for this path.

Why: While the error message is good and the helper is simple, an explicit test would document the expected behavior and catch regressions if the message changes.

Fix: Consider adding a small test case in the harness tests:

test('requireShadowDir throws descriptive error when shadowDir undefined', () => {
  const server = { shadowDir: undefined } as TestServer;
  expect(() => requireShadowDir(server)).toThrow(/withShadow: true/);
});

Refs:

💭 2) dev-shadow-init.ts Consider extracting error categorization for future extensibility

Issue: The handleDevShadowInitError function uses an if/else if/else chain that's correct and clear, but as more error types need special handling, this could grow.

Why: The current structure is appropriate for 2-3 branches but the pattern could be extended to a category-based approach if needed in the future.

Fix: No change needed now — the current implementation is clean and the if/else structure correctly expresses the mutual exclusion. Just noting this as a potential evolution path if more error types emerge.

Refs:

🧹 While You're Here (1) 🧹

🧹 1) evidence/projectdir-couplings.md Line numbers may have drifted

Issue: The evidence file documents specific line numbers for projectRoot consumers, but these may drift as the codebase evolves.

Why: Stale line numbers in documentation can mislead future readers investigating the STOP rule.

Fix: Consider adding a note that line numbers are point-in-time snapshots from the PR, or updating them if they've already drifted since the evidence was captured.

Refs:


💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured, carefully designed internal test infrastructure change. The single projectRoot binding pattern (D12) correctly seals the PROJECT_ROOT leak surface, the fail-fast behavior under test isolation (D13) is properly implemented with clear mutual exclusion in the error handling, and the withShadow harness opt-in is thoughtfully designed with tier-appropriate defaults. The acceptance tests (T1-T3, T6) provide strong coverage of the new capabilities.

The main actionable item is the changeset incorrectly including @inkeep/open-knowledge-core: patch when that package has no changes — removing it will avoid a spurious version bump.

The test patterns are solid: try/finally lifecycle, AggregateError for multi-phase cleanup failures, and the STOP rule documentation for test.concurrent() with withShadow: true is well-placed. Nice work on the comprehensive spec and evidence artifacts.

Discarded (3)
Location Issue Reason Discarded
test-harness.ts L2 drain via destroy() is implicit Working correctly — destroy() is documented to flush L2 and the pattern is consistent across all acceptance tests
AGENTS.md Two halves have drift in precedent counts Pre-existing issue outside this PR's scope — the PR correctly mirrors new content in both halves
server-observers.ts No changes needed Standards reviewer found no issues meeting confidence threshold
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-tests 5 0 2 0 0 0 3
pr-review-standards 0 0 0 0 0 0 0
pr-review-devops 3 0 0 1 1 0 1
Total 8 0 2 1 1 0 4

Note: Standards reviewer found no issues meeting confidence threshold — this is a clean implementation.

Comment thread .changeset/per-worker-shadow-repo-test-harness.md
@github-actions github-actions Bot deleted a comment from inkeep-internal-ci Bot Apr 23, 2026
nick-inkeep added a commit that referenced this pull request Apr 23, 2026
- F1 (Minor): trim changeset — drop @inkeep/open-knowledge-core,
  @inkeep/open-knowledge, @inkeep/open-knowledge-desktop. Reviewer
  flagged only `core`; verification (`git diff main...HEAD --stat`)
  showed `cli` and `desktop` are also unchanged. The only packages with
  source changes are `server` (index.ts re-exports) and `app`
  (dev-shadow-init + hocuspocus-plugin). Avoids 3 spurious version bumps,
  not 1.

- F2 (Consider): add explicit test for requireShadowDir error path at
  packages/app/tests/integration/test-harness-helpers.test.ts (3 tests:
  withShadow:true mention, [test-harness] namespace, happy-path return).
  Locks in the descriptive-error contract that 9 call sites depend on.

- F4 (While You're Here): add point-in-time disclaimer to
  evidence/projectdir-couplings.md noting that line numbers will drift
  and the projectRoot binding identity is the durable contract. Verified
  current line numbers (L123, L136, L175, L207, L213, L220, L225) still
  match the code; the disclaimer is preventative for future maintainers.

- F3 (Consider — extract error categorization for future extensibility):
  declined per reviewer's own resolution ("No change needed now"); the
  3-branch if/else correctly expresses mutual exclusion and a category
  abstraction would be speculative refactoring for hypothetical future
  error types.

bun run check: 216 integration pass / 0 fail; 3 new tests added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-inkeep
Copy link
Copy Markdown
Contributor Author

Review pass 1 — assessment of review-body suggestions

Inline thread on .changeset/... resolved separately (see thread). Top-level summary for the 3 review-body items:

💭 Consider 1 — Add explicit test for requireShadowDir error path → ACCEPTED

Implemented in commit 120ccc13 as packages/app/tests/integration/test-harness-helpers.test.ts (3 tests):

  1. withShadow: true mention in error message (matches your suggested regex)
  2. [test-harness] namespace prefix in error message
  3. Happy-path return of the populated shadowDir

The contract is now locked in for the 9 call sites that depend on the descriptive error.

💭 Consider 2 — Extract error categorization for future extensibility → DECLINED (per your own resolution)

Your own rationale: "No change needed now — the current implementation is clean and the if/else structure correctly expresses the mutual exclusion." Agreed. Speculative refactor for hypothetical future error types is over-engineering. The 3-branch if/else if/else is the right shape for 3 cases. Greenfield posture says no deferred tech debt; this isn't tech debt, it's an evolution path that may never be needed.

🧹 While You're Here — Line numbers may have drifted in evidence/projectdir-couplings.mdPARTIALLY ACCEPTED

Verified via sed -n '<line>p' that the evidence file's claimed lines (L123, L136, L175, L207, L213, L220, L225) all match the current hocuspocus-plugin.ts byte-for-byte — there is no drift in this PR's HEAD.

However, your underlying concern (drift over future code evolution) is valid. Added a one-line disclaimer in commit 120ccc13:

Line numbers are point-in-time snapshots from this PR's HEAD ... the projectRoot binding identity (and the AGENTS.md STOP rule that names it) is the durable contract, not these line numbers. To re-verify the enumeration after future changes, run grep -nE 'projectRoot|PROJECT_ROOT' packages/app/src/server/hocuspocus-plugin.ts and reconcile.

This makes the durable contract (projectRoot binding identity + AGENTS.md STOP rule) explicit and gives future maintainers a verification one-liner.


bun run check green: 216 integration pass / 0 fail (3 new tests added).

Copy link
Copy Markdown

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This PR adds per-worker shadow-repo support for Playwright workers and a withShadow: true opt-in for the Tier 1 integration test harness. The implementation is thorough, well-documented, and follows established codebase conventions.

Key Observations

D12 Single projectRoot Binding
The single module-level binding pattern at hocuspocus-plugin.ts:121 correctly seals all 7 PROJECT_ROOT coupling sites. The STOP rule in AGENTS.md clearly documents this constraint with references to the evidence file.

D13 Broadened Fail-Fast
The handleDevShadowInitError if/else if/else chain correctly implements mutual exclusion:

  • ProjectGitInitErrorexit(1) (production AND isolation)
  • Any error + isTestIsolated: trueexit(1)
  • Non-ProjectGitInitError + production → degraded warn (continue)

Test coverage is comprehensive with 7 test cases covering all branches.

Test Harness withShadow Implementation

  • Proper lifecycle management with per-phase try/catch
  • swapContributors() correctly wired in setup and cleanup
  • destroyShadowRepo() called before rmSync() (correct ordering)
  • Error aggregation via AggregateError preserves debugging context
  • requireShadowDir() helper provides actionable error messages

Acceptance Tests (FR8)
All four acceptance tests (T1, T2, T3, T6) are well-structured:

  • Use plain test() (not test.concurrent()) per the STOP rule
  • Use pollUntil for event-driven waits (not fixed sleeps)
  • Cover agent write, save-version, external write, and history round-trip paths

🕐 Pending Recommendations (1)

  • 🟡 .changeset:7 Changeset included unchanged @inkeep/open-knowledge-core package — addressed in commit 120ccc13

✅ APPROVE

Summary: Clean implementation of per-worker shadow-repo test infrastructure. The single prior review finding (changeset including unchanged packages) has been addressed. The code follows established patterns (DI for testability, bracket-prefixed logging, STOP rule documentation), has comprehensive test coverage, and the AGENTS.md documentation accurately reflects the implementation. Ship it! 🚀

Discarded (0)

No findings discarded.

Reviewers (5)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-tests 5 0 0 0 0 0 5
pr-review-errors 0 0 0 0 0 0 0
pr-review-devops 0 0 0 0 0 0 0
pr-review-consistency 0 0 0 0 0 0 0
Total 5 0 0 0 0 0 5

Note: pr-review-tests returned 5 observations (all MINOR) that were discarded as they were explicit "no fix needed" observations documenting good patterns rather than issues.

@github-actions github-actions Bot deleted a comment from inkeep-internal-ci Bot Apr 23, 2026
nick-inkeep added a commit that referenced this pull request Apr 23, 2026
…omise

Closes an FR1 race condition surfaced by /debug in PR #292 QA:
agent-write / save-version / rollback requests that arrived within the
~100-200 ms `runDevShadowInit` async window silently lost their L2
shadow commit because `commitToWipRef` short-circuited on
`shadowRef.current === undefined` — a branch intended for the "shadow
disabled" degraded case (D13), not "init in flight."

Customer impact was bounded (human editor users take seconds to write,
past the race window), but the race was load-bearing for the spec's FR1
claim that Playwright tests can drive shadow-asserting flows via the
dev plugin. Reproducer at `_qa-shadow-default-on.e2e.ts` fails
deterministically without this fix and passes within ~4s with it.

Changes:
- `packages/app/src/server/hocuspocus-plugin.ts`: capture the
  `runDevShadowInit` promise (previously `void`-discarded) as
  `shadowReadyPromise`, pass it into `createApiExtension({ ... })`.

- `packages/server/src/api-extension.ts`:
  - Add optional `shadowReadyPromise?: Promise<void>` to
    `ApiExtensionOptions`. Non-breaking; callers that init shadow
    synchronously (Tier 1 `createTestServer`, CLI `createServer` via
    `bootServer`) omit it and behavior is unchanged.
  - Gate `flushDocToGit` (covers all 4 agent-write handlers) on
    `Promise.resolve(shadowReadyPromise).then(...)` — L1+L2 chain only
    fires after shadow init settles.
  - Add `if (shadowReadyPromise) await shadowReadyPromise;` at the top
    of `handleSaveVersion` and `handleRollback`, before the existing
    `shadowRef?.current` read. The `if (!shadow)` degraded-mode guard
    remains as the genuine "shadow disabled" fallback.

- `packages/app/tests/stress/_qa-shadow-default-on.e2e.ts`: new
  permanent regression test. Uses `workerServer` fixture to verify (1)
  shadow exists at `<contentDir>/.git/open-knowledge/`, (2) agent
  write produces `refs/wip/main/agent-<id>` within ~5s without
  artificial delay, (3) commit subject matches `wip:`, (4) `ok-actor:`
  JSON body line present.

Spec §16 note: `packages/server/src/api-extension.ts` is not
explicitly IN-scope for SPEC 2026-04-22, but the addition is an
optional, non-breaking field on an existing options interface and is
required for FR1 to be Playwright-verifiable per the spec's own claim.
Defensible on the merits.

Verification:
- `_qa-shadow-default-on.e2e.ts` passes in ~4s (previously failed
  deterministically without pre-write probe delay).
- `bun run check` green: 216 integration pass / 0 fail / 43 files.
- `docs-open.e2e.ts --workers=4`: 19/20 pass; the 1 failure (F10
  source-editor warm-swap) was verified pre-existing — reproducible
  on pre-fix state via `git stash` + retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a re-review following commit b9f1fb28 which was pushed after the prior approval. The new commit addresses a race condition where shadow operations (flushDocToGit, handleSaveVersion, handleRollback) could execute before runDevShadowInit completed.

Review of New Commit b9f1fb28

Race Condition Fix

The fix introduces shadowReadyPromise — the promise returned by runDevShadowInit() — and gates all shadow-dependent operations on it:

  • hocuspocus-plugin.ts: Captures the init promise and passes it to createApiExtension
  • api-extension.ts: Gates flushDocToGit and related operations via Promise.resolve(shadowReadyPromise).then(...)

Design Validation

The "promise never rejects" claim is accurate:

  • runDevShadowInit wraps initShadowRepo in a try/catch
  • All errors route through handleDevShadowInitError which either exits (test isolation) or logs + continues (production degraded mode)
  • The outer .catch() in api-extension.ts is defense-in-depth for unexpected edge cases

Test Coverage

_qa-shadow-default-on.e2e.ts provides regression coverage:

  • Validates shadow repo exists immediately after boot
  • Drives an agent write with no artificial pre-write delay (the prior race window)
  • Asserts the WIP ref appears, proving the shadowReadyPromise gate works

Prior Review Items

All feedback from the prior review has been addressed:

  • ✅ Changeset trimmed to only changed packages (@inkeep/open-knowledge-server, @inkeep/open-knowledge-app)
  • requireShadowDir error-path tests added in test-harness-helpers.test.ts

✅ APPROVE

Summary: The new commit cleanly fixes a legitimate race condition with sound async gating. The "promise never rejects" design is verified accurate, and the E2E test provides adequate regression coverage. All prior feedback addressed. Ship it! 🚀

Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-errors 0 0 0 0 0 0 0
pr-review-tests 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Focused review on new commit b9f1fb28 since prior approval already covered the main PR content.

@github-actions github-actions Bot deleted a comment from inkeep-internal-ci Bot Apr 23, 2026
nick-inkeep and others added 14 commits April 23, 2026 12:26
…x (FR6, FR7)

Re-exports the preferred atomic-drain contributor-tracker API and the
shadow-repo cleanup primitive from @inkeep/open-knowledge-server so the
Tier 1 integration harness (US-004) and new acceptance tests (US-007) can
wire shadow lifecycle without reaching into package internals.

clearContributors stays exported — removal is out of scope; US-005 is the
deliberate migration handoff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D12)

Collapses all PROJECT_ROOT-derived call sites in the Vite dev plugin to one
module-level binding: `const projectRoot = isTestIsolated ? CONTENT_DIR :
PROJECT_ROOT`. Threaded through 8 sites — acquireServerLock.worktreeRoot,
runDevShadowInit arg, contentFilter.projectDir, BacklinkIndex.projectDir,
persistence projectDir + getCurrentBranch, api-extension projectDir +
getCurrentBranch, server-observer getCurrentBranch, and CONTENT_ROOT's
`relative(..., CONTENT_DIR)` call.

Production path (no OK_TEST_CONTENT_DIR) is byte-for-byte identical to
today. Under isolation, `/api/save-version` now lands ok/v<N> tags in the
tmpdir's .git/ rather than the developer's actual OK repo (closes the L250
leak that pre-audit enumeration missed — see R1 in SPEC §11a).

isTestIsolated declaration hoisted up with its comment to make the binding
visible at CONTENT_ROOT and acquireServerLock call sites. resolveContentConfig
keeps reading PROJECT_ROOT by design — the .open-knowledge/config.yml lives
at the repo root regardless of isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (FR1, FR5, D13)

Drops the `if (!isTestIsolated)` guard around runDevShadowInit so every
Playwright worker gets a per-tmpdir shadow at <tmpdir>/.git/open-knowledge/.
Persistence's gitEnabled flag flips from `!isTestIsolated` to `true`
unconditionally — commit paths short-circuit on shadowRef?.current so the
unattached-shadow window during async init is safe.

Broadens handleDevShadowInitError to fail-fast on ALL shadow-init errors
under isTestIsolated (D13), not just ProjectGitInitError (R6). Production
path is unchanged: ProjectGitInitError still exits, other errors still
degrade-warn. Under isolation, every throw exits so silent shadow gaps
cannot hide coverage regressions.

Refactors runDevShadowInit's trailing args (io, deps) into a single
options object that also carries isTestIsolated. Tests updated
accordingly; added two new cases for the D13 fail-fast path (plain Error
and non-Error thrown values under isTestIsolated: true).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cycle (FR2, FR7, D2, D10)

Adds `withShadow?: boolean` (default false) to `CreateTestServerOptions`.
When true, the harness:
- initializes a real shadow repo at <contentDir>/.git/open-knowledge/
  before `createServer` boots
- passes `gitEnabled: true` + `shadowRepo: shadow` through to the server
- exposes `shadowDir: shadow?.gitDir` on the returned `TestServer`
- drains module-state contributors via `swapContributors()` in both setup
  and cleanup so cross-test bleed is impossible under test.concurrent()

Cleanup order is load-bearing: swapContributors() + destroyShadowRepo(shadow)
run BEFORE rmSync(contentDir) so the shadow writer-lock release touches
filesystem state that rm is about to delete, not vice-versa.

Default-off preserves the 37/38 shadow-orthogonal Tier 1 tests without
modification (verified via full `bun run check` — 209 pass / 2 skip / 0 fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…adow: true }) (FR6, D7)

Deletes the hand-forked initShadowRepo + raw createServer + manual
contributor-tracker cleanup pattern. All four tests now use
`createTestServer({ withShadow: true, debounce: 60_000, maxDebounce: 60_000 })`
and consume `server.shadowDir` for simpleGit assertions. clearContributors
→ swapContributors (the deprecated surface → preferred atomic-drain API).

Harness fix: pass `contentRoot: '.'` when `withShadow: true`. The
persistence extension's default fallback (`relative(projectDir, contentDir)
|| 'content'`) resolves to literal 'content' when projectDir === contentDir
(harness layout), which makes buildWipTree fail `git add content` because
no such subdir exists. Overriding to '.' matches buildWipTree's own
`contentRoot || '.'` default and lets `git add .` include the doc files
correctly.

Nested-dir coverage (separate projectDir + contentDir/'content' layout)
stays at the server-tier sibling packages/server/src/persistence-fan-out.test.ts
per SPEC §11 S1 — the integration tier exercises the flat harness layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under US-004's `withShadow: false` default, shadow is off unless the test
opts in — so an explicit `gitEnabled: false` line inside `createServer` /
`createPersistenceExtension` at the test site is now redundant. Deleting
the lines from symlink-alias.test.ts (L44) and provider-pool-reconnect.test.ts
(L74) keeps one less "reason to reason about" on a test that's orthogonal
to shadow behavior.

Safety net: both tests have `shadowRef?.current` as undefined (no shadow
handle is constructed), so `commitToWipRef` short-circuits at the ref
check at persistence.ts:263 regardless of gitEnabled. Behavior unchanged;
tests pass.

boot.test.ts and keepalive-presence-cleanup.test.ts live in packages/server/src/
not packages/app/tests/integration/, so they were already out of scope per
SPEC §16. D11's spirit (leave those alone) is honored automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four Must-tier acceptance tests proving the createTestServer({ withShadow:
true }) harness covers the three primary write surfaces + the primary
read surface end-to-end.

- T1 (shadow-harness-agent-write.test.ts): POST /api/agent-write-md →
  refs/wip/main/agent-<connId> with wip: subject + ok-actor: body line.
- T2 (shadow-harness-save-version.test.ts): POST /api/save-version →
  refs/checkpoints/main/<sha> in shadow AND ok/v1 tag in the TEST's
  tmpdir .git/ (NOT the developer's actual OK repo — FR4 isolation
  invariant). Uses child_process.execFileSync for parent-git tag reads
  so the app package doesn't need to pull simple-git as a devDep.
- T3 (shadow-harness-external-write.test.ts): applyExternalChange →
  refs/wip/main/file-system with reconcile: subject (D53 taxonomy).
- T6 (shadow-harness-history-roundtrip.test.ts): agent-write then
  GET /api/history returns an entry with the expected commit SHA.
  Regression guard for the L245 getCurrentBranch leak D12 sealed.

Drain pattern: `server.instance.destroy()` triggers Phase 3 + Phase 4
L1/L2 flush before assertions. For tests that need to query shadow
state mid-lifecycle (T2, T6), `hocuspocus.flushPendingStores()` + 200-
300ms wait drains L1 without tearing down the server.

3× deterministic pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the second-occurrence corrigendum breadcrumb at SPEC 2026-04-21-shadow-
repo-single-mode's Risk Register row for Q3 (line 274) per CLAUDE.md
"apply to every occurrence" rule. The short-form breadcrumb points back to
the full corrigendum at §Q3 row 227 and forward to this spec's authoritative
fix.

First-occurrence breadcrumb at §Q3 (line 227) was pre-populated during the
spec-drafting phase and carried in the worktree baseline — this commit
captures it alongside the second-occurrence addition.

Also commits the spec dir (SPEC.md + meta/_changelog.md + evidence/ +
meta/audit-findings.md + meta/design-challenge.md) which was untracked in
the worktree baseline. evidence/projectdir-couplings.md is already current
post-D12 (7 sites enumerated) — no content update needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-binding STOP rule

AGENTS.md updates for the per-worker shadow-repo test-harness spec
(specs/2026-04-22-per-worker-shadow-repo-test-harness/):

- Tier 1 integration harness: document `createTestServer({ withShadow: true })`
  opt-in, `shadowDir` exposure, auto-wired swapContributors + destroyShadowRepo
  lifecycle, cost trade-off, and FR8 reference tests (shadow-harness-*).
- Dev mode: document per-worker shadow default-on under OK_TEST_CONTENT_DIR
  isolation (FR1) + fail-fast on all shadow-init errors (FR5 / D13) + the
  single-binding projectRoot pattern (D12).
- Known Pitfalls STOP rule: forbid new PROJECT_ROOT-derived bindings in
  hocuspocus-plugin.ts outside the single module-level projectRoot binding
  (SPEC §16 ASK_FIRST / D12).

SPEC 2026-04-21 §Q3 corrigendum already applied by US-008 (b50a960).
Edits mirrored across both duplicate halves of AGENTS.md (pre-existing
file drift, out of scope to reconcile in this phase).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- F1 (Minor): trim changeset — drop @inkeep/open-knowledge-core,
  @inkeep/open-knowledge, @inkeep/open-knowledge-desktop. Reviewer
  flagged only `core`; verification (`git diff main...HEAD --stat`)
  showed `cli` and `desktop` are also unchanged. The only packages with
  source changes are `server` (index.ts re-exports) and `app`
  (dev-shadow-init + hocuspocus-plugin). Avoids 3 spurious version bumps,
  not 1.

- F2 (Consider): add explicit test for requireShadowDir error path at
  packages/app/tests/integration/test-harness-helpers.test.ts (3 tests:
  withShadow:true mention, [test-harness] namespace, happy-path return).
  Locks in the descriptive-error contract that 9 call sites depend on.

- F4 (While You're Here): add point-in-time disclaimer to
  evidence/projectdir-couplings.md noting that line numbers will drift
  and the projectRoot binding identity is the durable contract. Verified
  current line numbers (L123, L136, L175, L207, L213, L220, L225) still
  match the code; the disclaimer is preventative for future maintainers.

- F3 (Consider — extract error categorization for future extensibility):
  declined per reviewer's own resolution ("No change needed now"); the
  3-branch if/else correctly expresses mutual exclusion and a category
  abstraction would be speculative refactoring for hypothetical future
  error types.

bun run check: 216 integration pass / 0 fail; 3 new tests added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…omise

Closes an FR1 race condition surfaced by /debug in PR #292 QA:
agent-write / save-version / rollback requests that arrived within the
~100-200 ms `runDevShadowInit` async window silently lost their L2
shadow commit because `commitToWipRef` short-circuited on
`shadowRef.current === undefined` — a branch intended for the "shadow
disabled" degraded case (D13), not "init in flight."

Customer impact was bounded (human editor users take seconds to write,
past the race window), but the race was load-bearing for the spec's FR1
claim that Playwright tests can drive shadow-asserting flows via the
dev plugin. Reproducer at `_qa-shadow-default-on.e2e.ts` fails
deterministically without this fix and passes within ~4s with it.

Changes:
- `packages/app/src/server/hocuspocus-plugin.ts`: capture the
  `runDevShadowInit` promise (previously `void`-discarded) as
  `shadowReadyPromise`, pass it into `createApiExtension({ ... })`.

- `packages/server/src/api-extension.ts`:
  - Add optional `shadowReadyPromise?: Promise<void>` to
    `ApiExtensionOptions`. Non-breaking; callers that init shadow
    synchronously (Tier 1 `createTestServer`, CLI `createServer` via
    `bootServer`) omit it and behavior is unchanged.
  - Gate `flushDocToGit` (covers all 4 agent-write handlers) on
    `Promise.resolve(shadowReadyPromise).then(...)` — L1+L2 chain only
    fires after shadow init settles.
  - Add `if (shadowReadyPromise) await shadowReadyPromise;` at the top
    of `handleSaveVersion` and `handleRollback`, before the existing
    `shadowRef?.current` read. The `if (!shadow)` degraded-mode guard
    remains as the genuine "shadow disabled" fallback.

- `packages/app/tests/stress/_qa-shadow-default-on.e2e.ts`: new
  permanent regression test. Uses `workerServer` fixture to verify (1)
  shadow exists at `<contentDir>/.git/open-knowledge/`, (2) agent
  write produces `refs/wip/main/agent-<id>` within ~5s without
  artificial delay, (3) commit subject matches `wip:`, (4) `ok-actor:`
  JSON body line present.

Spec §16 note: `packages/server/src/api-extension.ts` is not
explicitly IN-scope for SPEC 2026-04-22, but the addition is an
optional, non-breaking field on an existing options interface and is
required for FR1 to be Playwright-verifiable per the spec's own claim.
Defensible on the merits.

Verification:
- `_qa-shadow-default-on.e2e.ts` passes in ~4s (previously failed
  deterministically without pre-write probe delay).
- `bun run check` green: 216 integration pass / 0 fail / 43 files.
- `docs-open.e2e.ts --workers=4`: 19/20 pass; the 1 failure (F10
  source-editor warm-swap) was verified pre-existing — reproducible
  on pre-fix state via `git stash` + retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-inkeep nick-inkeep force-pushed the worktree-spec-shadow-repo-test-harness branch from b9f1fb2 to 4c94ed7 Compare April 23, 2026 19:36
Copy link
Copy Markdown

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a re-review following the pull_request:synchronize event. The latest commit (4c94ed71) was already reviewed and APPROVED in the prior review cycle (2026-04-23T11:05:26Z). All prior feedback has been addressed.

Review Coverage

Reviewer Findings
pr-review-standards ✅ 0 issues — code quality, async gating pattern, and AGENTS.md STOP rule verified
pr-review-tests ✅ 0 blocking — E2E coverage is appropriate for the race condition fix
pr-review-errors ✅ 0 issues — fail-fast (D13) and degraded-warn paths correctly implemented
pr-review-devops ✅ 0 issues — changeset accurate, AGENTS.md updates complete

Key Validations

1. shadowReadyPromise Race Condition Fix

  • The promise gating is correctly applied to all 3 shadow-dependent paths (flushDocToGit, handleSaveVersion, handleRollback)
  • The "promise never rejects" claim is accurate — handleDevShadowInitError routes all errors through exit(1) or log+return, never throw
  • E2E regression test (_qa-shadow-default-on.e2e.ts) proves the fix end-to-end with no artificial delays

2. Single projectRoot Binding (D12)

  • All 9 consumer sites correctly thread through projectRoot instead of raw PROJECT_ROOT
  • AGENTS.md STOP rule clearly documents the constraint with evidence file reference

3. Fail-Fast Under Test Isolation (D13)

  • The 3-branch if/else if/else chain correctly implements mutual exclusion
  • 11 unit tests cover all branches including edge cases (non-Error thrown values)

4. Test Harness Cleanup Lifecycle

  • Per-phase try/catch with AggregateError preserves debugging context
  • Correct ordering: destroyShadowRepo() before rmSync()
  • swapContributors() drain in both setup and cleanup

💭 Consider (1) 💭

💭 1) api-extension.ts / _qa-shadow-default-on.e2e.ts Tier 1 test for shadowReadyPromise gate

Issue: The race condition fix is tested only via E2E (120s timeout), not Tier 1 integration.

Why: The race was discovered in the dev plugin's async init path (hocuspocus-plugin.ts), which isn't exercised by createTestServer (synchronous init). A Tier 1 test would need to inject a delayed shadowReadyPromise via test-only API surface that doesn't exist.

Resolution: The E2E test is appropriate coverage for this specific path. The test proves the full dev plugin → api-extension → shadow commit pipeline works with real Playwright isolation. Future work could expose a test hook, but this is not blocking.


✅ APPROVE

Summary: Excellent internal test infrastructure work. The single projectRoot binding seals a real leak vector (PROJECT_ROOT couplings that could write to the developer's checkout during local test runs), the shadowReadyPromise fix closes a legitimate race condition discovered during QA, and the test harness improvements enable shadow-repo coverage that was previously impossible. All prior feedback addressed. Ready to ship! 🚀

Reviewers (4)
Reviewer Returned Main Findings Consider Inline Comments Discarded
pr-review-standards 0 0 0 0 0
pr-review-tests 6 0 1 0 5
pr-review-errors 0 0 0 0 0
pr-review-devops 0 0 0 0 0
Total 6 0 1 0 5

Note: Test reviewer returned 6 observations (1 MAJOR downgraded to Consider after validation, 5 INFO/validated-good patterns). All other reviewers found no issues meeting confidence threshold.

@github-actions github-actions Bot deleted a comment from inkeep-internal-ci Bot Apr 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed in 7 days if no further activity occurs.

If this PR is still relevant:

  • Rebase it on the latest main branch
  • Add a comment explaining its current status
  • Request a review if it's ready

Thank you for your contributions!

@github-actions github-actions Bot added the stale label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant