fix(runner): quarantine undeletable worktree dirs (#96) by hadamrd · Pull Request #97 · hadamrd/forge-loop

hadamrd · 2026-05-28T08:50:01Z

Summary

Closes bug(runner): stale worktree from failed attempt blocks every retry → infinite worktree-create-failed loop #96
When a prior worker planted files we can't chmod/rm, the dir survives _prep_worktree's cleanup, then git worktree add fails with path already exists → worktree-create-failed → infinite retry. Bit refactor(config): single pydantic-settings model replaces scattered config.py + os.environ.get sites #84 today: ~1h of identical-error ticks.
Fix: rename undeletable dir to wt-loop-<N>.stale-<ts> before worktree add. No sudo, no destruction — operator can inspect.
Applied to both _prep_worktree (fresh) and _prep_repair_worktree (repair); boot reaper sweeps .stale-* on next restart.

Test plan

test_prep_worktree_quarantines_undeletable_dir — simulates PermissionError on rmtree, asserts quarantine dir created + worktree add still invoked
Existing test_prep_worktree_uses_configured_base_branch still passes
pytest tests/test_worker.py tests/test_init.py — 38 passed

…rktree-create-failed loop (#96) When a prior worker planted files owned by a different uid (subprocess context mismatch), chmod+rmtree in _prep_worktree fails silently, leaves the dir behind, and the next git worktree add fails with "path already exists" → worktree-create-failed → infinite retry loop until operator manually sudo-cleans /tmp/wt-loop-<N>. Today this bit issue forge-loop#84: ~1h of ticks all failing the same way. Fix: if cleanup leaves the dir behind, rename it to wt-loop-<N>.stale-<ts> so git worktree add proceeds. No sudo, no destruction (operator can inspect the planted state). Boot reaper sweeps .stale-* dirs on next restart. Applied to both _prep_worktree (fresh) and _prep_repair_worktree (repair). Test: PermissionError on rmtree → quarantine dir exists, original marker preserved, worktree add still called.

… (#139) Dogfood the manifestos system on forge-loop itself by writing the seed quality and testing manifestos that every future forge-loop change is gated against. quality-manifesto.md codifies five rules drawn from this week's persistent-worker work: no shared module-level state (#100), typed Protocol+Fake at every I/O boundary (#104), single Settings source of truth (#98), typed events instead of untyped **fields (#99), and no subprocess.run for SDK-able services (#103, #105). Each rule names the concrete issue it came from so future contributors know the *why*. testing-manifesto.md codifies six rules drawn from this week's iteration-probe bugs: one test per state-machine edge plus a fallthrough adversarial (would have caught #97/#120/#128), an adversarial test for the false case of every external-dep assumption, both ==0 and !=0 branches for every subprocess.returncode (specifically #128), a contract test pinning every Fake to its Real, hypothesis property tests on >4-branch / user-input functions (#102), and an adversarial test that every infinite-loop guard actually fires. tests/test_manifestos_discovery.py is the meta-validation gate: it discovers and parses both files, asserts each rule has a rationale, asserts the spec-mandated issue references are present, and includes adversarial tests that stubs and missing files are detectable. 22 tests, all pass.

…loop) (#149) Closes the feedback loop the CTO described: every bug we fix becomes a permanent gate. Today's PR #147 (critic SDK event-capture mismatch) exposed a 4-PR train of bugs with the same shape — #97, #120, #128, #147 — all driven by string-literal discriminators that didn't match across module boundaries. The critic (PR #141) reads the quality manifesto + flags sev1 violations. This rule + the critic infrastructure together mean the next worker that writes ``event["type"] == "result"`` (or similar cross-module string-comparison) gets the PR auto-blocked with the manifesto rationale.

Adds the customer-facing documentation for the manifestos + brainstormer feature that closed the cosmetic-tickets gap. Real customers consuming this OSS need to know: 1. The four files they own (.forge/product-vision.md, axes.yaml, quality-manifesto.md, testing-manifesto.md). 2. The brainstormer dry-run + --apply workflow. 3. The feedback loop (`forge-loop manifesto suggest --from-pr <N>`) where every bug becomes a permanent gate. 4. What the worker + critic see (manifestos injected into briefs; sev1 violations block auto-merge). README: new section "Manifestos & the brainstormer (axis-aligned tickets)" between Briefs and CLI reference. CLI reference table gains `brainstorm`, `brainstorm --apply`, `manifesto suggest --from-pr`. GUIDE: new section 4 "Manifestos: drive what gets built (not just how)" between "discipline matters" and "the brief is your contract" — with the real Titan brainstormer output as the worked example. Sections 5-10 renumbered accordingly. Both docs cite PR #147 as the canonical feedback-loop example: a stringly-typed event-boundary bug that surfaced after #97/#120/#128 all had the same shape; the fix landed the manifesto rule that the critic now enforces.

hadamrd merged commit d3f6d97 into trunk May 28, 2026
2 checks passed

hadamrd deleted the cleanup/architecture-pass branch May 28, 2026 08:50

hadamrd mentioned this pull request May 28, 2026

refactor(sdk): SdkEventKind enum + typed discriminated SdkEvent — kill stringly-typed event boundaries #148

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runner): quarantine undeletable worktree dirs (#96)#97

fix(runner): quarantine undeletable worktree dirs (#96)#97
hadamrd merged 1 commit into
trunkfrom
cleanup/architecture-pass

hadamrd commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hadamrd commented May 28, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant