Skip to content

feat(cc-247,cc-248): th_init --format=<preset> + --fail-fast harness options#142

Merged
screenleon merged 5 commits into
mainfrom
cc-247-248-harness-options
May 23, 2026
Merged

feat(cc-247,cc-248): th_init --format=<preset> + --fail-fast harness options#142
screenleon merged 5 commits into
mainfrom
cc-247-248-harness-options

Conversation

@screenleon
Copy link
Copy Markdown
Owner

Summary

PR-A of the CC-247/CC-248/CC-249 sequence (PR-0 docs landed in #141; PR-B for CC-249 follows after a /pre-impl spike).

Adds 5 named print-format presets + orthogonal --fail-fast boolean to scripts/lib/test-harness.sh, then migrates 5 harness consumers to drop their per-file pass/fail overrides. All consumer stdout byte-identical pre vs post.

API

th_init [--format=<preset>] [--fail-fast] [--filter <pat>] [--list]

Presets (closed enum; unknown value exits 1 with stderr listing all 5):

Preset PASS line PASS gating FAIL header FAIL detail
colon-flat (default) PASS: %s\n always FAIL: %s: %s\n inline
colon-mixed PASS: %s\n always FAIL %s\n (2sp) %s\n (8sp)
indent-1sp PASS %s\n (1sp) VERBOSE-only FAIL %s\n (1sp) %s\n (8sp)
indent-2sp PASS %s\n (2sp) always FAIL %s\n (2sp) %s\n (8sp)
indent-2sp-quiet PASS %s\n (2sp) VERBOSE-only FAIL %s\n (2sp) %s\n (no indent)

--fail-fast: after fail() prints, calls th_summary (which exits 1 on non-zero FAIL).

Consumer migration (5 files)

File Preset Fail-fast Override deleted
scripts/test-commands.sh indent-1sp
scripts/test-usage-weekly.sh indent-2sp
scripts/test-usage-tracker.sh indent-2sp
scripts/test-hooks.sh indent-2sp-quiet
scripts/test-skill-refine.sh colon-mixed

scripts/test-run-all-tests.sh is NOT a harness consumer (uses its own pass_case/fail_case); only locally renamed to avoid name collision with harness symbols. Output format unchanged.

Test plan

  • bash scripts/test-test-harness.sh — 22/22 (was 11/11; +12 new preset/fail-fast cases + visibility fix to run_harness_probe)
  • 5 consumer suites pass with original counts (103/20/46/306/11)
  • bash scripts/test-run-all-tests.sh integration: 13/13
  • Golden parity per consumer (VERBOSE unset + VERBOSE=1): all format-bytes identical pre/post (only mktemp random suffix + timing measurements differ — non-deterministic noise)
  • grep -E '^(pass|fail)\(\)' scripts/test-*.sh → 0 matches (all overrides cleaned)
  • bash scripts/lint-scripts.sh — 52 OK
  • BACKLOG pm/scripts/validate.sh parity preserved at 30 (CC-228 baseline)
  • shellcheck --severity=style — not installed locally; CI will run

PR-gate

/pr-gate standard tier: Final: GO — critic + qa-tester + architecture-reviewer all approve, only low-severity confirmations, zero blocks. (security-reviewer + risk-reviewer not in standard tier — no sensitive paths in diff.)

Notes for reviewers

  • BACKLOG pr: field for CC-247 + CC-248 will be flipped from TBD-PRA placeholder to actual PR# in a tail commit immediately after this PR opens (avoids the codex-doesn't-know-its-own-PR-number problem cleanly).
  • CC-249 (consolidate divergent assert_* helpers) stays deferred until the /pre-impl spike resolves the 3-way assert_contains signature divergence.
  • Initial codex dispatch hit 30-min timeout in a self-verify debug loop after correctly catching one preset-implementation bug (indent-1sp FAIL was collapsed with 2sp branch). Main-thread main fixed the one-line case-merge + added missing pass_case call to run_harness_probe; full self_verify clean after that.

🤖 Generated with Claude Code

screenleon and others added 5 commits May 23, 2026 21:25
…arness

Closed 5-preset enum + orthogonal fail-fast boolean. Default behavior
unchanged (colon-flat + collect-all).

Presets (one per pre-existing per-file override profile):
- colon-flat        — `PASS: %s\n` always / `FAIL: %s: %s\n` inline
- colon-mixed       — `PASS: %s\n` always / `  FAIL  ` 2sp + 8sp detail
- indent-1sp        — `  PASS ` 1sp VERBOSE-only / `  FAIL ` 1sp + 8sp detail
- indent-2sp        — `  PASS  ` 2sp always / `  FAIL  ` 2sp + 8sp detail
- indent-2sp-quiet  — `  PASS  ` 2sp VERBOSE-only / `  FAIL  ` 2sp + no-indent detail

Unknown --format value rejected at th_init parse-time with stderr listing
all 5 valid names. --fail-fast triggers th_summary (which exits 1 on
non-zero FAIL count) right after fail()'s print body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er-file overrides

Per-file pass()/fail() overrides DELETED from 5 harness consumers; each
switches to th_init --format=<preset> (+ --fail-fast where applicable):

| File                       | Preset            | Fail-fast |
|----------------------------|-------------------|-----------|
| test-commands.sh           | indent-1sp        |           |
| test-usage-weekly.sh       | indent-2sp        | --fail-fast |
| test-usage-tracker.sh      | indent-2sp        | --fail-fast |
| test-hooks.sh              | indent-2sp-quiet  |           |
| test-skill-refine.sh       | colon-mixed       | --fail-fast |

Each consumer's stdout byte-identical pre- vs post-migration under both
VERBOSE unset and VERBOSE=1 (validated via git-stash round-trip diff;
only non-deterministic noise — mktemp dir suffix, performance timing —
varies).

Bonus: test-run-all-tests.sh renames its local `pass`/`fail` to
`pass_case`/`fail_case`. The file uses its own counters (does not source
the harness), but the rename avoids name collision with the harness's
preset-aware `pass`/`fail` symbols. Output format `PASS: %s\n` /
`FAIL: %s: %s\n` unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test-harness

New cases (12 total, all green): preset-{colon-flat,colon-mixed,
indent-1sp-{verbose,off},indent-2sp,indent-2sp-quiet-{verbose,off}},
preset-default-matches-colon-flat, preset-unknown, fail-fast-{on,off,
no-failures,orthogonal}, filter-still-works, list-still-works.

Also: run_harness_probe now calls pass_case on success (was silent —
suite under-reported coverage; visible count rose from 11 to 22 with
no new failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lder)

CC-247 and CC-248 index rows' `pr:` field flipped from `pr:TBD` to
`pr:TBD-PRA`. Real PR number applied in a follow-up commit after
`gh pr create` returns.

Validator parity preserved at 30 pre-existing E-codes (CC-228 baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tail commit per PR-A's special_instruction_pr_ref: flip placeholder to
actual PR number after `gh pr create`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@screenleon screenleon merged commit 90a600a into main May 23, 2026
16 checks passed
@screenleon screenleon deleted the cc-247-248-harness-options branch May 23, 2026 12:33
screenleon added a commit that referenced this pull request May 23, 2026
…close cc-250 (#145)

* docs(cc-251): brief-authoring discipline for multi-file dispatches + close cc-250

CC-251 — 3 patterns added to prevent codex apply_patch debug-loop hang
on > 4 files OR > 50 lines verbatim briefs:

1. **apply_patch retry-cap** (constraint text): HALT after 2nd
   consecutive verification failure on the same file; no 3rd retry.
   Codex has no internal retry-cap → without this, debug loop runs
   until dispatch timeout (1800s).

2. **Verbatim-as-attached-file** (pattern): write embedded content
   (override-policy paragraphs, BACKLOG rows, brief-template fragments)
   to /tmp/<task>-content/*.md BEFORE dispatch; brief references path
   + says "copy verbatim, do NOT paraphrase". Eliminates the
   hallucinate-when-retyping failure mode (CC-250 stderr observed
   `pass/fail print-format` → `print-format` — "pass/fail" dropped).

3. **`expected_head_sha` state pin** (schema field): 40-char sha in
   brief metadata + `git rev-parse HEAD == <sha>` self_verify check.
   Catches "wrong branch / branch advanced / file changed by another
   process" before any patch is attempted.

Documented in:
- `agents/project-pm.md` — "Multi-file brief discipline" prose added to
  the "Writing a brief for codex-executor" section
- `docs/dispatch-brief.md` — `expected_head_sha` as Optional section
  with usage example
- Memory `[[feedback_codex_brief_discipline]]` (separate repo) — retro
  evidence from CC-247/248 #142 + CC-250 #144 dispatch hangs

For briefs touching > 8 files OR > 200 lines verbatim, ALSO split the
dispatch into 2–3 smaller ones (each 2–4 files); split alone without
the 3 patterns above doesn't fully prevent the hang.

Also closes **CC-250** as a tail-cleanup (status flip + `pr:TBD` →
`pr:#144` + body Outcome / See blocks; mirrors the pattern PR #144
brief explicitly deferred to a follow-up commit). MILESTONES.md M1
prerequisite sub-table flips CC-250 ⏳ → ✅ (#144) and adds CC-251 ⏳.

Long-term resolution: CC-235 (tiered-lifecycle-gate) enforces split
mechanically; CC-244 (typed pipeline) turns verbatim into schema
fields; CC-215 (pmctl) may add `--expect-head <sha>` wrapper flag.

Validator parity preserved at 30 (CC-228 baseline). No code change;
discipline is brief-authoring time, not runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(cc-251): normalize verbatim-length threshold to 50 lines

PR-gate critic [low]: intro states `> 50 lines` trigger but bullet 2
said `> 30 lines`. Standardize to 50 across both. Matches the memory
copy at feedback_codex_brief_discipline.md (also updated 30→50).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant