fix(cc-254): harness assert_* no auto-pass + 3 process artifacts from PR-B.2 retro#149
Merged
Merged
Conversation
PR-B.1 (#148) shipped assert_* helpers that auto-called `pass "$name"` on success. Spike Q1-Q5 assumed pure-rename consumer migration would work. PR-B.2 (consumer migration) dispatch 2026-05-24 surfaced a design conflict: 13 consumer test scripts ALL follow the `assert_X "$name" ...; pass "$name"` pattern — `assert_*` is a check; the consumer separately calls `pass` to increment the counter. Pure rename would double-count PASS (harness implicit pass + consumer explicit pass). Codex defensively shadowed the harness by re-defining the 4 helpers locally in 9 of 13 consumer files — Q3 explicitly rejected that "shim". Dispatch hit timeout 124 with 9 apply_patch failures across 4 complex files. Root cause = PR-B.1 helper API, not codex's interpretation or PR-B.2's brief. Fix: - Strip `pass "$name"` from each of the 4 helpers (`assert_exit`, `assert_file_contains`, `assert_file_matches`, `assert_string_contains`). Helpers still call `fail` on failure. - Header comment in `scripts/lib/test-harness.sh` documents the new contract: success returns 0 without side-effects; consumer responsible for `&& pass "$name"` if accounting is desired. - 4 pass-path self-tests in `scripts/test-test-harness.sh` amended to call `pass` explicitly after the assert. Fail-path self-tests unaffected. After this lands, PR-B.2 reverts to pure rename — each `assert_contains "$n" $f "$x"` becomes `assert_file_contains "$n" $f "$x"` and the existing `pass "$n"` underneath stays unchanged. No double- count, no shim. Verification: - bash scripts/test-test-harness.sh — 30/30 (no regression) - bash scripts/test-run-all-tests.sh — 13/13 integration - bash pm/scripts/validate.sh BACKLOG.md — parity 30 (CC-228 baseline) BACKLOG row CC-254 files this amendment; will close as ✅ pr:#NNN in tail commit after gh pr create. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three process improvements captured from the CC-249 PR-B.2 design-gap incident (see commit 9869034 + spike Amendment): 1. **`agents/project-pm.md`** — adds 3 new rules to "Writing a brief for codex-executor": - Spike-pilot rule: every API-design spike MUST include a `## Pilot walkthrough` section in the output spike doc (1 representative consumer, verbatim before/after diff applying every spike decision). If walkthrough can't be written cleanly, spike's API decision is not yet mature. - Additive-PR pilot rule: every PR shipping a new API MUST migrate at least 1 real consumer in the SAME PR. - Explore call-site-context rule: Explore symbol surveys must capture BOTH declarations AND call-site context (raw line + 2 before/after). Declaration-only surveys miss usage-pattern conflicts. 2. **`docs/spikes/CC-249.md`** — appends Amendment section to Open Risks documenting: - The hidden contract gap (consumers follow `assert_X ...; pass` pattern; spike never inspected call-sites) - Symptoms (PR-B.2 dispatch timeout + shadow-shim 9/13 files) - Recovery (CC-254 amendment) - Cross-links to the 2 new memory entries 3. Memory (separate repo, already committed): - `[[feedback_spike_pilot_required]]` — full rule + CC-249 retro evidence + cross-links - `[[feedback_explore_call_site_context]]` — full rule + Explore prompt template fragment + CC-249 retro evidence These artifacts make the lesson reusable across projects. Future spike + additive PR briefs are expected to apply the patterns; future Explore surveys for symbol divergence include the call-site context template fragment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per CC-250/CC-251/CC-252 tail-commit pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two things in one PR, both rooted in the CC-249 PR-B.2 dispatch incident (2026-05-24):
passfrom the 4 harnessassert_*helpers + update self-tests + document the new contractWhat broke (and why)
PR-B.1 helpers auto-called
pass "$name"on success. PR-B.2 dispatch surfaced a hidden contract: all 13 consumer test scripts already followassert_X "$name" ...; pass "$name"(assert is check; consumer calls pass). Pure rename would have double-counted PASS. Codex defensively shadowed the harness by re-defining 4 helpers in each of 9/13 consumer files — explicitly violating spike Q3 "no shim". Dispatch hit timeout 124 with 9 apply_patch failures in 4 complex files.Root cause = PR-B.1 helper API, not codex's behavior or PR-B.2's brief. Spike never inspected consumer call-sites, only declarations.
Fix (CC-254)
scripts/lib/test-harness.sh:pass "$name"from each of the 4 helpers (assert_exit,assert_file_contains,assert_file_matches,assert_string_contains)&& pass "$name"if PASS accounting is desiredfailon failure (unchanged)scripts/test-test-harness.sh:&& pass '<name>'after each assertAfter this lands, PR-B.2 reverts to pure rename — each
assert_contains "$n" $f "$x"→assert_file_contains "$n" $f "$x"; existingpass "$n"stays untouched. No double-count, no shim.Process artifacts
[[feedback_spike_pilot_required]]~/.claude/projects/.../memory/(separate repo, already committed)[[feedback_explore_call_site_context]]agents/project-pm.mddocs/spikes/CC-249.mdOpen RisksThe 3 brief-authoring rules added to
agents/project-pm.md:## Pilot walkthrough(1 representative consumer, verbatim before/after diff). If walkthrough can't be written cleanly → spike not yet mature.Verification
bash scripts/test-test-harness.sh— 30/30 (no regression after auto-pass removal + self-test amend)bash scripts/test-run-all-tests.sh— 13/13 integrationbash pm/scripts/validate.sh BACKLOG.md— parity 30 (CC-228 baseline)/pr-gate express— Final: GO, critic + qa-tester both[pass], zero findingsOut of scope
spike_v1schema addingpilot_walkthrough:field — long-term automation; ticket existspr:TBD→pr:#<this>aftergh pr create🤖 Generated with Claude Code