Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10
Merged
Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10
Conversation
…step cues, scaffold banners
The personal-crm-founders run (committed direct to main) stopped at
step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check,
evaluation, evaluator-pass, and all working-log entries. Same stop
pattern as the four pre-Verdel artifact-only runs.
Four adjustments targeted at the exact stop pattern:
1. .claude/skills/pagekit/SKILL.md
- Moved the FULLY LOGGED stop rule to the top of Hard Rules. Was
buried at step 8.
- Added a 'Are you done? Self-check' section — a literal checkbox
list the bot must walk before handoff: all artifacts filled (not
placeholders), prompts/outputs paired for every step, slop-check
clean, run-check returns FULLY LOGGED or PUBLISHABLE.
- Added explicit rule: do not delete or leave blank the scaffolded
placeholder files (claim-check.md, evaluation.md, evaluator-pass.md,
working-log.md).
2. Added Next: cues to every per-step skill so the bot has a
continuation signal instead of stopping at the end of each step:
- signal-doc → message-spine
- message-spine → first-page-decision
- first-page-decision → page-argument-shape
- page-argument-shape → proof-map
- proof-map → first-page-draft
- first-page-draft → explicit 6-step handoff (slop-check,
claim-check, evaluation.md, evaluator-pass, working-log,
run-check). Also says 'The draft is not the deliverable. The
logged run is.'
- claim-check → evaluation + evaluator-pass + run-check
3. scripts/new-run.sh scaffold placeholders now carry
<!-- DO NOT DELETE --> banners at the top of:
- working-log.md
- claim-check.md
- evaluation.md
- evaluator-pass.md
Each banner explains the tier requirement and which skill to use to
fill it in. Stops the 'this is just a stub' misread.
Verified:
- scripts/doctor.sh PASS
- scripts/slop-check.sh exit 0 clean
- runs/vegan-dog-food-verdel still PUBLISHABLE
- Fresh scaffold from new-run.sh shows the DO NOT DELETE banners and
classifies as FULLY LOGGED (not PUBLISHABLE) with a clear punch list
hnshah
pushed a commit
that referenced
this pull request
Apr 15, 2026
…subagent The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed. ## Punch list (from runs/personal-crm-founders/evaluator-pass.md) ### 1. First-page-decision template: falsification prompt templates/first-page-decision-template.md — added an 'If this is a hypothesis: what would falsify it?' sub-field under 'Confidence basis for this decision'. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is 'hypothesis'; optional when 'data' or 'signal'. ### 2. Evaluation scaffold: Source quality field scripts/new-run.sh evaluation.md scaffold now includes a 'Source quality' section at the top: Real / Training fiction / Mixed. Surfaces the source provenance prominently in the evaluation rather than burying it one level down in sources/01-source-capture.md. A reader scanning the eval should immediately know whether the run was built on real or invented material. ### 3. Claim-check: distinguish remove-vs-verify Claim-check previously collapsed 'cut this line' into a single correction category. The audit should preserve the distinction between: - rewrite - remove (wrong) — disqualified; do not restore - remove pending verification — potentially restorable if source X confirms Updated three surfaces: - prompts/07-claim-check.md (canonical prompt) - .claude/agents/pagekit-claim-checker.md (subagent instructions) - templates/claim-check-template.md (audit format) ### 4. Evaluation scaffold: weak-section to source-gap mapping scripts/new-run.sh evaluation.md scaffold now requires that every section flagged as weak (in 'What stayed thin' or 'Where outputs drifted generic') name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint. ## Bug fix: claim-checker subagent corrupted the corrected draft The personal-crm-founders run reported the subagent left inline '*[Rewritten: ...]*' annotations in body copy and introduced two new em-dashes during rewrites. The working-log shows these had to be manually cleaned before the corrected draft could pass slop-check. Fixed in .claude/agents/pagekit-claim-checker.md with explicit hard rules for the corrected draft: - No inline annotation markers (*[Rewritten:...]* etc.) in body copy. Provenance belongs in the audit, not the corrected draft. - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md). - Self-scan rewrites for flagged patterns before saving. Mirrored in prompts/07-claim-check.md so chat users get the same enforcement. ## Verified - scripts/doctor.sh PASS - scripts/slop-check.sh exit 0 clean - runs/vegan-dog-food-verdel still PUBLISHABLE - runs/personal-crm-founders still PUBLISHABLE - Fresh scaffold shows the new Source quality field and Weak-section- to-source-gap mapping sections - templates/first-page-decision-template.md shows the new falsification prompt - Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as expected for an empty scaffold)
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
personal-crm-foundersrun committed directly to main (96aee3c) stopped at step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check, evaluation, evaluator-pass, and all working-log entries. Same stop pattern as the four pre-Verdel artifact-only runs.Four adjustments targeted at the exact stop pattern.
What's in this PR
1. Move FULLY LOGGED stop rule to the top of the orchestrator's Hard Rules
.claude/skills/pagekit/SKILL.md— was buried at step 8 of the procedure. Now the first Hard Rule is: "Do not declare the run done untilscripts/run-check.sh runs/<name>returnstier: FULLY LOGGED(orPUBLISHABLE). This is the only completion signal. Step 06 producing a nice-looking draft is NOT completion."2. Add "Are you done?" self-check to the orchestrator
A literal checkbox list the bot has to walk before handoff. Every box must be YES: all artifacts filled (not placeholders), prompts/outputs paired for every step 01-07, working-log filled across steps, slop-check clean, run-check returns FULLY LOGGED or PUBLISHABLE.
3. Add
Next:cues to every per-step skillChain-of-continuation signal so the bot doesn't stop at the end of a step:
pagekit-signal-doc→ invokepagekit-message-spinepagekit-message-spine→ invokepagekit-first-page-decisionpagekit-first-page-decision→ invokepagekit-page-argument-shapepagekit-page-argument-shape→ invokepagekit-proof-mappagekit-proof-map→ invokepagekit-first-page-draftpagekit-first-page-draft→ explicit 6-step handoff: slop-check, claim-check, write evaluation.md, run evaluator-pass, fill working-log, run run-check. Opens with "The draft is not the deliverable. The logged run is."pagekit-claim-check→ write evaluation, run evaluator-pass, run run-check4. Add
<!-- DO NOT DELETE -->banners to scaffolded placeholdersscripts/new-run.shnow emits banners at the top of:working-log.mdclaim-check.mdevaluation.mdevaluator-pass.mdEach banner names the tier requirement and which skill to use to fill it. Stops the "this is just a stub" misread that caused the personal-crm-founders bot to not commit these files.
Verified
scripts/doctor.sh→ PASSscripts/slop-check.sh→ exit 0 cleanruns/vegan-dog-food-verdel/still classifies as PUBLISHABLETest plan
.claude/skills/pagekit/SKILL.mdfor the new Hard Rule at top and the self-check section.claude/skills/pagekit-<step>/SKILL.mdfor the## Nextsectionbash scripts/new-run.sh _testrun && head -8 runs/_testrun/claim-check.mdshows the banner;rm -rf runs/_testrunNext move after merge
Run the next bot (sonnet 4.6, ChatGPT, etc.) against these tightened rules and see if they reach FULLY LOGGED on their own without hand-holding. The personal-crm-founders run was our baseline data point for the pattern we're trying to fix.