Tighten FULLY LOGGED enforcement so bots stop stopping at step 06 by hnshah · Pull Request #10 · hnshah/pagekit

hnshah · 2026-04-14T23:55:43Z

The personal-crm-founders run committed directly to main (96aee3c) stopped at step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check, evaluation, evaluator-pass, and all working-log entries. Same stop pattern as the four pre-Verdel artifact-only runs.

Four adjustments targeted at the exact stop pattern.

What's in this PR

1. Move FULLY LOGGED stop rule to the top of the orchestrator's Hard Rules

.claude/skills/pagekit/SKILL.md — was buried at step 8 of the procedure. Now the first Hard Rule is: "Do not declare the run done until scripts/run-check.sh runs/<name> returns tier: FULLY LOGGED (or PUBLISHABLE). This is the only completion signal. Step 06 producing a nice-looking draft is NOT completion."

2. Add "Are you done?" self-check to the orchestrator

A literal checkbox list the bot has to walk before handoff. Every box must be YES: all artifacts filled (not placeholders), prompts/outputs paired for every step 01-07, working-log filled across steps, slop-check clean, run-check returns FULLY LOGGED or PUBLISHABLE.

3. Add `Next:` cues to every per-step skill

Chain-of-continuation signal so the bot doesn't stop at the end of a step:

pagekit-signal-doc → invoke pagekit-message-spine
pagekit-message-spine → invoke pagekit-first-page-decision
pagekit-first-page-decision → invoke pagekit-page-argument-shape
pagekit-page-argument-shape → invoke pagekit-proof-map
pagekit-proof-map → invoke pagekit-first-page-draft
pagekit-first-page-draft → explicit 6-step handoff: slop-check, claim-check, write evaluation.md, run evaluator-pass, fill working-log, run run-check. Opens with "The draft is not the deliverable. The logged run is."
pagekit-claim-check → write evaluation, run evaluator-pass, run run-check

4. Add `` banners to scaffolded placeholders

scripts/new-run.sh now emits banners at the top of:

working-log.md
claim-check.md
evaluation.md
evaluator-pass.md

Each banner names the tier requirement and which skill to use to fill it. Stops the "this is just a stub" misread that caused the personal-crm-founders bot to not commit these files.

Verified

scripts/doctor.sh → PASS
scripts/slop-check.sh → exit 0 clean
runs/vegan-dog-food-verdel/ still classifies as PUBLISHABLE
Fresh scaffold carries the new banners and classifies as FULLY LOGGED (not PUBLISHABLE) with the correct upgrade path

Test plan

CI on this PR passes
Scan .claude/skills/pagekit/SKILL.md for the new Hard Rule at top and the self-check section
Scan each .claude/skills/pagekit-<step>/SKILL.md for the ## Next section
bash scripts/new-run.sh _testrun && head -8 runs/_testrun/claim-check.md shows the banner; rm -rf runs/_testrun

Next move after merge

Run the next bot (sonnet 4.6, ChatGPT, etc.) against these tightened rules and see if they reach FULLY LOGGED on their own without hand-holding. The personal-crm-founders run was our baseline data point for the pattern we're trying to fix.

…step cues, scaffold banners The personal-crm-founders run (committed direct to main) stopped at step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check, evaluation, evaluator-pass, and all working-log entries. Same stop pattern as the four pre-Verdel artifact-only runs. Four adjustments targeted at the exact stop pattern: 1. .claude/skills/pagekit/SKILL.md - Moved the FULLY LOGGED stop rule to the top of Hard Rules. Was buried at step 8. - Added a 'Are you done? Self-check' section — a literal checkbox list the bot must walk before handoff: all artifacts filled (not placeholders), prompts/outputs paired for every step, slop-check clean, run-check returns FULLY LOGGED or PUBLISHABLE. - Added explicit rule: do not delete or leave blank the scaffolded placeholder files (claim-check.md, evaluation.md, evaluator-pass.md, working-log.md). 2. Added Next: cues to every per-step skill so the bot has a continuation signal instead of stopping at the end of each step: - signal-doc → message-spine - message-spine → first-page-decision - first-page-decision → page-argument-shape - page-argument-shape → proof-map - proof-map → first-page-draft - first-page-draft → explicit 6-step handoff (slop-check, claim-check, evaluation.md, evaluator-pass, working-log, run-check). Also says 'The draft is not the deliverable. The logged run is.' - claim-check → evaluation + evaluator-pass + run-check 3. scripts/new-run.sh scaffold placeholders now carry  banners at the top of: - working-log.md - claim-check.md - evaluation.md - evaluator-pass.md Each banner explains the tier requirement and which skill to use to fill it in. Stops the 'this is just a stub' misread. Verified: - scripts/doctor.sh PASS - scripts/slop-check.sh exit 0 clean - runs/vegan-dog-food-verdel still PUBLISHABLE - Fresh scaffold from new-run.sh shows the DO NOT DELETE banners and classifies as FULLY LOGGED (not PUBLISHABLE) with a clear punch list

…subagent The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed. ## Punch list (from runs/personal-crm-founders/evaluator-pass.md) ### 1. First-page-decision template: falsification prompt templates/first-page-decision-template.md — added an 'If this is a hypothesis: what would falsify it?' sub-field under 'Confidence basis for this decision'. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is 'hypothesis'; optional when 'data' or 'signal'. ### 2. Evaluation scaffold: Source quality field scripts/new-run.sh evaluation.md scaffold now includes a 'Source quality' section at the top: Real / Training fiction / Mixed. Surfaces the source provenance prominently in the evaluation rather than burying it one level down in sources/01-source-capture.md. A reader scanning the eval should immediately know whether the run was built on real or invented material. ### 3. Claim-check: distinguish remove-vs-verify Claim-check previously collapsed 'cut this line' into a single correction category. The audit should preserve the distinction between: - rewrite - remove (wrong) — disqualified; do not restore - remove pending verification — potentially restorable if source X confirms Updated three surfaces: - prompts/07-claim-check.md (canonical prompt) - .claude/agents/pagekit-claim-checker.md (subagent instructions) - templates/claim-check-template.md (audit format) ### 4. Evaluation scaffold: weak-section to source-gap mapping scripts/new-run.sh evaluation.md scaffold now requires that every section flagged as weak (in 'What stayed thin' or 'Where outputs drifted generic') name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint. ## Bug fix: claim-checker subagent corrupted the corrected draft The personal-crm-founders run reported the subagent left inline '*[Rewritten: ...]*' annotations in body copy and introduced two new em-dashes during rewrites. The working-log shows these had to be manually cleaned before the corrected draft could pass slop-check. Fixed in .claude/agents/pagekit-claim-checker.md with explicit hard rules for the corrected draft: - No inline annotation markers (*[Rewritten:...]* etc.) in body copy. Provenance belongs in the audit, not the corrected draft. - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md). - Self-scan rewrites for flagged patterns before saving. Mirrored in prompts/07-claim-check.md so chat users get the same enforcement. ## Verified - scripts/doctor.sh PASS - scripts/slop-check.sh exit 0 clean - runs/vegan-dog-food-verdel still PUBLISHABLE - runs/personal-crm-founders still PUBLISHABLE - Fresh scaffold shows the new Source quality field and Weak-section- to-source-gap mapping sections - templates/first-page-decision-template.md shows the new falsification prompt - Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as expected for an empty scaffold)

hnshah merged commit ab00589 into main Apr 15, 2026
1 check passed

hnshah mentioned this pull request Apr 15, 2026

Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug #11

Merged

7 tasks

hnshah deleted the claude/tighten-fully-logged-enforcement branch April 15, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10

Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10
hnshah merged 1 commit intomainfrom
claude/tighten-fully-logged-enforcement

hnshah commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hnshah commented Apr 14, 2026

What's in this PR

1. Move FULLY LOGGED stop rule to the top of the orchestrator's Hard Rules

2. Add "Are you done?" self-check to the orchestrator

3. Add Next: cues to every per-step skill

4. Add  banners to scaffolded placeholders

Verified

Test plan

Next move after merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Add `Next:` cues to every per-step skill

4. Add `` banners to scaffolded placeholders