Skip to content

Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10

Merged
hnshah merged 1 commit intomainfrom
claude/tighten-fully-logged-enforcement
Apr 15, 2026
Merged

Tighten FULLY LOGGED enforcement so bots stop stopping at step 06#10
hnshah merged 1 commit intomainfrom
claude/tighten-fully-logged-enforcement

Conversation

@hnshah
Copy link
Copy Markdown
Owner

@hnshah hnshah commented Apr 14, 2026

The personal-crm-founders run committed directly to main (96aee3c) stopped at step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check, evaluation, evaluator-pass, and all working-log entries. Same stop pattern as the four pre-Verdel artifact-only runs.

Four adjustments targeted at the exact stop pattern.

What's in this PR

1. Move FULLY LOGGED stop rule to the top of the orchestrator's Hard Rules

.claude/skills/pagekit/SKILL.md — was buried at step 8 of the procedure. Now the first Hard Rule is: "Do not declare the run done until scripts/run-check.sh runs/<name> returns tier: FULLY LOGGED (or PUBLISHABLE). This is the only completion signal. Step 06 producing a nice-looking draft is NOT completion."

2. Add "Are you done?" self-check to the orchestrator

A literal checkbox list the bot has to walk before handoff. Every box must be YES: all artifacts filled (not placeholders), prompts/outputs paired for every step 01-07, working-log filled across steps, slop-check clean, run-check returns FULLY LOGGED or PUBLISHABLE.

3. Add Next: cues to every per-step skill

Chain-of-continuation signal so the bot doesn't stop at the end of a step:

  • pagekit-signal-doc → invoke pagekit-message-spine
  • pagekit-message-spine → invoke pagekit-first-page-decision
  • pagekit-first-page-decision → invoke pagekit-page-argument-shape
  • pagekit-page-argument-shape → invoke pagekit-proof-map
  • pagekit-proof-map → invoke pagekit-first-page-draft
  • pagekit-first-page-draft → explicit 6-step handoff: slop-check, claim-check, write evaluation.md, run evaluator-pass, fill working-log, run run-check. Opens with "The draft is not the deliverable. The logged run is."
  • pagekit-claim-check → write evaluation, run evaluator-pass, run run-check

4. Add <!-- DO NOT DELETE --> banners to scaffolded placeholders

scripts/new-run.sh now emits banners at the top of:

  • working-log.md
  • claim-check.md
  • evaluation.md
  • evaluator-pass.md

Each banner names the tier requirement and which skill to use to fill it. Stops the "this is just a stub" misread that caused the personal-crm-founders bot to not commit these files.

Verified

  • scripts/doctor.sh → PASS
  • scripts/slop-check.sh → exit 0 clean
  • runs/vegan-dog-food-verdel/ still classifies as PUBLISHABLE
  • Fresh scaffold carries the new banners and classifies as FULLY LOGGED (not PUBLISHABLE) with the correct upgrade path

Test plan

  • CI on this PR passes
  • Scan .claude/skills/pagekit/SKILL.md for the new Hard Rule at top and the self-check section
  • Scan each .claude/skills/pagekit-<step>/SKILL.md for the ## Next section
  • bash scripts/new-run.sh _testrun && head -8 runs/_testrun/claim-check.md shows the banner; rm -rf runs/_testrun

Next move after merge

Run the next bot (sonnet 4.6, ChatGPT, etc.) against these tightened rules and see if they reach FULLY LOGGED on their own without hand-holding. The personal-crm-founders run was our baseline data point for the pattern we're trying to fix.

…step cues, scaffold banners

The personal-crm-founders run (committed direct to main) stopped at
step 06 and shipped at ARTIFACT-ONLY tier: missing claim-check,
evaluation, evaluator-pass, and all working-log entries. Same stop
pattern as the four pre-Verdel artifact-only runs.

Four adjustments targeted at the exact stop pattern:

1. .claude/skills/pagekit/SKILL.md
   - Moved the FULLY LOGGED stop rule to the top of Hard Rules. Was
     buried at step 8.
   - Added a 'Are you done? Self-check' section — a literal checkbox
     list the bot must walk before handoff: all artifacts filled (not
     placeholders), prompts/outputs paired for every step, slop-check
     clean, run-check returns FULLY LOGGED or PUBLISHABLE.
   - Added explicit rule: do not delete or leave blank the scaffolded
     placeholder files (claim-check.md, evaluation.md, evaluator-pass.md,
     working-log.md).

2. Added Next: cues to every per-step skill so the bot has a
   continuation signal instead of stopping at the end of each step:
   - signal-doc → message-spine
   - message-spine → first-page-decision
   - first-page-decision → page-argument-shape
   - page-argument-shape → proof-map
   - proof-map → first-page-draft
   - first-page-draft → explicit 6-step handoff (slop-check,
     claim-check, evaluation.md, evaluator-pass, working-log,
     run-check). Also says 'The draft is not the deliverable. The
     logged run is.'
   - claim-check → evaluation + evaluator-pass + run-check

3. scripts/new-run.sh scaffold placeholders now carry
   <!-- DO NOT DELETE --> banners at the top of:
   - working-log.md
   - claim-check.md
   - evaluation.md
   - evaluator-pass.md
   Each banner explains the tier requirement and which skill to use to
   fill it in. Stops the 'this is just a stub' misread.

Verified:
- scripts/doctor.sh PASS
- scripts/slop-check.sh exit 0 clean
- runs/vegan-dog-food-verdel still PUBLISHABLE
- Fresh scaffold from new-run.sh shows the DO NOT DELETE banners and
  classifies as FULLY LOGGED (not PUBLISHABLE) with a clear punch list
@hnshah hnshah merged commit ab00589 into main Apr 15, 2026
1 check passed
hnshah pushed a commit that referenced this pull request Apr 15, 2026
…subagent

The personal-crm-founders run (first real agentic fully-logged run
after the PR #10 enforcement tightenings) surfaced a 4-item punch list
in its evaluator-pass. This PR applies all four. It also fixes a bug
in the pagekit-claim-checker subagent that the run exposed.

## Punch list (from runs/personal-crm-founders/evaluator-pass.md)

### 1. First-page-decision template: falsification prompt
templates/first-page-decision-template.md — added an
'If this is a hypothesis: what would falsify it?' sub-field under
'Confidence basis for this decision'. Stops hypothesis-level
decisions from being silently promoted to conclusions. Required
when confidence is 'hypothesis'; optional when 'data' or 'signal'.

### 2. Evaluation scaffold: Source quality field
scripts/new-run.sh evaluation.md scaffold now includes a
'Source quality' section at the top: Real / Training fiction / Mixed.
Surfaces the source provenance prominently in the evaluation rather
than burying it one level down in sources/01-source-capture.md. A
reader scanning the eval should immediately know whether the run
was built on real or invented material.

### 3. Claim-check: distinguish remove-vs-verify
Claim-check previously collapsed 'cut this line' into a single
correction category. The audit should preserve the distinction
between:
  - rewrite
  - remove (wrong)       — disqualified; do not restore
  - remove pending verification — potentially restorable if source X confirms
Updated three surfaces:
  - prompts/07-claim-check.md (canonical prompt)
  - .claude/agents/pagekit-claim-checker.md (subagent instructions)
  - templates/claim-check-template.md (audit format)

### 4. Evaluation scaffold: weak-section to source-gap mapping
scripts/new-run.sh evaluation.md scaffold now requires that every
section flagged as weak (in 'What stayed thin' or 'Where outputs
drifted generic') name the specific source material that would fix
it. A weak section without a source gap named is a weak section
shipping by choice, not by constraint.

## Bug fix: claim-checker subagent corrupted the corrected draft

The personal-crm-founders run reported the subagent left inline
'*[Rewritten: ...]*' annotations in body copy and introduced two new
em-dashes during rewrites. The working-log shows these had to be
manually cleaned before the corrected draft could pass slop-check.

Fixed in .claude/agents/pagekit-claim-checker.md with explicit
hard rules for the corrected draft:
  - No inline annotation markers (*[Rewritten:...]* etc.) in body
    copy. Provenance belongs in the audit, not the corrected draft.
  - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md).
  - Self-scan rewrites for flagged patterns before saving.

Mirrored in prompts/07-claim-check.md so chat users get the same
enforcement.

## Verified

- scripts/doctor.sh PASS
- scripts/slop-check.sh exit 0 clean
- runs/vegan-dog-food-verdel still PUBLISHABLE
- runs/personal-crm-founders still PUBLISHABLE
- Fresh scaffold shows the new Source quality field and Weak-section-
  to-source-gap mapping sections
- templates/first-page-decision-template.md shows the new
  falsification prompt
- Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as
  expected for an empty scaffold)
@hnshah hnshah deleted the claude/tighten-fully-logged-enforcement branch April 15, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants