Skill consolidation harness. The experiment that closes the loop.
Claude Code has create-skill: the only procedure that writes procedures. That's the Consolidate cell. Dimmed because the agent never initiates. This harness makes it automatic.
Track which action sequences repeat with user approval across sessions. When a pattern recurs above threshold, condense it into a skill. Score against a mutation. Winner survives. Two iterations to convergence.
Three agents, three roles:
- Codex (GPT-5.4) — scores skill variants against the contract. The A/B test harness.
- Claude Code — creates and mutates skills. The skill mutator.
- Human — approves or rejects consolidated skills. The Attend.
~/.claude/skills/ # the skill store (Remember)
~/.claude/memory/ # session logs, co-activation counts
Documents/AGI/variants/ # mutated skill candidates
Documents/AGI/results/ # scoring logs
1. PERCEIVE Scan session logs for repeating action sequences
2. CACHE Cluster by embedding similarity
3. FILTER Reject patterns below co-activation threshold
4. ATTEND Human reviews candidates (sleep replay)
5. CONSOLIDATE Write winning pattern as a skill file
6. REMEMBER Skill persists in ~/.claude/skills/
The first skill to evolve. humanize has a well-defined contract:
- Input: a blog post
- Output: list of AI patterns found + opportunities for voice
- Verifiable: did the fix improve the prose?
- Read current
humanize/SKILL.md - Codex proposes one mutation (add a pattern, remove a pattern, change a threshold)
- Run both variants on the same test post
- Codex scores: which output better satisfies the contract?
- Winner replaces the skill. Loser is logged.
- Repeat until convergence (expected: 2 iterations per the slop-detection result)
Posts from june.kim/_posts/ with humanize results in the git history. The diff is ground truth: what the human accepted.
Qualifiers like "a bit" dampen a skill to idempotency. "Tighten every paragraph a bit" converges in two passes — the second finds almost nothing to cut. Without the qualifier, repeated application collapses the output to a single word.
This is the convergence mechanism for skill mutation. Without a dampener, each mutation drifts further. With one, mutations that overshoot get corrected on the next evaluation. The qualifier is the Filter on the Filter.
The harness produces a skill that:
- Finds more true patterns than the current skill (recall)
- Flags fewer false patterns (precision)
- Requires less human direction to apply fixes (autonomy)
- Changes how the agent processes the next post (the consolidation test)
- The Flicker — the blog post documenting this experiment
- The Natural Framework — the six steps
- Diagnosis LLM — SOAP notes on the agent's broken cells
- Consolidation — the procedural memory test
- The Parts Bin — candidate algorithms for each cell
- Slop Detection — two iterations to convergence
AGPL-3.0-or-later