OBS-09: Rule audit across agents and decompose skills

## Origin

E2E Round 3, OBS-09 — Design Meta (see `reports/e2e-003-ido4shape-cloud.md` lines 407–453).

Surfaced while scoping OBS-07 and OBS-08 fixes: the proposed fixes were themselves over-prescriptive toward the agents — piling on rules, enforcement lists, detection patterns, and forbidden-term tables. The irony: OBS-07 complains that `technical-spec-writer` over-prescribes to implementer agents, then proposes to fix it by over-prescribing to `technical-spec-writer`.

## Problem

Agent definitions accumulate rules until the cumulative effect is harmful:
- **Cognitive overload** — every rule is a constraint the agent checks; past a threshold, agents spend more energy rule-checking than problem-solving
- **Rule conflicts** — "be specific" vs. "leave room for implementation" point opposite directions without clear hierarchy
- **Literal-mindedness** — agents following many rules become rule-followers, checking the letter and missing the spirit
- **Reduced judgment** — more explicit rules = less agent thinking (exactly the failure mode OBS-07 identifies)
- **Diminishing returns** — each new rule is marginally less effective; after ~7-10 rules per agent, marginal value turns negative

Additionally, **Anthropic's Claude 4 best practices explicitly warn** that `MUST`/`NEVER`/`ALWAYS` capitalized imperatives cause overtriggering on Opus 4.5/4.6: *"The fix is to dial back any aggressive language. Where you might have said 'CRITICAL: You MUST use this tool when...', you can use more normal prompting like 'Use this tool when...'."* Our agent definitions are full of this language. The rigidity observed in Round 3's tech-spec output may be partly caused by our own agent definitions.

## Current rule inventory

| File | Lines | Rules / enforcement points | Shape |
|---|---|---|---|
| `agents/code-analyzer.md` | 227 | 7 numbered rules + 3 mode-specific instruction blocks | Mix of hard constraints (rule 3 context preservation, rule 7 Read-not-cat) and judgment calls (rule 4 "be honest", rule 5 "don't design solutions") |
| `agents/technical-spec-writer.md` | 218 | 6 numbered rules + Goldilocks section + Metadata sub-sections + Structure sub-section + 6-step Process with embedded rules (Step 5 alone has 7 items) | Heaviest accumulator. Unclear which rules are load-bearing |
| `agents/spec-reviewer.md` | 95 | Stage 1 (9 format checks) + Stage 2 (8 quality checks) + Governance Implications (4 items) + 3-tier classification | Dense enumeration. Format checks are legitimate (parser compliance). Quality checks are mostly judgment calls shaped as rules |
| `skills/decompose/SKILL.md` | 168 | Post-round-3 inline-execution rewrite | Watch for rule accumulation from Phase 1 fix |
| `skills/decompose-tasks/SKILL.md` | 136 | Same pattern | Same concern |
| `skills/decompose-validate/SKILL.md` | 179 | Same pattern | Same concern |

**Not in scope:** `agents/project-manager/AGENT.md` (384 lines, not tested yet — deferred to Round 5+).

## Audit methodology — enforcement inventory

For each rule or rule-shaped item, classify into one of three cases:

### Case A — Double-enforced (delete from prose)
Rule exists in agent/skill prose AND in a downstream validator (parser, spec-reviewer, ingest_spec).
Example: "task refs match `[A-Z]{2,5}-\d{2,3}[A-Z]?`" is in `technical-spec-writer.md` rule 5 AND in `spec-reviewer.md` Stage 1 AND in `spec-parser.ts`.
**Action:** Delete from writer prose. The parser + reviewer enforce it.

### Case B — Prose-only, genuinely qualitative (reshape as principle + example)
No parser can check this. It's a judgment call.
Example: "be honest about complexity."
**Action:** Keep in prose. Reshape from rule → principle + good/bad example pair.

### Case C — Prose-only but enforceable (keep prose + consider enforcement)
Today it's prose-only, but a reviewer/hook/schema could catch violations.
Example: "preserve strategic context verbatim."
**Action:** Keep in prose (reshaped as explanation). Optionally build enforcement (see #6).

### Per-rule audit criteria
- Hard constraint or judgment call?
- Redundant with another rule? → Consolidate
- Conflicts with another rule? → Resolve with explicit priority, or remove the weaker
- Can be replaced with an example that teaches the same thing? → Prefer example
- Aspirational or actually affects behavior? → Remove if aspirational
- Uses enforcement language (`MUST`, `NEVER`, `ALWAYS`) for something that's actually a judgment call? → Downgrade to principle

## Empirical evidence informing the audit

### Verbatim preservation (code-analyzer rule 3) — WORKING
Traced 5 capabilities (AUTH-02, STOR-01, PLUG-02, VIEW-05, PROJ-01) through all 3 artifacts:
- Strategic spec → Canvas: **verbatim preservation** across all 5 capabilities
- Canvas → Technical spec: first 1-2 sentences verbatim, then code context added via `Per canvas:` and `Per D9:` citations
- Success conditions: preserved verbatim in canvas, migrated to task-level in tech spec
- Group context: reliably included as "Part of [Group] ([priority])" in tech spec
- **Verdict:** Rule 3 stays as a hard rule (Case C). Reshape language only — drop `MUST carry forward verbatim` all-caps, add explanation of WHY preservation matters.

### Over-prescription in tech spec output — CONFIRMED
7 sampled tasks (PLAT-01A, PLAT-01B, PLAT-01D, STOR-01A, STOR-01B, PLUG-02A, PLUG-02B) show:
- File paths and function signatures fully dictated
- Directory structures enumerated
- Config values and algorithms pinned
- "Decisions" pre-made in parentheses
- This correlates with the agent definitions' own over-prescriptive rule language

## Deliverables

1. **Enforcement inventory table** — per-rule classification (A/B/C) with rationale, returned for user sign-off before edits
2. **Rule edits** — delete Case A duplicates, reshape Case B as principle + example, reshape Case C language
3. **Language pass** — sweep all edited files for `MUST`/`NEVER`/`ALWAYS`/`CRITICAL`/`IMPORTANT` all-caps; replace with explanation + normal-case
4. **Rule-order reordering** — identity/context at top, process in middle, hard rules and principles near end
5. **Frontmatter alignment** — fix format gaps discovered during Anthropic spec investigation (see #7)
6. **Regression safety log** — for every rule removed/downgraded, document: original text, failure it prevented, which layer now catches it, whether Round 5 calibration should watch for it

## Acceptance criteria

- [ ] Enforcement inventory table produced and user-approved before edits
- [ ] Net rule count is reduced or equal (not increased) across all audited files
- [ ] No `MUST`/`NEVER`/`ALWAYS` all-caps language remains in qualitative (Case B) rules
- [ ] Hard rules (Case A/C) are reshaped with explanation of WHY, not just commandments
- [ ] Every deleted rule has a regression-safety entry in the e2e report
- [ ] `claude plugin validate .` passes after all edits

## Dependencies

- Blocked by: #1 (prompt strategy doc — the audit references it as the authoring standard)
- Blocks: #3 (OBS-07), #4 (OBS-08), #5 (OBS-06)

## References

- `reports/e2e-003-ido4shape-cloud.md` — OBS-09 (lines 407–453), current rule inventory (lines 416–421), audit criteria (lines 444–451)
- `agents/code-analyzer.md` — 7 rules at lines 178–187, mode-specific at lines 188–227
- `agents/technical-spec-writer.md` — rules at lines 211–219, process at lines 143–209, Goldilocks at lines 83–103
- `agents/spec-reviewer.md` — Stage 1 at lines 25–38, Stage 2 at lines 39–48
- [Claude 4 best practices — aggressive language warning](https://platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices)
- [Anthropic context engineering — edge-case enumeration anti-pattern](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- [Agent Skills best practices — degrees of freedom](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OBS-09: Rule audit across agents and decompose skills #2

Origin

Problem

Current rule inventory

Audit methodology — enforcement inventory

Case A — Double-enforced (delete from prose)

Case B — Prose-only, genuinely qualitative (reshape as principle + example)

Case C — Prose-only but enforceable (keep prose + consider enforcement)

Per-rule audit criteria

Empirical evidence informing the audit

Verbatim preservation (code-analyzer rule 3) — WORKING

Over-prescription in tech spec output — CONFIRMED

Deliverables

Acceptance criteria

Dependencies

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Lines	Rules / enforcement points	Shape
`agents/code-analyzer.md`	227	7 numbered rules + 3 mode-specific instruction blocks	Mix of hard constraints (rule 3 context preservation, rule 7 Read-not-cat) and judgment calls (rule 4 "be honest", rule 5 "don't design solutions")
`agents/technical-spec-writer.md`	218	6 numbered rules + Goldilocks section + Metadata sub-sections + Structure sub-section + 6-step Process with embedded rules (Step 5 alone has 7 items)	Heaviest accumulator. Unclear which rules are load-bearing
`agents/spec-reviewer.md`	95	Stage 1 (9 format checks) + Stage 2 (8 quality checks) + Governance Implications (4 items) + 3-tier classification	Dense enumeration. Format checks are legitimate (parser compliance). Quality checks are mostly judgment calls shaped as rules
`skills/decompose/SKILL.md`	168	Post-round-3 inline-execution rewrite	Watch for rule accumulation from Phase 1 fix
`skills/decompose-tasks/SKILL.md`	136	Same pattern	Same concern
`skills/decompose-validate/SKILL.md`	179	Same pattern	Same concern

OBS-09: Rule audit across agents and decompose skills #2

Description

Origin

Problem

Current rule inventory

Audit methodology — enforcement inventory

Case A — Double-enforced (delete from prose)

Case B — Prose-only, genuinely qualitative (reshape as principle + example)

Case C — Prose-only but enforceable (keep prose + consider enforcement)

Per-rule audit criteria

Empirical evidence informing the audit

Verbatim preservation (code-analyzer rule 3) — WORKING

Over-prescription in tech spec output — CONFIRMED

Deliverables

Acceptance criteria

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions