You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E2E Round 3, OBS-09 — Design Meta (see reports/e2e-003-ido4shape-cloud.md lines 407–453).
Surfaced while scoping OBS-07 and OBS-08 fixes: the proposed fixes were themselves over-prescriptive toward the agents — piling on rules, enforcement lists, detection patterns, and forbidden-term tables. The irony: OBS-07 complains that technical-spec-writer over-prescribes to implementer agents, then proposes to fix it by over-prescribing to technical-spec-writer.
Problem
Agent definitions accumulate rules until the cumulative effect is harmful:
Cognitive overload — every rule is a constraint the agent checks; past a threshold, agents spend more energy rule-checking than problem-solving
Rule conflicts — "be specific" vs. "leave room for implementation" point opposite directions without clear hierarchy
Literal-mindedness — agents following many rules become rule-followers, checking the letter and missing the spirit
Reduced judgment — more explicit rules = less agent thinking (exactly the failure mode OBS-07 identifies)
Diminishing returns — each new rule is marginally less effective; after ~7-10 rules per agent, marginal value turns negative
Additionally, Anthropic's Claude 4 best practices explicitly warn that MUST/NEVER/ALWAYS capitalized imperatives cause overtriggering on Opus 4.5/4.6: "The fix is to dial back any aggressive language. Where you might have said 'CRITICAL: You MUST use this tool when...', you can use more normal prompting like 'Use this tool when...'." Our agent definitions are full of this language. The rigidity observed in Round 3's tech-spec output may be partly caused by our own agent definitions.
Dense enumeration. Format checks are legitimate (parser compliance). Quality checks are mostly judgment calls shaped as rules
skills/decompose/SKILL.md
168
Post-round-3 inline-execution rewrite
Watch for rule accumulation from Phase 1 fix
skills/decompose-tasks/SKILL.md
136
Same pattern
Same concern
skills/decompose-validate/SKILL.md
179
Same pattern
Same concern
Not in scope:agents/project-manager/AGENT.md (384 lines, not tested yet — deferred to Round 5+).
Audit methodology — enforcement inventory
For each rule or rule-shaped item, classify into one of three cases:
Case A — Double-enforced (delete from prose)
Rule exists in agent/skill prose AND in a downstream validator (parser, spec-reviewer, ingest_spec).
Example: "task refs match [A-Z]{2,5}-\d{2,3}[A-Z]?" is in technical-spec-writer.md rule 5 AND in spec-reviewer.md Stage 1 AND in spec-parser.ts. Action: Delete from writer prose. The parser + reviewer enforce it.
Case B — Prose-only, genuinely qualitative (reshape as principle + example)
No parser can check this. It's a judgment call.
Example: "be honest about complexity." Action: Keep in prose. Reshape from rule → principle + good/bad example pair.
Case C — Prose-only but enforceable (keep prose + consider enforcement)
Today it's prose-only, but a reviewer/hook/schema could catch violations.
Example: "preserve strategic context verbatim." Action: Keep in prose (reshaped as explanation). Optionally build enforcement (see #6).
Per-rule audit criteria
Hard constraint or judgment call?
Redundant with another rule? → Consolidate
Conflicts with another rule? → Resolve with explicit priority, or remove the weaker
Can be replaced with an example that teaches the same thing? → Prefer example
Aspirational or actually affects behavior? → Remove if aspirational
Uses enforcement language (MUST, NEVER, ALWAYS) for something that's actually a judgment call? → Downgrade to principle
Empirical evidence informing the audit
Verbatim preservation (code-analyzer rule 3) — WORKING
Traced 5 capabilities (AUTH-02, STOR-01, PLUG-02, VIEW-05, PROJ-01) through all 3 artifacts:
Strategic spec → Canvas: verbatim preservation across all 5 capabilities
Canvas → Technical spec: first 1-2 sentences verbatim, then code context added via Per canvas: and Per D9: citations
Success conditions: preserved verbatim in canvas, migrated to task-level in tech spec
Group context: reliably included as "Part of [Group] ([priority])" in tech spec
Verdict: Rule 3 stays as a hard rule (Case C). Reshape language only — drop MUST carry forward verbatim all-caps, add explanation of WHY preservation matters.
Regression safety log — for every rule removed/downgraded, document: original text, failure it prevented, which layer now catches it, whether Round 5 calibration should watch for it
Acceptance criteria
Enforcement inventory table produced and user-approved before edits
Net rule count is reduced or equal (not increased) across all audited files
No MUST/NEVER/ALWAYS all-caps language remains in qualitative (Case B) rules
Hard rules (Case A/C) are reshaped with explanation of WHY, not just commandments
Every deleted rule has a regression-safety entry in the e2e report
Origin
E2E Round 3, OBS-09 — Design Meta (see
reports/e2e-003-ido4shape-cloud.mdlines 407–453).Surfaced while scoping OBS-07 and OBS-08 fixes: the proposed fixes were themselves over-prescriptive toward the agents — piling on rules, enforcement lists, detection patterns, and forbidden-term tables. The irony: OBS-07 complains that
technical-spec-writerover-prescribes to implementer agents, then proposes to fix it by over-prescribing totechnical-spec-writer.Problem
Agent definitions accumulate rules until the cumulative effect is harmful:
Additionally, Anthropic's Claude 4 best practices explicitly warn that
MUST/NEVER/ALWAYScapitalized imperatives cause overtriggering on Opus 4.5/4.6: "The fix is to dial back any aggressive language. Where you might have said 'CRITICAL: You MUST use this tool when...', you can use more normal prompting like 'Use this tool when...'." Our agent definitions are full of this language. The rigidity observed in Round 3's tech-spec output may be partly caused by our own agent definitions.Current rule inventory
agents/code-analyzer.mdagents/technical-spec-writer.mdagents/spec-reviewer.mdskills/decompose/SKILL.mdskills/decompose-tasks/SKILL.mdskills/decompose-validate/SKILL.mdNot in scope:
agents/project-manager/AGENT.md(384 lines, not tested yet — deferred to Round 5+).Audit methodology — enforcement inventory
For each rule or rule-shaped item, classify into one of three cases:
Case A — Double-enforced (delete from prose)
Rule exists in agent/skill prose AND in a downstream validator (parser, spec-reviewer, ingest_spec).
Example: "task refs match
[A-Z]{2,5}-\d{2,3}[A-Z]?" is intechnical-spec-writer.mdrule 5 AND inspec-reviewer.mdStage 1 AND inspec-parser.ts.Action: Delete from writer prose. The parser + reviewer enforce it.
Case B — Prose-only, genuinely qualitative (reshape as principle + example)
No parser can check this. It's a judgment call.
Example: "be honest about complexity."
Action: Keep in prose. Reshape from rule → principle + good/bad example pair.
Case C — Prose-only but enforceable (keep prose + consider enforcement)
Today it's prose-only, but a reviewer/hook/schema could catch violations.
Example: "preserve strategic context verbatim."
Action: Keep in prose (reshaped as explanation). Optionally build enforcement (see #6).
Per-rule audit criteria
MUST,NEVER,ALWAYS) for something that's actually a judgment call? → Downgrade to principleEmpirical evidence informing the audit
Verbatim preservation (code-analyzer rule 3) — WORKING
Traced 5 capabilities (AUTH-02, STOR-01, PLUG-02, VIEW-05, PROJ-01) through all 3 artifacts:
Per canvas:andPer D9:citationsMUST carry forward verbatimall-caps, add explanation of WHY preservation matters.Over-prescription in tech spec output — CONFIRMED
7 sampled tasks (PLAT-01A, PLAT-01B, PLAT-01D, STOR-01A, STOR-01B, PLUG-02A, PLUG-02B) show:
Deliverables
MUST/NEVER/ALWAYS/CRITICAL/IMPORTANTall-caps; replace with explanation + normal-caseAcceptance criteria
MUST/NEVER/ALWAYSall-caps language remains in qualitative (Case B) rulesclaude plugin validate .passes after all editsDependencies
References
reports/e2e-003-ido4shape-cloud.md— OBS-09 (lines 407–453), current rule inventory (lines 416–421), audit criteria (lines 444–451)agents/code-analyzer.md— 7 rules at lines 178–187, mode-specific at lines 188–227agents/technical-spec-writer.md— rules at lines 211–219, process at lines 143–209, Goldilocks at lines 83–103agents/spec-reviewer.md— Stage 1 at lines 25–38, Stage 2 at lines 39–48