Problem
After 3 simulation iterations that all passed (zero unresolved pain points, decreasing pain point count each iteration), a domain expert review found 31 critical gaps including: trading calendar mismatches across asset classes, no corporate action handling, missing indicator dependency ordering, no pipeline scheduling, and missing data deduplication fields.
The simulation never caught these because it was asking the wrong question.
Root Cause
The simulate-spec skill walks each bounded context in isolation, verifying that internal logic is consistent (no dead ends, no contradictions, all entities exercised). It does not verify that the spec accurately models real-world behavior.
Specific structural issues:
-
Per-context walkthroughs miss cross-cutting constraints. Trading calendars, corporate actions, and operational scheduling affect every bounded context. Walking contexts independently means these concerns fall through the cracks.
-
No "day in the life" end-to-end scenarios. The simulation walks individual rules but never traces a complete real-world scenario (e.g., "what happens on a weekend?" or "what happens after a stock split?").
-
Convergence is a false signal. Decreasing pain point count across iterations is interpreted as improvement, but it only measures internal consistency. A spec can be perfectly internally consistent while being fundamentally wrong about how the domain works.
-
The adversarial review-simulation gate only checks internal consistency. It verifies: unresolved pain points, untested entities, untested quality attributes, cross-context data shape consistency. It does not check whether the spec accurately reflects real-world domain behavior.
Proposed Fixes
1. Add "Cross-Cutting Walkthrough" requirement to simulate-spec
After per-context walkthroughs, require at least 3 cross-cutting scenarios that trace data through multiple contexts:
- A normal operational day (full pipeline from ingestion to output)
- An edge-case operational day (weekend, holiday, empty database cold start)
- A domain anomaly day (data split/delisting, external service failure, flash crash)
2. Add "Domain Fidelity Check" to review-simulation gate
The adversarial review should explicitly verify that the spec accurately models real-world constraints, not just internal consistency. Add a checklist item:
"I have verified that the spec accurately handles: operational scheduling, data source alignment, calendar/datetime edge cases, and domain-specific anomaly events."
3. Add "Day 1 Deployment" walkthrough
A dedicated walkthrough that traces the system from empty state to first successful output. This catches cold-start gaps (historical data backfill, benchmark initialization, indicator warmup periods).
4. Separate convergence metrics
Track two metrics independently:
- Internal consistency (pain point count — should decrease to zero)
- External coverage (number of real-world scenarios verified — should increase each iteration)
These are orthogonal. A spec can have zero pain points and zero real-world coverage.
Problem
After 3 simulation iterations that all passed (zero unresolved pain points, decreasing pain point count each iteration), a domain expert review found 31 critical gaps including: trading calendar mismatches across asset classes, no corporate action handling, missing indicator dependency ordering, no pipeline scheduling, and missing data deduplication fields.
The simulation never caught these because it was asking the wrong question.
Root Cause
The simulate-spec skill walks each bounded context in isolation, verifying that internal logic is consistent (no dead ends, no contradictions, all entities exercised). It does not verify that the spec accurately models real-world behavior.
Specific structural issues:
Per-context walkthroughs miss cross-cutting constraints. Trading calendars, corporate actions, and operational scheduling affect every bounded context. Walking contexts independently means these concerns fall through the cracks.
No "day in the life" end-to-end scenarios. The simulation walks individual rules but never traces a complete real-world scenario (e.g., "what happens on a weekend?" or "what happens after a stock split?").
Convergence is a false signal. Decreasing pain point count across iterations is interpreted as improvement, but it only measures internal consistency. A spec can be perfectly internally consistent while being fundamentally wrong about how the domain works.
The adversarial review-simulation gate only checks internal consistency. It verifies: unresolved pain points, untested entities, untested quality attributes, cross-context data shape consistency. It does not check whether the spec accurately reflects real-world domain behavior.
Proposed Fixes
1. Add "Cross-Cutting Walkthrough" requirement to simulate-spec
After per-context walkthroughs, require at least 3 cross-cutting scenarios that trace data through multiple contexts:
2. Add "Domain Fidelity Check" to review-simulation gate
The adversarial review should explicitly verify that the spec accurately models real-world constraints, not just internal consistency. Add a checklist item:
3. Add "Day 1 Deployment" walkthrough
A dedicated walkthrough that traces the system from empty state to first successful output. This catches cold-start gaps (historical data backfill, benchmark initialization, indicator warmup periods).
4. Separate convergence metrics
Track two metrics independently:
These are orthogonal. A spec can have zero pain points and zero real-world coverage.