Skip to content

Releases: safiqsindha/Ditto-V5

v5.0 — Phase D Closeout

Choose a tag to compare

@safiqsindha safiqsindha released this 30 Apr 22:12

v5.0 — Phase D Closeout (2026-04-30)

Five-cell parallel violation-detection diagnostic on Claude Haiku 4.5 at n=1,200 chains/cell, two API calls per chain (paired baseline / intervention).

Headline result

4 of 5 cells significant past Bonferroni at α/5 by many orders of magnitude.

Cell Det@Base Det@Int Δ 95% CI FP@Base FP@Int b c χ² p_Bonf
pubg 75.9% 100.0% +24.1% [+21.8, +26.6] 0.0% 0.0% 0 289 287.0 1.1e-63
nba 57.4% 100.0% +42.6% [+39.8, +45.4] 0.8% 9.9% 0 511 509.0 5.2e-112
csgo 65.1% 98.1% +32.9% [+30.3, +35.7] 11.3% 29.8% 0 395 393.0 9.2e-87
rocket_league 0.2% 6.2% +5.9% [+4.7, +7.3] 0.0% 0.0% 0 71 69.0 4.9e-16
poker 100.0% 99.9% -0.1% [-0.25, 0.00] 0.5% 1.1% 1 0 0.0 1.000

4-tier representational hierarchy

Tier Cells Anchored Observable Unary-reducible Profile
0 Saturated poker rule pre-internalized; no lift possible
1 Aligned pubg, nba clean intervention lift to perfect detection
2 Partial csgo ✗ (bomb sites) high lift, elevated FP — confabulation signature
3 Misaligned rocket_league ✗ (positions) tiny but real lift; strict grounding suppresses confabulation

Defensible claim

Constraint reasoning in LLMs is gated by representational alignment between the rule and the observable event structure. It succeeds when violations reduce to observable unary predicates over event streams; it degrades predictably under missing observability; it is suppressed entirely under strict grounding when required variables are absent.

Mechanistic evidence (Layer-2 CoT)

  • NBA residual FPs (50/119 analyzed): 92% cite the 24-second shot clock — second-order shot-clock confabulation despite D-44's time_in_possession_s rename.
  • CSGO residual FPs (50/358 analyzed): 80% cite "Bomb plants only at sites A or B" — textbook constraint-triggered confabulation when the predicate variable isn't surfaced in the rendered chain.

The model fails exactly where the rule's required variables are absent — strongest evidence in v5 that the failure mode is representational, not reasoning-bound.

Total spend

~$5 across the entire experiment (Phase D + crash recovery + prior diagnostics).

What's next

  • v5.1: cross-model replication via OpenRouter (~11 models, n=300/cell, derived-state-marker ablation as second axis). Scoped separately.
  • v5.2: awpy CSGO + per-event RL extraction. Run after cross-model lands.

Documents

  • STATUS.md — end-state status
  • MEMO.md — internal memo + bridge document for the eventual V1–V5.1 arXiv preprint
  • DECISION_LOG.md — D-0 through D-45, every methodology decision
  • RESULTS/phase_d_final.json — locked headline numbers
  • RESULTS/phase_d_raw_batches/ — 23,998 archived raw API responses (~6 MB)
  • RESULTS/phase_d_cot_residual_fps.json — CoT mechanistic evidence

Known issues

  • 9 pre-existing test failures in Fortnite + CSGO pipeline mocks (#5). Pre-date this release, unrelated to v5 results.