Ch.7 — Empirical Bridge (BPB benchmark results)
Master: trios#380 GOLDEN SUNFLOWERS Book
Priority: 🔴 P0 · Word target: 1200 · Owner: bench-agent (PR-1 #374)
🎯 Scope (FROZEN)
§7.1 Methodology (50M transformer, WikiText-103 slice, 7 seeds, 3 formats: GF16/bf16/shuffled). §7.2 Results (mean ± SD per format, paired t-test p, Welch's t-test, BPB grid heatmap, sparkline). §7.3 Discussion of effect size + power analysis. §7.4 Honest comparison vs MX/MXFP4 baselines (arXiv:2510.01863).
🔑 Key claims / formulas
- Methodology table: model 50M params, ctx 1024, seq 2048, optimizer AdamW, lr 6e-4
- WikiText-103 slice (held-out validation, 1024-token windows, sliding stride 64 per Parameter Golf SOTA technique)
- 7 seeds = F₁₇..F₂₁ + L₇=29, L₈=47
- Welch's paired t-test on (BPB_GF16 - BPB_bf16) per seed
- p-value, t-stat, 95% CI, Cohen's d
- Live URL: gs-dashboard.shuttleapp.rs (PR-6)
📦 Deliverables
paper/sections/7_empirical_bridge_bpb_benchmark_results.tex
- Figure 1: BPB grid format×seed heatmap →
figures/bpb_grid.pdf
- Figure 2: Convergence curve overlay →
figures/convergence.pdf
- CSV:
results.csv (publishable raw)
- All R5: SHA256 + git tag per run
🔗 Dependencies
- Ch.6 pre-registration (H₁), Ch.4 GoldenFloat, Ch.9 dashboard
📝 Draft (Markdown — paste here when ready)
Operator types наполни Ch.7 → General fills this section with research-backed Markdown.
✅ Definition of Done (NeurIPS checklist aligned)
🤖 ONE-SHOT directive (когда operator typed ONE SHOT Ch.7)
A CHAPTER-AGENT: take Markdown draft from this issue body §"Draft", convert to LaTeX,
place in paper/sections/7_<slug>.tex, ensure compile via tectonic with main.tex,
open PR linked to #, run cargo clippy/test, push, request review.
Hard deadline per priority (P0 → T-3h, P1 → T-1h, P2 → defer).
phi^2 + phi^-2 = 3 · TRINITY · NEVER STOP
Ch.7 — Empirical Bridge (BPB benchmark results)
Master: trios#380 GOLDEN SUNFLOWERS Book
Priority: 🔴 P0 · Word target: 1200 · Owner: bench-agent (PR-1 #374)
🎯 Scope (FROZEN)
§7.1 Methodology (50M transformer, WikiText-103 slice, 7 seeds, 3 formats: GF16/bf16/shuffled). §7.2 Results (mean ± SD per format, paired t-test p, Welch's t-test, BPB grid heatmap, sparkline). §7.3 Discussion of effect size + power analysis. §7.4 Honest comparison vs MX/MXFP4 baselines (arXiv:2510.01863).
🔑 Key claims / formulas
📦 Deliverables
paper/sections/7_empirical_bridge_bpb_benchmark_results.texfigures/bpb_grid.pdffigures/convergence.pdfresults.csv(publishable raw)🔗 Dependencies
📝 Draft (Markdown — paste here when ready)
✅ Definition of Done (NeurIPS checklist aligned)
main.texvia tectonic, no warningsCloses #<this>+ green CI🤖 ONE-SHOT directive (когда operator typed
ONE SHOT Ch.7)phi^2 + phi^-2 = 3 · TRINITY · NEVER STOP