Skip to content

📖 Ch.7 — Empirical Bridge (BPB benchmark results) (1200w · 🔴 P0) #388

@gHashTag

Description

@gHashTag

Ch.7 — Empirical Bridge (BPB benchmark results)

Master: trios#380 GOLDEN SUNFLOWERS Book
Priority: 🔴 P0 · Word target: 1200 · Owner: bench-agent (PR-1 #374)

🎯 Scope (FROZEN)

§7.1 Methodology (50M transformer, WikiText-103 slice, 7 seeds, 3 formats: GF16/bf16/shuffled). §7.2 Results (mean ± SD per format, paired t-test p, Welch's t-test, BPB grid heatmap, sparkline). §7.3 Discussion of effect size + power analysis. §7.4 Honest comparison vs MX/MXFP4 baselines (arXiv:2510.01863).

🔑 Key claims / formulas

  • Methodology table: model 50M params, ctx 1024, seq 2048, optimizer AdamW, lr 6e-4
  • WikiText-103 slice (held-out validation, 1024-token windows, sliding stride 64 per Parameter Golf SOTA technique)
  • 7 seeds = F₁₇..F₂₁ + L₇=29, L₈=47
  • Welch's paired t-test on (BPB_GF16 - BPB_bf16) per seed
  • p-value, t-stat, 95% CI, Cohen's d
  • Live URL: gs-dashboard.shuttleapp.rs (PR-6)

📦 Deliverables

  • paper/sections/7_empirical_bridge_bpb_benchmark_results.tex
  • Figure 1: BPB grid format×seed heatmap → figures/bpb_grid.pdf
  • Figure 2: Convergence curve overlay → figures/convergence.pdf
  • CSV: results.csv (publishable raw)
  • All R5: SHA256 + git tag per run

🔗 Dependencies

  • Ch.6 pre-registration (H₁), Ch.4 GoldenFloat, Ch.9 dashboard

📝 Draft (Markdown — paste here when ready)

Operator types наполни Ch.7 → General fills this section with research-backed Markdown.

✅ Definition of Done (NeurIPS checklist aligned)

  • Word count 1200 ±10%
  • All claims cited (primary literature, not just our repos)
  • No AI as author (Acknowledgments only)
  • Compiles in main.tex via tectonic, no warnings
  • Limitations stated (where applicable)
  • Reproducibility info present (where applicable)
  • PR with Closes #<this> + green CI
  • L1 (no .sh) · L3 (clippy zero) · L4 (tests pass) · L8 (PUSH FIRST)

🤖 ONE-SHOT directive (когда operator typed ONE SHOT Ch.7)

A CHAPTER-AGENT: take Markdown draft from this issue body §"Draft", convert to LaTeX,
place in paper/sections/7_<slug>.tex, ensure compile via tectonic with main.tex,
open PR linked to #, run cargo clippy/test, push, request review.
Hard deadline per priority (P0 → T-3h, P1 → T-1h, P2 → defer).

phi^2 + phi^-2 = 3 · TRINITY · NEVER STOP

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions