Skip to content

# Trainer Consolidation Plan — crates/trios-trainer #321

@gHashTag

Description

@gHashTag

Issue: gHashTag/trios#143
Author: Computer agent (R5-honest, ground-truth via gh API)
Anchor: φ² + φ⁻² = 3
Champion baseline (STANDS): 2446855 BPB=2.2393 @ 27K seed=43
Gate-2 deadline: 2026-04-30 23:59 UTC (T-3.5d)


R5-honest audit of current state

Crates with overlapping training code (5)

Crate Lines (src/) Active path? Verdict
crates/trios-train-cpu ~330 KB across 23 .rs files in src/, 23 binaries in src/bin/ ✅ champion path (hybrid_train.rs) MIGRATE active subset → trios-trainer; DELETE rest
crates/trios-training ~80 KB, half behind feature=burn-backend ❌ 0 path-dep references in workspace DELETE entire crate
crates/trios-training-ffi 3 KB stub for Zig FFI ❌ no code uses it DELETE entire crate
crates/trios-igla-trainer ~17 KB (jepa_runner + audit + schedule) 🟡 referenced by leaderboard.yml + trios-cli + asha.rs MERGE jepa_runner into trios-trainer; DELETE rest
crates/trios-igla-race ~250 KB, 5 backup files ✅ source of truth for ASHA / invariants / victory KEEP — trainer depends on this. Delete main_*.rs.backup, victory_new.rs

Verified duplicates

  • 3 copies of transformer.rs: trios-train-cpu/src/transformer.rs, trios-training/src/transformer.rs, trios-training/src/trinity_3k_transformer.rs
  • 2 JEPA paths: trios-train-cpu/src/jepa/{ema,loss,masking,predictor,mod}.rs (29 KB) vs trios-igla-trainer/src/jepa_runner.rs (2.5 KB)
  • 2 EMA implementations: trios-train-cpu/src/jepa/ema.rs vs trios-igla-race/src/ema.rs
  • 2 invariants modules: trios-train-cpu/src/invariants.rs (14 KB) vs trios-igla-race/src/invariants.rs (16 KB) — race version is canonical (per L-R14)
  • 23 train binaries in trios-train-cpu/src/bin/ — only hybrid_train.rs is on the champion path
  • Backup pollution in trios-igla-race/src/: main.rs.backup, main_broken.rs, main_clean.rs, main_corrupted.rs, main_fixed.rs, victory_new.rs, lib.rs.tmp, predictor.rs.bak2, ngram_train.rs.bak, ngram_train_backup.rs
  • R1 violations (Python in repo): scripts/igla_race_worker.py 18 KB, scripts/igla_train.py 8 KB, scripts/train_gpt.py 47 KB

DEAD list (delete in consolidation PR)

Whole crates

```
crates/trios-training/ — 0 references, archive entire dir
crates/trios-training-ffi/ — Zig stub never wired
```

Backup / corrupted files

```
crates/trios-igla-race/src/main.rs.backup
crates/trios-igla-race/src/main_broken.rs
crates/trios-igla-race/src/main_clean.rs
crates/trios-igla-race/src/main_corrupted.rs
crates/trios-igla-race/src/main_fixed.rs
crates/trios-igla-race/src/victory_new.rs
crates/trios-train-cpu/src/lib.rs.tmp
crates/trios-train-cpu/src/jepa/predictor.rs.bak2
crates/trios-train-cpu/src/bin/ngram_train.rs.bak
crates/trios-train-cpu/src/bin/ngram_train_backup.rs
```

Dead binaries in trios-train-cpu/src/bin/ (kept here only because they share lib.rs; once trainer migrates, delete with the crate)

```
attn_train.rs cpu_train.rs lstm_train.rs
concat_train.rs igla_train.rs ngram_train.rs
arch_explorer.rs igla_trigram.rs ngram_train_gf16.rs
r12_optimizer_race.rs train_cpu.rs transformer_train.rs
trinity_3k_simple_train.rs trinity_3k_fineweb_train.rs trinity_3k_tinyshakespeare.rs
trinity_pr1722.rs trinity_tournament.rs train_v2.rs
```

Python scripts (R1 violation)

```
scripts/igla_race_worker.py
scripts/igla_train.py
scripts/train_gpt.py
```


ALIVE list (migrate into crates/trios-trainer/)

```
crates/trios-train-cpu/src/transformer.rs → src/model.rs (canonical)
crates/trios-train-cpu/src/hybrid_attn.rs → src/model_hybrid_attn.rs
crates/trios-train-cpu/src/optimizer.rs → src/optimizer.rs
crates/trios-train-cpu/src/forward.rs → src/forward.rs
crates/trios-train-cpu/src/backward.rs → src/backward.rs
crates/trios-train-cpu/src/objective.rs → src/objective.rs
crates/trios-train-cpu/src/jepa/ → src/jepa/
crates/trios-train-cpu/src/gf16.rs → DELETE; re-export from trios-golden-float
crates/trios-train-cpu/src/tokenizer.rs → src/data/tokenizer.rs
crates/trios-train-cpu/src/bin/hybrid_train.rs → src/bin/trios-train.rs (rewritten)
crates/trios-train-cpu/src/bin/tjepa_train.rs → MERGE into trios-train.rs as --mode tjepa
crates/trios-igla-trainer/src/jepa_runner.rs → MERGE into src/jepa/runner.rs
crates/trios-igla-race/ → KEEP, depend on it
```


Target layout

```
crates/trios-trainer/
├── Cargo.toml ← single bin "trios-train"
├── README.md ← run-on-any-machine + Railway recipe
├── Dockerfile ← multi-stage rust:1.75-slim → debian:bookworm-slim
├── railway.json ← Railway service config
├── .dockerignore
├── configs/
│ ├── champion.toml ← reproduce 2446855 (BPB=2.2393)
│ ├── gate2-attempt.toml ← HybridAttn + JEPA push
│ └── needle-v1-mup.toml ← L-V1 muP-transfer variant
├── src/
│ ├── lib.rs ← façade exports
│ ├── config.rs ← TOML schema + env override + validate(INV-8)
│ ├── train_loop.rs ← step loop, eval, ledger emit
│ ├── ledger.rs ← triplet-validated emit + embargo block
│ ├── checkpoint.rs ← save/load
│ ├── model.rs / hybrid_attn.rs
│ ├── optimizer.rs (AdamW + Muon + φ-schedule)
│ ├── jepa.rs (mod) → jepa/{ema, loss, masking, predictor}.rs
│ ├── objective.rs ← combined loss
│ ├── data.rs (tokenizer + loaders)
│ ├── gf16.rs ← re-export from trios-golden-float
│ └── bin/trios-train.rs ← clap → load config → run()
└── tests/
├── reproduce_champion.rs (smoke + ignored full)
└── invariants.rs (mirror INV-1..INV-10 from trios-igla-race)
```


Run patterns unlocked

Any machine (clone + cargo)

```bash
git clone https://github.com/gHashTag/trios.git
cd trios
cargo run --release -p trios-trainer --bin trios-train -- \
--config crates/trios-trainer/configs/champion.toml --seed 43
```

Railway (3 parallel seeds for Gate-2)

```bash
railway login
railway link gHashTag/trios
for s in 43 44 45; do
railway service create "trios-trainer-seed-$s"
railway variables set TRIOS_SEED=$s --service "trios-trainer-seed-$s"
railway up --service "trios-trainer-seed-$s"
done
```

Docker on any VPS

```bash
docker run --rm -e TRIOS_SEED=44 -e TRIOS_LEDGER_PUSH=1 \
-v $PWD/assertions:/work/assertions \
ghcr.io/ghashtag/trios-trainer:latest
```


Phased PR plan (R10 atomicity)

PR-1 (skeleton, this artifact) — crates/trios-trainer/ empty crate

  • Add to workspace members, compile cleanly, no migrated code yet
  • Adds Dockerfile, railway.json, configs/*.toml, README, ledger.rs (live), config.rs (live), train_loop.rs (skeleton)
  • Acceptance: cargo build -p trios-trainer green; cargo test -p trios-trainer 1 test passes

PR-2 — migrate model + optimizer + data

  • Move transformer.rs, hybrid_attn.rs, optimizer.rs, tokenizer.rs, forward.rs, backward.rs
  • Update trios-train-cpu/src/bin/hybrid_train.rs to depend on trios-trainer (transitional)
  • Acceptance: champion config dry-runs end-to-end; full run reproduces ≈ 2.2393 ± 0.01

PR-3 — migrate JEPA + objective

  • Move jepa/* and objective.rs; merge trios-igla-trainer::jepa_runner into src/jepa/runner.rs
  • Acceptance: gate2-attempt.toml runs full training on a single machine

PR-4 — DELETE phase (the housekeeping)

  • Remove crates/trios-training/, crates/trios-training-ffi/, all backup files, all 22 dead bins
  • Remove scripts/igla_*.py + scripts/train_gpt.py (R1)
  • Update CI .github/workflows/leaderboard.yml to call trios-trainer instead of trios-igla-trainer
  • Acceptance: workspace builds, test suite passes, no // TODO migrate left

PR-5 — Railway publish

  • Push image to ghcr.io/ghashtag/trios-trainer
  • Wire Railway service gHashTag/trios → trainer-seed-{43,44,45}
  • Acceptance: 3 parallel seeds emit rows to assertions/seed_results.jsonl from cloud

Risk register

Risk Mitigation
Champion path breaks during migration PR-1 is empty crate; PR-2 keeps trios-train-cpu alive in parallel until reproduction test goes green
Lost git history on moved files Use git mv for every migrated file; never copy-then-delete
INV-8 / INV-2 drift Trainer imports from trios-igla-race only — no private invariants module
Embargo bypass ledger.rs::is_embargoed runs before every emit; CI test asserts an embargoed SHA refuses
Railway image size Multi-stage build → final image ≈ 250 MB (rust binary + git + libssl + ca-certs)
Cost on Railway One trainer service per seed; auto-pause on idle; restartPolicyMaxRetries=10

Anchor: φ² + φ⁻² = 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions