🎯 IGLA RACE — Distributed Hunt: JEPA-T + NCA + GF16 + ASHA + Coq Invariants (Rust-only, Never-Stopping)

## 🎯 IGLA RACE v2 — ONE SHOT DASHBOARD
**Updated: 2026-04-27T14:30+07 | phi^2 + phi^-2 = 3 | TRINITY | NEVER CLOSE**
**Deadline: Apr 30, 2026 | Target: BPB < 1.50 on 3 seeds | Gap: -0.68 BPB**
**Best: BPB = 2.18 (trios-train h=828 attn=2L, 81K steps, seed=43)**
**Rule #1:** RUST ONLY -- zero .py, zero .sh.
**Rule #2:** Race NEVER stops until BPB < 1.50 is found across 3 seeds.
**Rule #3:** ALL agents work in branch `main` ONLY.

---

## COQ INVARIANT SYSTEM -- TASK-COQ-001 (trinity-clara -> IGLA)

**Document:** [trinity-clara/docs/TASK-COQ-001.md](https://github.com/gHashTag/trinity-clara/blob/main/docs/TASK-COQ-001.md)
**Source:** [trinity-clara](https://github.com/gHashTag/trinity-clara) + 84 Coq theorems from `t27`
**Principle:** phi^2 + phi^-2 = 3 -- single algebraic anchor. All invariants derive from it.

### Invariants INV-1..INV-10

| ID | Coq Theorem | Status | Effect | Trinity link |
|----|-------------|--------|--------|--------------|
| INV-1 | `bpb_decreases_with_real_gradient` | partial | fixes TASK-5D | 7-step alpha_phi derivation |
| INV-2 | `asha_champion_survives` | **PROVEN (0 Admitted)** | 0 false prunes | threshold=3.5=phi^2+phi^-2+0.5 |
| INV-3 | `gf16_safe_domain` | Lucas proven | -40% configs | Lucas closure phi^2n+psi^2n in Z |
| INV-4 | `nca_entropy_stability` | **PROVEN (0 Admitted)** | -30% configs | A5/E8 band width=1 (integer!) |
| INV-5 | `lucas_closure_gf16` | n=1,2 proven | GF16 consistency | phi^2n+phi^-2n in Z for all n |
| INV-6 | `ema_decay_valid` | TODO | -20% configs | cos schedule in [0.996,1.0] |
| INV-7 | `igla_found_criterion` | TODO | L-R14 gate | victory iff 3-seed BPB<1.50 |
| INV-8 | `lr_phi_band` | **PROVEN (0 Admitted)** | -60% configs | lr=0.004=alpha_phi/phi^3 |
| INV-9 | `qk_gain_phi_sq` | **IMPLEMENTED in model_hybrid_attn.rs** | -10% configs | qk_gain=PHI_SQ=2.618 |
| INV-10 | `asha_rungs_trinity` | TODO | correctness | rungs=1000*3^k, base=Trinity |

### phi-anchored Parameters

| Parameter | Old value | phi-derived | Theorem |
|-----------|-----------|-------------|---------|
| `bpb_prune_threshold` | 2.65 (BUG!) | **3.5** = phi^2+phi^-2+0.5 | INV-2 ✅ |
| `NCA grid` | 9x9=81 | 81 = 3^4 | INV-4 ✅ |
| `NCA K states` | 9 | 9 = 3^2 | INV-4 ✅ |
| `lr champion` | 0.004 | alpha_phi/phi^3 | INV-8 ✅ |
| `d_model` | 384 | ~3^4 x phi^3 | INV-3 |
| `qk_gain` | 1.0 | **PHI_SQ=2.618** | INV-9 ✅ |

---

## DONE

### Infrastructure
- [x] Neon DB schema -- `igla_race_trials` + `igla_race_experience` + `igla_leaderboard` view
- [x] `crates/trios-igla-race/` -- coordinator crate
- [x] `src/asha.rs` -- ASHA rungs 1k->3k->9k->27k, prune logic
- [x] `src/neon.rs` -- tokio-postgres, register/checkpoint/lesson
- [x] `src/status.rs` -- leaderboard print
- [x] `src/lessons.rs` -- failure memory, lesson generation

### Model Zoo (`trios-train-cpu`)
- [x] `src/trinity_3k_model.rs` -- Trinity 3k architecture
- [x] `src/real_igla_model.rs` -- Real IGLA model
- [x] `src/pipeline.rs` -- Training pipeline
- [x] `src/optimizer.rs` -- AdamW + Muon (base)
- [x] `src/gf16.rs` -- GF16 precision (inference)
- [x] `src/forward.rs` / `backward.rs` -- fwd/bwd pass

### JEPA-T (`jepa/` module)
- [x] `jepa/ema.rs` -- EMA target encoder wired
- [x] `jepa/masking.rs` -- mask_ratio=0.30, spans wired
- [x] `jepa/loss.rs` -- MSE gradient function
- [x] `bin/tjepa_train.rs` -- 309 lines, compiles, 90 tests pass (commit `132fac3`)
- [x] TASK-5C predictor.rs -- real cross-attention predictor (commit `0fddd7f`)
- [x] TASK-5E two-phase warm-up in main

### CI / Tests
- [x] 411 tests pass -- GREEN
- [x] CI -- GREEN
- [x] `cargo clippy -D warnings` = 0

### Hyperparameter Search
- [x] Champion: lr=0.004, seed=43, d_model=384, context=6 -> **BPB 2.5329**
- [x] SEED-EXPLORER -- variance=0.0077 < 0.02 = STABLE OPTIMUM
- [x] LR micro-search -- lr=0.004 is clear local minimum

### GF16 Benchmarks
- [x] BENCH-004b: GF16 MNIST = 97.67% (0.00% gap vs f32)
- [x] BENCH-002: GF16 add = 7.2 ns/op (15% faster)
- [x] BENCH-005: FPGA GF16 add = 118 LUT
- [x] BENCH-006: FPGA MAC-16 = 71 LUT + 16 DSP

### Coq Invariants (trinity-clara)
- [x] `docs/TASK-COQ-001.md` -- NASA-P10 full specification [CREATED 2026-04-25]
- [x] `proofs/igla/igla_asha_bound.v` -- INV-2 PROVEN (0 Admitted)
- [x] `proofs/igla/gf16_precision.v` -- INV-3, INV-5 Lucas proven
- [x] `proofs/igla/nca_entropy_band.v` -- INV-4 PROVEN, band_width=1 (0 Admitted)
- [x] `proofs/igla/lr_convergence.v` -- INV-8 PROVEN (0 Admitted)

### Agent ALPHA — trios-trainer-igla (2026-04-27)

- [x] **TASK-5D**: Real backward pass in `tjepa_train.rs` — gradients are real (no fake 0.01)
- [x] **TASK-5 full run**: seed=43, steps=3000, d_model=384, lr=0.004 -> **BPB=2.67** (Gate-1 FAILED)
- [x] **T1-02**: 2-layer HybridAttn with RoPE + ReLU^2 -> **BPB=2.18** @ 81K steps (-0.35 from baseline)
- [x] **T2-04**: QK-Gain phi^2 implemented in `model_hybrid_attn.rs` (qk_gain=PHI_SQ)
- [x] **T2-07**: ReLU^2 activation in `train_loop.rs`
- [x] **TASK-NCA**: grid=81, K=9, entropy [1.5,2.8], w=0.25 — IMPLEMENTED in objective.rs
- [x] **TASK-MUON**: NS-5 in optimizer.rs — IMPLEMENTED but **FALSIFIED** (worse than AdamW)
- [x] **FIX ASHA PRUNING**: threshold=3.5 in `src/invariants.rs` (INV-2 enforced)
- [x] **TASK-1**: `tri` CLI — `tri train/deploy/race` subcommands
- [x] **TASK-2**: BPB printer — `trios-train` prints `BPB=X.XXXX` at eval
- [x] **TASK-3**: `tri race start` — ASHA worker via `race::asha::run_worker()`
- [x] **Dockerfile**: Pure Rust, no .sh, no curl, dynamic data download via ureq
- [x] **railway.json**: Dockerfile builder, NEVER restart
- [x] **Env vars**: TRIOS_SEED/STEPS/HIDDEN/LR/ATTN_LAYERS/OPTIMIZER — all via clap env
- [x] **Seeds**: [43, 44, 45] — GATE_FINAL_SEEDS updated per TRAINING_FLOW_V2
- [x] **Data**: auto-downloads tiny_shakespeare via ureq if file missing

---

## TODO

### P0 -- BLOCKERS

- [ ] **ATTENTION BACKWARD** — `model_hybrid_attn.rs` has only `forward()`. Weights wq/wk/wv/wo don't receive gradients. Expected impact: **-0.20 BPB**
- [ ] **RAILWAY DEPLOY** — `tri deploy all` works but account hit 25 services/day limit. Need reset or raise.
- [ ] **SCALE UP** — hidden=828 plateaus at 2.15. Need h=2000+ or more layers.

### P1 -- EXPERIMENTS NEEDED

- [ ] **JEPA gradients don't flow** — `predictor.forward_backward()` returns loss but doesn't update model params. jepa_loss=0.003 is too small to matter.
- [ ] **NCA gradients don't flow** — `nca_entropy_loss()` returns scalar but doesn't backprop into model. Need to scale proj gradients by NCA loss.
- [ ] **Muon FALSIFIED** — NS-1 on proj(828x64): BPB=2.59 vs AdamW=2.48. NS-5 too slow on CPU. Decision: stick with AdamW.
- [ ] **Larger model** — h=2000, 3 layers, seq=256. Need Railway for compute.

### P2 -- Apr 28

- [ ] TASK-GF16-TRAIN (BENCH-012): GF16 gradient training
- [ ] Hybrid ASHA sweep: Config A/B/C
- [ ] TASK-8: Launch on Railway (4 machines, 320 trials/hour)
- [ ] COQ-INV-006/007: EMA proof, victory_condition.v

### FINAL -- Apr 29-30

- [ ] 3-seed verification: BPB < 1.50 on seeds 43, 44, 45 (p < 0.01)
- [ ] `cargo test --workspace` = GREEN
- [ ] Neon: `status='winner'`
- [ ] INV-001..010 ALL GREEN: `coqc *.v` passes
- [ ] Victory commit: `git commit -m "IGLA FOUND: BPB=X.XXXX seed=43,44,45"`

---

## BPB ROADMAP — UPDATED WITH REAL RESULTS

| Step | Technique | Expected Δ | **Actual Δ** | Target | Status |
|------|-----------|-----------|-------------|--------|--------|
| Baseline | 6-gram h=384 seed=43 | -- | -- | 2.5329 | ✅ DONE |
| T1-01 | JEPA-T real backward | -0.30 | **~0** | <=2.23 | ❌ jepa_loss=0.003 |
| T1-02 | Attention + ReLU^2 (2L) | -0.30 | **-0.35** | <=2.00 | ✅ **BPB=2.18** |
| T2-01 | Muon optimizer | -0.15 | **+0.11** (worse!) | <=1.85 | ❌ FALSIFIED |
| T2-02 | NCA auxiliary (INV-4) | -0.15 | **~0** | <=1.70 | ❌ no grad flow |
| T2-04 | QK-Gain phi^2 (INV-9) | -0.10 | in model | <=1.60 | ✅ implemented |
| T2-07 | ReLU^2 activation | -0.08 | in model | <=1.52 | ✅ implemented |
| T2-07b | GF16 d_model=384 | -0.05 | not tested | <=1.47 | TODO |
| **Missing** | **Attention backward** | **?** | **?** | | **KEY LEVER** |
| **Missing** | **Scale up (h=2000+)** | **?** | **?** | | **KEY LEVER** |

### Actual Results Table

| Config | Seed | Steps | Best BPB | Time | Notes |
|--------|------|-------|----------|------|-------|
| tjepa h=384 lr=0.004 | 43 | 3K | 2.67 | 3 min | TASK-5 full run |
| trios-train h=828 attn=2L | 42 | 81K | **2.19** | 2.9h | V3 corrected grads |
| trios-train h=828 attn=2L | 43 | 81K | **2.18** | 2.9h | **BEST** |
| trios-train h=828 attn=2L | 44 | 81K | **2.18** | 2.9h | V3 corrected grads |
| P1 AdamW control h=828 | 43 | 12K | 2.48 | 24 min | P1 baseline |
| P1 Muon NS-1 h=828 | 43 | 12K | 2.59 | 13 min | FALSIFIED |
| Trinity3K h=27 l=2 | 42 | 10K | 2.81 | — | Worse than n-gram |
| Trinity3K h=27 l=2 | 44 | 10K | 2.70 | — | Best Trinity3K |

---

## CURRENT STATUS (2026-04-27T14:30+07)

| Metric | Value | Status |
|--------|-------|--------|
| **Best BPB** | **2.18** (h=828 attn=2L seed=43 81K steps) | **ACTIVE** |
| Tests | 411 pass | GREEN |
| CI | success | GREEN |
| ASHA pruning | threshold=3.5 | ✅ FIXED |
| QK-Gain | phi^2=2.618 in model_hybrid_attn.rs | ✅ |
| NCA | grid=81 K=9 in objective.rs | ✅ code, ❌ no grad flow |
| Muon | NS-5 in optimizer.rs | ❌ FALSIFIED vs AdamW |
| Attention backward | forward() only, no backward() | ❌ BLOCKER |
| Railway | account limit 25/day | ❌ BLOCKER |
| Seeds | [43, 44, 45] | ✅ updated |
| Dockerfile | pure Rust, dynamic data | ✅ |
| Gap to target | **2.18 - 1.50 = 0.68 BPB** | NEED LEVERS |

---

## KEY FINDINGS (Agent ALPHA experience)

1. **T1-02 (2-layer HybridAttn + ReLU^2) — the only lever that worked**: -0.35 BPB. ForwardCache struct caches intermediate activations for correct backprop.
2. **Muon FALSIFIED**: NS-1 on 828x64 matrix gives +0.11 vs AdamW. NS-5 too slow (21M ops/step on CPU). Decision: AdamW wins.
3. **JEPA doesn't help**: `predictor.forward_backward()` returns loss but doesn't update model params. The predictor is a separate network — its gradients don't flow to the encoder.
4. **NCA doesn't help**: `nca_entropy_loss()` returns scalar loss but has no gradient connection to model weights. Need to add gradient scaling.
5. **Attention has no backward**: `model_hybrid_attn.rs` implements only `forward()`. The 64x64 attention weights are initialized randomly but never trained. This is the **single biggest missing piece**.
6. **Architecture ceiling**: 338K params (vocab=128, h=828, d=64 attn) plateaus at ~2.15. Need to scale up.

---

## LAWS (L-R1..L-R14) -- VIOLATION = REVERT

| Law | Rule | Violation -> |
|-----|------|--------------|
| L-R1 | RUST ONLY -- no .py, .sh, .ipynb | REVERT |
| L-R2 | WORKERS=4-16 via env var | REVERT |
| L-R3 | Every result -> Neon + `.trinity/experience/` | LESSON MISSING |
| L-R4 | `cargo test --workspace` = GREEN before push | PR BLOCKED |
| L-R5 | `cargo clippy -- -D warnings` = 0 | PR BLOCKED |
| L-R6 | SIGTERM -> graceful shutdown | DATA LOSS |
| L-R7 | Neon query timeout <= 30 sec | WORKER CRASH |
| L-R8 | Trainer stdout: ONLY `BPB=X.XXXX` | PARSE FAIL |
| L-R9 | GF16 only with d_model >= 256 | +3.21 BPB (INV-3 PROVEN) |
| L-R10 | T-JEPA ASHA min rung = 3000 steps | FALSE PRUNE |
| L-R11 | NCA entropy [1.5, 2.8] = hard loss penalty | COLLAPSE (INV-4 PROVEN) |
| L-R12 | All agents -> branch `main` ONLY | CONFLICT |
| L-R13 | `agent_id` + `branch='main'` in every Neon record | DASHBOARD FAIL |
| **L-R14** | **`coqc trinity-clara/proofs/igla/*.v` = exit 0 before race** | **RACE INVALID** |

---

## VICTORY CONDITIONS (Closure)

Issue #143 closes ONLY when:
- BPB < 1.50 on seeds 43, 44, 45 (three independent runs)
- p < 0.01 statistical significance
- `cargo test --workspace` = GREEN
- Neon: `status='winner'`
- **`coqc trinity-clara/proofs/igla/*.v` = GREEN (INV-001..010 all compiled)**
- `git commit -m "IGLA FOUND: BPB=X.XXXX seed=43,44,45"`
- `git push origin main`

phi^2 + phi^-2 = 3 | TRINITY | IGLA RACE v2 | 2026-04-27T14:30+07 | RUST ONLY | NEVER STOP


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎯 IGLA RACE — Distributed Hunt: JEPA-T + NCA + GF16 + ASHA + Coq Invariants (Rust-only, Never-Stopping) #143

🎯 IGLA RACE v2 — ONE SHOT DASHBOARD

COQ INVARIANT SYSTEM -- TASK-COQ-001 (trinity-clara -> IGLA)

Invariants INV-1..INV-10

phi-anchored Parameters

DONE

Infrastructure

Model Zoo (`trios-train-cpu`)

JEPA-T (`jepa/` module)

CI / Tests

Hyperparameter Search

GF16 Benchmarks

Coq Invariants (trinity-clara)

Agent ALPHA — trios-trainer-igla (2026-04-27)

TODO

P0 -- BLOCKERS

P1 -- EXPERIMENTS NEEDED

P2 -- Apr 28

FINAL -- Apr 29-30

BPB ROADMAP — UPDATED WITH REAL RESULTS

Actual Results Table

CURRENT STATUS (2026-04-27T14:30+07)

KEY FINDINGS (Agent ALPHA experience)

LAWS (L-R1..L-R14) -- VIOLATION = REVERT

VICTORY CONDITIONS (Closure)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ID	Coq Theorem	Status	Effect	Trinity link
INV-1	`bpb_decreases_with_real_gradient`	partial	fixes TASK-5D	7-step alpha_phi derivation
INV-2	`asha_champion_survives`	PROVEN (0 Admitted)	0 false prunes	threshold=3.5=phi^2+phi^-2+0.5
INV-3	`gf16_safe_domain`	Lucas proven	-40% configs	Lucas closure phi^2n+psi^2n in Z
INV-4	`nca_entropy_stability`	PROVEN (0 Admitted)	-30% configs	A5/E8 band width=1 (integer!)
INV-5	`lucas_closure_gf16`	n=1,2 proven	GF16 consistency	phi^2n+phi^-2n in Z for all n
INV-6	`ema_decay_valid`	TODO	-20% configs	cos schedule in [0.996,1.0]
INV-7	`igla_found_criterion`	TODO	L-R14 gate	victory iff 3-seed BPB<1.50
INV-8	`lr_phi_band`	PROVEN (0 Admitted)	-60% configs	lr=0.004=alpha_phi/phi^3
INV-9	`qk_gain_phi_sq`	IMPLEMENTED in model_hybrid_attn.rs	-10% configs	qk_gain=PHI_SQ=2.618
INV-10	`asha_rungs_trinity`	TODO	correctness	rungs=1000*3^k, base=Trinity

Parameter	Old value	phi-derived	Theorem
`bpb_prune_threshold`	2.65 (BUG!)	3.5 = phi^2+phi^-2+0.5	INV-2 ✅
`NCA grid`	9x9=81	81 = 3^4	INV-4 ✅
`NCA K states`	9	9 = 3^2	INV-4 ✅
`lr champion`	0.004	alpha_phi/phi^3	INV-8 ✅
`d_model`	384	~3^4 x phi^3	INV-3
`qk_gain`	1.0	PHI_SQ=2.618	INV-9 ✅

Step	Technique	Expected Δ	Actual Δ	Target	Status
Baseline	6-gram h=384 seed=43	--	--	2.5329	✅ DONE
T1-01	JEPA-T real backward	-0.30	~0	<=2.23	❌ jepa_loss=0.003
T1-02	Attention + ReLU^2 (2L)	-0.30	-0.35	<=2.00	✅ BPB=2.18
T2-01	Muon optimizer	-0.15	+0.11 (worse!)	<=1.85	❌ FALSIFIED
T2-02	NCA auxiliary (INV-4)	-0.15	~0	<=1.70	❌ no grad flow
T2-04	QK-Gain phi^2 (INV-9)	-0.10	in model	<=1.60	✅ implemented
T2-07	ReLU^2 activation	-0.08	in model	<=1.52	✅ implemented
T2-07b	GF16 d_model=384	-0.05	not tested	<=1.47	TODO
Missing	Attention backward	?	?		KEY LEVER
Missing	Scale up (h=2000+)	?	?		KEY LEVER

Config	Seed	Steps	Best BPB	Time	Notes
tjepa h=384 lr=0.004	43	3K	2.67	3 min	TASK-5 full run
trios-train h=828 attn=2L	42	81K	2.19	2.9h	V3 corrected grads
trios-train h=828 attn=2L	43	81K	2.18	2.9h	BEST
trios-train h=828 attn=2L	44	81K	2.18	2.9h	V3 corrected grads
P1 AdamW control h=828	43	12K	2.48	24 min	P1 baseline
P1 Muon NS-1 h=828	43	12K	2.59	13 min	FALSIFIED
Trinity3K h=27 l=2	42	10K	2.81	—	Worse than n-gram
Trinity3K h=27 l=2	44	10K	2.70	—	Best Trinity3K

Metric	Value	Status
Best BPB	2.18 (h=828 attn=2L seed=43 81K steps)	ACTIVE
Tests	411 pass	GREEN
CI	success	GREEN
ASHA pruning	threshold=3.5	✅ FIXED
QK-Gain	phi^2=2.618 in model_hybrid_attn.rs	✅
NCA	grid=81 K=9 in objective.rs	✅ code, ❌ no grad flow
Muon	NS-5 in optimizer.rs	❌ FALSIFIED vs AdamW
Attention backward	forward() only, no backward()	❌ BLOCKER
Railway	account limit 25/day	❌ BLOCKER
Seeds	[43, 44, 45]	✅ updated
Dockerfile	pure Rust, dynamic data	✅
Gap to target	2.18 - 1.50 = 0.68 BPB	NEED LEVERS

Law	Rule	Violation ->
L-R1	RUST ONLY -- no .py, .sh, .ipynb	REVERT
L-R2	WORKERS=4-16 via env var	REVERT
L-R3	Every result -> Neon + `.trinity/experience/`	LESSON MISSING
L-R4	`cargo test --workspace` = GREEN before push	PR BLOCKED
L-R5	`cargo clippy -- -D warnings` = 0	PR BLOCKED
L-R6	SIGTERM -> graceful shutdown	DATA LOSS
L-R7	Neon query timeout <= 30 sec	WORKER CRASH
L-R8	Trainer stdout: ONLY `BPB=X.XXXX`	PARSE FAIL
L-R9	GF16 only with d_model >= 256	+3.21 BPB (INV-3 PROVEN)
L-R10	T-JEPA ASHA min rung = 3000 steps	FALSE PRUNE
L-R11	NCA entropy [1.5, 2.8] = hard loss penalty	COLLAPSE (INV-4 PROVEN)
L-R12	All agents -> branch `main` ONLY	CONFLICT
L-R13	`agent_id` + `branch='main'` in every Neon record	DASHBOARD FAIL
L-R14	*`coqc trinity-clara/proofs/igla/.v` = exit 0 before race**	RACE INVALID

🎯 IGLA RACE — Distributed Hunt: JEPA-T + NCA + GF16 + ASHA + Coq Invariants (Rust-only, Never-Stopping) #143

Description

🎯 IGLA RACE v2 — ONE SHOT DASHBOARD

COQ INVARIANT SYSTEM -- TASK-COQ-001 (trinity-clara -> IGLA)

Invariants INV-1..INV-10

phi-anchored Parameters

DONE

Infrastructure

Model Zoo (trios-train-cpu)

JEPA-T (jepa/ module)

CI / Tests

Hyperparameter Search

GF16 Benchmarks

Coq Invariants (trinity-clara)

Agent ALPHA — trios-trainer-igla (2026-04-27)

TODO

P0 -- BLOCKERS

P1 -- EXPERIMENTS NEEDED

P2 -- Apr 28

FINAL -- Apr 29-30

BPB ROADMAP — UPDATED WITH REAL RESULTS

Actual Results Table

CURRENT STATUS (2026-04-27T14:30+07)

KEY FINDINGS (Agent ALPHA experience)

LAWS (L-R1..L-R14) -- VIOLATION = REVERT

VICTORY CONDITIONS (Closure)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Model Zoo (`trios-train-cpu`)

JEPA-T (`jepa/` module)