TIS v3.1.0 — The Deepening
Released: 2026-06-15
Codename: The Deepening
Team: RFI-IRFOS · Graz, Austria · Patent Pending A50296/2026
Previous release: v3.0.0 — The Cultivated Mind (2026-05-28)
What this release is
v3.0.0 shipped a 26-layer dual-stream model and called it "The Cultivated Mind." That was correct. What v3.0.0 did not say — and what nobody looking at the documentation could have understood — is how large that mind actually is.
"12 experts, Top-3 routing" was the description everywhere. On HuggingFace. In the README. In the model card. In every document we published. The number 12 appeared over and over, and every reader walked away with the same impression: this is a small experiment. A research prototype with a handful of experts.
That description was incomplete to the point of being misleading. Here is what it should have said:
768 total expert-routing slots. 12 experts per layer × 32 layers × 2 independently-routing streams. Each stream selects Top-3 of its 12 per layer independently. The FFN weights are shared; the routing gate is not. From the outside: 768 possible expert activations per forward pass. 192 active per token (Top-3 × 32 layers × 2 streams). Nine of twelve experts per step are bypassed entirely via @sparseskip.
That number — 768 — does not appear anywhere in v3.0.0. This release fixes that. Every document, every README, every model card, every methodology file has been updated to lead with the correct total.
The Weights Are on HuggingFace. All of Them. Right Now.
The full albert. model weights are publicly available at huggingface.co/rfi-irfos/albert — no gating, no application, no waitlist.
This matters more than it might sound. To our knowledge, no prior publicly available model combines all of the following in a single download:
- Trained entirely from scratch in ternary — weights constrained to {−γ, 0, +γ} throughout both the forward and backward passes via Straight-Through Estimation. No float32 pretraining, no post-hoc quantization, no distillation from a binary model. The ternary constraint was active from epoch 1.
- Full trit-tensor checkpoint — the safetensors file contains the actual ternary weight matrices as they exist in training, not a reconstructed approximation. You can load them, inspect them, and run inference with them as-is.
- Complete training pipeline included — tokenizer, corpus loader, EvolutionManager, STE backward pass, TTL routing, Mycelium expert health monitor, Net2Net surgery logic, Cord Surgery implementation — everything in the repo, buildable with
cargo build --release. - Architecture that grew itself — the model at 32 layers per stream did not start at 32 layers. It started at 12 and grew to 32 through 19 autonomous Net2Net surgery events, each triggered by the model's own Fibonacci plateau gate. The weights reflect this evolutionary history. Every layer was earned, not initialized.
- Full documentation of every architectural event — surgery log, convergence log, evolution evidence, sparseskip benchmark methodology, architecture doc, model card — all committed alongside the weights.
The combination of from-scratch ternary training + published trit weights + complete pipeline + evolutionary growth log + full documentation in one public repository does not, as far as we know, exist anywhere else. We are not making that claim lightly. We are making it because we have looked.
The Architecture, Stated Correctly
| Metric | v3.0.0 (2026-05-28) | v3.1.0 (2026-06-15) |
|---|---|---|
| Layers per stream | 26 | 32 |
| Streams | 2 (dual-stream, Cord Surgery) | 2 (unchanged) |
| Experts per layer | 12 (shared FFN weights) | 12 (unchanged) |
| Total expert-routing slots | 312 (26 × 12 × 2, never stated) | 768 (32 × 12 × 2, now documented everywhere) |
| Active experts per token | 156 (Top-3 × 26L × 2) | 192 (Top-3 × 32L × 2) |
| Expert skip rate (@sparseskip) | 75% | 75% (unchanged) |
| Anastomosis gates | 6 (Fibonacci [2,3,5,8,13,21]) | 6 (unchanged) |
| Parameters | ~224M | ~224M |
| Depth surgeries | 13 (S1–S13) + 1 cord | 19 (S1–S19) + 1 cord |
| Global epoch | ep4234 | ep~6500+ |
| Best EP-AVG ATL | 9.2045 (ep4136, 23L era) | 5.8693 (ep6487) |
| Best chip ATL | 8.6852 (post-S13) | 1.2637 |
| TTL routing rows | 52 (L0–L25 × 2 streams) | 64 (L0–L31 × 2 streams) |
On the 768 number
The Cord Surgery at ep4202 introduced independent per-stream routing gates. This is the architectural decision that makes 768 the right number to quote, not 12. When stream A routes its token through 12 experts at layer 17, stream B is simultaneously routing through a completely separate gate network — same weight matrices, different routing decision. The two streams see the same FFN parameters from a different angle on every single forward pass. This is not a parameter count — it is a capacity and diversity count. 768 is the number of distinct routing paths the model can activate. 192 is the number it actually activates per token. 576 are bypassed, zero-weight, @sparseskip.
The Six New Surgeries (S14–S19)
Every surgery since v3.0.0 was triggered by the EvolutionManager's Fibonacci plateau gate. No operator set a schedule. No layer was injected. Each one fired when the gate opened — when the gradient signal stagnated past the generation-3 patience window — and then training resumed on the deeper architecture.
| Surgery | Epoch | From | To | Date | Notes |
|---|---|---|---|---|---|
| S14 | ~ep4280 | 2×256H · 26L | 2×256H · 27L | 2026-05-29 | First post-S13; Gen3 step 2/6; both streams grow simultaneously |
| S15 | ~ep4350 | 2×256H · 27L | 2×256H · 28L | 2026-05-29 | 58 epochs after S14; continued Gen3 descent |
| S16 | ~ep4740 | 2×256H · 28L | 2×256H · 29L | 2026-05-31 | checkpoint-mtime verified |
| S17 | ep5610 | 2×256H · 29L | 2×256H · 30L | 2026-06-06 21:08Z | checkpoint-mtime verified; 870 epochs after S16 |
| S18 | ep6339 | 2×256H · 30L | 2×256H · 31L | 2026-06-14 | clean descent resumed after billing gap |
| S19 | ~ep6500 | 2×256H · 31L | 2×256H · 32L | 2026-06-15 | current depth; EP-AVG ATL descending |
The gap between S17 (ep5610) and S18 (ep6339) reflects a ~1-week training pause during an infrastructure billing migration. The model resumed exactly where it left off. ATL continued descending within 50 epochs of restart. The weights carry no visible scar from the interruption.
Total since v1.0: 19 Net2Net depth surgeries + 1 Cord Surgery = 20 autonomous architectural events. Every layer in this model was placed by the model's own evolution logic, not by a human.
The Training Descent: 9.2 → 5.87
v3.0.0 closed at EP-AVG ATL 9.2045. Today it is 5.8693. That is a 3.3 nat improvement over 2,300 additional training epochs on the same corpus.
ep4234 → 9.38 (v3.0.0 close, 26L)
ep4740 → S16 fires (28L)
ep5610 → S17 fires (30L)
ep6132 → 6.4339 (prior stated best — superseded)
ep6339 → S18 fires (31L)
ep6478 → 5.9380 (Δ −0.0687 from prior block)
ep6487 → 5.8693 ← new all-time best EP-AVG ATL
ep6500+ → S19 fires (32L); descent continuing
The chip-ATL (best single intra-batch loss): 8.6852 at v3.0.0 close → 1.2637 now. The gradient is negative. The model is not plateauing.
Expert health: dead=0 across the entire post-v3.0.0 run. All 12 experts per layer per stream remain alive and routing. The Mycelium monitor has not had to intervene once.
@sparseskip: 768 Slots, 192 Active, 576 Free
Patent Pending A50296/2026 — TIS platform patent, 10 claims; @sparseskip = Claim 3
At 768 total expert-routing slots and 192 active per token, the @sparseskip skip rate of 75% means 576 expert MLPs are not executed per token. On CPU without any INT8 kernel — pure Rust, raw x86 — this yields 83 tokens/second sustained decode throughput on 2013-era laptop hardware.
The 4.58× speedup over dense execution at 75% sparsity is confirmed by the SPARSESKIP_METHODOLOGY benchmark (200 warmup + 2000 timed iterations, correctness verified to within 1e-4). This is the mechanism that makes 768 routing slots viable without datacenter infrastructure. 768 possible paths, 192 taken, 576 skipped at zero cost.
Documentation Sweep
Every public-facing document now leads with 768, not 12:
- MODEL_CARD.md — architecture table leads with "768 total expert-routing slots"; training state updated to 32L, ep~6500+, best ATL 5.8693
- albert-moe-13/README.md — MoE section opens with 768; surgery log extended with S18 and S19
- ternlang-root/README.md — Training Progress table and Core Research Dimensions updated
- README.md (root) — "Current state" line: 32L dual-stream, 768 slots, 19 surgeries, ep~6500+, ATL 5.8693
- albert-moe-13/models/README.md — architecture row, version history, surgery log all current
- albert-moe-13/docs/architecture.md — MoE Block section opens with 768/192/576 breakdown
- albert-moe-13/docs/SPARSESKIP_METHODOLOGY.md — opening paragraph now states 768 in the first sentence
- albert-moe-13/models/albert_v3.0.config.json —
num_layers: 32 - albert-moe-13/models/albert_v3.0.best_loss — updated to current all-time best
What Is Next
The model is at 32L per stream, Gen3 step 1/6, fib_index=7, window=34. S20 will fire when the plateau gate opens. S20 brings both streams to 33L and total expert-routing slots to 792.
The next release milestone is either S20+ firing or EP-AVG ATL breaking below 5.0 — whichever comes first.
Reproduce / Verify
# Clone and build
git clone https://github.com/rfi-irfos/ternary-intelligence-stack
cd ternary-intelligence-stack
# Run the @sparseskip benchmark
cd albert-moe-13
cargo run --release --bin sparseskip_throughput -p moe-llm-core
# Install the TIS CLI
cargo install ternlang-cli
# Pull the live checkpoint
albert-train pull
# Or download directly from HuggingFace
# huggingface.co/rfi-irfos/albertLive API (free tier at ternlang.com/#licensing):
curl -s https://ternlang-api.fly.dev/api/trit_decide \
-H "Content-Type: application/json" \
-H "X-Ternlang-Key: <your-key>" \
-d '{"statement": "768 routing slots, 192 active, 576 bypassed at zero cost"}' | jq .RFI-IRFOS · Graz, Austria · ZVR 1015608684 · GISA 39261441 · Steuernummer 68 028/0989
Patent Pending A50296/2026 · ternlang.com · rfi.irfos@gmail.com