v0.2.0b1 — SIKE validation & MoE decoder
Highlights
This release establishes scale-invariant retrieval across model sizes from 0.6B to 8B parameters, validated by the SIKE benchmark. Retrieval is no longer the bottleneck — it's consistent at 10/10 across all tested models.
SIKE Benchmark Results (q4_0 KV cache)
| Model | Retrieval | Accuracy | Notes |
|---|---|---|---|
| qwen3:0.6b | 10/10 | 2/10 | Parameter floor — retrieval works, model can't use it |
| qwen3:1.7b | 10/10 | 3/10 | |
| qwen3:4b | 10/10 | 9/10 | Sweet spot — 2.5GB VRAM |
| gemma4:e4b | 10/10 | 9/10 | MoE decoder enabled |
| qwen3:8b | 10/10 | 9/10 |
MoE-aware decoder
- Front-loads KV answer slate in first 200 tokens for SWA (sliding-window attention) models
- Relevance-first gene ordering for MoE/small models (vs sequence_index for dense)
- Automatic activation via
MOE_MODEL_FAMILIES = (\"gemma4\",) - gemma4:e4b jumped from 5/10 → 9/10 accuracy with slate enabled
Per-request model detection
- Server reads
body[\"model\"]and adapts expression strategy per request _should_use_slate()gates on downstream model name + param countSMALL_MODEL_THRESHOLD_B = 3.2— excludes qwen3:4b which works without slate
Think-mode suppression for sub-3.2B models
- Small models' reasoning loops consume the entire output budget without producing answers
- Injects
/no_thinkprefix and setstemperature=0for Qwen3 sub-3.2B - q8_0 tested: worse than q4_0 (think mode gets more rope to hang itself)
Storage & operations
- New
Genome.vacuum()method +/admin/vacuumendpoint (752 MB → 523 MB, -30.4%) - Clear documentation distinguishing checkpoint / refresh / compact / vacuum operations
- README refresh with badges, TOC, glossary, sample output
- Test corpus composition breakdown with public/private repo split
Cumulative changes since v0.1.0b2
- MoE-aware decoder with answer slate + relevance-first ordering
- SIKE benchmark validation across 5 model scales
- Per-request downstream model detection
- Think suppression for sub-3.2B models
- Genome.vacuum() + storage optimizations
- README overhaul + SIKE benchmark docs
All 179 tests passing.
🤖 Generated with Claude Code