Summary
Make quant.h the canonical source and auto-generate src/engine/*.c from it, eliminating the sync-divergence bug class that caused #77 (SmolLM2 numerical instability in libturboquant).
Problem
quant.h and src/engine/*.c implement the same forward pass independently. They have diverged:
- quant.h: SmolLM2 works (23 tok/s), Phi-3.5 works (6.5 tok/s)
- libturboquant: SmolLM2 produces garbage (layer 7 max=18,359), Phi-3.5 crashes
Every new architecture (Phi-3, Gemma-4, Qwen3) requires manual porting between the two codebases. This is the #1 source of bugs.
Proposed Solution
quant.h (single source of truth)
↓ tools/split_header.py (automated)
src/engine/tq_transformer.c ← auto-generated
src/engine/tq_generate.c ← auto-generated
src/engine/tq_model.c ← auto-generated
Implementation
-
Add section markers to quant.h:
// --- SECTION: transformer ---
float* tq_forward(tq_model_t* model, ...) { ... }
// --- END SECTION ---
-
Write tools/split_header.py that extracts sections into .c files
-
CI check: python tools/split_header.py && git diff --exit-code src/engine/
Precedent
- SQLite:
sqlite3.c amalgamation is the canonical source
- stb libraries: single-header is the distribution format
Impact
Priority: P1
Root-cause analysis from ClawTeam Claw-5 (Researcher) investigation
Summary
Make
quant.hthe canonical source and auto-generatesrc/engine/*.cfrom it, eliminating the sync-divergence bug class that caused #77 (SmolLM2 numerical instability in libturboquant).Problem
quant.handsrc/engine/*.cimplement the same forward pass independently. They have diverged:Every new architecture (Phi-3, Gemma-4, Qwen3) requires manual porting between the two codebases. This is the #1 source of bugs.
Proposed Solution
Implementation
Add section markers to
quant.h:Write
tools/split_header.pythat extracts sections into.cfilesCI check:
python tools/split_header.py && git diff --exit-code src/engine/Precedent
sqlite3.camalgamation is the canonical sourceImpact
Priority: P1
Root-cause analysis from ClawTeam Claw-5 (Researcher) investigation