v0.2.0
lattice v0.2.0
Date: 2026-05-16
Previous: v0.1.2
Changelog
New Features
- feat(inference): QuaRot step 4a — strided sliding-window perplexity harness (CPU) — PR #30
- feat(inference): QuaRot step 4b — Metal Q4 perplexity path + dual-Q4 delta CLI — PR #32
- feat(inference): QuaRot step 4c — measured WikiText-2 PPL delta + harness corpus-cap fix — PR #33
Fixes
- fix(inference):
eval_perplexitycorpus silently capped at ~4K tokens (BPE defaultmax_seq_len = 4096) — PR #33 - fix(inference):
from_q4_dirnow refuses both directions of thetie_word_embeddings/lm_head.q4contract mismatch — PR #32 - fix(inference):
from_q4_dirvalidatesmax_cache_len <= max_position_embeddings— PR #32
Internal
- docs(adr-044): QuaRot v0 step 4 fully landed — measured WikiText-2 PPL delta on Qwen3.5-0.8B: unrotated Q4 PPL 25.57, QuaRot Q4 PPL 23.96, delta −1.61 (PASS, < 0.5 gate)
- docs(adr-044): "first publicly available pure-Rust QuaRot" claim hardened with comprehensive ecosystem survey provenance
Highlights
QuaRot v0 is complete. This is (to the best of our knowledge) the first publicly available pure-Rust implementation of QuaRot (Ashkboos et al., NeurIPS 2024) — Hadamard rotation absorbed into weight matrices, RMSNorm fusion, forward-equivalence validated.
Measured on our pipeline (NOT the paper's numbers): WikiText-2 raw test split, Qwen3.5-0.8B, 297,156 tokens, window 512 / stride 256. QuaRot Q4 PPL = 23.96 vs unrotated Q4 PPL = 25.57 — same memory, 1.61 PPL better.
Crates
All 5 crates published to crates.io:
lattice-inferencev0.2.0lattice-embedv0.2.0lattice-fannv0.2.0lattice-tunev0.2.0lattice-transportv0.2.0