Skip to content

v0.2.0

Choose a tag to compare

@ohdearquant ohdearquant released this 16 May 05:41
· 117 commits to main since this release
1dbe19b

lattice v0.2.0

Date: 2026-05-16
Previous: v0.1.2

Changelog

New Features

  • feat(inference): QuaRot step 4a — strided sliding-window perplexity harness (CPU) — PR #30
  • feat(inference): QuaRot step 4b — Metal Q4 perplexity path + dual-Q4 delta CLI — PR #32
  • feat(inference): QuaRot step 4c — measured WikiText-2 PPL delta + harness corpus-cap fix — PR #33

Fixes

  • fix(inference): eval_perplexity corpus silently capped at ~4K tokens (BPE default max_seq_len = 4096) — PR #33
  • fix(inference): from_q4_dir now refuses both directions of the tie_word_embeddings/lm_head.q4 contract mismatch — PR #32
  • fix(inference): from_q4_dir validates max_cache_len <= max_position_embeddings — PR #32

Internal

  • docs(adr-044): QuaRot v0 step 4 fully landed — measured WikiText-2 PPL delta on Qwen3.5-0.8B: unrotated Q4 PPL 25.57, QuaRot Q4 PPL 23.96, delta −1.61 (PASS, < 0.5 gate)
  • docs(adr-044): "first publicly available pure-Rust QuaRot" claim hardened with comprehensive ecosystem survey provenance

Highlights

QuaRot v0 is complete. This is (to the best of our knowledge) the first publicly available pure-Rust implementation of QuaRot (Ashkboos et al., NeurIPS 2024) — Hadamard rotation absorbed into weight matrices, RMSNorm fusion, forward-equivalence validated.

Measured on our pipeline (NOT the paper's numbers): WikiText-2 raw test split, Qwen3.5-0.8B, 297,156 tokens, window 512 / stride 256. QuaRot Q4 PPL = 23.96 vs unrotated Q4 PPL = 25.57 — same memory, 1.61 PPL better.

Crates

All 5 crates published to crates.io: