Release v0.2.5 — Embed Quality + Tokenizer Parity + HF Regression Gate · ohdearquant/lattice

Highlights

Embed quality + tokenizer parity + permanent HF regression gate. This release closes the embedding quality gap surfaced by the embed-perf-quality show. Four production embedding models (BGE-small, multilingual-E5-small, all-MiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2) now match HuggingFace reference at cosine ≥ 0.9998 end-to-end across 5 measured inputs.

Tokenizer parity was lifted from 8/28 to 32/32 audit cases across all three tokenizer families. Embedding service gains role-aware prompts + cache key distinguishment + per-model pooling routing. A new HF-reference parity regression test is wired into make ci so future tokenization or forward-pass divergence is caught automatically.

Tokenizer parity (8/28 → 32/32 cases)

SentencePiece BOS/EOS injection from tokenizer.json post_processor TemplateProcessing (E5 family). Template IDs are authoritative when present; falls back to vocab-name guesses only when the template omits explicit IDs.
Qwen BPE EOS injection (id 151643) + GPT-4-style pre-tokenizer regex alignment (URL/punctuation edge cases).
WordPiece AddedToken metadata preservation + longest-match scan for special tokens appearing literally in input text.
WordPiece CJK character splitting via Hiragana/Katakana NFD voicing fold (58 entries). Mirrors HuggingFace BertNormalizer NFD + Mn-stripping for canonical kana. Closes the BGE Japanese UNK bug.
SentencePiece trailing-whitespace Metaspace handling. The normalize() loop no longer emits a trailing ▁ for whitespace-trailing input — matches HF's "▁ is the space before a word" semantics. Closes the E5 leading/trailing-ws regression.

Embed service

Role-aware prompts: embed_query() / embed_passage() apply E5 "passage: " and Qwen task-instruction prefixes before forward. embed() retains backwards-compatible Generic semantics.
Role-aware cache keys: EmbeddingRole { Query | Passage | Generic } distinguished via Blake3 hash tag injection. Identical text in different roles no longer collides in CachedEmbeddingService.
Per-model pooling: BertPooling { Mean | CLS } enum on BertModel. BGE-{small,base,large}-en-v1.5 → CLS; E5/MiniLM/paraphrase → Mean; Qwen → LastToken (already routed via QwenModel). L2 normalization stays post-pool for all paths.

Permanent HF parity regression gate

scripts/gen_embed_parity_goldens.py — one-shot HuggingFace golden generator (uv run --with transformers --with torch --with numpy --with sentencepiece).
crates/embed/tests/embed_parity_vs_hf.rs — Rust integration test that loads committed JSON fixtures + computes lattice embeddings + asserts cosine + max-abs-diff against HF reference.
Committed fixtures for 5 models × 5 inputs in crates/embed/tests/fixtures/embed_parity_v1/.
Wired into make ci via scripts/ci.sh.

Model	Min cosine vs HF (5 inputs)
`BAAI/bge-small-en-v1.5`	0.999868
`intfloat/multilingual-e5-small`	0.999937
`sentence-transformers/all-MiniLM-L6-v2`	0.999899
`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`	0.999875
`Qwen/Qwen3-Embedding-0.6B`	`#[ignore]` — forward-pass divergence under investigation, see #103

Bench-compare

No regression in untouched SIMD/forward kernels. The show touched only tokenizer code + BertPooling enum (single match-branch in BertModel::encode); SIMD code unchanged. Bench-compare against v0.2.4 was clean once measurement artifacts from cross-worktree builds were isolated.

Crates Published

lattice-inference 0.2.5
lattice-embed 0.2.5
lattice-fann 0.2.5
lattice-tune 0.2.5
lattice-transport 0.2.5

Follow-ups (deferred to subsequent releases)

#102 — SIMD throughput (quantization amortization, simsimd_comparison bench restore, NEON normalize target)
#103 — Qwen3-Embedding forward-pass divergence (0.948 cosine on whitespace input, 0.989 on tokens-match input)
#116 — Codex-review follow-ups from the show stack (edge cases the parity gate doesn't cover: AddedToken structured-field enforcement, Qwen \s+(?!\S) whitespace regex, WordPiece NFD combining-mark stripping, SentencePiece literal-▁ distinction)

Diff Stats

Across 14 commits merged via #104 + 1 commit via #117:

crates/inference/src/tokenizer/{bpe,common,sentencepiece,wordpiece}.rs — tokenizer fixes
crates/inference/src/{lib.rs, model/bert.rs, pool.rs} — BertPooling enum + branch
crates/embed/src/{cache.rs, lib.rs, model.rs} — EmbeddingRole + role-aware cache keys
crates/embed/src/service/{cached.rs, mod.rs, native.rs, tests.rs} — service-level role threading + tests
crates/embed/tests/{embed_parity_vs_hf.rs, tokenizer_parity_e2e.rs, fixtures/embed_parity_v1/*.json} — regression gate
crates/inference/tests/audit_tokenizer_parity.rs — 32 audit cases (28 original + 4 added in fixes)
scripts/gen_embed_parity_goldens.py, scripts/ci.sh — generator + CI wiring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.5 — Embed Quality + Tokenizer Parity + HF Regression Gate

Choose a tag to compare

Sorry, something went wrong.