v0.1.2

ohdearquant released this 14 May 23:18

· 131 commits to main since this release

fd8971d

New Features

Gated attention (G1 SDPA-Output) — SIMD-accelerated output gating for full-attention layers (ADR-040) — #4
Differential attention — noise-cancelling dual-softmax attention with learnable λ (ADR-041) — #5
Native sparse attention — block-sparse + sliding-window attention with configurable patterns (ADR-042) — #6
LoRA serving CI tests — spy/delta hook tests verify all 12 adapted projections per layer fire correctly (ADR-043) — #7
Qwen3.5-0.8B support — config preset, fixture test, default model for both generation binaries — #7

Fixes

partial_rotary_factor parsing — now explicitly extracted from nested rope_parameters in HF config.json instead of relying on a serde default coincidence — #7

Internal

ADR INDEX updated through ADR-043
generate_lora and qwen35_generate binaries accept --model-dir, --model, --temperature flags

Crates

All 5 crates at 0.1.2: lattice-inference, lattice-embed, lattice-fann, lattice-tune, lattice-transport

Assets 2