v0.1.2
New Features
- Gated attention (G1 SDPA-Output) — SIMD-accelerated output gating for full-attention layers (ADR-040) — #4
- Differential attention — noise-cancelling dual-softmax attention with learnable λ (ADR-041) — #5
- Native sparse attention — block-sparse + sliding-window attention with configurable patterns (ADR-042) — #6
- LoRA serving CI tests — spy/delta hook tests verify all 12 adapted projections per layer fire correctly (ADR-043) — #7
- Qwen3.5-0.8B support — config preset, fixture test, default model for both generation binaries — #7
Fixes
partial_rotary_factorparsing — now explicitly extracted from nestedrope_parametersin HF config.json instead of relying on a serde default coincidence — #7
Internal
- ADR INDEX updated through ADR-043
generate_loraandqwen35_generatebinaries accept--model-dir,--model,--temperatureflags
Crates
All 5 crates at 0.1.2: lattice-inference, lattice-embed, lattice-fann, lattice-tune, lattice-transport