Skip to content

v0.1.2

Choose a tag to compare

@ohdearquant ohdearquant released this 14 May 23:18
· 131 commits to main since this release
fd8971d

New Features

  • Gated attention (G1 SDPA-Output) — SIMD-accelerated output gating for full-attention layers (ADR-040) — #4
  • Differential attention — noise-cancelling dual-softmax attention with learnable λ (ADR-041) — #5
  • Native sparse attention — block-sparse + sliding-window attention with configurable patterns (ADR-042) — #6
  • LoRA serving CI tests — spy/delta hook tests verify all 12 adapted projections per layer fire correctly (ADR-043) — #7
  • Qwen3.5-0.8B support — config preset, fixture test, default model for both generation binaries — #7

Fixes

  • partial_rotary_factor parsing — now explicitly extracted from nested rope_parameters in HF config.json instead of relying on a serde default coincidence — #7

Internal

  • ADR INDEX updated through ADR-043
  • generate_lora and qwen35_generate binaries accept --model-dir, --model, --temperature flags

Crates

All 5 crates at 0.1.2: lattice-inference, lattice-embed, lattice-fann, lattice-tune, lattice-transport