Skip to content

Releases: ssmall256/natten-mps

v0.3.2

11 Jun 18:31

Choose a tag to compare

natten-mps v0.3.2

Performance

  • Rewrite AV forward kernels with K-outer/D-inner loop order and SIMD accumulation — ~2-3x speedup on split-path AV operations
  • Rewrite d_query backward kernels using CSR inverse-map variants for significantly faster gradient computation
  • Support asymmetric head dimensions (dim_q != dim_v) in split QK/AV paths

Cleanup

  • Remove 12 dead backward kernels replaced by the new CSR inverse-map variants
  • Skip Metal-dependent tests when Metal backend is unavailable (CI fix)

Install

pip install natten-mps==0.3.2

v0.3.0

27 Feb 14:45

Choose a tag to compare

natten-mps v0.3.0

Neighborhood Attention for Apple Silicon — PyTorch MPS backend.

Highlights

  • 1D, 2D, and 3D neighborhood attention with fused and split-QK/AV paths
  • Metal compute shaders via torch.mps.compile_shader — no native extension build step
  • Variable-length (varlen) attention support
  • GQA / MQA, causal masking, dilation, additional KV
  • Automatic backend selection (metal → pure fallback)
  • CPU + MPS support; float16 / bfloat16 / float32

Install

pip install natten-mps

Requires PyTorch ≥ 2.8.0 and Apple Silicon.