Releases: ssmall256/natten-mps
Releases · ssmall256/natten-mps
v0.3.2
natten-mps v0.3.2
Performance
- Rewrite AV forward kernels with K-outer/D-inner loop order and SIMD accumulation — ~2-3x speedup on split-path AV operations
- Rewrite d_query backward kernels using CSR inverse-map variants for significantly faster gradient computation
- Support asymmetric head dimensions (
dim_q != dim_v) in split QK/AV paths
Cleanup
- Remove 12 dead backward kernels replaced by the new CSR inverse-map variants
- Skip Metal-dependent tests when Metal backend is unavailable (CI fix)
Install
pip install natten-mps==0.3.2v0.3.0
natten-mps v0.3.0
Neighborhood Attention for Apple Silicon — PyTorch MPS backend.
Highlights
- 1D, 2D, and 3D neighborhood attention with fused and split-QK/AV paths
- Metal compute shaders via
torch.mps.compile_shader— no native extension build step - Variable-length (varlen) attention support
- GQA / MQA, causal masking, dilation, additional KV
- Automatic backend selection (metal → pure fallback)
- CPU + MPS support; float16 / bfloat16 / float32
Install
pip install natten-mpsRequires PyTorch ≥ 2.8.0 and Apple Silicon.