natten-mps v0.3.2
Performance
- Rewrite AV forward kernels with K-outer/D-inner loop order and SIMD accumulation — ~2-3x speedup on split-path AV operations
- Rewrite d_query backward kernels using CSR inverse-map variants for significantly faster gradient computation
- Support asymmetric head dimensions (
dim_q != dim_v) in split QK/AV paths
Cleanup
- Remove 12 dead backward kernels replaced by the new CSR inverse-map variants
- Skip Metal-dependent tests when Metal backend is unavailable (CI fix)
Install
pip install natten-mps==0.3.2