Skip to content

v0.3.2

Latest

Choose a tag to compare

@ssmall256 ssmall256 released this 11 Jun 18:31
· 1 commit to main since this release

natten-mps v0.3.2

Performance

  • Rewrite AV forward kernels with K-outer/D-inner loop order and SIMD accumulation — ~2-3x speedup on split-path AV operations
  • Rewrite d_query backward kernels using CSR inverse-map variants for significantly faster gradient computation
  • Support asymmetric head dimensions (dim_q != dim_v) in split QK/AV paths

Cleanup

  • Remove 12 dead backward kernels replaced by the new CSR inverse-map variants
  • Skip Metal-dependent tests when Metal backend is unavailable (CI fix)

Install

pip install natten-mps==0.3.2