Skip to content

ejkernel v0.0.50

Choose a tag to compare

@erfanzar erfanzar released this 01 Jan 16:56
· 115 commits to main since this release

Breaking Changes

  • Renamed attention_sink to softmax_aux across ragged_page_attention_v3 and unified_attention (modules + kernels + tests).
  • Renamed mla_attention to flash_mla in ejkernel.modules / ejkernel.modules.operations.

Added

  • New paged-attention ops with Triton + XLA backends:
    • chunked_prefill_paged_decode (updates block-tabled KV cache + runs unified paged attention).
    • decode_attention (paged decode attention returning output + LSE).
  • XLA backend support for flash_mla.

Changed

  • make_dummy_rpa_inputs now supports padded page tables via total_num_pages for smaller physical caches.
  • Attention-module call signatures now place cfg at the end (chunked_prefill_paged_decode, decode_attention, unified_attention).

Testing

  • Added/expanded Triton↔XLA equivalence tests for chunked_prefill_paged_decode and decode_attention, including shard_map coverage.

Full Changelog: v0.0.47...v0.0.50