ejkernel v0.0.50

erfanzar released this 01 Jan 16:56

· 115 commits to main since this release

1c37afd

Breaking Changes

Renamed attention_sink to softmax_aux across ragged_page_attention_v3 and unified_attention (modules + kernels + tests).
Renamed mla_attention to flash_mla in ejkernel.modules / ejkernel.modules.operations.

Added

New paged-attention ops with Triton + XLA backends:
- chunked_prefill_paged_decode (updates block-tabled KV cache + runs unified paged attention).
- decode_attention (paged decode attention returning output + LSE).
XLA backend support for flash_mla.

Changed

make_dummy_rpa_inputs now supports padded page tables via total_num_pages for smaller physical caches.
Attention-module call signatures now place cfg at the end (chunked_prefill_paged_decode, decode_attention, unified_attention).

Testing

Added/expanded Triton↔XLA equivalence tests for chunked_prefill_paged_decode and decode_attention, including shard_map coverage.

Full Changelog: v0.0.47...v0.0.50

Assets 2