ejkernel v0.0.50
Breaking Changes
- Renamed
attention_sinktosoftmax_auxacrossragged_page_attention_v3andunified_attention(modules + kernels + tests). - Renamed
mla_attentiontoflash_mlainejkernel.modules/ejkernel.modules.operations.
Added
- New paged-attention ops with Triton + XLA backends:
chunked_prefill_paged_decode(updates block-tabled KV cache + runs unified paged attention).decode_attention(paged decode attention returning output + LSE).
- XLA backend support for
flash_mla.
Changed
make_dummy_rpa_inputsnow supports padded page tables viatotal_num_pagesfor smaller physical caches.- Attention-module call signatures now place
cfgat the end (chunked_prefill_paged_decode,decode_attention,unified_attention).
Testing
- Added/expanded Triton↔XLA equivalence tests for
chunked_prefill_paged_decodeanddecode_attention, including shard_map coverage.
Full Changelog: v0.0.47...v0.0.50