Release v0.2.12 · flashinfer-ai/flashinfer

What's Changed

Fix TRTLLM NVFP4-out attention kernel scale factor dim issue by @elvischenv in #1460
perf: add fast path to TopPRenormProbKernel for top_p >= 1.0, significantly boosting SGLang workloads by @TianyuZhang1214 in #1483
fix: update cutedsl masked moe gemm by @yyihuang in #1488
feat: Support fp8 qkv, fp16/bf16 out MHA for trtllm-gen. by @weireweire in #1490
Add errors when dtype is anything other than int32 for ptr metatdata by @pavanimajety in #1492
refactor: unify autotuner for bmm_fp8 by @ttyio in #1479
fix: update masked moe gemm fp4 tensor reshape by @yyihuang in #1495
Revert "feat: Support fp8 qkv, fp16/bf16 out MHA for trtllm-gen. (#1490) by @yzh119 in #1496
fix(aot): unused compute in has_sm by @fecet in #1501
fix: Replace cub Max/Min with cuda::maximum/minimum for cuda 13 compatibility by @yongwww in #1500
doc: Update the masked grouped gemm doc by @kaixih in #1499
Perf: support scale_a/scale_b instead of combined scale in cutlass bmm_fp8 by @ttyio in #1491
feat: scaling at fp4 gemm epilogue by @yyihuang in #1498
Add benchmark for cutedsl gemm by @fzyzcjy in #1502
Do not import NVSHMEM in the AoT script unless explicitly requested by @nandor in #1506
bugfix: Fix stream handling in cutedsl gemm by @fzyzcjy in #1509
bump version to v0.2.12 by @yongwww in #1510

Full Changelog: v0.2.11.post3...v0.2.12