-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Labels
Description
We'd like to set up benchmark (Helion vs. Triton vs. eager) for TritonBench kernels to keep track of Helion's kernel coverage and performance.
OSS TritonBench (link):
(NOTE: the implemented kernels will show up in KERNEL_MAPPINGS
in https://github.com/pytorch/helion/blob/main/benchmarks/run.py#L38)
Forward kernels:
- mixed_gemm (Doable by learning from int4_gemm pattern. Should have one Helion kernel per mixed dtype combo.)
- fp8_gemm_rowwise
- fp8_fused_quant_gemm_rowwise (harder)
- blackwell_attentions (harder)
- decoding_attention (harder)
- fp8_gemm_blockwise (harder) (@yf225)
- template_attention (@Sibylau [Benchmark] template attention kernel and test #824)
- bf16xint16_gemm (@karthickai) ([Benchmark] bf16 x int16 helion kernel #794)
- flex_attention (harder) (@v0i0) ([example] flex attention #764)
- fp8_gemm_rowwise_grouped (@yf225) ([WIP] fp8_gemm_rowwise_grouped kernel #627)
- rope (@exclamaforte) (WIP Rope implementation #472)
- fused_linear_cross_entropy (@yf225) ([Example] Add fused_linear_cross_entropy example and unit test #342) ([Benchmark] Add fused_linear_cross_entropy to tritonbench integration #343)
- low_mem_dropout (@karthickai [Benchmark] Add low mem dropout example #641)
- fused_linear_jsd (@v0i0) ([example] fused_linear_jsd #494)
- jagged_layer_norm (@Sibylau [Benchmark] jagged_layer_norm kernel and test #704)
- jagged_sum (@Sibylau [Benchmark] jagged_sum kernel and test #676)
- gather_gemv (@Sibylau [Benchmark] gather_gemv kernel and test #635)
- grouped_gemm (@yf225) ([Example] grouped_gemm kernel example and tritonbench integration #620)
- int4_gemm (@yf225) ([Example] int4_gemm kernel example and tritonbench integration #613)
- kl_div (@Sibylau [Benchmark] kl_div kernel and test #615)
- welford (@karthickai [Benchmark] Welford kernel and example #614)
- jsd (@Sibylau [Benchmark] jsd kernel and test #611)
- swiglu (@Sibylau [Benchmark] swiglu example and test #584)
- geglu (@Sibylau [Benchmark] geglu example and test #582)
- addmm (@Sibylau [Benchmark] add addmm example and test #555)
- ragged_attention (Add jagged hstu attention example (i.e. ragged_attention) #527)
- cross_entropy (Add cross_entropy example and unit test #320) ([Benchmark] Add cross_entropy to tritonbench integration #321)
- flash_attention ([Benchmark] Add attention tritonbench integration #284)
- fp8_attention (Add fp8_attention example and unit test #318) ([Benchmark] Add fp8_attention to tritonbench integration #319)
- fp8_gemm (Add fp8_gemm example and test #267) ([Benchmark] Add fp8_gemm to TritonBench integration #268)
- gemm ([Examples] Add matmul variants with bias support and tests #379) ([Benchmark] Support kernel variants; setup matmul tritonbench integration #380)
- jagged_mean (Add jagged_mean example #263) ([Benchmark] Add jagged_mean tritonbench integration #264)
- jagged_softmax ([example] add jagged_softmax example #480)
- layer_norm ([Example] Layer Norm Forward #170)
- rms_norm (Add rms_norm example and test #252) ([Benchmark] Add rms_norm benchmark #253)
- softmax ([Benchmark] Add softmax tritonbench integration #286)
- sum (Add sum example and test #256) ([Benchmark] Add sum to TritonBench integration #257)
- vector_add ([Benchmark] Add initial TritonBench integration and vector_add benchmark example #247)
- vector_exp ([Benchmark] Add vector_exp benchmark #249)
- embedding ([Benchmark] Add embedding benchmark #248)
Backward kernels:
- embedding backward
- vector_add backward
- sum backward
- jagged_mean backward
- fp8_gemm backward
- fp8_attention backward
- flash_attention backward
- cross_entropy backward
- jagged_sum backward
- int4_gemm backward
- geglu backward
- kl_div backward
- fused_linear_jsd backward
- jsd backward
- jagged_softmax backward
- ragged_attention backward
- gather_gemv backward
- welford backward
- grouped_gemm backward
- blackwell_attentions backward
- flex_attention backward
- fp8_fused_quant_gemm_rowwise backward
- template_attention backward
- mixed_gemm backward
- roi_align backward
- decoding_attention backward
- fp8_gemm_rowwise backward
- softmax backward (@karthickai) (Add backward pass for softmax kernel #744)
- gemm backward (@tianrengao) (Add matmul/addmm bwd examples and add test coverage #748)
- addmm backward (@tianrengao) (Add matmul/addmm bwd examples and add test coverage #748)
- swiglu backward (@shunting314) ([helion] backward support for swiglu #756)
- vector_exp backward (@aditvenk) (Add backward kernel for exp #736)
- rms_norm backward (@mengluy0125) (Add rms_norm backward kernels #597)
- layer_norm backward (@yf225) (Add layer_norm backward kernels #588)
Meta-internal TritonBench (See T229696048).