Set up benchmark for TritonBench kernels

We'd like to set up benchmark (Helion vs. Triton vs. eager) for TritonBench kernels to keep track of Helion's kernel coverage and performance.

OSS TritonBench ([link](https://github.com/pytorch-labs/tritonbench/tree/main/tritonbench/operators)):
(NOTE: the implemented kernels will show up in `KERNEL_MAPPINGS` in https://github.com/pytorch/helion/blob/main/benchmarks/run.py#L38)

Forward kernels:
- [ ] mixed_gemm (Doable by learning from int4_gemm pattern. Should have one Helion kernel per mixed dtype combo.)
- [ ] fp8_gemm_rowwise
- [ ] fp8_fused_quant_gemm_rowwise (harder)
- [ ] blackwell_attentions (harder)
- [ ] decoding_attention (harder)
- [ ] fp8_gemm_blockwise (harder) (@yf225)
- [ ] template_attention (@Sibylau https://github.com/pytorch/helion/pull/824)
- [ ] bf16xint16_gemm (@karthickai) (https://github.com/pytorch/helion/pull/794)
- [ ] flex_attention (harder) (@v0i0) (https://github.com/pytorch/helion/pull/764)
- [ ] fp8_gemm_rowwise_grouped (@yf225) (https://github.com/pytorch/helion/pull/627)
- [ ] rope (@exclamaforte) (https://github.com/pytorch/helion/pull/472)
- [ ] fused_linear_cross_entropy (@yf225) (https://github.com/pytorch/helion/pull/342) (https://github.com/pytorch/helion/pull/343)
- [x] low_mem_dropout (@karthickai https://github.com/pytorch/helion/pull/641)
- [x] fused_linear_jsd (@v0i0) (https://github.com/pytorch/helion/pull/494)
- [x] jagged_layer_norm (@Sibylau https://github.com/pytorch/helion/pull/704)
- [x] jagged_sum (@Sibylau https://github.com/pytorch/helion/pull/676)
- [x] gather_gemv (@Sibylau https://github.com/pytorch/helion/pull/635)
- [x] grouped_gemm (@yf225) (https://github.com/pytorch/helion/pull/620)
- [x] int4_gemm (@yf225) (https://github.com/pytorch/helion/pull/613)
- [x] kl_div (@Sibylau https://github.com/pytorch/helion/pull/615)
- [x] welford (@karthickai https://github.com/pytorch/helion/pull/614)
- [x] jsd (@Sibylau https://github.com/pytorch/helion/pull/611)
- [x] swiglu (@Sibylau https://github.com/pytorch/helion/pull/584)
- [x] geglu (@Sibylau https://github.com/pytorch/helion/pull/582)
- [x] addmm (@Sibylau https://github.com/pytorch/helion/pull/555)
- [x] ragged_attention (https://github.com/pytorch/helion/pull/527)
- [x] cross_entropy (https://github.com/pytorch/helion/pull/320) (https://github.com/pytorch/helion/pull/321)
- [x] flash_attention (https://github.com/pytorch/helion/pull/284)
- [x] fp8_attention (https://github.com/pytorch/helion/pull/318) (https://github.com/pytorch/helion/pull/319)
- [x] fp8_gemm (https://github.com/pytorch/helion/pull/267) (https://github.com/pytorch/helion/pull/268)
- [x] gemm (https://github.com/pytorch/helion/pull/379) (https://github.com/pytorch/helion/pull/380)
- [x] jagged_mean (https://github.com/pytorch/helion/pull/263) (https://github.com/pytorch/helion/pull/264)
- [x] jagged_softmax (https://github.com/pytorch/helion/pull/480)
- [x] layer_norm (https://github.com/pytorch/helion/pull/170)
- [x] rms_norm (https://github.com/pytorch/helion/pull/252) (https://github.com/pytorch/helion/pull/253)
- [x] softmax (https://github.com/pytorch/helion/pull/286)
- [x] sum (https://github.com/pytorch/helion/pull/256) (https://github.com/pytorch/helion/pull/257)
- [x] vector_add (https://github.com/pytorch/helion/pull/247)
- [x] vector_exp (https://github.com/pytorch/helion/pull/249)
- [x] embedding (https://github.com/pytorch/helion/pull/248)

Backward kernels:

- [ ] embedding backward
- [ ] vector_add backward
- [ ] sum backward
- [ ] jagged_mean backward
- [ ] fp8_gemm backward
- [ ] fp8_attention backward
- [ ] flash_attention backward
- [ ] cross_entropy backward
- [ ] jagged_sum backward
- [ ] int4_gemm backward
- [ ] geglu backward
- [ ] kl_div backward
- [ ] fused_linear_jsd backward
- [ ] jsd backward
- [ ] jagged_softmax backward
- [ ] ragged_attention backward
- [ ] gather_gemv backward
- [ ] welford backward
- [ ] grouped_gemm backward
- [ ] blackwell_attentions backward
- [ ] flex_attention backward
- [ ] fp8_fused_quant_gemm_rowwise backward
- [ ] template_attention backward
- [ ] mixed_gemm backward
- [ ] roi_align backward
- [ ] decoding_attention backward
- [ ] fp8_gemm_rowwise backward
- [ ] softmax backward (@karthickai) (https://github.com/pytorch/helion/pull/744)
- [ ] gemm backward (@tianrengao) (https://github.com/pytorch/helion/pull/748)
- [ ] addmm backward (@tianrengao) (https://github.com/pytorch/helion/pull/748)
- [ ] swiglu backward (@shunting314) (https://github.com/pytorch/helion/pull/756)
- [x] vector_exp backward (@aditvenk) (https://github.com/pytorch/helion/pull/736)
- [x] rms_norm backward (@mengluy0125) (https://github.com/pytorch/helion/pull/597)
- [x] layer_norm backward (@yf225) (https://github.com/pytorch/helion/pull/588)

Meta-internal TritonBench (See [T229696048](https://www.internalfb.com/intern/tasks/?t=229696048)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set up benchmark for TritonBench kernels #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Set up benchmark for TritonBench kernels #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions