-
Notifications
You must be signed in to change notification settings - Fork 38
Closed
Labels
matmulmatmul / gemm / mm / bmm / tl.dot / hl.dot related issuesmatmul / gemm / mm / bmm / tl.dot / hl.dot related issuesptc2025
Description
Is your feature request related to a problem? Please describe.
Correct accumulator precision is one of the primary footguns in e.g. attention kernel writing (e.g. i think the current helion kernel does not get this right for attention).
Describe the solution you'd like
Currently, helion provides hl.dot with an out kwarg to determine the output data type & what to accumulate into.
It would be nice to have an out_dtype kwarg in addition (which is in line with tl.dot) to not have create an accumulator to get f32 accs out (which is what you want in a kernel 99% of the time).
Describe alternatives you've considered
you could just create accs and pass to out, it's just more work
Metadata
Metadata
Assignees
Labels
matmulmatmul / gemm / mm / bmm / tl.dot / hl.dot related issuesmatmul / gemm / mm / bmm / tl.dot / hl.dot related issuesptc2025