tl.dot for matrix size 32x8x16 (m-n-k) #3212

Begunner · 2024-02-27T09:18:25Z

May tl.dot support mma 32x8x16 (m-n-k) which is supported by tensor core?

In the process of developing operators with Triton, it's essential to minimize the N dimension of blocks as much as possible, yet the smallest size supported by tl.dot is 16.

I've found a related comment from @jon-chuang . In the link, 32x8x16 mma is supported. May triton support it in later?

It seems reasonable that in this case, Triton would not use mma instructions, but rather ordinary FMA instructions. This, however, appears to be unimplemented. The list of supported sizes is here.

To my understanding, Triton also does not support optimizing other "edge-cases" when it comes to dot perf, for instance tall-and-skinny matmuls.

Originally posted by @jon-chuang in #2266 (comment)

The text was updated successfully, but these errors were encountered:

jlebar · 2024-02-27T17:36:12Z

I don't see a reason not to support this, but like many features in Triton, it may be in a "patches welcome" situation until and unless one of the Triton maintainers needs this feature themselves.

int3 mentioned this issue Apr 24, 2024

tl.dot error when tile sizes < 16 #3709

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tl.dot for matrix size 32x8x16 (m-n-k) #3212

tl.dot for matrix size 32x8x16 (m-n-k) #3212

Begunner commented Feb 27, 2024 •

edited

Loading

jlebar commented Feb 27, 2024

tl.dot for matrix size 32x8x16 (m-n-k) #3212

tl.dot for matrix size 32x8x16 (m-n-k) #3212

Comments

Begunner commented Feb 27, 2024 • edited Loading

jlebar commented Feb 27, 2024

Begunner commented Feb 27, 2024 •

edited

Loading