Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tl.dot for matrix size 32x8x16 (m-n-k) #3212

Open
Begunner opened this issue Feb 27, 2024 · 1 comment
Open

tl.dot for matrix size 32x8x16 (m-n-k) #3212

Begunner opened this issue Feb 27, 2024 · 1 comment

Comments

@Begunner
Copy link

Begunner commented Feb 27, 2024

May tl.dot support mma 32x8x16 (m-n-k) which is supported by tensor core?

In the process of developing operators with Triton, it's essential to minimize the N dimension of blocks as much as possible, yet the smallest size supported by tl.dot is 16.

I've found a related comment from @jon-chuang . In the link, 32x8x16 mma is supported. May triton support it in later?

It seems reasonable that in this case, Triton would not use mma instructions, but rather ordinary FMA instructions. This, however, appears to be unimplemented. The list of supported sizes is here.

To my understanding, Triton also does not support optimizing other "edge-cases" when it comes to dot perf, for instance tall-and-skinny matmuls.

Originally posted by @jon-chuang in #2266 (comment)

@jlebar
Copy link
Collaborator

jlebar commented Feb 27, 2024

I don't see a reason not to support this, but like many features in Triton, it may be in a "patches welcome" situation until and unless one of the Triton maintainers needs this feature themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants