-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton - Assertion failure: "Unexpected MMA layout version found" #142
Comments
me too.. pascal card. they fsck us. |
Hey @Ph0rk0z thank you for confirming the issue is with our HW. Would you mind leaving a comment on the triton issue I linked above to get the attention of the triton team? Hopefully we can have a solution soon. |
Also getting this on the cuda branch, pascal card here as well |
Thanks for letting us know @C0deMunk33, would you mind also leaving a comment on the triton issue? Thanks again! |
Closing because triton-lang/triton#1505 seems to provide inference on Pascal series GPUs for f32. |
Does it work for you? I tried it and got this:
I will check with the stock implementation. |
Loading models I get this now:
Doesn't appear to be running out of memory on GPU or CPU. |
Has anyone run into this issue? I am currently off of 9463299 on the triton branch.
I found this issue#1271 in the triton repo, but it doesn't seem there is a solution to the issue.
Setup Details
GPU: GTX 1070 Ti
CUDA: 11.8
OS: Win 10 via WSL2
Reproduction
In order to reproduce this I am using the following for my model and my quantized weights:
"tokenizer_class"
to"LlamaTokenizer"
intokenizer_config.json
Python dependencies
Execution
I am running the inference with the following script:
CUDA
I was able to run the same exact model and quantized weights using the cuda branch off of 610fdae.
Everything worked fine, but it just took a long time to load the model and perform inference.
And after reading #82 (comment) I wanted to checkout the triton branch to experience it for myself.
Has anyone run into a similar issue? Or is there someone who can vouch that this works on a 1070 Ti?
The text was updated successfully, but these errors were encountered: