Triton - Assertion failure: "Unexpected MMA layout version found" #142

clxyder · 2023-04-08T14:36:37Z

Has anyone run into this issue? I am currently off of 9463299 on the triton branch.

python: /project/lib/Analysis/Utility.cpp:136: bool mlir::supportMMA(mlir::Value, int): Assertion `(version == 1 || version == 2) && "Unexpected MMA layout version found"' failed.
Aborted

I found this issue#1271 in the triton repo, but it doesn't seem there is a solution to the issue.

Setup Details

GPU: GTX 1070 Ti
CUDA: 11.8
OS: Win 10 via WSL2

Reproduction

In order to reproduce this I am using the following for my model and my quantized weights:

decapoda-research/llama-7b-hf
- I renamed "tokenizer_class" to "LlamaTokenizer" in tokenizer_config.json
sardukar/llama7b-4bit-v2

Python dependencies

safetensors==0.3.0
datasets==2.10.1
sentencepiece==0.1.97
transformers==4.28.0.dev0
accelerate==0.17.1
triton==2.0.0
torch==2.0.0+cu118
protobuf==3.20.3

Execution

I am running the inference with the following script:

models_dir="/repos/language-models/models"
llama_7b_hf="${models_dir}/llama-7b-hf"
llama_4bit="${models_dir}/llama7b-4bit-v2/llama7b-4bit-ts-ao-g128-v2.safetensors"
prompt="Building a website can be done in 10 simple steps:"

CUDA_VISIBLE_DEVICES=0 python llama_inference.py "${llama_7b_hf}" --wbits 4 --groupsize 128 --load "${llama_4bit}" --text "${prompt}" --max_length 512

CUDA

I was able to run the same exact model and quantized weights using the cuda branch off of 610fdae.

Everything worked fine, but it just took a long time to load the model and perform inference.

And after reading #82 (comment) I wanted to checkout the triton branch to experience it for myself.

Has anyone run into a similar issue? Or is there someone who can vouch that this works on a 1070 Ti?

The text was updated successfully, but these errors were encountered:

Ph0rk0z · 2023-04-08T19:39:52Z

me too.. pascal card. they fsck us.

clxyder · 2023-04-09T22:16:20Z

Hey @Ph0rk0z thank you for confirming the issue is with our HW.

Would you mind leaving a comment on the triton issue I linked above to get the attention of the triton team?

Hopefully we can have a solution soon.

C0deMunk33 · 2023-04-11T01:01:45Z

Also getting this on the cuda branch, pascal card here as well

clxyder · 2023-04-11T01:33:02Z

Thanks for letting us know @C0deMunk33, would you mind also leaving a comment on the triton issue?

Thanks again!

clxyder · 2023-04-13T02:43:18Z

Closing because triton-lang/triton#1505 seems to provide inference on Pascal series GPUs for f32.

Ph0rk0z · 2023-04-14T00:32:22Z

Does it work for you? I tried it and got this:

error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
Aborted

I will check with the stock implementation.

Ph0rk0z · 2023-04-14T00:56:30Z

Loading models I get this now:

File “/home/mint/text-generation-webui/repositories/GPTQ-for-LLaMa/custom_autotune.py”, line 72, in _bench
except triton.compiler.OutOfResources:
AttributeError: module ‘triton.compiler’ has no attribute ‘OutOfResources’

Doesn't appear to be running out of memory on GPU or CPU.

clxyder mentioned this issue Apr 11, 2023

Impossible to use the tutorials triton-lang/triton#1271

Open

clxyder closed this as completed Apr 13, 2023

DragonLiu1995 mentioned this issue Apr 14, 2023

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems #174

Open

ajz34 mentioned this issue Apr 25, 2023

错误：Unexpected MMA layout version found OpenMOSS/MOSS#149

Open

boystray mentioned this issue May 26, 2023

P40使用moss-moon-003-sft-int4/量化模型推理报错 OpenMOSS/MOSS#302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton - Assertion failure: "Unexpected MMA layout version found" #142

Triton - Assertion failure: "Unexpected MMA layout version found" #142

clxyder commented Apr 8, 2023 •

edited

Loading

Ph0rk0z commented Apr 8, 2023

clxyder commented Apr 9, 2023

C0deMunk33 commented Apr 11, 2023

clxyder commented Apr 11, 2023

clxyder commented Apr 13, 2023

Ph0rk0z commented Apr 14, 2023

Ph0rk0z commented Apr 14, 2023

Triton - Assertion failure: "Unexpected MMA layout version found" #142

Triton - Assertion failure: "Unexpected MMA layout version found" #142

Comments

clxyder commented Apr 8, 2023 • edited Loading

Setup Details

Reproduction

Python dependencies

Execution

CUDA

Ph0rk0z commented Apr 8, 2023

clxyder commented Apr 9, 2023

C0deMunk33 commented Apr 11, 2023

clxyder commented Apr 11, 2023

clxyder commented Apr 13, 2023

Ph0rk0z commented Apr 14, 2023

Ph0rk0z commented Apr 14, 2023

clxyder commented Apr 8, 2023 •

edited

Loading