Do not run AccelerateMatmul on pre-Volta GPUs #1505

geekypathak21 · 2023-04-11T08:44:07Z

Related to #1271 . I am currently working on adding support for Pre-volta GPUs in Triton.

geekypathak21 · 2023-04-11T08:51:29Z

Want to ask one question in convertFMADot() we are using LLVM::FMulAddOp which requires the same type for all operands and results and while doing passes we convert our result tensor to tensor<64x64xf32, #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>> which is of type f32 even when we give result tensor to be of float16 type is there any specific reason for this ? So basically this PR will give support only for float32 types currently for dot operation. If we try to pass float16 type it will give error.

Jokeren · 2023-04-11T16:17:05Z

So basically this PR will give support only for float32 types currently for dot operation. If we try to pass float16 type it will give error.

What errors did you see?

geekypathak21 · 2023-04-11T16:27:53Z

What errors did you see?

error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
[1]    139783 abort (core dumped)  python3 test_files1.py

@Jokeren got this error when passing these tensors.

 a = torch.randn((64, 16), device="cuda", dtype=torch.float16)
 b = torch.randn((16, 64), device="cuda", dtype=torch.float16)
 c = torch.empty((64, 64), device="cuda", dtype=torch.float16)

I tried to dump all the tensor types the result tensor I got of different type is.

tensor<64x64xf32, #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>>

When tried with float32 it worked successfully.

lib/Analysis/Utility.cpp

geekypathak21 · 2023-04-11T17:26:40Z

lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp

@@ -147,8 +149,6 @@ class BlockedToMMA : public mlir::RewritePattern {
      mmaEnc = triton::gpu::MmaEncodingAttr::get(
          oldRetType.getContext(), versionMajor, 0 /*versionMinor*/,
          warpsPerTile);
-    } else {
-      llvm_unreachable("Mma layout only supports versionMajor in {1, 2}");


Due to assertion will never reach here. So I think no point in adding it back.

Ph0rk0z · 2023-04-14T00:31:13Z

I got this kind of error:

error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
Aborted

Kraegge · 2023-04-15T09:47:02Z

I get this error running it on a Tesla K80 (Kepler architecture, compute capability 3.7).

error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.

geekypathak21 · 2023-04-15T09:48:49Z

@Ph0rk0z @Kraegge I am working on this bug also will update this PR when done.

Ph0rk0z · 2023-04-15T12:05:17Z

Ok. i was eager to try triton and see what kind of speeds I get on my pascal card. I saw that GPTQ closed their bugs and assumed the best :)

geekypathak21

Hey @ptillet I have tried to add support float16 but it required some discussion before moving further so just added a warning in this commit.

geekypathak21 · 2023-04-17T19:22:49Z

lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp

+
+    if (computeCapability < 70) {
+      if (oldAType.getElementType().isF16()) {
+        llvm_unreachable("Float16 type is not supported with computeCapability "


Lot of people were facing this issue so added a error statement here.

geekypathak21 · 2023-04-17T19:24:24Z

lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp

+    auto oldAType = a.getType().cast<RankedTensorType>();
+    auto oldBType = b.getType().cast<RankedTensorType>();
+
+    if (computeCapability < 70) {


v100 has a compute capability of 70 so changed it from <= to < because it supports MMA Layout.

the error should be somewhere else. Probably at the beginning of semantic.dot in the frontend. Here an assertion that computeCapability >= 70 should be enough.

Also, your check doesn't cover float8

@ptillet Done as you have suggested 👍

clxyder · 2023-04-21T03:59:29Z

Hey @geekypathak21 do you think it's possible to support 4 or 8 bit data representation for the Pascal cards and greater?

geekypathak21 · 2023-04-21T08:41:07Z

Hey @clxyder I think it's possible to support 16 but not sure about 8 and 4.

Ph0rk0z · 2023-04-21T11:13:25Z

Hey @clxyder I think it's possible to support 16 but not sure about 8 and 4.

But it is working in 4bit on cuda kernels currently. 8bit through bits and bytes is a bit sketchy and doesn't work with every model.

…#1505) Related to triton-lang#1271 . I am currently working on adding support for Pre-volta GPUs in Triton. --------- Co-authored-by: Himanshu Pathak <himanshu@mtatva.com> Co-authored-by: Philippe Tillet <phil@openai.com>

These fixes allow the Triton project to build under gcc-9. cc triton-lang#1505

geekypathak21 requested review from Jokeren and ptillet as code owners April 11, 2023 08:44

ptillet reviewed Apr 11, 2023

View reviewed changes

lib/Analysis/Utility.cpp Outdated Show resolved Hide resolved

geekypathak21 commented Apr 11, 2023

View reviewed changes

ptillet changed the title ~~Adding support For Pre-volta GPU~~ Do not run AccelerateMatmul on pre-Volta GPUs Apr 11, 2023

geekypathak21 force-pushed the add-prevoltasupport branch from 5aaa16a to 0853230 Compare April 12, 2023 04:32

ptillet approved these changes Apr 12, 2023

View reviewed changes

clxyder mentioned this pull request Apr 13, 2023

Triton - Assertion failure: "Unexpected MMA layout version found" qwopqwop200/GPTQ-for-LLaMa#142

Closed

DragonLiu1995 mentioned this pull request Apr 14, 2023

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems qwopqwop200/GPTQ-for-LLaMa#174

Open

geekypathak21 force-pushed the add-prevoltasupport branch 2 times, most recently from 8ef6fcf to b060693 Compare April 17, 2023 19:21

geekypathak21 commented Apr 17, 2023

View reviewed changes

geekypathak21 requested a review from ptillet April 18, 2023 01:17

clxyder mentioned this pull request Apr 18, 2023

I implement an easy-to-use package based on cuda branch qwopqwop200/GPTQ-for-LLaMa#186

Closed

geekypathak21 force-pushed the add-prevoltasupport branch from 01d90ce to d94f87b Compare April 21, 2023 08:37

geekypathak21 force-pushed the add-prevoltasupport branch 2 times, most recently from 46f7e17 to 31ee8fc Compare April 21, 2023 08:58

Himanshu Pathak and others added 2 commits April 24, 2023 21:13

Do not run Accelerate Matumul for pre-volta gpus

5f68873

Adding change related to isort

22c1877

geekypathak21 force-pushed the add-prevoltasupport branch from d36087c to 22c1877 Compare April 24, 2023 15:51

Merge branch 'main' into add-prevoltasupport

276c9f0

ptillet merged commit 6d22643 into triton-lang:main Apr 24, 2023

ajz34 mentioned this pull request Apr 25, 2023

错误：Unexpected MMA layout version found OpenMOSS/MOSS#149

Open

carolinacamassabdi mentioned this pull request Apr 26, 2023

Fused mlp causes assertion error qwopqwop200/GPTQ-for-LLaMa#179

Open

GRMrGecko mentioned this pull request Jun 21, 2023

[Feature] Support Chatbot to use other LLM models such as ChatGLM-6B toverainc/willow-inference-server#84

Open

ZzEeKkAa pushed a commit to ZzEeKkAa/triton that referenced this pull request Aug 5, 2024

Build with gcc9 (triton-lang#1528)

65aaa4d

These fixes allow the Triton project to build under gcc-9. cc triton-lang#1505

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not run AccelerateMatmul on pre-Volta GPUs #1505

Do not run AccelerateMatmul on pre-Volta GPUs #1505

geekypathak21 commented Apr 11, 2023

geekypathak21 commented Apr 11, 2023 •

edited

Loading

Jokeren commented Apr 11, 2023

geekypathak21 commented Apr 11, 2023

geekypathak21 Apr 11, 2023 •

edited

Loading

Ph0rk0z commented Apr 14, 2023

Kraegge commented Apr 15, 2023

geekypathak21 commented Apr 15, 2023

Ph0rk0z commented Apr 15, 2023

geekypathak21 left a comment •

edited

Loading

geekypathak21 Apr 17, 2023 •

edited

Loading

geekypathak21 Apr 17, 2023

ptillet Apr 19, 2023 •

edited

Loading

ptillet Apr 19, 2023

geekypathak21 Apr 21, 2023

clxyder commented Apr 21, 2023

geekypathak21 commented Apr 21, 2023

Ph0rk0z commented Apr 21, 2023

Do not run AccelerateMatmul on pre-Volta GPUs #1505

Do not run AccelerateMatmul on pre-Volta GPUs #1505

Conversation

geekypathak21 commented Apr 11, 2023

geekypathak21 commented Apr 11, 2023 • edited Loading

Jokeren commented Apr 11, 2023

geekypathak21 commented Apr 11, 2023

geekypathak21 Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

Ph0rk0z commented Apr 14, 2023

Kraegge commented Apr 15, 2023

geekypathak21 commented Apr 15, 2023

Ph0rk0z commented Apr 15, 2023

geekypathak21 left a comment • edited Loading

Choose a reason for hiding this comment

geekypathak21 Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

geekypathak21 Apr 17, 2023

Choose a reason for hiding this comment

ptillet Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

ptillet Apr 19, 2023

Choose a reason for hiding this comment

geekypathak21 Apr 21, 2023

Choose a reason for hiding this comment

clxyder commented Apr 21, 2023

geekypathak21 commented Apr 21, 2023

Ph0rk0z commented Apr 21, 2023

geekypathak21 commented Apr 11, 2023 •

edited

Loading

geekypathak21 Apr 11, 2023 •

edited

Loading

geekypathak21 left a comment •

edited

Loading

geekypathak21 Apr 17, 2023 •

edited

Loading

ptillet Apr 19, 2023 •

edited

Loading