Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Fixed folding of inline imm into dot w/o opsel #73589

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

rampitec
Copy link
Collaborator

A splat packed constant can be folded as an inline immediate but it shall use opsel. On gfx940 this code path can be skipped due to HW bug workaround and then it may be folded w/o opsel which is a bug. Fixed.

A splat packed constant can be folded as an inline immediate but
it shall use opsel. On gfx940 this code path can be skipped due
to HW bug workaround and then it may be folded w/o opsel which is
a bug. Fixed.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 27, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

A splat packed constant can be folded as an inline immediate but it shall use opsel. On gfx940 this code path can be skipped due to HW bug workaround and then it may be folded w/o opsel which is a bug. Fixed.


Full diff: https://github.com/llvm/llvm-project/pull/73589.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll (+6-3)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 0ec0370e21dfc16..5f8e32c812292b2 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -205,9 +205,11 @@ bool SIFoldOperands::updateOperand(FoldCandidate &Fold) const {
   const uint64_t TSFlags = MI->getDesc().TSFlags;
   if (Fold.isImm()) {
     if (TSFlags & SIInstrFlags::IsPacked && !(TSFlags & SIInstrFlags::IsMAI) &&
-        (!ST->hasDOTOpSelHazard() || !(TSFlags & SIInstrFlags::IsDOT)) &&
         AMDGPU::isFoldableLiteralV216(Fold.ImmToFold,
                                       ST->hasInv2PiInlineImm())) {
+      if (ST->hasDOTOpSelHazard() && (TSFlags & SIInstrFlags::IsDOT))
+        return false; // Prevent further folding of this operand without opsel.
+
       // Set op_sel/op_sel_hi on this operand or bail out if op_sel is
       // already set.
       unsigned Opcode = MI->getOpcode();
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll
index 490ce706455cd6f..04c84df859b27fa 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll
@@ -1,6 +1,6 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX9,GFX906
-; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX9,GFX940
-; RUN: llc -march=amdgcn -mcpu=gfx940 -global-isel -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX9,GFX940
+; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX9,GFX940-SDAG
+; RUN: llc -march=amdgcn -mcpu=gfx940 -global-isel -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX9,GFX940-GISEL
 ; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 
@@ -43,7 +43,10 @@ entry:
 
 ; GCN-LABEL: {{^}}test_llvm_amdgcn_udot2_op_sel:
 ; GFX906: v_dot2_u32_u16 v{{[0-9]+}}, 1, v{{[0-9]+}}, s{{[0-9]+}} op_sel:[0,1,0] op_sel_hi:[0,0,1]{{$}}
-; GFX940: v_dot2_u32_u16 v{{[0-9]+}}, 1, v{{[0-9]+}}, s{{[0-9]+}}{{$}}
+; GFX940-SDAG: s_mov_b32 [[K:s[0-9]+]], 0x10001
+; GFX940-SDAG: v_dot2_u32_u16 v{{[0-9]+}}, [[K]], v{{[0-9]+}}, v{{[0-9]+}}{{$}}
+; GFX940-GISEL: v_mov_b32_e32 [[K:v[0-9]+]], 0x10001
+; GFX940-GISEL: v_dot2_u32_u16 v{{[0-9]+}}, [[K]], v{{[0-9]+}}, s{{[0-9]+}}{{$}}
 ; GFX10:  v_dot2_u32_u16 v{{[0-9]+}}, 1, v{{[0-9]+}}, s{{[0-9]+}} op_sel:[0,1,0] op_sel_hi:[0,0,1]{{$}}
 define amdgpu_kernel void @test_llvm_amdgcn_udot2_op_sel(
     ptr addrspace(1) %r,

@rampitec rampitec merged commit 82d22a1 into llvm:main Nov 28, 2023
3 of 4 checks passed
@rampitec rampitec deleted the udot-inline-opsel-fix branch November 28, 2023 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants