[AMDGPU] Fix PromoteAlloca size check of alloca for store #72528

bcahoon · 2023-11-16T16:04:50Z

When storing a subvector too many element were written when the
size of the alloca is smaller than the size of the vector store.
This patch checks for the minimum of the alloca vector and the
store vector to determine the number of elements to store.

When storing a subvector too many element were written when the size of the alloca is smaller than the size of the vector store. This patch checks for the minimum of the alloca vector and the store vector to determine the number of elements to store.

llvmbot · 2023-11-16T16:05:22Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (bcahoon)

Changes

When storing a subvector too many element were written when the
size of the alloca is smaller than the size of the vector store.
This patch checks for the minimum of the alloca vector and the
store vector to determine the number of elements to store.

Full diff: https://github.com/llvm/llvm-project/pull/72528.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll (+20)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 29591ddd669c95e..9293f8954cfe2b5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -469,6 +469,7 @@ static Value *promoteAllocaUserToVector(
       assert(AccessSize.isKnownMultipleOf(DL.getTypeStoreSize(VecEltTy)));
       const unsigned NumWrittenElts =
           AccessSize / DL.getTypeStoreSize(VecEltTy);
+      const unsigned NumVecElts = VectorTy->getNumElements();
       auto *SubVecTy = FixedVectorType::get(VecEltTy, NumWrittenElts);
       assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));
 
@@ -480,7 +481,8 @@ static Value *promoteAllocaUserToVector(
       Val = Builder.CreateBitOrPointerCast(Val, SubVecTy);
 
       Value *CurVec = GetOrLoadCurrentVectorValue();
-      for (unsigned K = 0; K < NumWrittenElts; ++K) {
+      for (unsigned K = 0, NumElts = std::min(NumWrittenElts, NumVecElts);
+           K < NumElts; ++K) {
         Value *CurIdx =
             Builder.CreateAdd(Index, ConstantInt::get(Index->getType(), K));
         CurVec = Builder.CreateInsertElement(
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
index 765aa250a48f4f0..d8e228e27a8360c 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
@@ -458,3 +458,23 @@ finally:
   %load = load <4 x i16>, ptr addrspace(5) %ptr.2, align 2
   ret <4 x i16> %load
 }
+
+
+; Check the case when the alloca is smaller than the vector size.
+define void @math_kernel3(<4 x i32> %store) {
+; CHECK-LABEL: define void @math_kernel3
+; CHECK-SAME: (<4 x i32> [[STORE:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = extractelement <4 x i32> [[STORE]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <3 x i32> undef, i32 [[TMP0]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <4 x i32> [[STORE]], i64 1
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <3 x i32> [[TMP1]], i32 [[TMP2]], i32 1
+; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x i32> [[STORE]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <3 x i32> [[TMP3]], i32 [[TMP4]], i32 2
+; CHECK-NEXT:    ret void
+;
+entry:
+  %res = alloca <3 x i32>, align 16, addrspace(5)
+  store <4 x i32> %store, ptr addrspace(5) %res, align 16
+  ret void
+}

Pierre-vh

What did the IR look like before the change? I thought it'd just be folded to undef if you'd try to extract/insert past the end of a vector

In any case, change is good, test looks good, just wondering what this fixes exactly

bcahoon · 2023-11-17T23:51:41Z

Hi @Pierre-vh, I decided the change the test to better show the bug. The following is the output prior to the patch. The set of extract/insert element for the 2nd store start with the last value generated by the first set of extract/insert element instruction. The initial value becomes poison because the last insertelement uses an invalid index (outside the range). Let me know if that does/doesn't help.

%0 = extractelement <4 x i32> %store1, i64 0
%1 = insertelement <3 x i32> undef, i32 %0, i32 0
%2 = extractelement <4 x i32> %store1, i64 1
%3 = insertelement <3 x i32> %1, i32 %2, i32 1
%4 = extractelement <4 x i32> %store1, i64 2
%5 = insertelement <3 x i32> %3, i32 %4, i32 2
%6 = extractelement <4 x i32> %store1, i64 3
%7 = extractelement <4 x i32> %store2, i64 0
%8 = insertelement <3 x i32> poison, i32 %7, i32 0
%9 = extractelement <4 x i32> %store2, i64 1
%10 = insertelement <3 x i32> %8, i32 %9, i32 1
%11 = extractelement <4 x i32> %store2, i64 2
%12 = insertelement <3 x i32> %10, i32 %11, i32 2
%13 = extractelement <4 x i32> %store2, i64 3
ret void

Pierre-vh · 2023-11-20T07:24:22Z

Good catch! I thought in such cases the insertelement would just be eliminated (as it stores to nothing and it's a no-side-effect operation) but indeed semantics dictate it returns poison. https://llvm.org/docs/LangRef.html#insertelement-instruction

When storing a subvector, too many element were written when the size of the alloca is smaller than the size of the vector store. This patch checks for the minimum of the alloca vector and the store vector to determine the number of elements to store.

llvmbot added the backend:AMDGPU label Nov 16, 2023

bcahoon requested a review from Pierre-vh November 16, 2023 16:05

Pierre-vh approved these changes Nov 17, 2023

View reviewed changes

fixup! [AMDGPU] Fix PromoteAlloca size check of alloca for store

b807981

bcahoon merged commit 28b5054 into llvm:main Nov 20, 2023

bcahoon deleted the brcahoon/promote-alloca branch November 20, 2023 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Fix PromoteAlloca size check of alloca for store #72528

[AMDGPU] Fix PromoteAlloca size check of alloca for store #72528

Uh oh!

bcahoon commented Nov 16, 2023

Uh oh!

llvmbot commented Nov 16, 2023

Uh oh!

Pierre-vh left a comment

Uh oh!

bcahoon commented Nov 17, 2023

Uh oh!

Pierre-vh commented Nov 20, 2023

Uh oh!

Uh oh!

[AMDGPU] Fix PromoteAlloca size check of alloca for store #72528

[AMDGPU] Fix PromoteAlloca size check of alloca for store #72528

Uh oh!

Conversation

bcahoon commented Nov 16, 2023

Uh oh!

llvmbot commented Nov 16, 2023

Uh oh!

Pierre-vh left a comment

Choose a reason for hiding this comment

Uh oh!

bcahoon commented Nov 17, 2023

Uh oh!

Pierre-vh commented Nov 20, 2023

Uh oh!

Uh oh!