[LV] fix logical error in trunc cost #91136

artagnon · 2024-05-05T17:53:33Z

In LoopVectorizationCostModel::getInstructionCost(), when the condition canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source type is computed as the smallest type of the source vector and the destination vector, and the destination type is computed as the largest type of the instruction and destination type. This is clearly a logical error, as the original source vector type could be smaller than the original destination vector type, and the trunc semantics are broken because we're attempting to widen.

Fixes #47665.

llvmbot · 2024-05-05T17:54:05Z

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

In LoopVectorizationCostModel::getInstructionCost(), when the condition canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source type is computed as the smallest type of the source vector and the destination vector, and the destination type is computed as the largest type of the instruction and destination type. This is clearly a logical error, as the original source vector type could be smaller than the original destination vector type, and the trunc semantics are broken because we're attempting to widen. Even if this were not the case, there is clearly no benefit of computing the largest vector type for the final destination type, as the cost of a trunc only depends on the vectorization factor.

Full diff: https://github.com/llvm/llvm-project/pull/91136.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+3-10)
(added) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+148)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3be0102bea3e34..96debb8e33b436 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3390,12 +3390,6 @@ static Type *smallestIntegerVectorType(Type *T1, Type *T2) {
   return I1->getBitWidth() < I2->getBitWidth() ? T1 : T2;
 }
 
-static Type *largestIntegerVectorType(Type *T1, Type *T2) {
-  auto *I1 = cast<IntegerType>(cast<VectorType>(T1)->getElementType());
-  auto *I2 = cast<IntegerType>(cast<VectorType>(T2)->getElementType());
-  return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;
-}
-
 void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
                                             VPlan &Plan) {
   // Fix widened non-induction PHIs by setting up the PHI operands.
@@ -7123,16 +7117,15 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
       // "zext i8 %1 to i32" becomes "zext i8 %1 to i16".
       //
       // Calculate the modified src and dest types.
-      Type *MinVecTy = VectorTy;
       if (Opcode == Instruction::Trunc) {
-        SrcVecTy = smallestIntegerVectorType(SrcVecTy, MinVecTy);
+        SrcVecTy = smallestIntegerVectorType(SrcVecTy, VectorTy);
         VectorTy =
-            largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
+            smallestIntegerVectorType(ToVectorTy(I->getType(), VF), VectorTy);
       } else if (Opcode == Instruction::ZExt || Opcode == Instruction::SExt) {
         // Leave SrcVecTy unchanged - we only shrink the destination element
         // type.
         VectorTy =
-            smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
+            smallestIntegerVectorType(ToVectorTy(I->getType(), VF), VectorTy);
       }
     }
 
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
new file mode 100644
index 00000000000000..1b793464340804
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -0,0 +1,148 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=loop-vectorize,dce,simplifycfg -mtriple=s390x -mcpu=z14 -S %s | FileCheck %s --check-prefix=NOVEC
+; RUN: opt -passes=loop-vectorize,dce,simplifycfg -force-vector-width=2 -mtriple=s390x -mcpu=z14 -S %s | FileCheck %s
+
+define void @test(ptr %p) {
+; NOVEC-LABEL: define void @test(
+; NOVEC-SAME: ptr [[P:%.*]]) #[[ATTR0:[0-9]+]] {
+; NOVEC-NEXT:  entry:
+; NOVEC-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 0, i64 0
+; NOVEC-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
+; NOVEC-NEXT:    [[VEC_IV:%.*]] = add <16 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; NOVEC-NEXT:    [[TMP0:%.*]] = icmp ule <16 x i32> [[VEC_IV]], <i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9, i32 9>
+; NOVEC-NEXT:    [[TMP1:%.*]] = extractelement <16 x i1> [[TMP0]], i32 0
+; NOVEC-NEXT:    [[TMP2:%.*]] = extractelement <16 x i1> [[TMP0]], i32 1
+; NOVEC-NEXT:    [[TMP3:%.*]] = xor i1 [[TMP1]], true
+; NOVEC-NEXT:    [[TMP4:%.*]] = xor i1 [[TMP2]], true
+; NOVEC-NEXT:    [[TMP5:%.*]] = xor i1 [[TMP3]], true
+; NOVEC-NEXT:    [[TMP6:%.*]] = xor i1 [[TMP4]], true
+; NOVEC-NEXT:    [[TMP7:%.*]] = or i1 [[TMP5]], [[TMP6]]
+; NOVEC-NEXT:    [[TMP8:%.*]] = extractelement <16 x i1> [[TMP0]], i32 2
+; NOVEC-NEXT:    [[TMP9:%.*]] = xor i1 [[TMP7]], true
+; NOVEC-NEXT:    [[TMP10:%.*]] = xor i1 [[TMP8]], true
+; NOVEC-NEXT:    [[TMP11:%.*]] = xor i1 [[TMP9]], true
+; NOVEC-NEXT:    [[TMP12:%.*]] = xor i1 [[TMP10]], true
+; NOVEC-NEXT:    [[TMP13:%.*]] = or i1 [[TMP11]], [[TMP12]]
+; NOVEC-NEXT:    [[TMP14:%.*]] = extractelement <16 x i1> [[TMP0]], i32 3
+; NOVEC-NEXT:    [[TMP15:%.*]] = xor i1 [[TMP13]], true
+; NOVEC-NEXT:    [[TMP16:%.*]] = xor i1 [[TMP14]], true
+; NOVEC-NEXT:    [[TMP17:%.*]] = xor i1 [[TMP15]], true
+; NOVEC-NEXT:    [[TMP18:%.*]] = xor i1 [[TMP16]], true
+; NOVEC-NEXT:    [[TMP19:%.*]] = or i1 [[TMP17]], [[TMP18]]
+; NOVEC-NEXT:    [[TMP20:%.*]] = extractelement <16 x i1> [[TMP0]], i32 4
+; NOVEC-NEXT:    [[TMP21:%.*]] = xor i1 [[TMP19]], true
+; NOVEC-NEXT:    [[TMP22:%.*]] = xor i1 [[TMP20]], true
+; NOVEC-NEXT:    [[TMP23:%.*]] = xor i1 [[TMP21]], true
+; NOVEC-NEXT:    [[TMP24:%.*]] = xor i1 [[TMP22]], true
+; NOVEC-NEXT:    [[TMP25:%.*]] = or i1 [[TMP23]], [[TMP24]]
+; NOVEC-NEXT:    [[TMP26:%.*]] = extractelement <16 x i1> [[TMP0]], i32 5
+; NOVEC-NEXT:    [[TMP27:%.*]] = xor i1 [[TMP25]], true
+; NOVEC-NEXT:    [[TMP28:%.*]] = xor i1 [[TMP26]], true
+; NOVEC-NEXT:    [[TMP29:%.*]] = xor i1 [[TMP27]], true
+; NOVEC-NEXT:    [[TMP30:%.*]] = xor i1 [[TMP28]], true
+; NOVEC-NEXT:    [[TMP31:%.*]] = or i1 [[TMP29]], [[TMP30]]
+; NOVEC-NEXT:    [[TMP32:%.*]] = extractelement <16 x i1> [[TMP0]], i32 6
+; NOVEC-NEXT:    [[TMP33:%.*]] = xor i1 [[TMP31]], true
+; NOVEC-NEXT:    [[TMP34:%.*]] = xor i1 [[TMP32]], true
+; NOVEC-NEXT:    [[TMP35:%.*]] = xor i1 [[TMP33]], true
+; NOVEC-NEXT:    [[TMP36:%.*]] = xor i1 [[TMP34]], true
+; NOVEC-NEXT:    [[TMP37:%.*]] = or i1 [[TMP35]], [[TMP36]]
+; NOVEC-NEXT:    [[TMP38:%.*]] = extractelement <16 x i1> [[TMP0]], i32 7
+; NOVEC-NEXT:    [[TMP39:%.*]] = xor i1 [[TMP37]], true
+; NOVEC-NEXT:    [[TMP40:%.*]] = xor i1 [[TMP38]], true
+; NOVEC-NEXT:    [[TMP41:%.*]] = xor i1 [[TMP39]], true
+; NOVEC-NEXT:    [[TMP42:%.*]] = xor i1 [[TMP40]], true
+; NOVEC-NEXT:    [[TMP43:%.*]] = or i1 [[TMP41]], [[TMP42]]
+; NOVEC-NEXT:    [[TMP44:%.*]] = extractelement <16 x i1> [[TMP0]], i32 8
+; NOVEC-NEXT:    [[TMP45:%.*]] = xor i1 [[TMP43]], true
+; NOVEC-NEXT:    [[TMP46:%.*]] = xor i1 [[TMP44]], true
+; NOVEC-NEXT:    [[TMP47:%.*]] = xor i1 [[TMP45]], true
+; NOVEC-NEXT:    [[TMP48:%.*]] = xor i1 [[TMP46]], true
+; NOVEC-NEXT:    [[TMP49:%.*]] = or i1 [[TMP47]], [[TMP48]]
+; NOVEC-NEXT:    [[TMP50:%.*]] = extractelement <16 x i1> [[TMP0]], i32 9
+; NOVEC-NEXT:    [[TMP51:%.*]] = xor i1 [[TMP49]], true
+; NOVEC-NEXT:    [[TMP52:%.*]] = xor i1 [[TMP50]], true
+; NOVEC-NEXT:    [[TMP53:%.*]] = xor i1 [[TMP51]], true
+; NOVEC-NEXT:    [[TMP54:%.*]] = xor i1 [[TMP52]], true
+; NOVEC-NEXT:    [[TMP55:%.*]] = or i1 [[TMP53]], [[TMP54]]
+; NOVEC-NEXT:    [[TMP56:%.*]] = extractelement <16 x i1> [[TMP0]], i32 10
+; NOVEC-NEXT:    [[TMP57:%.*]] = xor i1 [[TMP55]], true
+; NOVEC-NEXT:    [[TMP58:%.*]] = xor i1 [[TMP56]], true
+; NOVEC-NEXT:    [[TMP59:%.*]] = xor i1 [[TMP57]], true
+; NOVEC-NEXT:    [[TMP60:%.*]] = xor i1 [[TMP58]], true
+; NOVEC-NEXT:    [[TMP61:%.*]] = or i1 [[TMP59]], [[TMP60]]
+; NOVEC-NEXT:    [[TMP62:%.*]] = extractelement <16 x i1> [[TMP0]], i32 11
+; NOVEC-NEXT:    [[TMP63:%.*]] = xor i1 [[TMP61]], true
+; NOVEC-NEXT:    [[TMP64:%.*]] = xor i1 [[TMP62]], true
+; NOVEC-NEXT:    [[TMP65:%.*]] = xor i1 [[TMP63]], true
+; NOVEC-NEXT:    [[TMP66:%.*]] = xor i1 [[TMP64]], true
+; NOVEC-NEXT:    [[TMP67:%.*]] = or i1 [[TMP65]], [[TMP66]]
+; NOVEC-NEXT:    [[TMP68:%.*]] = extractelement <16 x i1> [[TMP0]], i32 12
+; NOVEC-NEXT:    [[TMP69:%.*]] = xor i1 [[TMP67]], true
+; NOVEC-NEXT:    [[TMP70:%.*]] = xor i1 [[TMP68]], true
+; NOVEC-NEXT:    [[TMP71:%.*]] = xor i1 [[TMP69]], true
+; NOVEC-NEXT:    [[TMP72:%.*]] = xor i1 [[TMP70]], true
+; NOVEC-NEXT:    [[TMP73:%.*]] = or i1 [[TMP71]], [[TMP72]]
+; NOVEC-NEXT:    [[TMP74:%.*]] = extractelement <16 x i1> [[TMP0]], i32 13
+; NOVEC-NEXT:    [[TMP75:%.*]] = xor i1 [[TMP73]], true
+; NOVEC-NEXT:    [[TMP76:%.*]] = xor i1 [[TMP74]], true
+; NOVEC-NEXT:    [[TMP77:%.*]] = xor i1 [[TMP75]], true
+; NOVEC-NEXT:    [[TMP78:%.*]] = xor i1 [[TMP76]], true
+; NOVEC-NEXT:    [[TMP79:%.*]] = or i1 [[TMP77]], [[TMP78]]
+; NOVEC-NEXT:    [[TMP80:%.*]] = extractelement <16 x i1> [[TMP0]], i32 14
+; NOVEC-NEXT:    [[TMP81:%.*]] = xor i1 [[TMP79]], true
+; NOVEC-NEXT:    [[TMP82:%.*]] = xor i1 [[TMP80]], true
+; NOVEC-NEXT:    [[TMP83:%.*]] = xor i1 [[TMP81]], true
+; NOVEC-NEXT:    [[TMP84:%.*]] = xor i1 [[TMP82]], true
+; NOVEC-NEXT:    [[TMP85:%.*]] = or i1 [[TMP83]], [[TMP84]]
+; NOVEC-NEXT:    [[TMP86:%.*]] = extractelement <16 x i1> [[TMP0]], i32 15
+; NOVEC-NEXT:    [[TMP87:%.*]] = xor i1 [[TMP85]], true
+; NOVEC-NEXT:    [[TMP88:%.*]] = xor i1 [[TMP86]], true
+; NOVEC-NEXT:    [[TMP89:%.*]] = xor i1 [[TMP87]], true
+; NOVEC-NEXT:    [[TMP90:%.*]] = xor i1 [[TMP88]], true
+; NOVEC-NEXT:    [[TMP91:%.*]] = or i1 [[TMP89]], [[TMP90]]
+; NOVEC-NEXT:    br i1 [[TMP91]], label [[TMP92:%.*]], label [[MIDDLE_BLOCK:%.*]]
+; NOVEC:       92:
+; NOVEC-NEXT:    store i1 false, ptr [[P]], align 1
+; NOVEC-NEXT:    br label [[MIDDLE_BLOCK]]
+; NOVEC:       middle.block:
+; NOVEC-NEXT:    [[INDEX_NEXT:%.*]] = add i32 0, 16
+; NOVEC-NEXT:    ret void
+;
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: ptr [[P:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE30:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE30]] ]
+; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 10
+; CHECK-NEXT:    br i1 [[TMP0]], label [[EXIT:%.*]], label [[PRED_STORE_CONTINUE30]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %for.body ]
+  %trunc = trunc i40 0 to i32
+  %icmp.eq = icmp eq i32 %trunc, 0
+  %zext = zext i1 %icmp.eq to i32
+  %icmp.ult = icmp ult i32 0, %zext
+  %or = or i1 %icmp.ult, true
+  %icmp.sgt = icmp sgt i1 %or, false
+  store i1 %icmp.sgt, ptr %p, align 1
+  %iv.next = add i32 %iv, 1
+  %cond = icmp ult i32 %iv.next, 10
+  br i1 %cond, label %for.body, label %exit
+
+exit:                                             ; preds = %for.body
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+;.

artagnon · 2024-05-14T09:25:46Z

Gentle ping.

llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll

llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll

artagnon · 2024-05-14T18:32:47Z

Okay, I have updated the tests to read a little better, but I'm not sure if we really tackled the fragility issue: if instcombine runs before loop-vectorize in the future, these tests are null and void.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

artagnon · 2024-05-21T09:35:35Z

I've now implemented a more thorough fix.

In LoopVectorizationCostModel::getInstructionCost(), when the condition canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source type is computed as the smallest type of the source vector and the destination vector, and the destination type is computed as the largest type of the instruction and destination type. This is clearly a logical error, as the original source vector type could be smaller than the original destination vector type, and the trunc semantics are broken because we're attempting to widen. Even if this were not the case, there is clearly no benefit of computing the largest vector type for the final destination type, as the cost of a trunc only depends on the vectorization factor. Fixes llvm#47765.

artagnon · 2024-05-24T10:25:05Z

Rebased.

fhahn

LGTM, thanks!

llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll

llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll

artagnon requested review from fhahn and david-arm May 5, 2024 17:53

llvmbot added backend:SystemZ vectorizers llvm:transforms labels May 5, 2024

artagnon force-pushed the lv-47665 branch from 2074a8b to 9a2d31d Compare May 7, 2024 13:07

artagnon added the backend:RISC-V label May 10, 2024

artagnon requested review from sjoerdmeijer, MaskRay and davemgreen May 10, 2024 20:10

artagnon changed the title ~~LoopVectorize: fix logical error in trunc cost~~ [LV] fix logical error in trunc cost May 14, 2024

david-arm reviewed May 14, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll Outdated Show resolved Hide resolved

artagnon force-pushed the lv-47665 branch from 9a2d31d to 5451c6d Compare May 14, 2024 18:31

fhahn reviewed May 16, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

artagnon force-pushed the lv-47665 branch from 5451c6d to 281bd1a Compare May 21, 2024 09:35

artagnon added 2 commits May 24, 2024 11:19

[LV] a more thorough fix

e85e136

artagnon force-pushed the lv-47665 branch from 281bd1a to e85e136 Compare May 24, 2024 10:21

fhahn approved these changes May 24, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll Outdated Show resolved Hide resolved

[LV] remove dce,simplifycfg from tests

3b204a1

artagnon merged commit bb0d29a into llvm:main May 24, 2024

artagnon deleted the lv-47665 branch May 24, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] fix logical error in trunc cost #91136

[LV] fix logical error in trunc cost #91136

Uh oh!

artagnon commented May 5, 2024 •

edited

Loading

Uh oh!

llvmbot commented May 5, 2024 •

edited

Loading

Uh oh!

artagnon commented May 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

artagnon commented May 14, 2024

Uh oh!

Uh oh!

artagnon commented May 21, 2024

Uh oh!

artagnon commented May 24, 2024

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[LV] fix logical error in trunc cost #91136

[LV] fix logical error in trunc cost #91136

Uh oh!

Conversation

artagnon commented May 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artagnon commented May 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

artagnon commented May 14, 2024

Uh oh!

Uh oh!

artagnon commented May 21, 2024

Uh oh!

artagnon commented May 24, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

artagnon commented May 5, 2024 •

edited

Loading

llvmbot commented May 5, 2024 •

edited

Loading