[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

topperc · 2024-02-14T00:14:45Z

We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected.

More comments in the patch.

llvmbot · 2024-02-14T00:15:16Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Craig Topper (topperc)

Changes

We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected.

More comments in the patch.

Patch is 54.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81690.diff

11 Files Affected:

(modified) llvm/lib/CodeGen/TypePromotion.cpp (+52-57)
(modified) llvm/test/CodeGen/AArch64/and-mask-removal.ll (+31-31)
(modified) llvm/test/CodeGen/AArch64/lack-of-signed-truncation-check.ll (+36-26)
(modified) llvm/test/CodeGen/AArch64/signed-truncation-check.ll (+37-27)
(modified) llvm/test/CodeGen/AArch64/typepromotion-overflow.ll (+2-3)
(modified) llvm/test/CodeGen/AArch64/typepromotion-signed.ll (+4-3)
(modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+150-90)
(modified) llvm/test/CodeGen/RISCV/signed-truncation-check.ll (+146-60)
(modified) llvm/test/CodeGen/RISCV/typepromotion-overflow.ll (+2-3)
(modified) llvm/test/Transforms/TypePromotion/ARM/icmps.ll (+4-3)
(modified) llvm/test/Transforms/TypePromotion/ARM/wrapping.ll (+6-4)

diff --git a/llvm/lib/CodeGen/TypePromotion.cpp b/llvm/lib/CodeGen/TypePromotion.cpp
index 48ad8de778010e..2116027f22c6bd 100644
--- a/llvm/lib/CodeGen/TypePromotion.cpp
+++ b/llvm/lib/CodeGen/TypePromotion.cpp
@@ -272,64 +272,58 @@ bool TypePromotionImpl::isSink(Value *V) {
 
 /// Return whether this instruction can safely wrap.
 bool TypePromotionImpl::isSafeWrap(Instruction *I) {
-  // We can support a potentially wrapping instruction (I) if:
+  // We can support a potentially wrapping Add/Sub instruction (I) if:
   // - It is only used by an unsigned icmp.
   // - The icmp uses a constant.
-  // - The wrapping value (I) is decreasing, i.e would underflow - wrapping
-  //   around zero to become a larger number than before.
   // - The wrapping instruction (I) also uses a constant.
   //
-  // We can then use the two constants to calculate whether the result would
-  // wrap in respect to itself in the original bitwidth. If it doesn't wrap,
-  // just underflows the range, the icmp would give the same result whether the
-  // result has been truncated or not. We calculate this by:
-  // - Zero extending both constants, if needed, to RegisterBitWidth.
-  // - Take the absolute value of I's constant, adding this to the icmp const.
-  // - Check that this value is not out of range for small type. If it is, it
-  //   means that it has underflowed enough to wrap around the icmp constant.
+  // This a common pattern emitted to check if a value is within a range.
   //
   // For example:
   //
-  // %sub = sub i8 %a, 2
-  // %cmp = icmp ule i8 %sub, 254
+  // %sub = sub i8 %a, C1
+  // %cmp = icmp ule i8 %sub, C2
+  //
+  // or
   //
-  // If %a = 0, %sub = -2 == FE == 254
-  // But if this is evalulated as a i32
-  // %sub = -2 == FF FF FF FE == 4294967294
-  // So the unsigned compares (i8 and i32) would not yield the same result.
+  // %add = add i8 %a, C1
+  // %cmp = icmp ule i8 %add, C2.
   //
-  // Another way to look at it is:
-  // %a - 2 <= 254
-  // %a + 2 <= 254 + 2
-  // %a <= 256
-  // And we can't represent 256 in the i8 format, so we don't support it.
+  // We will treat an add as though it were a subtract by -C1. To promote
+  // the Add/Sub we will zero extend the LHS and the subtracted amount. For Add,
+  // this means we need to negate the constant, zero extend to RegisterBitWidth,
+  // and negate in the larger type.
   //
-  // Whereas:
+  // This will produce a value in the range [-zext(C1), zext(X)-zext(C1)] where
+  // C1 is the subtracted amount. This is either a small unsigned number or a
+  // large unsigned number in the promoted type.
   //
-  // %sub i8 %a, 1
+  // Now we need to correct the compare constant C2. Values >= C1 in the
+  // original add result range have been remapped to large values in the
+  // promoted range. If the compare constant fell into this range we need to
+  // remap it as well. We can do this as -(zext(-C2)).
+  //
+  // For example:
+  //
+  // %sub = sub i8 %a, 2
   // %cmp = icmp ule i8 %sub, 254
   //
-  // If %a = 0, %sub = -1 == FF == 255
-  // As i32:
-  // %sub = -1 == FF FF FF FF == 4294967295
+  // becomes
   //
-  // In this case, the unsigned compare results would be the same and this
-  // would also be true for ult, uge and ugt:
-  // - (255 < 254) == (0xFFFFFFFF < 254) == false
-  // - (255 <= 254) == (0xFFFFFFFF <= 254) == false
-  // - (255 > 254) == (0xFFFFFFFF > 254) == true
-  // - (255 >= 254) == (0xFFFFFFFF >= 254) == true
+  // %zext = zext %a to i32
+  // %sub = sub i32 %zext, 2
+  // %cmp = icmp ule i32 %sub, 4294967294
   //
-  // To demonstrate why we can't handle increasing values:
+  // Another example:
   //
-  // %add = add i8 %a, 2
-  // %cmp = icmp ult i8 %add, 127
+  // %sub = sub i8 %a, 1
+  // %cmp = icmp ule i8 %sub, 254
   //
-  // If %a = 254, %add = 256 == (i8 1)
-  // As i32:
-  // %add = 256
+  // becomes
   //
-  // (1 < 127) != (256 < 127)
+  // %zext = zext %a to i32
+  // %sub = sub i32 %zext, 1
+  // %cmp = icmp ule i32 %sub, 254
 
   unsigned Opc = I->getOpcode();
   if (Opc != Instruction::Add && Opc != Instruction::Sub)
@@ -356,15 +350,10 @@ bool TypePromotionImpl::isSafeWrap(Instruction *I) {
   APInt OverflowConst = cast<ConstantInt>(I->getOperand(1))->getValue();
   if (Opc == Instruction::Sub)
     OverflowConst = -OverflowConst;
-  if (!OverflowConst.isNonPositive())
-    return false;
 
   SafeWrap.insert(I);
 
-  // Using C1 = OverflowConst and C2 = ICmpConst, we can either prove that:
-  //   zext(x) + sext(C1) <u zext(C2)  if C1 < 0 and C1 >s C2
-  //   zext(x) + sext(C1) <u sext(C2)  if C1 < 0 and C1 <=s C2
-  if (OverflowConst.sgt(ICmpConst)) {
+  if (OverflowConst.ugt(ICmpConst)) {
     LLVM_DEBUG(dbgs() << "IR Promotion: Allowing safe overflow for sext "
                       << "const of " << *I << "\n");
     return true;
@@ -487,18 +476,24 @@ void IRPromoter::PromoteTree() {
         continue;
 
       if (auto *Const = dyn_cast<ConstantInt>(Op)) {
-        // For subtract, we don't need to sext the constant. We only put it in
+        // For subtract, we only need to zext the constant. We only put it in
         // SafeWrap because SafeWrap.size() is used elsewhere.
-        // For cmp, we need to sign extend a constant appearing in either
-        // operand. For add, we should only sign extend the RHS.
-        Constant *NewConst =
-            ConstantInt::get(Const->getContext(),
-                             (SafeWrap.contains(I) &&
-                              (I->getOpcode() == Instruction::ICmp || i == 1) &&
-                              I->getOpcode() != Instruction::Sub)
-                                 ? Const->getValue().sext(PromotedWidth)
-                                 : Const->getValue().zext(PromotedWidth));
-        I->setOperand(i, NewConst);
+        // For Add and ICmp we need to find how far the constant is from the
+        // top of its original unsigned range and place it the same distance
+        // from the top of its new unsigned range. We can do this by negating
+        // the constant, zero extending it, then negating in the new type.
+        APInt NewConst;
+        if (SafeWrap.contains(I)) {
+          if (I->getOpcode() == Instruction::ICmp)
+            NewConst = -((-Const->getValue()).zext(PromotedWidth));
+          else if (I->getOpcode() == Instruction::Add && i == 1)
+            NewConst = -((-Const->getValue()).zext(PromotedWidth));
+          else
+            NewConst = Const->getValue().zext(PromotedWidth);
+        } else
+          NewConst = Const->getValue().zext(PromotedWidth);
+
+        I->setOperand(i, ConstantInt::get(Const->getContext(), NewConst));
       } else if (isa<UndefValue>(Op))
         I->setOperand(i, ConstantInt::get(ExtTy, 0));
     }
diff --git a/llvm/test/CodeGen/AArch64/and-mask-removal.ll b/llvm/test/CodeGen/AArch64/and-mask-removal.ll
index 17ff0159701689..4ed482c8f2ba39 100644
--- a/llvm/test/CodeGen/AArch64/and-mask-removal.ll
+++ b/llvm/test/CodeGen/AArch64/and-mask-removal.ll
@@ -65,9 +65,8 @@ if.end:                                           ; preds = %if.then, %entry
 define zeroext i1 @test8_0(i8 zeroext %x)  align 2 {
 ; CHECK-LABEL: test8_0:
 ; CHECK:       ; %bb.0: ; %entry
-; CHECK-NEXT:    add w8, w0, #74
-; CHECK-NEXT:    and w8, w8, #0xff
-; CHECK-NEXT:    cmp w8, #236
+; CHECK-NEXT:    sub w8, w0, #182
+; CHECK-NEXT:    cmn w8, #20
 ; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
 entry:
@@ -295,20 +294,20 @@ ret_true:
 define zeroext i1 @test16_2(i16 zeroext %x)  align 2 {
 ; CHECK-SD-LABEL: test16_2:
 ; CHECK-SD:       ; %bb.0: ; %entry
-; CHECK-SD-NEXT:    mov w8, #16882 ; =0x41f2
-; CHECK-SD-NEXT:    mov w9, #40700 ; =0x9efc
+; CHECK-SD-NEXT:    mov w8, #-48654 ; =0xffff41f2
+; CHECK-SD-NEXT:    mov w9, #-24836 ; =0xffff9efc
 ; CHECK-SD-NEXT:    add w8, w0, w8
-; CHECK-SD-NEXT:    cmp w9, w8, uxth
-; CHECK-SD-NEXT:    cset w0, hi
+; CHECK-SD-NEXT:    cmp w8, w9
+; CHECK-SD-NEXT:    cset w0, lo
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: test16_2:
 ; CHECK-GI:       ; %bb.0: ; %entry
-; CHECK-GI-NEXT:    mov w8, #16882 ; =0x41f2
-; CHECK-GI-NEXT:    mov w9, #40699 ; =0x9efb
+; CHECK-GI-NEXT:    mov w8, #-48654 ; =0xffff41f2
+; CHECK-GI-NEXT:    mov w9, #-24837 ; =0xffff9efb
 ; CHECK-GI-NEXT:    add w8, w0, w8
-; CHECK-GI-NEXT:    cmp w9, w8, uxth
-; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w0, ls
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = add i16 %x, 16882
@@ -349,20 +348,20 @@ ret_true:
 define zeroext i1 @test16_4(i16 zeroext %x)  align 2 {
 ; CHECK-SD-LABEL: test16_4:
 ; CHECK-SD:       ; %bb.0: ; %entry
-; CHECK-SD-NEXT:    mov w8, #29985 ; =0x7521
+; CHECK-SD-NEXT:    mov w8, #-35551 ; =0xffff7521
 ; CHECK-SD-NEXT:    mov w9, #15676 ; =0x3d3c
 ; CHECK-SD-NEXT:    add w8, w0, w8
-; CHECK-SD-NEXT:    cmp w9, w8, uxth
-; CHECK-SD-NEXT:    cset w0, lo
+; CHECK-SD-NEXT:    cmp w8, w9
+; CHECK-SD-NEXT:    cset w0, hi
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: test16_4:
 ; CHECK-GI:       ; %bb.0: ; %entry
-; CHECK-GI-NEXT:    mov w8, #29985 ; =0x7521
+; CHECK-GI-NEXT:    mov w8, #-35551 ; =0xffff7521
 ; CHECK-GI-NEXT:    mov w9, #15677 ; =0x3d3d
 ; CHECK-GI-NEXT:    add w8, w0, w8
-; CHECK-GI-NEXT:    cmp w9, w8, uxth
-; CHECK-GI-NEXT:    cset w0, ls
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w0, hs
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = add i16 %x, -35551
@@ -431,20 +430,20 @@ ret_true:
 define zeroext i1 @test16_7(i16 zeroext %x)  align 2 {
 ; CHECK-SD-LABEL: test16_7:
 ; CHECK-SD:       ; %bb.0: ; %entry
-; CHECK-SD-NEXT:    mov w8, #9272 ; =0x2438
-; CHECK-SD-NEXT:    mov w9, #22619 ; =0x585b
+; CHECK-SD-NEXT:    mov w8, #-56264 ; =0xffff2438
+; CHECK-SD-NEXT:    mov w9, #-42917 ; =0xffff585b
 ; CHECK-SD-NEXT:    add w8, w0, w8
-; CHECK-SD-NEXT:    cmp w9, w8, uxth
-; CHECK-SD-NEXT:    cset w0, lo
+; CHECK-SD-NEXT:    cmp w8, w9
+; CHECK-SD-NEXT:    cset w0, hi
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: test16_7:
 ; CHECK-GI:       ; %bb.0: ; %entry
-; CHECK-GI-NEXT:    mov w8, #9272 ; =0x2438
-; CHECK-GI-NEXT:    mov w9, #22620 ; =0x585c
+; CHECK-GI-NEXT:    mov w8, #-56264 ; =0xffff2438
+; CHECK-GI-NEXT:    mov w9, #-42916 ; =0xffff585c
 ; CHECK-GI-NEXT:    add w8, w0, w8
-; CHECK-GI-NEXT:    cmp w9, w8, uxth
-; CHECK-GI-NEXT:    cset w0, ls
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w0, hs
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = add i16 %x, 9272
@@ -508,16 +507,17 @@ define i64 @pr58109(i8 signext %0) {
 define i64 @pr58109b(i8 signext %0, i64 %a, i64 %b) {
 ; CHECK-SD-LABEL: pr58109b:
 ; CHECK-SD:       ; %bb.0:
-; CHECK-SD-NEXT:    add w8, w0, #1
-; CHECK-SD-NEXT:    tst w8, #0xfe
-; CHECK-SD-NEXT:    csel x0, x1, x2, eq
+; CHECK-SD-NEXT:    and w8, w0, #0xff
+; CHECK-SD-NEXT:    sub w8, w8, #255
+; CHECK-SD-NEXT:    cmn w8, #254
+; CHECK-SD-NEXT:    csel x0, x1, x2, lo
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: pr58109b:
 ; CHECK-GI:       ; %bb.0:
-; CHECK-GI-NEXT:    add w8, w0, #1
-; CHECK-GI-NEXT:    and w8, w8, #0xff
-; CHECK-GI-NEXT:    cmp w8, #2
+; CHECK-GI-NEXT:    mov w8, #-255 ; =0xffffff01
+; CHECK-GI-NEXT:    add w8, w8, w0, uxtb
+; CHECK-GI-NEXT:    cmn w8, #254
 ; CHECK-GI-NEXT:    csel x0, x1, x2, lo
 ; CHECK-GI-NEXT:    ret
   %2 = add i8 %0, 1
diff --git a/llvm/test/CodeGen/AArch64/lack-of-signed-truncation-check.ll b/llvm/test/CodeGen/AArch64/lack-of-signed-truncation-check.ll
index 56a18b00a974cb..8e464d16dd90c5 100644
--- a/llvm/test/CodeGen/AArch64/lack-of-signed-truncation-check.ll
+++ b/llvm/test/CodeGen/AArch64/lack-of-signed-truncation-check.ll
@@ -187,10 +187,11 @@ define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
 define i1 @add_ugecmp_i16_i8(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_i16_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, ne
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-65281 // =0xffff00ff
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp uge i16 %tmp0, 256 ; 1U << 8
@@ -256,10 +257,11 @@ define i1 @add_ugecmp_i64_i8(i64 %x) nounwind {
 define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugtcmp_i16_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, ne
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-65281 // =0xffff00ff
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ugt i16 %tmp0, 255 ; (1U << 8) - 1
@@ -301,9 +303,10 @@ define i1 @add_ugecmp_bad_i16_i8_cmp(i16 %x, i16 %y) nounwind {
 define i1 @add_ugecmp_bad_i8_i16(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i8_i16:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #128
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #127
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-65409 // =0xffff007f
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
@@ -315,9 +318,10 @@ define i1 @add_ugecmp_bad_i8_i16(i16 %x) nounwind {
 define i1 @add_ugecmp_bad_i16_i8_c0notpoweroftwo(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i16_i8_c0notpoweroftwo:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #192
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #255
+; CHECK-NEXT:    mov w8, #-65344 // =0xffff00c0
+; CHECK-NEXT:    mov w9, #-65281 // =0xffff00ff
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 192 ; (1U << (8-1)) + (1U << (8-1-1))
@@ -329,9 +333,10 @@ define i1 @add_ugecmp_bad_i16_i8_c0notpoweroftwo(i16 %x) nounwind {
 define i1 @add_ugecmp_bad_i16_i8_c1notpoweroftwo(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i16_i8_c1notpoweroftwo:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #128
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #767
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-64769 // =0xffff02ff
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
@@ -343,9 +348,10 @@ define i1 @add_ugecmp_bad_i16_i8_c1notpoweroftwo(i16 %x) nounwind {
 define i1 @add_ugecmp_bad_i16_i8_magic(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i16_i8_magic:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #64
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #255
+; CHECK-NEXT:    mov w8, #-65472 // =0xffff0040
+; CHECK-NEXT:    mov w9, #-65281 // =0xffff00ff
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 64 ; 1U << (8-1-1)
@@ -357,9 +363,10 @@ define i1 @add_ugecmp_bad_i16_i8_magic(i16 %x) nounwind {
 define i1 @add_ugecmp_bad_i16_i4(i16 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i16_i4:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #8
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #15
+; CHECK-NEXT:    mov w8, #-65528 // =0xffff0008
+; CHECK-NEXT:    mov w9, #-65521 // =0xffff000f
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 8 ; 1U << (4-1)
@@ -371,9 +378,12 @@ define i1 @add_ugecmp_bad_i16_i4(i16 %x) nounwind {
 define i1 @add_ugecmp_bad_i24_i8(i24 %x) nounwind {
 ; CHECK-LABEL: add_ugecmp_bad_i24_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #128
-; CHECK-NEXT:    and w8, w8, #0xffffff
-; CHECK-NEXT:    cmp w8, #255
+; CHECK-NEXT:    mov w8, #128 // =0x80
+; CHECK-NEXT:    and w9, w0, #0xffffff
+; CHECK-NEXT:    movk w8, #65280, lsl #16
+; CHECK-NEXT:    add w8, w9, w8
+; CHECK-NEXT:    mov w9, #-16776961 // =0xff0000ff
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, hi
 ; CHECK-NEXT:    ret
   %tmp0 = add i24 %x, 128 ; 1U << (8-1)
diff --git a/llvm/test/CodeGen/AArch64/signed-truncation-check.ll b/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
index ab42e6463feeed..6a5795ff337dd7 100644
--- a/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
@@ -200,10 +200,11 @@ define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
 define i1 @add_ultcmp_i16_i8(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_i16_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-65280 // =0xffff0100
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i16 %tmp0, 256 ; 1U << 8
@@ -269,10 +270,11 @@ define i1 @add_ultcmp_i64_i8(i64 %x) nounwind {
 define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
 ; CHECK-LABEL: add_ulecmp_i16_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-65280 // =0xffff0100
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ule i16 %tmp0, 255 ; (1U << 8) - 1
@@ -314,9 +316,9 @@ define i1 @add_ultcmp_bad_i16_i8_cmp(i16 %x, i16 %y) nounwind {
 define i1 @add_ultcmp_bad_i8_i16(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_bad_i8_i16:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    add w8, w8, #128
-; CHECK-NEXT:    lsr w0, w8, #16
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    cmn w8, w0, uxth
+; CHECK-NEXT:    cset w0, hs
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i16 %tmp0, 128 ; 1U << (8-1)
@@ -327,9 +329,10 @@ define i1 @add_ultcmp_bad_i8_i16(i16 %x) nounwind {
 define i1 @add_ultcmp_bad_i16_i8_c0notpoweroftwo(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_bad_i16_i8_c0notpoweroftwo:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #192
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #256
+; CHECK-NEXT:    mov w8, #-65344 // =0xffff00c0
+; CHECK-NEXT:    mov w9, #-65280 // =0xffff0100
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 192 ; (1U << (8-1)) + (1U << (8-1-1))
@@ -341,9 +344,10 @@ define i1 @add_ultcmp_bad_i16_i8_c0notpoweroftwo(i16 %x) nounwind {
 define i1 @add_ultcmp_bad_i16_i8_c1notpoweroftwo(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_bad_i16_i8_c1notpoweroftwo:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #128
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #768
+; CHECK-NEXT:    mov w8, #-65408 // =0xffff0080
+; CHECK-NEXT:    mov w9, #-64768 // =0xffff0300
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
@@ -355,9 +359,10 @@ define i1 @add_ultcmp_bad_i16_i8_c1notpoweroftwo(i16 %x) nounwind {
 define i1 @add_ultcmp_bad_i16_i8_magic(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_bad_i16_i8_magic:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #64
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, #256
+; CHECK-NEXT:    mov w8, #-65472 // =0xffff0040
+; CHECK-NEXT:    mov w9, #-65280 // =0xffff0100
+; CHECK-NEXT:    add w8, w8, w0, uxth
+; CHECK-NEXT:    cmp w8, w9
 ; CHECK-NEXT:    cset w0, lo
 ; CHECK-NEXT:    ret
   %tmp0 = add i16 %x, 64 ; 1U << (8-1-1)
@@ -369,9 +374,10 @@ define i1 @add_ultcmp_bad_i16_i8_magic(i16 %x) nounwind {
 define i1 @add_ultcmp_bad_i16_i4(i16 %x) nounwind {
 ; CHECK-LABEL: add_ultcmp_bad_i16_i4:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #8
-; CHEC...
[truncated]

AtariDreams · 2024-02-14T00:29:24Z

Given how this seems to increase the number of generated instructions, I think this should be done in cases where if OverflowConst.isNonPositive() holds false, instead of doing it to all numbers.

topperc · 2024-02-14T00:48:57Z

Given how this seems to increase the number of generated instructions, I think this should be done in cases where if OverflowConst.isNonPositive() holds false, instead of doing it to all numbers.

A lot of the signed-truncation-check and lack-of-signed-truncation-check test cases are not the output that InstCombine produces. So I'm not sure increases there reflect real behavior.

I don't think the increases are directly related to the sign of the constants. We see increases on these test files from negative constants too. See #81574 If anything we should be costing the constants based on target specific cost like getImmCost not making decisions based on the sign.

sparker-arm · 2024-02-14T09:34:34Z

A lot of the signed-truncation-check and lack-of-signed-truncation-check test cases are not the output that InstCombine produces

Then, to allay fears, would you mind adding a test run that first processes the file with instcombine before codegen?

If anything we should be costing the constants based on target specific cost like getImmCost

There's already some basic heuristics, at the end of TryToPromote, so tracking increased costs of immediates SGTM.

We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch.

topperc · 2024-02-29T00:44:59Z

I've updated the patch to check isLegalAddImmediate for the constant we that promotion will generate. If it's not legal we continue treating the add as a source instead of promoting it.

sparker-arm

LGTM, thanks for making the change.

…ap. (#81690)" This reverts commit 0813b90. Fixes miscompile reported in #84718.

…Wrap. (#81690)" With special case with Add constant is 0. Original message: We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch. The new cases check isLegalAddImmediate to avoid some regressions in lit tests.

topperc requested review from nikic, preames, davemgreen, sparker-arm and avieira-arm February 14, 2024 00:14

llvmbot added backend:AArch64 llvm:transforms labels Feb 14, 2024

topperc marked this pull request as draft February 15, 2024 00:32

[TypePromotion] Support positive addition amounts in isSafeWrap.

97f08c6

We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch.

topperc force-pushed the pr/typeprom-positive branch from f6a4820 to 97f08c6 Compare February 29, 2024 00:43

sparker-arm approved these changes Feb 29, 2024

View reviewed changes

topperc marked this pull request as ready for review February 29, 2024 18:50

topperc merged commit 0813b90 into llvm:main Mar 1, 2024
6 checks passed

topperc deleted the pr/typeprom-positive branch March 1, 2024 17:17

svs-quic mentioned this pull request Mar 11, 2024

miscompilation of trivial but non-canonical arithmetic by AArch64 backend #84718

Closed

topperc added a commit that referenced this pull request Mar 11, 2024

Revert "[TypePromotion] Support positive addition amounts in isSafeWr…

561ddb1

…ap. (#81690)" This reverts commit 0813b90. Fixes miscompile reported in #84718.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

topperc commented Feb 14, 2024

llvmbot commented Feb 14, 2024 •

edited

AtariDreams commented Feb 14, 2024

topperc commented Feb 14, 2024

sparker-arm commented Feb 14, 2024

topperc commented Feb 29, 2024

sparker-arm left a comment

[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

Conversation

topperc commented Feb 14, 2024

llvmbot commented Feb 14, 2024 • edited

AtariDreams commented Feb 14, 2024

topperc commented Feb 14, 2024

sparker-arm commented Feb 14, 2024

topperc commented Feb 29, 2024

sparker-arm left a comment

Choose a reason for hiding this comment

llvmbot commented Feb 14, 2024 •

edited