[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

bjope · 2024-04-15T21:55:48Z

Optimize
(icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)
We do this if all shifted out bits, as well as shifted in bits,
are known to be zero. And we also do it if all shifted out bits are
known to be equal to at least one bit that isn't shifted out.

Defensively limit this to one-use shifts (haven't really
considered if this can be profitable also when there are multiple
uses of the shift, but that is likely to depend on the target).

github-actions · 2024-04-15T21:58:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

bjope · 2024-04-15T22:03:24Z

Draft based on missing optimization found in #87646

Unfortunately it seems to cause other regressions. A bit surprised that for example RISCV is emitting more instructions when the DAG is simplified to get rid of the not-needed shift.

bjope · 2024-04-16T09:30:10Z

llvm/test/CodeGen/RISCV/signed-truncation-check.ll

@@ -586,6 +586,7 @@ define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
 ; RV64I-NEXT:    lui a1, 8
 ; RV64I-NEXT:    add a0, a0, a1
 ; RV64I-NEXT:    srliw a0, a0, 16
+; RV64I-NEXT:    slli a0, a0, 16


Looks like RISCV isn't optimizing

t41: i64 = and t32, Constant:i64<4294901760> t42: i64 = setcc t41, Constant:i64<0>, setne:ch

as good as

t43: i64 = and t32, Constant:i64<4294967295> t38: i64 = srl t43, Constant:i64<16> t39: i64 = setcc t38, Constant:i64<0>, setne:ch

It does fold SRL+AND into a single srliw. But the AND is lowered into srliw+slli. Problem is that it would need to understand that the using setcc doesn't care about where the bits go.
@asb / @topperc / @michaelmaitland : Is RISCV lacking ISel patterns for setcc+"and with shifted mask"?

shouldFoldConstantShiftPairToMask is enabled by default - is that affecting it?

I don't think shouldFoldConstantShiftPairToMask has anything to do with it here, since I can't see any shift pair being created in the first place.

We could add a new hook to avoid the simplification done here for RISCV. But I really think that this is something that RISCV should handle at selection (i.e. allowing the simplified DAG, but doing a better selection).
No idea if the regression here is acceptable until someone that knows more about RISCV would deal with that. I don't feel comfortable doing that (and I do not really have time to work on that as it isn't important for my downstream target).

llvmbot · 2024-04-23T13:03:15Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-selectiondag

Author: Björn Pettersson (bjope)

Changes

Optimize
(icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)
We do this if all shifted out bits, as well as shifted in bits,
are known to be zero. And we also do it if all shifted out bits are
known to be equal to at least one bit that isn't shifted out.

Defensively limit this to one-use shifts (haven't really
considered if this can be profitable also when there are multiple
uses of the shift, but that is likely to depend on the target).

Full diff: https://github.com/llvm/llvm-project/pull/88801.diff

7 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+21)
(modified) llvm/test/CodeGen/ARM/and-cmpz.ll (+5-5)
(modified) llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll (+9-9)
(modified) llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll (+17-20)
(modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+1)
(modified) llvm/test/CodeGen/RISCV/sextw-removal.ll (+22-16)
(modified) llvm/test/CodeGen/RISCV/signed-truncation-check.ll (+1)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index c938b3996be393..a3d832617b4b51 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -4516,6 +4516,27 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
         }
       }
     }
+
+    // Optimize
+    //    (icmp eq/ne (shift N00, N01C), 0) -> (icmp eq/ne N00, 0)
+    // If shift is logical and all shifted out bits are known to be zero, then
+    // the zero'd ness doesnt change and we can omit the shift.
+    // If all shifted out bits are equal to at least one bit that isn't
+    // shifted out, then the zero'd ness doesnt change and we can omit the
+    // shift.
+    if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
+        N0.hasOneUse() &&
+        (N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {
+      if (ConstantSDNode *ShAmt = isConstOrConstSplat(N0.getOperand(1))) {
+        SDValue N00 = N0.getOperand(0);
+        KnownBits Known = DAG.computeKnownBits(N00);
+        if (N0.getOpcode() == ISD::SRL)
+          Known = Known.reverseBits();
+        if (ShAmt->getAPIntValue().ule(Known.countMinLeadingZeros()) ||
+            ShAmt->getAPIntValue().ult(Known.countMinSignBits()))
+          return DAG.getSetCC(dl, VT, N00, N1, Cond);
+      }
+    }
   }
 
   // FIXME: Support vectors.
diff --git a/llvm/test/CodeGen/ARM/and-cmpz.ll b/llvm/test/CodeGen/ARM/and-cmpz.ll
index 1f72307f12a682..ca01680792eff1 100644
--- a/llvm/test/CodeGen/ARM/and-cmpz.ll
+++ b/llvm/test/CodeGen/ARM/and-cmpz.ll
@@ -89,12 +89,12 @@ false:
 }
 
 ; CHECK-LABEL: i16_cmpz:
-; T1:      uxth    r0, r0
-; T1-NEXT: lsrs    r0, r0, #9
+; T1:      lsls    r0, r0, #16
+; T1-NEXT: lsrs    r0, r0, #25
 ; T1-NEXT: bne
-; T2:      uxth    r0, r0
-; T2-NEXT: movs    r2, #0
-; T2-NEXT: cmp.w   r2, r0, lsr #9
+; T2:      tst.w   r0, #65024
+; T2-NEXT: it
+; T2-NEXT: bxne
 define void @i16_cmpz(i16 %x, ptr %foo) {
 entry:
   %cmp = icmp ult i16 %x, 512
diff --git a/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll b/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
index 838da59f9e412c..bad99b5c7d976d 100644
--- a/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
+++ b/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
@@ -311,7 +311,7 @@ define i1 @test_48_16_8(ptr %y) {
 ; CHECK-LE-LABEL: test_48_16_8:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrh r0, [r0, #1]
-; CHECK-LE-NEXT:    lsls r0, r0, #8
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
@@ -559,28 +559,28 @@ define i1 @test_24_8_8(ptr %y) {
 ; CHECK-LE-LABEL: test_24_8_8:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-LE-NEXT:    lsls r0, r0, #8
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-LE-LABEL: test_24_8_8:
 ; CHECK-V7-LE:       @ %bb.0:
 ; CHECK-V7-LE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-V7-LE-NEXT:    lsls r0, r0, #8
+; CHECK-V7-LE-NEXT:    cmp r0, #0
 ; CHECK-V7-LE-NEXT:    movwne r0, #1
 ; CHECK-V7-LE-NEXT:    bx lr
 ;
 ; CHECK-BE-LABEL: test_24_8_8:
 ; CHECK-BE:       @ %bb.0:
 ; CHECK-BE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-BE-NEXT:    lsls r0, r0, #8
+; CHECK-BE-NEXT:    cmp r0, #0
 ; CHECK-BE-NEXT:    movne r0, #1
 ; CHECK-BE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-BE-LABEL: test_24_8_8:
 ; CHECK-V7-BE:       @ %bb.0:
 ; CHECK-V7-BE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-V7-BE-NEXT:    lsls r0, r0, #8
+; CHECK-V7-BE-NEXT:    cmp r0, #0
 ; CHECK-V7-BE-NEXT:    movwne r0, #1
 ; CHECK-V7-BE-NEXT:    bx lr
   %a = load i24, ptr %y
@@ -633,28 +633,28 @@ define i1 @test_24_8_16(ptr %y) {
 ; CHECK-LE-LABEL: test_24_8_16:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrb r0, [r0, #2]
-; CHECK-LE-NEXT:    lsls r0, r0, #16
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-LE-LABEL: test_24_8_16:
 ; CHECK-V7-LE:       @ %bb.0:
 ; CHECK-V7-LE-NEXT:    ldrb r0, [r0, #2]
-; CHECK-V7-LE-NEXT:    lsls r0, r0, #16
+; CHECK-V7-LE-NEXT:    cmp r0, #0
 ; CHECK-V7-LE-NEXT:    movwne r0, #1
 ; CHECK-V7-LE-NEXT:    bx lr
 ;
 ; CHECK-BE-LABEL: test_24_8_16:
 ; CHECK-BE:       @ %bb.0:
 ; CHECK-BE-NEXT:    ldrb r0, [r0]
-; CHECK-BE-NEXT:    lsls r0, r0, #16
+; CHECK-BE-NEXT:    cmp r0, #0
 ; CHECK-BE-NEXT:    movne r0, #1
 ; CHECK-BE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-BE-LABEL: test_24_8_16:
 ; CHECK-V7-BE:       @ %bb.0:
 ; CHECK-V7-BE-NEXT:    ldrb r0, [r0]
-; CHECK-V7-BE-NEXT:    lsls r0, r0, #16
+; CHECK-V7-BE-NEXT:    cmp r0, #0
 ; CHECK-V7-BE-NEXT:    movwne r0, #1
 ; CHECK-V7-BE-NEXT:    bx lr
   %a = load i24, ptr %y
diff --git a/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll b/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
index 2eecfa9f47f176..9df3289a3f589a 100644
--- a/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
+++ b/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
@@ -173,64 +173,61 @@ define void @f6(ptr %a0, i16 %a1) #0 {
 ; CHECK-LABEL: f6:
 ; CHECK:       // %bb.0: // %b0
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r2 = extractu(r1,#8,#8)
-; CHECK-NEXT:    }
-; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = #255
+; CHECK-NEXT:     r2 = #255
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     p1 = !bitsclr(r1,r3)
+; CHECK-NEXT:     r3 = ##65280
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     p0 = cmp.eq(r2,#0)
+; CHECK-NEXT:     p1 = !bitsclr(r1,r2)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (p0) r2 = #0
+; CHECK-NEXT:     p0 = !bitsclr(r1,r3)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     r1 = mux(p1,#8,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = mux(p1,#2,#0)
+; CHECK-NEXT:     r2 = mux(p1,#2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 = setbit(r1,#2)
+; CHECK-NEXT:     r3 = mux(p0,##128,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r6 = setbit(r3,#0)
+; CHECK-NEXT:     r4 = mux(p0,#32,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r2 = #128
+; CHECK-NEXT:     r5 = setbit(r1,#2)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r4 = mux(p0,#0,#32)
+; CHECK-NEXT:     r6 = setbit(r2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     if (!p1) r5 = add(r1,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p1) r6 = add(r3,#0)
+; CHECK-NEXT:     r1 = setbit(r3,#6)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r1 = setbit(r2,#6)
+; CHECK-NEXT:     if (!p1) r6 = add(r2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = setbit(r4,#4)
+; CHECK-NEXT:     r2 = setbit(r4,#4)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 = or(r6,r5)
+; CHECK-NEXT:     if (!p0) r2 = add(r4,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r2 = add(r1,#0)
+; CHECK-NEXT:     if (!p0) r1 = add(r3,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r4 = add(r3,#0)
+; CHECK-NEXT:     r4 = or(r6,r5)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 |= or(r4,r2)
+; CHECK-NEXT:     r4 |= or(r2,r1)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     memb(r0+#0) = r5
+; CHECK-NEXT:     memb(r0+#0) = r4
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     jumpr r31
diff --git a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
index 6e3a50542939f1..abfa1d3d3f06cb 100644
--- a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
@@ -534,6 +534,7 @@ define i1 @add_ugecmp_i32_i16(i32 %x) nounwind {
 ; RV64I-NEXT:    lui a1, 8
 ; RV64I-NEXT:    add a0, a0, a1
 ; RV64I-NEXT:    srliw a0, a0, 16
+; RV64I-NEXT:    slli a0, a0, 16
 ; RV64I-NEXT:    snez a0, a0
 ; RV64I-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index f707cb31e3eced..dc5ff5c8081bfa 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1034,31 +1034,34 @@ define signext i32 @bug(i32 signext %x) {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    beqz a0, .LBB18_4
 ; CHECK-NEXT:  # %bb.1: # %if.end
-; CHECK-NEXT:    srliw a2, a0, 16
-; CHECK-NEXT:    seqz a1, a2
+; CHECK-NEXT:    srliw a3, a0, 16
+; CHECK-NEXT:    seqz a1, a3
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sllw a1, a0, a1
+; CHECK-NEXT:    srliw a2, a1, 24
+; CHECK-NEXT:    slli a2, a2, 24
 ; CHECK-NEXT:    li a0, 16
-; CHECK-NEXT:    beqz a2, .LBB18_3
+; CHECK-NEXT:    beqz a3, .LBB18_3
 ; CHECK-NEXT:  # %bb.2: # %if.end
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:  .LBB18_3: # %if.end
-; CHECK-NEXT:    srliw a2, a1, 24
 ; CHECK-NEXT:    seqz a2, a2
 ; CHECK-NEXT:    slli a3, a2, 3
 ; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    srliw a3, a1, 28
+; CHECK-NEXT:    slli a3, a3, 28
 ; CHECK-NEXT:    neg a2, a2
 ; CHECK-NEXT:    andi a2, a2, -8
 ; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 28
-; CHECK-NEXT:    seqz a2, a2
+; CHECK-NEXT:    seqz a2, a3
 ; CHECK-NEXT:    slli a3, a2, 2
 ; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    srliw a3, a1, 30
+; CHECK-NEXT:    slli a3, a3, 30
 ; CHECK-NEXT:    neg a2, a2
 ; CHECK-NEXT:    andi a2, a2, -4
 ; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 30
-; CHECK-NEXT:    seqz a2, a2
+; CHECK-NEXT:    seqz a2, a3
 ; CHECK-NEXT:    slli a3, a2, 1
 ; CHECK-NEXT:    sllw a1, a1, a3
 ; CHECK-NEXT:    neg a2, a2
@@ -1074,31 +1077,34 @@ define signext i32 @bug(i32 signext %x) {
 ; NOREMOVAL:       # %bb.0: # %entry
 ; NOREMOVAL-NEXT:    beqz a0, .LBB18_4
 ; NOREMOVAL-NEXT:  # %bb.1: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a0, 16
-; NOREMOVAL-NEXT:    seqz a1, a2
+; NOREMOVAL-NEXT:    srliw a3, a0, 16
+; NOREMOVAL-NEXT:    seqz a1, a3
 ; NOREMOVAL-NEXT:    slli a1, a1, 4
 ; NOREMOVAL-NEXT:    sllw a1, a0, a1
+; NOREMOVAL-NEXT:    srliw a2, a1, 24
+; NOREMOVAL-NEXT:    slli a2, a2, 24
 ; NOREMOVAL-NEXT:    li a0, 16
-; NOREMOVAL-NEXT:    beqz a2, .LBB18_3
+; NOREMOVAL-NEXT:    beqz a3, .LBB18_3
 ; NOREMOVAL-NEXT:  # %bb.2: # %if.end
 ; NOREMOVAL-NEXT:    li a0, 32
 ; NOREMOVAL-NEXT:  .LBB18_3: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a1, 24
 ; NOREMOVAL-NEXT:    seqz a2, a2
 ; NOREMOVAL-NEXT:    slli a3, a2, 3
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    srliw a3, a1, 28
+; NOREMOVAL-NEXT:    slli a3, a3, 28
 ; NOREMOVAL-NEXT:    neg a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -8
 ; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 28
-; NOREMOVAL-NEXT:    seqz a2, a2
+; NOREMOVAL-NEXT:    seqz a2, a3
 ; NOREMOVAL-NEXT:    slli a3, a2, 2
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    srliw a3, a1, 30
+; NOREMOVAL-NEXT:    slli a3, a3, 30
 ; NOREMOVAL-NEXT:    neg a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -4
 ; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 30
-; NOREMOVAL-NEXT:    seqz a2, a2
+; NOREMOVAL-NEXT:    seqz a2, a3
 ; NOREMOVAL-NEXT:    slli a3, a2, 1
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
 ; NOREMOVAL-NEXT:    neg a2, a2
diff --git a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
index de36bcdb910609..c5e2c23169df0c 100644
--- a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
@@ -586,6 +586,7 @@ define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
 ; RV64I-NEXT:    lui a1, 8
 ; RV64I-NEXT:    add a0, a0, a1
 ; RV64I-NEXT:    srliw a0, a0, 16
+; RV64I-NEXT:    slli a0, a0, 16
 ; RV64I-NEXT:    seqz a0, a0
 ; RV64I-NEXT:    ret
 ;

jayfoad · 2024-04-23T13:26:04Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+    // shift.
+    if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
+        N0.hasOneUse() &&
+        (N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {


Could handle SRA too? Could also look for the "exact" flag before calculating known bits?

RKSimon · 2024-04-23T14:15:01Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+    if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
+        N0.hasOneUse() &&
+        (N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {
+      if (ConstantSDNode *ShAmt = isConstOrConstSplat(N0.getOperand(1))) {


Could we use getValidMaximumShiftAmountConstant here instead?

jayfoad · 2024-04-24T10:44:55Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+      if ((IsRightShift && N0->getFlags().hasExact()) ||
+          (!IsRightShift && N0->getFlags().hasNoUnsignedWrap()) ||
+          (!IsRightShift && N0->getFlags().hasNoSignedWrap()))


Nit: this is just my personal preference to use ?: in expressions like this.

Suggested change

if ((IsRightShift && N0->getFlags().hasExact()) ||

(!IsRightShift && N0->getFlags().hasNoUnsignedWrap()) ||

(!IsRightShift && N0->getFlags().hasNoSignedWrap()))

if (IsRightShift ? N0->getFlags().hasExact() :

(N0->getFlags().hasNoUnsignedWrap() || N0->getFlags().hasNoSignedWrap()))

Simplify callers which don't have their own DemandedElts mask. Noticed while reviewing #88801

RKSimon · 2024-04-24T12:27:36Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+                               ? APInt::getAllOnes(VT.getVectorMinNumElements())
+                               : APInt(1, 1);
+      if (const APInt *ShAmt =
+              DAG.getValidMaximumShiftAmountConstant(N0, DemandedElts)) {


I've pushed 9f2a068 which should allow you to just call getValidMaximumShiftAmountConstant(N0)

Optimize (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) We do this if all shifted out bits, as well as shifted in bits, are known to be zero. And we also do it if all shifted out bits are known to be equal to at least one bit that isn't shifted out. Defensively limit this to one-use shifts (haven't really considered if this can be profitable also when there are multiple uses of the shift, but that is likely to depend on the target).

goldsteinn · 2024-05-06T14:36:06Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+        if (IsRightShift)
+          Known = Known.reverseBits();
+        if (ShAmt->ule(Known.countMinLeadingZeros()) ||
+            ShAmt->ult(Known.countMinSignBits()))


Do these knownbits check ever succeed? Think that we will have always set the respective flag if they are true.

Are you saying that SelectionDAG/DAGCombiner is deriving those flags somewhere? It is unclear to me if that actually is done.
Or are you assuming that the input to SelectionDAG already has these flags (and that lowering/legalization always are setting the flags when introducing shifts)?

Anyway, there is at least one special case here that isn't covered by the flags (although no idea how important it is). But imagine a right shift where the five least significant bits are known to be either one or zero. If shifting 4 steps the shift isn't exact, but we still know that the zero:edness doesn't change, as the shifted out bits are equal to at least one of the remaining bits.

But I think that in general poison generating flags mainly should be used when needed to imply something that isn't known otherwise (such as UB on signed overflow). When used for a condition that can be derived by value tracking those flags are just caching some information. But there aren't really any rules that say that it isn't OK to drop the flags, or that a transform must derive that information, right(?).
We are for example dropping poison generating flags when moving freeze instruction through the DAG. So they may disappear during DAG combine, while I think that the transform here still would be legal. Therefore it seems weird to base the transform on the existence of the flags.

The test case here for example https://github.com/llvm/llvm-project/pull/88801/files/f603024d54df8b87428e9eb3ddb6b44856c28b5a#diff-d54fdf520f394753a79ba9e32d77d40bc4ffd7f1887c2710f5a067cab847e946 , is resulting in an optimized DAG like this:

Optimized type-legalized selection DAG: %bb.0 'i16_cmpz:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t40: i32 = and t2, Constant:i32<65024> t34: i32 = srl t40, Constant:i32<9> t41: ch = br_cc t0, setne:ch, t34, Constant:i32<0>, BasicBlock:ch<if.end 0x5559bb945028> t14: ch = br t41, BasicBlock:ch<if.then 0x5559bb944f28>

where the SRL isn't marked as exact even if all shifted out bits are zero.

what analysis do we have that lets us know consecutive trailing bits w.o confirming they are either zeros/non-zeros. If they are non-zeros we can just simplify the expression to false.

You are right, we don't have logic to add flags in visit{SHL,SAR,SHR}. Ill make patches for that as it seems like a better approach to be able to rely on those flags rather than essentially implementing the logic to check for them here.

I probably made some bad assumptions here when using computeKnownBits.

Using DAG.computeKnownBits(N00).countMinSignBits() isn't always as good as DAG.ComputeNumSignBits(N00). The latter might for example know that the number of sign bits after an arithmetic right shift ,while computKnownBits only can tell number of sign bits if they are known to be zero or one. So I really want to check ComputeNumSignBits, but then as you say we do not have anything for computing number of trailing bits that are equal to each other.
For the left shift it might be better to use ComputeNumSignBits (unless we can trust flags).
But as you say, for the right shift it seems like computeKnownBits won't help here.

Btw, if adding flags in visit{SHL,SAR,SHR} we will depend on iteration order if those will be present or not when visiting the SETCC. And iteration order in DAGCombiner is a bit of a mystery.

Created: #91239 but the diffs are pretty bad so guess this approach is reasonable for now.

I guess I would prefer if you created helpers for checking the shift flags that use the knownbits analysis so we don't run into a situation where we re-implement the exact/nuw/nsw check for each time we need this.

Could we add the flags during the i16->i32 promotion?

IMHO it is a waste of time to derive these things, just in case someone wants to use them in the future. We do run DAG combiner several times, so for example adding some logic in visitSHL to try to derive the flags, potentially failing each time, seems like a real waste if there is no DAG combine that is interested in knowing that information.

And I frankly don't know if poison generating flags is the solution for caching value tracking results. We could just as well wanna cache that we failed to derive that the shift is exact, or that a particular shift is known to not be exact. But not sure how to deal with that in a nice way.

I think it is much better to find a way were we can compute these things lazily when a fold is asking for the information, rather than expecting that the flags always are set when possible. OTOH, when some optimization is interested to know these kinds of properties, and derives information via value tracking, then we might wanna cache that information somehow to avoid having to compute it again (mutate the DAG and add the flags).

In this particular case we will fold away the shift if the flags are set. So there is no shift left to add the flags to after doing the fold.

Regardless, if the community decides that flags always should be derived up-front instead of being lazily computed based on need, then implementing that strategy is out-of-scope for this pull request. I think we really need a much better framework to automatically add flags when creating new nodes, as well as much better documentation regarding which posion generating flags that are mandatory to set on which DAG nodes (including some kind of verifier support) if we want to go that way.

The middle end optimizations should have done the hard parts. We have to repeat a lot of optimizations in the DAG, where everything is slower and will do a worse job so ideally there would be a specific reason to do something.

I think overall it would be better to try to maintain information we already had, rather than re-infer it. For the motivating promotion case, SHL looks easy and missed today: https://alive2.llvm.org/ce/z/asYzPY

The right shift cases are trickier since you need to inspect the input, which we probably shouldn't do eagerly

arsenm

I kind of expect these flags to be present on the incoming IR, and combines need to do a better job of preserving them. Are there any examples where you need to recover the flags with value tracking late?

arsenm · 2024-05-07T09:17:25Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+         "Expecting an SHL/SRL/SRA operation.");
+  SDValue Op0 = Op.getOperand(0);
+  SDNodeFlags Flags = Op->getFlags();
+


Skip doing anything if the flags are already set?

I've tried to guard the most expensive computations below by checking if the flags already are present. So not quite sure what you propose.

arsenm · 2024-05-07T09:17:38Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

-      if (IsRightShift ? N0->getFlags().hasExact()
-                       : (N0->getFlags().hasNoUnsignedWrap() ||
-                          N0->getFlags().hasNoSignedWrap()))
+      // We cam't rely on flags already being present on all shift operations,


Suggested change

// We cam't rely on flags already being present on all shift operations,

// We can't rely on flags already being present on all shift operations,

bjope · 2024-05-07T09:27:55Z

I kind of expect these flags to be present on the incoming IR, and combines need to do a better job of preserving them. Are there any examples where you need to recover the flags with value tracking late?

See yesterdays discussion with @goldsteinn here #88801 (comment)

bjope force-pushed the setccshift branch 2 times, most recently from 0e350c1 to f872394 Compare April 16, 2024 09:18

bjope changed the title ~~[DAGCombine] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)~~ [SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) Apr 16, 2024

bjope commented Apr 16, 2024

View reviewed changes

bjope force-pushed the setccshift branch 2 times, most recently from 4779673 to ec40d9b Compare April 23, 2024 13:02

bjope marked this pull request as ready for review April 23, 2024 13:02

llvmbot added backend:ARM llvm:SelectionDAG SelectionDAGISel as well labels Apr 23, 2024

jayfoad reviewed Apr 23, 2024

View reviewed changes

RKSimon reviewed Apr 23, 2024

View reviewed changes

llvmbot added the backend:PowerPC label Apr 24, 2024

jayfoad reviewed Apr 24, 2024

View reviewed changes

RKSimon added a commit that referenced this pull request Apr 24, 2024

[DAG] Add getValid*ShiftAmountConstant wrappers without DemandedElts

9f2a068

Simplify callers which don't have their own DemandedElts mask. Noticed while reviewing #88801

RKSimon reviewed Apr 24, 2024

View reviewed changes

bjope force-pushed the setccshift branch from 4c00fb1 to f603024 Compare April 25, 2024 06:33

goldsteinn reviewed May 6, 2024

View reviewed changes

To be sqashed: Add SelectionDAG::computeShiftFlags helper.

add7357

arsenm reviewed May 7, 2024

View reviewed changes

To be squashed: cleanup

8d7384a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

bjope commented Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 15, 2024 •

edited

Loading

bjope commented Apr 15, 2024

bjope Apr 16, 2024

RKSimon May 2, 2024

bjope May 7, 2024

llvmbot commented Apr 23, 2024 •

edited

Loading

jayfoad Apr 23, 2024

RKSimon Apr 23, 2024

jayfoad Apr 24, 2024

RKSimon Apr 24, 2024

goldsteinn May 6, 2024

bjope May 6, 2024

bjope May 6, 2024

goldsteinn May 6, 2024

bjope May 6, 2024

goldsteinn May 6, 2024

arsenm May 7, 2024

bjope May 7, 2024

arsenm May 8, 2024

arsenm left a comment

arsenm May 7, 2024

bjope May 7, 2024

arsenm May 7, 2024

bjope commented May 7, 2024 •

edited

Loading

	// We cam't rely on flags already being present on all shift operations,
	// We can't rely on flags already being present on all shift operations,

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

Are you sure you want to change the base?

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

Conversation

bjope commented Apr 15, 2024 • edited Loading

github-actions bot commented Apr 15, 2024 • edited Loading

bjope commented Apr 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjope commented May 7, 2024 • edited Loading

bjope commented Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 15, 2024 •

edited

Loading

llvmbot commented Apr 23, 2024 •

edited

Loading

bjope commented May 7, 2024 •

edited

Loading