Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bjope
Copy link
Collaborator

@bjope bjope commented Apr 15, 2024

Optimize
(icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)
We do this if all shifted out bits, as well as shifted in bits,
are known to be zero. And we also do it if all shifted out bits are
known to be equal to at least one bit that isn't shifted out.

Defensively limit this to one-use shifts (haven't really
considered if this can be profitable also when there are multiple
uses of the shift, but that is likely to depend on the target).

Copy link

github-actions bot commented Apr 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@bjope
Copy link
Collaborator Author

bjope commented Apr 15, 2024

Draft based on missing optimization found in #87646

Unfortunately it seems to cause other regressions. A bit surprised that for example RISCV is emitting more instructions when the DAG is simplified to get rid of the not-needed shift.

@bjope bjope force-pushed the setccshift branch 2 times, most recently from 0e350c1 to f872394 Compare April 16, 2024 09:18
@bjope bjope changed the title [DAGCombine] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) [SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) Apr 16, 2024
@@ -586,6 +586,7 @@ define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
; RV64I-NEXT: lui a1, 8
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: srliw a0, a0, 16
; RV64I-NEXT: slli a0, a0, 16
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like RISCV isn't optimizing

      t41: i64 = and t32, Constant:i64<4294901760>
    t42: i64 = setcc t41, Constant:i64<0>, setne:ch

as good as

      t43: i64 = and t32, Constant:i64<4294967295>
      t38: i64 = srl t43, Constant:i64<16>
    t39: i64 = setcc t38, Constant:i64<0>, setne:ch

It does fold SRL+AND into a single srliw. But the AND is lowered into srliw+slli. Problem is that it would need to understand that the using setcc doesn't care about where the bits go.
@asb / @topperc / @michaelmaitland : Is RISCV lacking ISel patterns for setcc+"and with shifted mask"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldFoldConstantShiftPairToMask is enabled by default - is that affecting it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think shouldFoldConstantShiftPairToMask has anything to do with it here, since I can't see any shift pair being created in the first place.

We could add a new hook to avoid the simplification done here for RISCV. But I really think that this is something that RISCV should handle at selection (i.e. allowing the simplified DAG, but doing a better selection).
No idea if the regression here is acceptable until someone that knows more about RISCV would deal with that. I don't feel comfortable doing that (and I do not really have time to work on that as it isn't important for my downstream target).

@bjope bjope force-pushed the setccshift branch 2 times, most recently from 4779673 to ec40d9b Compare April 23, 2024 13:02
@bjope bjope marked this pull request as ready for review April 23, 2024 13:02
@llvmbot llvmbot added backend:ARM llvm:SelectionDAG SelectionDAGISel as well labels Apr 23, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 23, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-selectiondag

Author: Björn Pettersson (bjope)

Changes

Optimize
(icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)
We do this if all shifted out bits, as well as shifted in bits,
are known to be zero. And we also do it if all shifted out bits are
known to be equal to at least one bit that isn't shifted out.

Defensively limit this to one-use shifts (haven't really
considered if this can be profitable also when there are multiple
uses of the shift, but that is likely to depend on the target).


Full diff: https://github.com/llvm/llvm-project/pull/88801.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+21)
  • (modified) llvm/test/CodeGen/ARM/and-cmpz.ll (+5-5)
  • (modified) llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll (+9-9)
  • (modified) llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll (+17-20)
  • (modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/sextw-removal.ll (+22-16)
  • (modified) llvm/test/CodeGen/RISCV/signed-truncation-check.ll (+1)
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index c938b3996be393..a3d832617b4b51 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -4516,6 +4516,27 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
         }
       }
     }
+
+    // Optimize
+    //    (icmp eq/ne (shift N00, N01C), 0) -> (icmp eq/ne N00, 0)
+    // If shift is logical and all shifted out bits are known to be zero, then
+    // the zero'd ness doesnt change and we can omit the shift.
+    // If all shifted out bits are equal to at least one bit that isn't
+    // shifted out, then the zero'd ness doesnt change and we can omit the
+    // shift.
+    if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
+        N0.hasOneUse() &&
+        (N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {
+      if (ConstantSDNode *ShAmt = isConstOrConstSplat(N0.getOperand(1))) {
+        SDValue N00 = N0.getOperand(0);
+        KnownBits Known = DAG.computeKnownBits(N00);
+        if (N0.getOpcode() == ISD::SRL)
+          Known = Known.reverseBits();
+        if (ShAmt->getAPIntValue().ule(Known.countMinLeadingZeros()) ||
+            ShAmt->getAPIntValue().ult(Known.countMinSignBits()))
+          return DAG.getSetCC(dl, VT, N00, N1, Cond);
+      }
+    }
   }
 
   // FIXME: Support vectors.
diff --git a/llvm/test/CodeGen/ARM/and-cmpz.ll b/llvm/test/CodeGen/ARM/and-cmpz.ll
index 1f72307f12a682..ca01680792eff1 100644
--- a/llvm/test/CodeGen/ARM/and-cmpz.ll
+++ b/llvm/test/CodeGen/ARM/and-cmpz.ll
@@ -89,12 +89,12 @@ false:
 }
 
 ; CHECK-LABEL: i16_cmpz:
-; T1:      uxth    r0, r0
-; T1-NEXT: lsrs    r0, r0, #9
+; T1:      lsls    r0, r0, #16
+; T1-NEXT: lsrs    r0, r0, #25
 ; T1-NEXT: bne
-; T2:      uxth    r0, r0
-; T2-NEXT: movs    r2, #0
-; T2-NEXT: cmp.w   r2, r0, lsr #9
+; T2:      tst.w   r0, #65024
+; T2-NEXT: it
+; T2-NEXT: bxne
 define void @i16_cmpz(i16 %x, ptr %foo) {
 entry:
   %cmp = icmp ult i16 %x, 512
diff --git a/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll b/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
index 838da59f9e412c..bad99b5c7d976d 100644
--- a/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
+++ b/llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
@@ -311,7 +311,7 @@ define i1 @test_48_16_8(ptr %y) {
 ; CHECK-LE-LABEL: test_48_16_8:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrh r0, [r0, #1]
-; CHECK-LE-NEXT:    lsls r0, r0, #8
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
@@ -559,28 +559,28 @@ define i1 @test_24_8_8(ptr %y) {
 ; CHECK-LE-LABEL: test_24_8_8:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-LE-NEXT:    lsls r0, r0, #8
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-LE-LABEL: test_24_8_8:
 ; CHECK-V7-LE:       @ %bb.0:
 ; CHECK-V7-LE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-V7-LE-NEXT:    lsls r0, r0, #8
+; CHECK-V7-LE-NEXT:    cmp r0, #0
 ; CHECK-V7-LE-NEXT:    movwne r0, #1
 ; CHECK-V7-LE-NEXT:    bx lr
 ;
 ; CHECK-BE-LABEL: test_24_8_8:
 ; CHECK-BE:       @ %bb.0:
 ; CHECK-BE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-BE-NEXT:    lsls r0, r0, #8
+; CHECK-BE-NEXT:    cmp r0, #0
 ; CHECK-BE-NEXT:    movne r0, #1
 ; CHECK-BE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-BE-LABEL: test_24_8_8:
 ; CHECK-V7-BE:       @ %bb.0:
 ; CHECK-V7-BE-NEXT:    ldrb r0, [r0, #1]
-; CHECK-V7-BE-NEXT:    lsls r0, r0, #8
+; CHECK-V7-BE-NEXT:    cmp r0, #0
 ; CHECK-V7-BE-NEXT:    movwne r0, #1
 ; CHECK-V7-BE-NEXT:    bx lr
   %a = load i24, ptr %y
@@ -633,28 +633,28 @@ define i1 @test_24_8_16(ptr %y) {
 ; CHECK-LE-LABEL: test_24_8_16:
 ; CHECK-LE:       @ %bb.0:
 ; CHECK-LE-NEXT:    ldrb r0, [r0, #2]
-; CHECK-LE-NEXT:    lsls r0, r0, #16
+; CHECK-LE-NEXT:    cmp r0, #0
 ; CHECK-LE-NEXT:    movne r0, #1
 ; CHECK-LE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-LE-LABEL: test_24_8_16:
 ; CHECK-V7-LE:       @ %bb.0:
 ; CHECK-V7-LE-NEXT:    ldrb r0, [r0, #2]
-; CHECK-V7-LE-NEXT:    lsls r0, r0, #16
+; CHECK-V7-LE-NEXT:    cmp r0, #0
 ; CHECK-V7-LE-NEXT:    movwne r0, #1
 ; CHECK-V7-LE-NEXT:    bx lr
 ;
 ; CHECK-BE-LABEL: test_24_8_16:
 ; CHECK-BE:       @ %bb.0:
 ; CHECK-BE-NEXT:    ldrb r0, [r0]
-; CHECK-BE-NEXT:    lsls r0, r0, #16
+; CHECK-BE-NEXT:    cmp r0, #0
 ; CHECK-BE-NEXT:    movne r0, #1
 ; CHECK-BE-NEXT:    mov pc, lr
 ;
 ; CHECK-V7-BE-LABEL: test_24_8_16:
 ; CHECK-V7-BE:       @ %bb.0:
 ; CHECK-V7-BE-NEXT:    ldrb r0, [r0]
-; CHECK-V7-BE-NEXT:    lsls r0, r0, #16
+; CHECK-V7-BE-NEXT:    cmp r0, #0
 ; CHECK-V7-BE-NEXT:    movwne r0, #1
 ; CHECK-V7-BE-NEXT:    bx lr
   %a = load i24, ptr %y
diff --git a/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll b/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
index 2eecfa9f47f176..9df3289a3f589a 100644
--- a/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
+++ b/llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
@@ -173,64 +173,61 @@ define void @f6(ptr %a0, i16 %a1) #0 {
 ; CHECK-LABEL: f6:
 ; CHECK:       // %bb.0: // %b0
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r2 = extractu(r1,#8,#8)
-; CHECK-NEXT:    }
-; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = #255
+; CHECK-NEXT:     r2 = #255
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     p1 = !bitsclr(r1,r3)
+; CHECK-NEXT:     r3 = ##65280
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     p0 = cmp.eq(r2,#0)
+; CHECK-NEXT:     p1 = !bitsclr(r1,r2)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (p0) r2 = #0
+; CHECK-NEXT:     p0 = !bitsclr(r1,r3)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     r1 = mux(p1,#8,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = mux(p1,#2,#0)
+; CHECK-NEXT:     r2 = mux(p1,#2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 = setbit(r1,#2)
+; CHECK-NEXT:     r3 = mux(p0,##128,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r6 = setbit(r3,#0)
+; CHECK-NEXT:     r4 = mux(p0,#32,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r2 = #128
+; CHECK-NEXT:     r5 = setbit(r1,#2)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r4 = mux(p0,#0,#32)
+; CHECK-NEXT:     r6 = setbit(r2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     if (!p1) r5 = add(r1,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p1) r6 = add(r3,#0)
+; CHECK-NEXT:     r1 = setbit(r3,#6)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r1 = setbit(r2,#6)
+; CHECK-NEXT:     if (!p1) r6 = add(r2,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r3 = setbit(r4,#4)
+; CHECK-NEXT:     r2 = setbit(r4,#4)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 = or(r6,r5)
+; CHECK-NEXT:     if (!p0) r2 = add(r4,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r2 = add(r1,#0)
+; CHECK-NEXT:     if (!p0) r1 = add(r3,#0)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     if (!p0) r4 = add(r3,#0)
+; CHECK-NEXT:     r4 = or(r6,r5)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     r5 |= or(r4,r2)
+; CHECK-NEXT:     r4 |= or(r2,r1)
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
-; CHECK-NEXT:     memb(r0+#0) = r5
+; CHECK-NEXT:     memb(r0+#0) = r4
 ; CHECK-NEXT:    }
 ; CHECK-NEXT:    {
 ; CHECK-NEXT:     jumpr r31
diff --git a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
index 6e3a50542939f1..abfa1d3d3f06cb 100644
--- a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
@@ -534,6 +534,7 @@ define i1 @add_ugecmp_i32_i16(i32 %x) nounwind {
 ; RV64I-NEXT:    lui a1, 8
 ; RV64I-NEXT:    add a0, a0, a1
 ; RV64I-NEXT:    srliw a0, a0, 16
+; RV64I-NEXT:    slli a0, a0, 16
 ; RV64I-NEXT:    snez a0, a0
 ; RV64I-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index f707cb31e3eced..dc5ff5c8081bfa 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1034,31 +1034,34 @@ define signext i32 @bug(i32 signext %x) {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    beqz a0, .LBB18_4
 ; CHECK-NEXT:  # %bb.1: # %if.end
-; CHECK-NEXT:    srliw a2, a0, 16
-; CHECK-NEXT:    seqz a1, a2
+; CHECK-NEXT:    srliw a3, a0, 16
+; CHECK-NEXT:    seqz a1, a3
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sllw a1, a0, a1
+; CHECK-NEXT:    srliw a2, a1, 24
+; CHECK-NEXT:    slli a2, a2, 24
 ; CHECK-NEXT:    li a0, 16
-; CHECK-NEXT:    beqz a2, .LBB18_3
+; CHECK-NEXT:    beqz a3, .LBB18_3
 ; CHECK-NEXT:  # %bb.2: # %if.end
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:  .LBB18_3: # %if.end
-; CHECK-NEXT:    srliw a2, a1, 24
 ; CHECK-NEXT:    seqz a2, a2
 ; CHECK-NEXT:    slli a3, a2, 3
 ; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    srliw a3, a1, 28
+; CHECK-NEXT:    slli a3, a3, 28
 ; CHECK-NEXT:    neg a2, a2
 ; CHECK-NEXT:    andi a2, a2, -8
 ; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 28
-; CHECK-NEXT:    seqz a2, a2
+; CHECK-NEXT:    seqz a2, a3
 ; CHECK-NEXT:    slli a3, a2, 2
 ; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    srliw a3, a1, 30
+; CHECK-NEXT:    slli a3, a3, 30
 ; CHECK-NEXT:    neg a2, a2
 ; CHECK-NEXT:    andi a2, a2, -4
 ; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 30
-; CHECK-NEXT:    seqz a2, a2
+; CHECK-NEXT:    seqz a2, a3
 ; CHECK-NEXT:    slli a3, a2, 1
 ; CHECK-NEXT:    sllw a1, a1, a3
 ; CHECK-NEXT:    neg a2, a2
@@ -1074,31 +1077,34 @@ define signext i32 @bug(i32 signext %x) {
 ; NOREMOVAL:       # %bb.0: # %entry
 ; NOREMOVAL-NEXT:    beqz a0, .LBB18_4
 ; NOREMOVAL-NEXT:  # %bb.1: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a0, 16
-; NOREMOVAL-NEXT:    seqz a1, a2
+; NOREMOVAL-NEXT:    srliw a3, a0, 16
+; NOREMOVAL-NEXT:    seqz a1, a3
 ; NOREMOVAL-NEXT:    slli a1, a1, 4
 ; NOREMOVAL-NEXT:    sllw a1, a0, a1
+; NOREMOVAL-NEXT:    srliw a2, a1, 24
+; NOREMOVAL-NEXT:    slli a2, a2, 24
 ; NOREMOVAL-NEXT:    li a0, 16
-; NOREMOVAL-NEXT:    beqz a2, .LBB18_3
+; NOREMOVAL-NEXT:    beqz a3, .LBB18_3
 ; NOREMOVAL-NEXT:  # %bb.2: # %if.end
 ; NOREMOVAL-NEXT:    li a0, 32
 ; NOREMOVAL-NEXT:  .LBB18_3: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a1, 24
 ; NOREMOVAL-NEXT:    seqz a2, a2
 ; NOREMOVAL-NEXT:    slli a3, a2, 3
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    srliw a3, a1, 28
+; NOREMOVAL-NEXT:    slli a3, a3, 28
 ; NOREMOVAL-NEXT:    neg a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -8
 ; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 28
-; NOREMOVAL-NEXT:    seqz a2, a2
+; NOREMOVAL-NEXT:    seqz a2, a3
 ; NOREMOVAL-NEXT:    slli a3, a2, 2
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    srliw a3, a1, 30
+; NOREMOVAL-NEXT:    slli a3, a3, 30
 ; NOREMOVAL-NEXT:    neg a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -4
 ; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 30
-; NOREMOVAL-NEXT:    seqz a2, a2
+; NOREMOVAL-NEXT:    seqz a2, a3
 ; NOREMOVAL-NEXT:    slli a3, a2, 1
 ; NOREMOVAL-NEXT:    sllw a1, a1, a3
 ; NOREMOVAL-NEXT:    neg a2, a2
diff --git a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
index de36bcdb910609..c5e2c23169df0c 100644
--- a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
@@ -586,6 +586,7 @@ define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
 ; RV64I-NEXT:    lui a1, 8
 ; RV64I-NEXT:    add a0, a0, a1
 ; RV64I-NEXT:    srliw a0, a0, 16
+; RV64I-NEXT:    slli a0, a0, 16
 ; RV64I-NEXT:    seqz a0, a0
 ; RV64I-NEXT:    ret
 ;

// shift.
if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
N0.hasOneUse() &&
(N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could handle SRA too? Could also look for the "exact" flag before calculating known bits?

if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
N0.hasOneUse() &&
(N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL)) {
if (ConstantSDNode *ShAmt = isConstOrConstSplat(N0.getOperand(1))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use getValidMaximumShiftAmountConstant here instead?

Comment on lines 4534 to 4536
if ((IsRightShift && N0->getFlags().hasExact()) ||
(!IsRightShift && N0->getFlags().hasNoUnsignedWrap()) ||
(!IsRightShift && N0->getFlags().hasNoSignedWrap()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is just my personal preference to use ?: in expressions like this.

Suggested change
if ((IsRightShift && N0->getFlags().hasExact()) ||
(!IsRightShift && N0->getFlags().hasNoUnsignedWrap()) ||
(!IsRightShift && N0->getFlags().hasNoSignedWrap()))
if (IsRightShift ? N0->getFlags().hasExact() :
(N0->getFlags().hasNoUnsignedWrap() || N0->getFlags().hasNoSignedWrap()))

RKSimon added a commit that referenced this pull request Apr 24, 2024
Simplify callers which don't have their own DemandedElts mask.

Noticed while reviewing #88801
? APInt::getAllOnes(VT.getVectorMinNumElements())
: APInt(1, 1);
if (const APInt *ShAmt =
DAG.getValidMaximumShiftAmountConstant(N0, DemandedElts)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed 9f2a068 which should allow you to just call getValidMaximumShiftAmountConstant(N0)

Optimize
  (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0)
We do this if all shifted out bits, as well as shifted in bits,
are known to be zero. And we also do it if all shifted out bits are
known to be equal to at least one bit that isn't shifted out.

Defensively limit this to one-use shifts (haven't really
considered if this can be profitable also when there are multiple
uses of the shift, but that is likely to depend on the target).
if (IsRightShift)
Known = Known.reverseBits();
if (ShAmt->ule(Known.countMinLeadingZeros()) ||
ShAmt->ult(Known.countMinSignBits()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these knownbits check ever succeed? Think that we will have always set the respective flag if they are true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that SelectionDAG/DAGCombiner is deriving those flags somewhere? It is unclear to me if that actually is done.
Or are you assuming that the input to SelectionDAG already has these flags (and that lowering/legalization always are setting the flags when introducing shifts)?

Anyway, there is at least one special case here that isn't covered by the flags (although no idea how important it is). But imagine a right shift where the five least significant bits are known to be either one or zero. If shifting 4 steps the shift isn't exact, but we still know that the zero:edness doesn't change, as the shifted out bits are equal to at least one of the remaining bits.

But I think that in general poison generating flags mainly should be used when needed to imply something that isn't known otherwise (such as UB on signed overflow). When used for a condition that can be derived by value tracking those flags are just caching some information. But there aren't really any rules that say that it isn't OK to drop the flags, or that a transform must derive that information, right(?).
We are for example dropping poison generating flags when moving freeze instruction through the DAG. So they may disappear during DAG combine, while I think that the transform here still would be legal. Therefore it seems weird to base the transform on the existence of the flags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case here for example https://github.com/llvm/llvm-project/pull/88801/files/f603024d54df8b87428e9eb3ddb6b44856c28b5a#diff-d54fdf520f394753a79ba9e32d77d40bc4ffd7f1887c2710f5a067cab847e946 , is resulting in an optimized DAG like this:

Optimized type-legalized selection DAG: %bb.0 'i16_cmpz:entry'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t40: i32 = and t2, Constant:i32<65024>
      t34: i32 = srl t40, Constant:i32<9>
    t41: ch = br_cc t0, setne:ch, t34, Constant:i32<0>, BasicBlock:ch<if.end 0x5559bb945028>
  t14: ch = br t41, BasicBlock:ch<if.then 0x5559bb944f28>

where the SRL isn't marked as exact even if all shifted out bits are zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what analysis do we have that lets us know consecutive trailing bits w.o confirming they are either zeros/non-zeros. If they are non-zeros we can just simplify the expression to false.

You are right, we don't have logic to add flags in visit{SHL,SAR,SHR}. Ill make patches for that as it seems like a better approach to be able to rely on those flags rather than essentially implementing the logic to check for them here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably made some bad assumptions here when using computeKnownBits.

Using DAG.computeKnownBits(N00).countMinSignBits() isn't always as good as DAG.ComputeNumSignBits(N00). The latter might for example know that the number of sign bits after an arithmetic right shift ,while computKnownBits only can tell number of sign bits if they are known to be zero or one. So I really want to check ComputeNumSignBits, but then as you say we do not have anything for computing number of trailing bits that are equal to each other.
For the left shift it might be better to use ComputeNumSignBits (unless we can trust flags).
But as you say, for the right shift it seems like computeKnownBits won't help here.

Btw, if adding flags in visit{SHL,SAR,SHR} we will depend on iteration order if those will be present or not when visiting the SETCC. And iteration order in DAGCombiner is a bit of a mystery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created: #91239 but the diffs are pretty bad so guess this approach is reasonable for now.

I guess I would prefer if you created helpers for checking the shift flags that use the knownbits analysis so we don't run into a situation where we re-implement the exact/nuw/nsw check for each time we need this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the flags during the i16->i32 promotion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it is a waste of time to derive these things, just in case someone wants to use them in the future. We do run DAG combiner several times, so for example adding some logic in visitSHL to try to derive the flags, potentially failing each time, seems like a real waste if there is no DAG combine that is interested in knowing that information.

And I frankly don't know if poison generating flags is the solution for caching value tracking results. We could just as well wanna cache that we failed to derive that the shift is exact, or that a particular shift is known to not be exact. But not sure how to deal with that in a nice way.

I think it is much better to find a way were we can compute these things lazily when a fold is asking for the information, rather than expecting that the flags always are set when possible. OTOH, when some optimization is interested to know these kinds of properties, and derives information via value tracking, then we might wanna cache that information somehow to avoid having to compute it again (mutate the DAG and add the flags).

In this particular case we will fold away the shift if the flags are set. So there is no shift left to add the flags to after doing the fold.

Regardless, if the community decides that flags always should be derived up-front instead of being lazily computed based on need, then implementing that strategy is out-of-scope for this pull request. I think we really need a much better framework to automatically add flags when creating new nodes, as well as much better documentation regarding which posion generating flags that are mandatory to set on which DAG nodes (including some kind of verifier support) if we want to go that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The middle end optimizations should have done the hard parts. We have to repeat a lot of optimizations in the DAG, where everything is slower and will do a worse job so ideally there would be a specific reason to do something.

I think overall it would be better to try to maintain information we already had, rather than re-infer it. For the motivating promotion case, SHL looks easy and missed today: https://alive2.llvm.org/ce/z/asYzPY

The right shift cases are trickier since you need to inspect the input, which we probably shouldn't do eagerly

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of expect these flags to be present on the incoming IR, and combines need to do a better job of preserving them. Are there any examples where you need to recover the flags with value tracking late?

"Expecting an SHL/SRL/SRA operation.");
SDValue Op0 = Op.getOperand(0);
SDNodeFlags Flags = Op->getFlags();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip doing anything if the flags are already set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to guard the most expensive computations below by checking if the flags already are present. So not quite sure what you propose.

if (IsRightShift ? N0->getFlags().hasExact()
: (N0->getFlags().hasNoUnsignedWrap() ||
N0->getFlags().hasNoSignedWrap()))
// We cam't rely on flags already being present on all shift operations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We cam't rely on flags already being present on all shift operations,
// We can't rely on flags already being present on all shift operations,

@bjope
Copy link
Collaborator Author

bjope commented May 7, 2024

I kind of expect these flags to be present on the incoming IR, and combines need to do a better job of preserving them. Are there any examples where you need to recover the flags with value tracking late?

See yesterdays discussion with @goldsteinn here #88801 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants