Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SelectionDAG] Fold (icmp eq/ne (shift X, C), 0) -> (icmp eq/ne X, 0) #88801

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAG.h
Original file line number Diff line number Diff line change
Expand Up @@ -1991,6 +1991,9 @@ class SelectionDAG {
return computeOverflowForMul(IsSigned, N0, N1) == OFK_Never;
}

/// Determine if a SHL/SRL/SRA operation is known to be exact/nuw/nsw.
SDNodeFlags computeShiftFlags(SDValue Op) const;

/// Test if the given value is known to have exactly one bit set. This differs
/// from computeKnownBits in that it doesn't necessarily determine which bit
/// is set.
Expand Down
31 changes: 31 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4282,6 +4282,37 @@ SelectionDAG::computeOverflowForSignedMul(SDValue N0, SDValue N1) const {
return OFK_Sometime;
}

SDNodeFlags SelectionDAG::computeShiftFlags(SDValue Op) const {
assert((Op.getOpcode() == ISD::SRL || Op.getOpcode() == ISD::SRA ||
Op.getOpcode() == ISD::SHL) &&
"Expecting an SHL/SRL/SRA operation.");
SDValue Op0 = Op.getOperand(0);
SDNodeFlags Flags = Op->getFlags();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip doing anything if the flags are already set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to guard the most expensive computations below by checking if the flags already are present. So not quite sure what you propose.

// Left shift: Try to derive 'nsw' and 'nuw' via value tracking.
if (Op.getOpcode() == ISD::SHL) {
if (const APInt *ShAmt = getValidMaximumShiftAmountConstant(Op)) {
if (!Flags.hasNoSignedWrap() && ShAmt->ult(ComputeNumSignBits(Op0)))
Flags.setNoSignedWrap(true);
if (!Flags.hasNoUnsignedWrap() &&
ShAmt->ule(computeKnownBits(Op0).countMinLeadingZeros()))
Flags.setNoUnsignedWrap(true);
}
return Flags;
}

// Right shift: Try to derive 'exact' via value tracking.
if ((Op.getOpcode() == ISD::SRL || Op.getOpcode() == ISD::SRA)) {
if (!Flags.hasExact())
if (const APInt *ShAmt = getValidMaximumShiftAmountConstant(Op)) {
if (ShAmt->ule(computeKnownBits(Op0).countMinTrailingZeros()))
Flags.setExact(true);
}
}

return Flags;
}

bool SelectionDAG::isKnownToBeAPowerOfTwo(SDValue Val, unsigned Depth) const {
if (Depth >= MaxRecursionDepth)
return false; // Limit search depth.
Expand Down
21 changes: 21 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4516,6 +4516,27 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
}
}
}

// Optimize
// (setcc (shift N00, N01), 0, eq/ne) -> (setcc N00, 0, eq/ne)
// If all shifted out bits are known to be zero, then the zero'd ness
// doesn't change and we can omit the shift.
// If all shifted out bits are equal to at least one bit that isn't
// shifted out, then the zero'd ness doesn't change and we can omit the
// shift.
if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && C1.isZero() &&
N0.hasOneUse() &&
(N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL ||
N0.getOpcode() == ISD::SRA)) {
bool IsRightShift = N0.getOpcode() != ISD::SHL;
SDValue N00 = N0.getOperand(0);
// We can't rely on flags already being present on all shift operations,
// so let's compute the flags using value tracking.
SDNodeFlags Flags = DAG.computeShiftFlags(N0);
if (IsRightShift ? Flags.hasExact()
: (Flags.hasNoUnsignedWrap() || Flags.hasNoSignedWrap()))
return DAG.getSetCC(dl, VT, N00, N1, Cond);
}
}

// FIXME: Support vectors.
Expand Down
10 changes: 5 additions & 5 deletions llvm/test/CodeGen/ARM/and-cmpz.ll
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@ false:
}

; CHECK-LABEL: i16_cmpz:
; T1: uxth r0, r0
; T1-NEXT: lsrs r0, r0, #9
; T1: lsls r0, r0, #16
; T1-NEXT: lsrs r0, r0, #25
; T1-NEXT: bne
; T2: uxth r0, r0
; T2-NEXT: movs r2, #0
; T2-NEXT: cmp.w r2, r0, lsr #9
; T2: tst.w r0, #65024
; T2-NEXT: it
; T2-NEXT: bxne
define void @i16_cmpz(i16 %x, ptr %foo) {
entry:
%cmp = icmp ult i16 %x, 512
Expand Down
18 changes: 9 additions & 9 deletions llvm/test/CodeGen/ARM/simplifysetcc_narrow_load.ll
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ define i1 @test_48_16_8(ptr %y) {
; CHECK-LE-LABEL: test_48_16_8:
; CHECK-LE: @ %bb.0:
; CHECK-LE-NEXT: ldrh r0, [r0, #1]
; CHECK-LE-NEXT: lsls r0, r0, #8
; CHECK-LE-NEXT: cmp r0, #0
; CHECK-LE-NEXT: movne r0, #1
; CHECK-LE-NEXT: mov pc, lr
;
Expand Down Expand Up @@ -559,28 +559,28 @@ define i1 @test_24_8_8(ptr %y) {
; CHECK-LE-LABEL: test_24_8_8:
; CHECK-LE: @ %bb.0:
; CHECK-LE-NEXT: ldrb r0, [r0, #1]
; CHECK-LE-NEXT: lsls r0, r0, #8
; CHECK-LE-NEXT: cmp r0, #0
; CHECK-LE-NEXT: movne r0, #1
; CHECK-LE-NEXT: mov pc, lr
;
; CHECK-V7-LE-LABEL: test_24_8_8:
; CHECK-V7-LE: @ %bb.0:
; CHECK-V7-LE-NEXT: ldrb r0, [r0, #1]
; CHECK-V7-LE-NEXT: lsls r0, r0, #8
; CHECK-V7-LE-NEXT: cmp r0, #0
; CHECK-V7-LE-NEXT: movwne r0, #1
; CHECK-V7-LE-NEXT: bx lr
;
; CHECK-BE-LABEL: test_24_8_8:
; CHECK-BE: @ %bb.0:
; CHECK-BE-NEXT: ldrb r0, [r0, #1]
; CHECK-BE-NEXT: lsls r0, r0, #8
; CHECK-BE-NEXT: cmp r0, #0
; CHECK-BE-NEXT: movne r0, #1
; CHECK-BE-NEXT: mov pc, lr
;
; CHECK-V7-BE-LABEL: test_24_8_8:
; CHECK-V7-BE: @ %bb.0:
; CHECK-V7-BE-NEXT: ldrb r0, [r0, #1]
; CHECK-V7-BE-NEXT: lsls r0, r0, #8
; CHECK-V7-BE-NEXT: cmp r0, #0
; CHECK-V7-BE-NEXT: movwne r0, #1
; CHECK-V7-BE-NEXT: bx lr
%a = load i24, ptr %y
Expand Down Expand Up @@ -633,28 +633,28 @@ define i1 @test_24_8_16(ptr %y) {
; CHECK-LE-LABEL: test_24_8_16:
; CHECK-LE: @ %bb.0:
; CHECK-LE-NEXT: ldrb r0, [r0, #2]
; CHECK-LE-NEXT: lsls r0, r0, #16
; CHECK-LE-NEXT: cmp r0, #0
; CHECK-LE-NEXT: movne r0, #1
; CHECK-LE-NEXT: mov pc, lr
;
; CHECK-V7-LE-LABEL: test_24_8_16:
; CHECK-V7-LE: @ %bb.0:
; CHECK-V7-LE-NEXT: ldrb r0, [r0, #2]
; CHECK-V7-LE-NEXT: lsls r0, r0, #16
; CHECK-V7-LE-NEXT: cmp r0, #0
; CHECK-V7-LE-NEXT: movwne r0, #1
; CHECK-V7-LE-NEXT: bx lr
;
; CHECK-BE-LABEL: test_24_8_16:
; CHECK-BE: @ %bb.0:
; CHECK-BE-NEXT: ldrb r0, [r0]
; CHECK-BE-NEXT: lsls r0, r0, #16
; CHECK-BE-NEXT: cmp r0, #0
; CHECK-BE-NEXT: movne r0, #1
; CHECK-BE-NEXT: mov pc, lr
;
; CHECK-V7-BE-LABEL: test_24_8_16:
; CHECK-V7-BE: @ %bb.0:
; CHECK-V7-BE-NEXT: ldrb r0, [r0]
; CHECK-V7-BE-NEXT: lsls r0, r0, #16
; CHECK-V7-BE-NEXT: cmp r0, #0
; CHECK-V7-BE-NEXT: movwne r0, #1
; CHECK-V7-BE-NEXT: bx lr
%a = load i24, ptr %y
Expand Down
37 changes: 17 additions & 20 deletions llvm/test/CodeGen/Hexagon/isel-memory-vNi1.ll
Original file line number Diff line number Diff line change
Expand Up @@ -173,64 +173,61 @@ define void @f6(ptr %a0, i16 %a1) #0 {
; CHECK-LABEL: f6:
; CHECK: // %bb.0: // %b0
; CHECK-NEXT: {
; CHECK-NEXT: r2 = extractu(r1,#8,#8)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r3 = #255
; CHECK-NEXT: r2 = #255
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: p1 = !bitsclr(r1,r3)
; CHECK-NEXT: r3 = ##65280
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: p0 = cmp.eq(r2,#0)
; CHECK-NEXT: p1 = !bitsclr(r1,r2)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (p0) r2 = #0
; CHECK-NEXT: p0 = !bitsclr(r1,r3)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r1 = mux(p1,#8,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r3 = mux(p1,#2,#0)
; CHECK-NEXT: r2 = mux(p1,#2,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r5 = setbit(r1,#2)
; CHECK-NEXT: r3 = mux(p0,##128,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r6 = setbit(r3,#0)
; CHECK-NEXT: r4 = mux(p0,#32,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (!p0) r2 = #128
; CHECK-NEXT: r5 = setbit(r1,#2)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r4 = mux(p0,#0,#32)
; CHECK-NEXT: r6 = setbit(r2,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (!p1) r5 = add(r1,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (!p1) r6 = add(r3,#0)
; CHECK-NEXT: r1 = setbit(r3,#6)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r1 = setbit(r2,#6)
; CHECK-NEXT: if (!p1) r6 = add(r2,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r3 = setbit(r4,#4)
; CHECK-NEXT: r2 = setbit(r4,#4)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r5 = or(r6,r5)
; CHECK-NEXT: if (!p0) r2 = add(r4,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (!p0) r2 = add(r1,#0)
; CHECK-NEXT: if (!p0) r1 = add(r3,#0)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: if (!p0) r4 = add(r3,#0)
; CHECK-NEXT: r4 = or(r6,r5)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: r5 |= or(r4,r2)
; CHECK-NEXT: r4 |= or(r2,r1)
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: memb(r0+#0) = r5
; CHECK-NEXT: memb(r0+#0) = r4
; CHECK-NEXT: }
; CHECK-NEXT: {
; CHECK-NEXT: jumpr r31
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/PowerPC/opt-cmp-inst-cr0-live.ll
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ define signext i32 @fn1(i32 %baz) {
%2 = zext i32 %1 to i64
%3 = shl i64 %2, 48
%4 = ashr exact i64 %3, 48
; CHECK: RLWINM8 killed {{[^,]+}}, 0, 16, 27
; CHECK: RLDICR {{[^,]+}}, 48, 15
; CHECK: CMPLDI
; CHECK: BCC

; CHECK: ANDI8_rec {{[^,]+}}, 65520, implicit-def $cr0
; CHECK: RLDICR_rec {{[^,]+}}, 48, 15, implicit-def $cr0
; CHECK: COPY killed $cr0
; CHECK: BCC
%5 = icmp eq i64 %4, 0
Expand Down
1 change: 1 addition & 0 deletions llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,7 @@ define i1 @add_ugecmp_i32_i16(i32 %x) nounwind {
; RV64I-NEXT: lui a1, 8
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: srliw a0, a0, 16
; RV64I-NEXT: slli a0, a0, 16
; RV64I-NEXT: snez a0, a0
; RV64I-NEXT: ret
;
Expand Down
38 changes: 22 additions & 16 deletions llvm/test/CodeGen/RISCV/sextw-removal.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1034,31 +1034,34 @@ define signext i32 @bug(i32 signext %x) {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: beqz a0, .LBB18_4
; CHECK-NEXT: # %bb.1: # %if.end
; CHECK-NEXT: srliw a2, a0, 16
; CHECK-NEXT: seqz a1, a2
; CHECK-NEXT: srliw a3, a0, 16
; CHECK-NEXT: seqz a1, a3
; CHECK-NEXT: slli a1, a1, 4
; CHECK-NEXT: sllw a1, a0, a1
; CHECK-NEXT: srliw a2, a1, 24
; CHECK-NEXT: slli a2, a2, 24
; CHECK-NEXT: li a0, 16
; CHECK-NEXT: beqz a2, .LBB18_3
; CHECK-NEXT: beqz a3, .LBB18_3
; CHECK-NEXT: # %bb.2: # %if.end
; CHECK-NEXT: li a0, 32
; CHECK-NEXT: .LBB18_3: # %if.end
; CHECK-NEXT: srliw a2, a1, 24
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: slli a3, a2, 3
; CHECK-NEXT: sllw a1, a1, a3
; CHECK-NEXT: srliw a3, a1, 28
; CHECK-NEXT: slli a3, a3, 28
; CHECK-NEXT: neg a2, a2
; CHECK-NEXT: andi a2, a2, -8
; CHECK-NEXT: add a0, a0, a2
; CHECK-NEXT: srliw a2, a1, 28
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: seqz a2, a3
; CHECK-NEXT: slli a3, a2, 2
; CHECK-NEXT: sllw a1, a1, a3
; CHECK-NEXT: srliw a3, a1, 30
; CHECK-NEXT: slli a3, a3, 30
; CHECK-NEXT: neg a2, a2
; CHECK-NEXT: andi a2, a2, -4
; CHECK-NEXT: add a0, a0, a2
; CHECK-NEXT: srliw a2, a1, 30
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: seqz a2, a3
; CHECK-NEXT: slli a3, a2, 1
; CHECK-NEXT: sllw a1, a1, a3
; CHECK-NEXT: neg a2, a2
Expand All @@ -1074,31 +1077,34 @@ define signext i32 @bug(i32 signext %x) {
; NOREMOVAL: # %bb.0: # %entry
; NOREMOVAL-NEXT: beqz a0, .LBB18_4
; NOREMOVAL-NEXT: # %bb.1: # %if.end
; NOREMOVAL-NEXT: srliw a2, a0, 16
; NOREMOVAL-NEXT: seqz a1, a2
; NOREMOVAL-NEXT: srliw a3, a0, 16
; NOREMOVAL-NEXT: seqz a1, a3
; NOREMOVAL-NEXT: slli a1, a1, 4
; NOREMOVAL-NEXT: sllw a1, a0, a1
; NOREMOVAL-NEXT: srliw a2, a1, 24
; NOREMOVAL-NEXT: slli a2, a2, 24
; NOREMOVAL-NEXT: li a0, 16
; NOREMOVAL-NEXT: beqz a2, .LBB18_3
; NOREMOVAL-NEXT: beqz a3, .LBB18_3
; NOREMOVAL-NEXT: # %bb.2: # %if.end
; NOREMOVAL-NEXT: li a0, 32
; NOREMOVAL-NEXT: .LBB18_3: # %if.end
; NOREMOVAL-NEXT: srliw a2, a1, 24
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: slli a3, a2, 3
; NOREMOVAL-NEXT: sllw a1, a1, a3
; NOREMOVAL-NEXT: srliw a3, a1, 28
; NOREMOVAL-NEXT: slli a3, a3, 28
; NOREMOVAL-NEXT: neg a2, a2
; NOREMOVAL-NEXT: andi a2, a2, -8
; NOREMOVAL-NEXT: add a0, a0, a2
; NOREMOVAL-NEXT: srliw a2, a1, 28
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: seqz a2, a3
; NOREMOVAL-NEXT: slli a3, a2, 2
; NOREMOVAL-NEXT: sllw a1, a1, a3
; NOREMOVAL-NEXT: srliw a3, a1, 30
; NOREMOVAL-NEXT: slli a3, a3, 30
; NOREMOVAL-NEXT: neg a2, a2
; NOREMOVAL-NEXT: andi a2, a2, -4
; NOREMOVAL-NEXT: add a0, a0, a2
; NOREMOVAL-NEXT: srliw a2, a1, 30
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: seqz a2, a3
; NOREMOVAL-NEXT: slli a3, a2, 1
; NOREMOVAL-NEXT: sllw a1, a1, a3
; NOREMOVAL-NEXT: neg a2, a2
Expand Down
1 change: 1 addition & 0 deletions llvm/test/CodeGen/RISCV/signed-truncation-check.ll
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,7 @@ define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
; RV64I-NEXT: lui a1, 8
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: srliw a0, a0, 16
; RV64I-NEXT: slli a0, a0, 16
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like RISCV isn't optimizing

      t41: i64 = and t32, Constant:i64<4294901760>
    t42: i64 = setcc t41, Constant:i64<0>, setne:ch

as good as

      t43: i64 = and t32, Constant:i64<4294967295>
      t38: i64 = srl t43, Constant:i64<16>
    t39: i64 = setcc t38, Constant:i64<0>, setne:ch

It does fold SRL+AND into a single srliw. But the AND is lowered into srliw+slli. Problem is that it would need to understand that the using setcc doesn't care about where the bits go.
@asb / @topperc / @michaelmaitland : Is RISCV lacking ISel patterns for setcc+"and with shifted mask"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldFoldConstantShiftPairToMask is enabled by default - is that affecting it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think shouldFoldConstantShiftPairToMask has anything to do with it here, since I can't see any shift pair being created in the first place.

We could add a new hook to avoid the simplification done here for RISCV. But I really think that this is something that RISCV should handle at selection (i.e. allowing the simplified DAG, but doing a better selection).
No idea if the regression here is acceptable until someone that knows more about RISCV would deal with that. I don't feel comfortable doing that (and I do not really have time to work on that as it isn't important for my downstream target).

; RV64I-NEXT: seqz a0, a0
; RV64I-NEXT: ret
;
Expand Down
Loading