Revert 112e49b38150b8bfdef01434309d1b05204193e4 due to a miscompile #71598

alexfh · 2023-11-07T23:04:38Z

Revert "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." - causes a miscompile, see 112e49b#commitcomment-131943923
Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral types. NFC", which fixes a compiler warning in the commit above

…types. NFC" This reverts commit 2d0ac85, which is necessary to revert 112e49b, which introduces a miscompile. See llvm@112e49b#commitcomment-131943923.

…` to use rotate or to getter constants." This reverts commit 112e49b, which introduces a miscompile. See llvm@112e49b#commitcomment-131943923.

llvmbot · 2023-11-07T23:05:07Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: None (alexfh)

Changes

Revert "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." - causes a miscompile, see 112e49b#commitcomment-131943923
Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral types. NFC", which fixes a compiler warning in the commit above

Full diff: https://github.com/llvm/llvm-project/pull/71598.diff

5 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (-18)
(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+15-115)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (-67)
(modified) llvm/lib/Target/X86/X86ISelLowering.h (-5)
(modified) llvm/test/CodeGen/X86/cmp-shiftX-maskX.ll (+71-75)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index c87537291e3b161..58aad70c4bb36e6 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -832,24 +832,6 @@ class TargetLoweringBase {
     return N->getOpcode() == ISD::FDIV;
   }
 
-  // Given:
-  //    (icmp eq/ne (and X, C0), (shift X, C1))
-  // or
-  //    (icmp eq/ne X, (rotate X, CPow2))
-
-  // If C0 is a mask or shifted mask and the shift amt (C1) isolates the
-  // remaining bits (i.e something like `(x64 & UINT32_MAX) == (x64 >> 32)`)
-  // Do we prefer the shift to be shift-right, shift-left, or rotate.
-  // Note: Its only valid to convert the rotate version to the shift version iff
-  // the shift-amt (`C1`) is a power of 2 (including 0).
-  // If ShiftOpc (current Opcode) is returned, do nothing.
-  virtual unsigned preferedOpcodeForCmpEqPiecesOfOperand(
-      EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
-      const APInt &ShiftOrRotateAmt,
-      const std::optional<APInt> &AndMask) const {
-    return ShiftOpc;
-  }
-
   /// These two forms are equivalent:
   ///   sub %y, (xor %x, -1)
   ///   add (add %x, 1), %y
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index bee50d58c73c32c..a867d88f76c0cf6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -12456,127 +12456,27 @@ SDValue DAGCombiner::visitSETCC(SDNode *N) {
 
   ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();
   EVT VT = N->getValueType(0);
-  SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);
-
-  SDValue Combined = SimplifySetCC(VT, N0, N1, Cond, SDLoc(N), !PreferSetCC);
 
-  if (Combined) {
-    // If we prefer to have a setcc, and we don't, we'll try our best to
-    // recreate one using rebuildSetCC.
-    if (PreferSetCC && Combined.getOpcode() != ISD::SETCC) {
-      SDValue NewSetCC = rebuildSetCC(Combined);
+  SDValue Combined = SimplifySetCC(VT, N->getOperand(0), N->getOperand(1), Cond,
+                                   SDLoc(N), !PreferSetCC);
 
-      // We don't have anything interesting to combine to.
-      if (NewSetCC.getNode() == N)
-        return SDValue();
+  if (!Combined)
+    return SDValue();
 
-      if (NewSetCC)
-        return NewSetCC;
-    }
-    return Combined;
-  }
+  // If we prefer to have a setcc, and we don't, we'll try our best to
+  // recreate one using rebuildSetCC.
+  if (PreferSetCC && Combined.getOpcode() != ISD::SETCC) {
+    SDValue NewSetCC = rebuildSetCC(Combined);
 
-  // Optimize
-  //    1) (icmp eq/ne (and X, C0), (shift X, C1))
-  // or
-  //    2) (icmp eq/ne X, (rotate X, C1))
-  // If C0 is a mask or shifted mask and the shift amt (C1) isolates the
-  // remaining bits (i.e something like `(x64 & UINT32_MAX) == (x64 >> 32)`)
-  // Then:
-  // If C1 is a power of 2, then the rotate and shift+and versions are
-  // equivilent, so we can interchange them depending on target preference.
-  // Otherwise, if we have the shift+and version we can interchange srl/shl
-  // which inturn affects the constant C0. We can use this to get better
-  // constants again determined by target preference.
-  if (Cond == ISD::SETNE || Cond == ISD::SETEQ) {
-    auto IsAndWithShift = [](SDValue A, SDValue B) {
-      return A.getOpcode() == ISD::AND &&
-             (B.getOpcode() == ISD::SRL || B.getOpcode() == ISD::SHL) &&
-             A.getOperand(0) == B.getOperand(0);
-    };
-    auto IsRotateWithOp = [](SDValue A, SDValue B) {
-      return (B.getOpcode() == ISD::ROTL || B.getOpcode() == ISD::ROTR) &&
-             B.getOperand(0) == A;
-    };
-    SDValue AndOrOp = SDValue(), ShiftOrRotate = SDValue();
-    bool IsRotate = false;
-
-    // Find either shift+and or rotate pattern.
-    if (IsAndWithShift(N0, N1)) {
-      AndOrOp = N0;
-      ShiftOrRotate = N1;
-    } else if (IsAndWithShift(N1, N0)) {
-      AndOrOp = N1;
-      ShiftOrRotate = N0;
-    } else if (IsRotateWithOp(N0, N1)) {
-      IsRotate = true;
-      AndOrOp = N0;
-      ShiftOrRotate = N1;
-    } else if (IsRotateWithOp(N1, N0)) {
-      IsRotate = true;
-      AndOrOp = N1;
-      ShiftOrRotate = N0;
-    }
-
-    if (AndOrOp && ShiftOrRotate && ShiftOrRotate.hasOneUse() &&
-        (IsRotate || AndOrOp.hasOneUse())) {
-      EVT OpVT = N0.getValueType();
-      // Get constant shift/rotate amount and possibly mask (if its shift+and
-      // variant).
-      auto GetAPIntValue = [](SDValue Op) -> std::optional<APInt> {
-        ConstantSDNode *CNode = isConstOrConstSplat(Op, /*AllowUndefs*/ false,
-                                                    /*AllowTrunc*/ false);
-        if (CNode == nullptr)
-          return std::nullopt;
-        return CNode->getAPIntValue();
-      };
-      std::optional<APInt> AndCMask =
-          IsRotate ? std::nullopt : GetAPIntValue(AndOrOp.getOperand(1));
-      std::optional<APInt> ShiftCAmt =
-          GetAPIntValue(ShiftOrRotate.getOperand(1));
-      unsigned NumBits = OpVT.getScalarSizeInBits();
-
-      // We found constants.
-      if (ShiftCAmt && (IsRotate || AndCMask) && ShiftCAmt->ult(NumBits)) {
-        unsigned ShiftOpc = ShiftOrRotate.getOpcode();
-        // Check that the constants meet the constraints.
-        bool CanTransform =
-            IsRotate ||
-            (*ShiftCAmt == (~*AndCMask).popcount() && ShiftOpc == ISD::SHL
-                 ? (~*AndCMask).isMask()
-                 : AndCMask->isMask());
-
-        // See if target prefers another shift/rotate opcode.
-        unsigned NewShiftOpc = TLI.preferedOpcodeForCmpEqPiecesOfOperand(
-            OpVT, ShiftOpc, ShiftCAmt->isPowerOf2(), *ShiftCAmt, AndCMask);
-        // Transform is valid and we have a new preference.
-        if (CanTransform && NewShiftOpc != ShiftOpc) {
-          SDLoc DL(N);
-          SDValue NewShiftOrRotate =
-              DAG.getNode(NewShiftOpc, DL, OpVT, ShiftOrRotate.getOperand(0),
-                          ShiftOrRotate.getOperand(1));
-          SDValue NewAndOrOp = SDValue();
-
-          if (NewShiftOpc == ISD::SHL || NewShiftOpc == ISD::SRL) {
-            APInt NewMask =
-                NewShiftOpc == ISD::SHL
-                    ? APInt::getHighBitsSet(NumBits,
-                                            NumBits - ShiftCAmt->getZExtValue())
-                    : APInt::getLowBitsSet(NumBits,
-                                           NumBits - ShiftCAmt->getZExtValue());
-            NewAndOrOp =
-                DAG.getNode(ISD::AND, DL, OpVT, ShiftOrRotate.getOperand(0),
-                            DAG.getConstant(NewMask, DL, OpVT));
-          } else {
-            NewAndOrOp = ShiftOrRotate.getOperand(0);
-          }
+    // We don't have anything interesting to combine to.
+    if (NewSetCC.getNode() == N)
+      return SDValue();
 
-          return DAG.getSetCC(DL, VT, NewAndOrOp, NewShiftOrRotate, Cond);
-        }
-      }
-    }
+    if (NewSetCC)
+      return NewSetCC;
   }
-  return SDValue();
+
+  return Combined;
 }
 
 SDValue DAGCombiner::visitSETCCCARRY(SDNode *N) {
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 22fba5601ccfd38..547461f3677e43d 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -3257,73 +3257,6 @@ bool X86TargetLowering::
   return NewShiftOpcode == ISD::SHL;
 }
 
-unsigned X86TargetLowering::preferedOpcodeForCmpEqPiecesOfOperand(
-    EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
-    const APInt &ShiftOrRotateAmt, const std::optional<APInt> &AndMask) const {
-  if (!VT.isInteger())
-    return ShiftOpc;
-
-  bool PreferRotate = false;
-  if (VT.isVector()) {
-    // For vectors, if we have rotate instruction support, then its definetly
-    // best. Otherwise its not clear what the best so just don't make changed.
-    PreferRotate = Subtarget.hasAVX512() && (VT.getScalarType() == MVT::i32 ||
-                                             VT.getScalarType() == MVT::i64);
-  } else {
-    // For scalar, if we have bmi prefer rotate for rorx. Otherwise prefer
-    // rotate unless we have a zext mask+shr.
-    PreferRotate = Subtarget.hasBMI2();
-    if (!PreferRotate) {
-      unsigned MaskBits =
-          VT.getScalarSizeInBits() - ShiftOrRotateAmt.getZExtValue();
-      PreferRotate = (MaskBits != 8) && (MaskBits != 16) && (MaskBits != 32);
-    }
-  }
-
-  if (ShiftOpc == ISD::SHL || ShiftOpc == ISD::SRL) {
-    assert(AndMask.has_value() && "Null andmask when querying about shift+and");
-
-    if (PreferRotate && MayTransformRotate)
-      return ISD::ROTL;
-
-    // If vector we don't really get much benefit swapping around constants.
-    // Maybe we could check if the DAG has the flipped node already in the
-    // future.
-    if (VT.isVector())
-      return ShiftOpc;
-
-    // See if the beneficial to swap shift type.
-    if (ShiftOpc == ISD::SHL) {
-      // If the current setup has imm64 mask, then inverse will have
-      // at least imm32 mask (or be zext i32 -> i64).
-      if (VT == MVT::i64)
-        return AndMask->getSignificantBits() > 32 ? (unsigned)ISD::SRL
-                                                  : ShiftOpc;
-
-      // We can only benefit if req at least 7-bit for the mask. We
-      // don't want to replace shl of 1,2,3 as they can be implemented
-      // with lea/add.
-      return ShiftOrRotateAmt.uge(7) ? (unsigned)ISD::SRL : ShiftOpc;
-    }
-
-    if (VT == MVT::i64)
-      // Keep exactly 32-bit imm64, this is zext i32 -> i64 which is
-      // extremely efficient.
-      return AndMask->getSignificantBits() > 33 ? (unsigned)ISD::SHL : ShiftOpc;
-
-    // Keep small shifts as shl so we can generate add/lea.
-    return ShiftOrRotateAmt.ult(7) ? (unsigned)ISD::SHL : ShiftOpc;
-  }
-
-  // We prefer rotate for vectors of if we won't get a zext mask with SRL
-  // (PreferRotate will be set in the latter case).
-  if (PreferRotate || VT.isVector())
-    return ShiftOpc;
-
-  // Non-vector type and we have a zext mask with SRL.
-  return ISD::SRL;
-}
-
 bool X86TargetLowering::preferScalarizeSplat(SDNode *N) const {
   return N->getOpcode() != ISD::FP_EXTEND;
 }
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 3b1b2603fd8fc61..8046f42736951cd 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1138,11 +1138,6 @@ namespace llvm {
         unsigned OldShiftOpcode, unsigned NewShiftOpcode,
         SelectionDAG &DAG) const override;
 
-    unsigned preferedOpcodeForCmpEqPiecesOfOperand(
-        EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
-        const APInt &ShiftOrRotateAmt,
-        const std::optional<APInt> &AndMask) const override;
-
     bool preferScalarizeSplat(SDNode *N) const override;
 
     bool shouldFoldConstantShiftPairToMask(const SDNode *N,
diff --git a/llvm/test/CodeGen/X86/cmp-shiftX-maskX.ll b/llvm/test/CodeGen/X86/cmp-shiftX-maskX.ll
index 115f3863afc312f..8ec142acb71d4ce 100644
--- a/llvm/test/CodeGen/X86/cmp-shiftX-maskX.ll
+++ b/llvm/test/CodeGen/X86/cmp-shiftX-maskX.ll
@@ -20,8 +20,9 @@ define i1 @shr_to_shl_eq_i8_s2(i8 %x) {
 ; CHECK-LABEL: shr_to_shl_eq_i8_s2:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movl %edi, %eax
-; CHECK-NEXT:    rolb $2, %al
-; CHECK-NEXT:    cmpb %al, %dil
+; CHECK-NEXT:    andb $63, %al
+; CHECK-NEXT:    shrb $2, %dil
+; CHECK-NEXT:    cmpb %dil, %al
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
   %and = and i8 %x, 63
@@ -34,9 +35,9 @@ define i1 @shl_to_shr_ne_i8_s7(i8 %x) {
 ; CHECK-LABEL: shl_to_shr_ne_i8_s7:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movl %edi, %eax
-; CHECK-NEXT:    shrb $7, %al
-; CHECK-NEXT:    andb $1, %dil
-; CHECK-NEXT:    cmpb %al, %dil
+; CHECK-NEXT:    shlb $7, %al
+; CHECK-NEXT:    andb $-128, %dil
+; CHECK-NEXT:    cmpb %dil, %al
 ; CHECK-NEXT:    setne %al
 ; CHECK-NEXT:    retq
   %shl = shl i8 %x, 7
@@ -62,8 +63,9 @@ define i1 @shr_to_shl_eq_i8_s1(i8 %x) {
 ; CHECK-LABEL: shr_to_shl_eq_i8_s1:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movl %edi, %eax
-; CHECK-NEXT:    rolb %al
-; CHECK-NEXT:    cmpb %al, %dil
+; CHECK-NEXT:    andb $127, %al
+; CHECK-NEXT:    shrb %dil
+; CHECK-NEXT:    cmpb %dil, %al
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
   %and = and i8 %x, 127
@@ -75,10 +77,10 @@ define i1 @shr_to_shl_eq_i8_s1(i8 %x) {
 define i1 @shr_to_shl_eq_i32_s3(i32 %x) {
 ; CHECK-LABEL: shr_to_shl_eq_i32_s3:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    # kill: def $edi killed $edi def $rdi
-; CHECK-NEXT:    leal (,%rdi,8), %eax
-; CHECK-NEXT:    andl $-8, %edi
-; CHECK-NEXT:    cmpl %eax, %edi
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    andl $536870911, %eax # imm = 0x1FFFFFFF
+; CHECK-NEXT:    shrl $3, %edi
+; CHECK-NEXT:    cmpl %edi, %eax
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
   %and = and i32 %x, 536870911
@@ -103,20 +105,14 @@ define i1 @shl_to_shr_eq_i32_s3_fail(i32 %x) {
 }
 
 define i1 @shl_to_shr_ne_i32_s16(i32 %x) {
-; CHECK-NOBMI-LABEL: shl_to_shr_ne_i32_s16:
-; CHECK-NOBMI:       # %bb.0:
-; CHECK-NOBMI-NEXT:    movzwl %di, %eax
-; CHECK-NOBMI-NEXT:    shrl $16, %edi
-; CHECK-NOBMI-NEXT:    cmpl %edi, %eax
-; CHECK-NOBMI-NEXT:    setne %al
-; CHECK-NOBMI-NEXT:    retq
-;
-; CHECK-BMI2-LABEL: shl_to_shr_ne_i32_s16:
-; CHECK-BMI2:       # %bb.0:
-; CHECK-BMI2-NEXT:    rorxl $16, %edi, %eax
-; CHECK-BMI2-NEXT:    cmpl %eax, %edi
-; CHECK-BMI2-NEXT:    setne %al
-; CHECK-BMI2-NEXT:    retq
+; CHECK-LABEL: shl_to_shr_ne_i32_s16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    shll $16, %eax
+; CHECK-NEXT:    andl $-65536, %edi # imm = 0xFFFF0000
+; CHECK-NEXT:    cmpl %edi, %eax
+; CHECK-NEXT:    setne %al
+; CHECK-NEXT:    retq
   %shl = shl i32 %x, 16
   %and = and i32 %x, 4294901760
   %r = icmp ne i32 %shl, %and
@@ -141,8 +137,9 @@ define i1 @shl_to_shr_ne_i32_s16_fail(i32 %x) {
 define i1 @shr_to_shl_eq_i16_s1(i16 %x) {
 ; CHECK-LABEL: shr_to_shl_eq_i16_s1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movl %edi, %eax
-; CHECK-NEXT:    rolw %ax
+; CHECK-NEXT:    movzwl %di, %eax
+; CHECK-NEXT:    andl $32767, %edi # imm = 0x7FFF
+; CHECK-NEXT:    shrl %eax
 ; CHECK-NEXT:    cmpw %ax, %di
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
@@ -170,9 +167,9 @@ define i1 @shr_to_shl_eq_i16_s1_fail(i16 %x) {
 define i1 @shl_to_shr_eq_i64_s44(i64 %x) {
 ; CHECK-LABEL: shl_to_shr_eq_i64_s44:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movq %rdi, %rax
-; CHECK-NEXT:    shrq $44, %rax
-; CHECK-NEXT:    andl $1048575, %edi # imm = 0xFFFFF
+; CHECK-NEXT:    movabsq $-17592186044416, %rax # imm = 0xFFFFF00000000000
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    shlq $44, %rdi
 ; CHECK-NEXT:    cmpq %rax, %rdi
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
@@ -183,20 +180,13 @@ define i1 @shl_to_shr_eq_i64_s44(i64 %x) {
 }
 
 define i1 @shr_to_shl_ne_i64_s32(i64 %x) {
-; CHECK-NOBMI-LABEL: shr_to_shl_ne_i64_s32:
-; CHECK-NOBMI:       # %bb.0:
-; CHECK-NOBMI-NEXT:    movl %edi, %eax
-; CHECK-NOBMI-NEXT:    shrq $32, %rdi
-; CHECK-NOBMI-NEXT:    cmpq %rdi, %rax
-; CHECK-NOBMI-NEXT:    setne %al
-; CHECK-NOBMI-NEXT:    retq
-;
-; CHECK-BMI2-LABEL: shr_to_shl_ne_i64_s32:
-; CHECK-BMI2:       # %bb.0:
-; CHECK-BMI2-NEXT:    rorxq $32, %rdi, %rax
-; CHECK-BMI2-NEXT:    cmpq %rax, %rdi
-; CHECK-BMI2-NEXT:    setne %al
-; CHECK-BMI2-NEXT:    retq
+; CHECK-LABEL: shr_to_shl_ne_i64_s32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    shrq $32, %rdi
+; CHECK-NEXT:    cmpq %rdi, %rax
+; CHECK-NEXT:    setne %al
+; CHECK-NEXT:    retq
   %and = and i64 %x, 4294967295
   %shr = lshr i64 %x, 32
   %r = icmp ne i64 %and, %shr
@@ -240,9 +230,9 @@ define i1 @ashr_to_shl_ne_i64_s32_fail(i64 %x) {
 define i1 @shl_to_shr_eq_i64_s63(i64 %x) {
 ; CHECK-LABEL: shl_to_shr_eq_i64_s63:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movq %rdi, %rax
-; CHECK-NEXT:    shrq $63, %rax
-; CHECK-NEXT:    andl $1, %edi
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    shlq $63, %rdi
 ; CHECK-NEXT:    cmpq %rax, %rdi
 ; CHECK-NEXT:    sete %al
 ; CHECK-NEXT:    retq
@@ -268,14 +258,23 @@ define i1 @shl_to_shr_eq_i64_s63_fail(i64 %x) {
 }
 
 define i1 @shr_to_shl_eq_i64_s7(i64 %x) {
-; CHECK-LABEL: shr_to_shl_eq_i64_s7:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    movq %rdi, %rax
-; CHECK-NEXT:    shlq $7, %rax
-; CHECK-NEXT:    andq $-128, %rdi
-; CHECK-NEXT:    cmpq %rax, %rdi
-; CHECK-NEXT:    sete %al
-; CHECK-NEXT:    retq
+; CHECK-NOBMI-LABEL: shr_to_shl_eq_i64_s7:
+; CHECK-NOBMI:       # %bb.0:
+; CHECK-NOBMI-NEXT:    movabsq $144115188075855871, %rax # imm = 0x1FFFFFFFFFFFFFF
+; CHECK-NOBMI-NEXT:    andq %rdi, %rax
+; CHECK-NOBMI-NEXT:    shrq $7, %rdi
+; CHECK-NOBMI-NEXT:    cmpq %rdi, %rax
+; CHECK-NOBMI-NEXT:    sete %al
+; CHECK-NOBMI-NEXT:    retq
+;
+; CHECK-BMI2-LABEL: shr_to_shl_eq_i64_s7:
+; CHECK-BMI2:       # %bb.0:
+; CHECK-BMI2-NEXT:    movb $57, %al
+; CHECK-BMI2-NEXT:    bzhiq %rax, %rdi, %rax
+; CHECK-BMI2-NEXT:    shrq $7, %rdi
+; CHECK-BMI2-NEXT:    cmpq %rdi, %rax
+; CHECK-BMI2-NEXT:    sete %al
+; CHECK-BMI2-NEXT:    retq
   %and = and i64 %x, 144115188075855871
   %shr = lshr i64 %x, 7
   %r = icmp eq i64 %and, %shr
@@ -285,8 +284,9 @@ define i1 @shr_to_shl_eq_i64_s7(i64 %x) {
 define i1 @shl_to_shr_ne_i32_s24(i32 %x) {
 ; CHECK-LABEL: shl_to_shr_ne_i32_s24:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movzbl %dil, %eax
-; CHECK-NEXT:    shrl $24, %edi
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    shll $24, %eax
+; CHECK-NEXT:    andl $-16777216, %edi # imm = 0xFF000000
 ; CHECK-NEXT:    cmpl %edi, %eax
 ; CHECK-NEXT:    setne %al
 ; CHECK-NEXT:    retq
@@ -312,20 +312,14 @@ define i1 @shr_to_shl_ne_i32_s24_fail(i32 %x) {
 }
 
 define i1 @shr_to_shl_ne_i32_s8(i32 %x) {
-; CHECK-NOBMI-LABEL: shr_to_shl_ne_i32_s8:
-; CHECK-NOBMI:       # %bb.0:
-; CHECK-NOBMI-NEXT:    movl %edi, %eax
-; CHECK-NOBMI-NEXT:    roll $8, %eax
-; CHECK-NOBMI-NEXT:    cmpl %eax, %edi
-; CHECK-NOBMI-NEXT:    setne %al
-; CHECK-NOBMI-NEXT:    retq
-;
-; CHECK-BMI2-LABEL: shr_to_shl_ne_i32_s8:
-; CHECK-BMI2:       # %bb.0:
-; CHECK-BMI2-NEXT:    rorxl $24, %edi, %eax
-; CHECK-BMI2-NEXT:    cmpl %eax, %edi
-; CHECK-BMI2-NEXT:    setne %al
-; CHECK-BMI2-NEXT:    retq
+; CHECK-LABEL: shr_to_shl_ne_i32_s8:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    andl $16777215, %eax # imm = 0xFFFFFF
+; CHECK-NEXT:    shrl $8, %edi
+; CHECK-NEXT:    cmpl %edi, %eax
+; CHECK-NEXT:    setne %al
+; CHECK-NEXT:    retq
   %and = and i32 %x, 16777215
   %shr = lshr i32 %x, 8
   %r = icmp ne i32 %and, %shr
@@ -365,8 +359,9 @@ define <4 x i1> @shr_to_ror_eq_4xi32_s4(<4 x i32> %x) {
 ;
 ; CHECK-AVX512-LABEL: shr_to_ror_eq_4xi32_s4:
 ; CHECK-AVX512:       # %bb.0:
-; CHECK-AVX512-NEXT:    vprold $4, %xmm0, %xmm1
-; CHECK-AVX512-NEXT:    vpcmpeqd %xmm1, %xmm0, %xmm0
+; CHECK-AVX512-NEXT:    vpsrld $4, %xmm0, %xmm1
+; CHECK-AVX512-NEXT:    vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
+; CHECK-AVX512-NEXT:    vpcmpeqd %xmm0, %xmm1, %xmm0
 ; CHECK-AVX512-NEXT:    vpternlogq $15, %xmm0, %xmm0, %xmm0
 ; CHECK-AVX512-NEXT:    retq
   %shr = lshr <4 x i32> %x, <i32 4, i32 4, i32 4, i32 4>
@@ -407,8 +402,9 @@ define <4 x i1> @shl_to_ror_eq_4xi32_s8(<4 x i32> %x) {
 ;
 ; CHECK-AVX512-LABEL: shl_to_ror_eq_4xi32_s8:
 ; CHECK-AVX512:       # %bb.0:
-; CHECK-AVX512-NEXT:    vprold $8, %xmm0, %xmm1
-; CHECK-AVX512-NEXT:    vpcmpeqd %xmm1, %xmm0, %xmm0
+; CHECK-AVX512-NEXT:    vpslld $8, %xmm0, %xmm1
+; CHECK-AVX512-NEXT:    vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
+; CHECK-AVX512-NEXT:    vpcmpeqd %xmm0, %xmm1, %xmm0
 ; CHECK-AVX512-NEXT:    vpternlogq $15, %xmm0, %xmm0, %xmm0
 ; CHECK-AVX512-NEXT:    retq
   %shr = shl <4 x i32> %x, <i32 8, i32 8, i32 8, i32 8>

… rotate or to getter constants. If `C0` is a mask and `C1` shifts out all the masked bits (to essentially compare two subsets of `X`), we can arbitrarily re-order shift as `srl` or `shl`. If `C1` (shift amount) is a power of 2, we can replace the and+shift with a rotate. Otherwise, based on target preference we can arbitrarily swap `shl` and `shl` in/out to get better constants. On x86 we can use this re-ordering to: 1) get better `and` constants for `C0` (zero extended moves or avoid imm64). 2) covert `srl` to `shl` if `shl` will be implementable with `lea` or `add` (both of which can be preferable). Proofs: https://alive2.llvm.org/ce/z/qzGM_w Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152116

alexfh · 2023-11-08T14:05:53Z

Given no objections and no alternative solutions so far and that this is a real correctness problem, I'm going to move forward with the revert.

RKSimon · 2023-11-08T14:18:12Z

@alexfh Hadn't @goldsteinn identified that #71489 was the fix to the underlying issue? Did that patch address the miscompiles you were seeing?

RKSimon · 2023-11-08T14:18:58Z

I'm not asking you to re-revert this - just wanting to ensure we track this properly for them to be re-committed.

alexfh · 2023-11-08T14:32:43Z

@alexfh Hadn't @goldsteinn identified that #71489 was the fix to the underlying issue? Did that patch address the miscompiles you were seeing?

I haven't seen #71489 in particular in relation to 112e49b. I can check whether the patch fixes the miscompile we see.

alexfh · 2023-11-08T15:10:14Z

@RKSimon The proposed fix in #71489 claims to address #71330, which is caused by a different commit - 0289dad. I've verified that it doesn't fix the miscompile introduced in 112e49b. Thus, the revert here is still necessary.

RKSimon · 2023-11-08T16:04:47Z

Oops - sorry about that - that'll teach me for trying to use the sodding github app to keep track of everything!

goldsteinn · 2023-11-08T17:47:58Z

Thank you for reverting. Looking into the issue.

goldsteinn · 2023-11-08T18:19:46Z

Okay appears the impl is using the wrong type size somewhere
The transform taking place is:
From:

t24: i1 = setcc t19, t22, setne:ch
  t19: i32 = and t3, Constant:i32<7>
    t3: i32 = truncate t2
      t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t0: ch,glue = EntryToken
  t22: i32 = srl t3, Constant:i8<5>

To:

t33: i1 = setcc t32, t30, setne:ch
  t32: i32 = and t3, Constant:i32<-32>
    t3: i32 = truncate t2
      t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t0: ch,glue = EntryToken
  t30: i32 = shl t3, Constant:i8<5>

Which is correct for i8, not i32 though.

goldsteinn · 2023-11-08T18:30:58Z

The following seems to be the fix:

-        bool CanTransform =
-            IsRotate ||
-            (*ShiftCAmt == (~*AndCMask).popcount() && ShiftOpc == ISD::SHL
-                 ? (~*AndCMask).isMask()
-                 : AndCMask->isMask());
+        bool CanTransform = IsRotate;
+        if (!CanTransform) {
+          // Check that mask and shift compliment eachother
+          CanTransform = *ShiftCAmt == (~*AndCMask).popcount();
+          // Check that we are comparing all bits
+          CanTransform &= (*ShiftCAmt + AndCMask->popcount()) == NumBits;
+          // Check that the and mask is correct for the shift
+          CanTransform &=
+              ShiftOpc == ISD::SHL ? (~*AndCMask).isMask() : AndCMask->isMask();
+        }

Will post patch for recommit shortly.

goldsteinn · 2023-11-08T20:02:41Z

PR with fix at: #71729

alexfh added 2 commits November 7, 2023 23:44

Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral …

67593c2

…types. NFC" This reverts commit 2d0ac85, which is necessary to revert 112e49b, which introduces a miscompile. See llvm@112e49b#commitcomment-131943923.

Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))…

e49773a

…` to use rotate or to getter constants." This reverts commit 112e49b, which introduces a miscompile. See llvm@112e49b#commitcomment-131943923.

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 7, 2023

alexfh requested review from goldsteinn and RKSimon November 7, 2023 23:05

alexfh merged commit 067632e into llvm:main Nov 8, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert 112e49b38150b8bfdef01434309d1b05204193e4 due to a miscompile #71598

Revert 112e49b38150b8bfdef01434309d1b05204193e4 due to a miscompile #71598

alexfh commented Nov 7, 2023

llvmbot commented Nov 7, 2023 •

edited

Loading

alexfh commented Nov 8, 2023

RKSimon commented Nov 8, 2023

RKSimon commented Nov 8, 2023

alexfh commented Nov 8, 2023

alexfh commented Nov 8, 2023

RKSimon commented Nov 8, 2023

goldsteinn commented Nov 8, 2023

goldsteinn commented Nov 8, 2023

goldsteinn commented Nov 8, 2023 •

edited by RKSimon

Loading

goldsteinn commented Nov 8, 2023

Revert 112e49b38150b8bfdef01434309d1b05204193e4 due to a miscompile #71598

Revert 112e49b38150b8bfdef01434309d1b05204193e4 due to a miscompile #71598

Conversation

alexfh commented Nov 7, 2023

llvmbot commented Nov 7, 2023 • edited Loading

alexfh commented Nov 8, 2023

RKSimon commented Nov 8, 2023

RKSimon commented Nov 8, 2023

alexfh commented Nov 8, 2023

alexfh commented Nov 8, 2023

RKSimon commented Nov 8, 2023

goldsteinn commented Nov 8, 2023

goldsteinn commented Nov 8, 2023

goldsteinn commented Nov 8, 2023 • edited by RKSimon Loading

goldsteinn commented Nov 8, 2023

llvmbot commented Nov 7, 2023 •

edited

Loading

goldsteinn commented Nov 8, 2023 •

edited by RKSimon

Loading