[ValueTracking][X86] Compute KnownBits for phadd/phsub #92429

mskamp · 2024-05-16T16:53:38Z

Add KnownBits computations to ValueTracking and X86 DAG lowering.

These instructions add/subtract adjacent vector elements in their
operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2].
This means that, in this example, we can compute the KnownBits of the
operation by computing the KnownBits of [X1, X2] + [X1, X2] and
[Y1, Y2] + [Y1, Y2] and intersecting the results. This approach
also generalizes to all x86 vector types.

There are also the operations phadd.sw and phsub.sw, which perform
saturating addition/subtraction. Use sadd_sat and ssub_sat to compute
the KnownBits of these operations.

Also adjust the existing test case pr53247.ll because it can be
transformed to a constant using the new KnownBits computation.

Fixes #82516.

github-actions · 2024-05-16T16:53:56Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-05-16T16:54:25Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-analysis

Author: None (mskamp)

Changes

Add KnownBits computations to ValueTracking and X86 DAG lowering.

These instructions add/subtract adjacent vector elements in their
operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2].
This means that, in this example, we can compute the KnownBits of the
operation by computing the KnownBits of [X1, X2] + [X1, X2] and
[Y1, Y2] + [Y1, Y2] and intersecting the results. This approach
also generalizes to all x86 vector types.

There are also the operations phadd.sw and phsub.sw, which perform
saturating addition/subtraction. Use sadd_sat and ssub_sat to compute
the KnownBits of these operations.

Fixes #82516.

Patch is 26.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92429.diff

4 Files Affected:

(modified) llvm/lib/Analysis/ValueTracking.cpp (+48)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+73)
(added) llvm/test/Analysis/ValueTracking/knownbits-hadd-hsub.ll (+192)
(added) llvm/test/CodeGen/X86/knownbits-hadd-hsub.ll (+201)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 2fdbb6e3ef840..e33cbef61e8f7 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1725,6 +1725,54 @@ static void computeKnownBitsFromOperator(const Operator *I,
       case Intrinsic::x86_sse42_crc32_64_64:
         Known.Zero.setBitsFrom(32);
         break;
+      case Intrinsic::x86_ssse3_phadd_d:
+      case Intrinsic::x86_ssse3_phadd_w:
+      case Intrinsic::x86_ssse3_phadd_d_128:
+      case Intrinsic::x86_ssse3_phadd_w_128:
+      case Intrinsic::x86_avx2_phadd_d:
+      case Intrinsic::x86_avx2_phadd_w: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);
+
+        Known = KnownBits::computeForAddSub(true, false, false, Known, Known)
+                    .intersectWith(KnownBits::computeForAddSub(
+                        true, false, false, Known2, Known2));
+        break;
+      }
+      case Intrinsic::x86_ssse3_phadd_sw:
+      case Intrinsic::x86_ssse3_phadd_sw_128:
+      case Intrinsic::x86_avx2_phadd_sw: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);
+
+        Known = KnownBits::sadd_sat(Known, Known)
+                    .intersectWith(KnownBits::sadd_sat(Known2, Known2));
+        break;
+      }
+      case Intrinsic::x86_ssse3_phsub_d:
+      case Intrinsic::x86_ssse3_phsub_w:
+      case Intrinsic::x86_ssse3_phsub_d_128:
+      case Intrinsic::x86_ssse3_phsub_w_128:
+      case Intrinsic::x86_avx2_phsub_d:
+      case Intrinsic::x86_avx2_phsub_w: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);
+
+        Known = KnownBits::computeForAddSub(false, false, false, Known, Known)
+                    .intersectWith(KnownBits::computeForAddSub(
+                        false, false, false, Known2, Known2));
+        break;
+      }
+      case Intrinsic::x86_ssse3_phsub_sw:
+      case Intrinsic::x86_ssse3_phsub_sw_128:
+      case Intrinsic::x86_avx2_phsub_sw: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);
+
+        Known = KnownBits::ssub_sat(Known, Known)
+                    .intersectWith(KnownBits::ssub_sat(Known2, Known2));
+        break;
+      }
       case Intrinsic::riscv_vsetvli:
       case Intrinsic::riscv_vsetvlimax: {
         bool HasAVL = II->getIntrinsicID() == Intrinsic::riscv_vsetvli;
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ecc5b3b3bf840..c23df2c91f385 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -37262,6 +37262,27 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
     }
     break;
   }
+  case X86ISD::HADD: {
+    Known = DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
+    KnownBits Known2 =
+        DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+
+    Known = KnownBits::computeForAddSub(true, false, false, Known, Known)
+                .intersectWith(KnownBits::computeForAddSub(true, false, false,
+                                                           Known2, Known2));
+    break;
+  }
+  case X86ISD::HSUB: {
+    Known =
+        DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
+    KnownBits Known2 =
+        DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+
+    Known = KnownBits::computeForAddSub(false, false, false, Known, Known)
+                .intersectWith(KnownBits::computeForAddSub(false, false, false,
+                                                           Known2, Known2));
+    break;
+  }
   case ISD::INTRINSIC_WO_CHAIN: {
     switch (Op->getConstantOperandVal(0)) {
     case Intrinsic::x86_sse2_psad_bw:
@@ -37276,6 +37297,58 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
       computeKnownBitsForPSADBW(LHS, RHS, Known, DemandedElts, DAG, Depth);
       break;
     }
+    case Intrinsic::x86_ssse3_phadd_d:
+    case Intrinsic::x86_ssse3_phadd_w:
+    case Intrinsic::x86_ssse3_phadd_d_128:
+    case Intrinsic::x86_ssse3_phadd_w_128:
+    case Intrinsic::x86_avx2_phadd_d:
+    case Intrinsic::x86_avx2_phadd_w: {
+      Known = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+      KnownBits Known2 =
+          DAG.computeKnownBits(Op.getOperand(2), DemandedElts, Depth + 1);
+
+      Known = KnownBits::computeForAddSub(true, false, false, Known, Known)
+                  .intersectWith(KnownBits::computeForAddSub(true, false, false,
+                                                             Known2, Known2));
+      break;
+    }
+    case Intrinsic::x86_ssse3_phadd_sw:
+    case Intrinsic::x86_ssse3_phadd_sw_128:
+    case Intrinsic::x86_avx2_phadd_sw: {
+      Known = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+      KnownBits Known2 =
+          DAG.computeKnownBits(Op.getOperand(2), DemandedElts, Depth + 1);
+
+      Known = KnownBits::sadd_sat(Known, Known)
+                  .intersectWith(KnownBits::sadd_sat(Known2, Known2));
+      break;
+    }
+    case Intrinsic::x86_ssse3_phsub_d:
+    case Intrinsic::x86_ssse3_phsub_w:
+    case Intrinsic::x86_ssse3_phsub_d_128:
+    case Intrinsic::x86_ssse3_phsub_w_128:
+    case Intrinsic::x86_avx2_phsub_d:
+    case Intrinsic::x86_avx2_phsub_w: {
+      Known = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+      KnownBits Known2 =
+          DAG.computeKnownBits(Op.getOperand(2), DemandedElts, Depth + 1);
+
+      Known = KnownBits::computeForAddSub(false, false, false, Known, Known)
+                  .intersectWith(KnownBits::computeForAddSub(
+                      false, false, false, Known2, Known2));
+      break;
+    }
+    case Intrinsic::x86_ssse3_phsub_sw:
+    case Intrinsic::x86_ssse3_phsub_sw_128:
+    case Intrinsic::x86_avx2_phsub_sw: {
+      Known = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
+      KnownBits Known2 =
+          DAG.computeKnownBits(Op.getOperand(2), DemandedElts, Depth + 1);
+
+      Known = KnownBits::ssub_sat(Known, Known)
+                  .intersectWith(KnownBits::ssub_sat(Known2, Known2));
+      break;
+    }
     }
     break;
   }
diff --git a/llvm/test/Analysis/ValueTracking/knownbits-hadd-hsub.ll b/llvm/test/Analysis/ValueTracking/knownbits-hadd-hsub.ll
new file mode 100644
index 0000000000000..443ab72ee54cb
--- /dev/null
+++ b/llvm/test/Analysis/ValueTracking/knownbits-hadd-hsub.ll
@@ -0,0 +1,192 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+define <4 x i1> @hadd_and_eq_v4i32(<4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: define <4 x i1> @hadd_and_eq_v4i32(
+; CHECK-SAME: <4 x i32> [[X:%.*]], <4 x i32> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <4 x i1> zeroinitializer
+;
+entry:
+  %0 = and <4 x i32> %x, <i32 3, i32 3, i32 3, i32 3>
+  %1 = and <4 x i32> %y, <i32 3, i32 3, i32 3, i32 3>
+  %2 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %0, <4 x i32> %1)
+  %3 = and <4 x i32> %2, <i32 -8, i32 -8, i32 -8, i32 -8>
+  %ret = icmp eq <4 x i32> %3, <i32 3, i32 4, i32 5, i32 6>
+  ret <4 x i1> %ret
+}
+
+define <8 x i1> @hadd_and_eq_v8i16(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: define <8 x i1> @hadd_and_eq_v8i16(
+; CHECK-SAME: <8 x i16> [[X:%.*]], <8 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = and <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <8 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phadd.w.128(<8 x i16> %0, <8 x i16> %1)
+  %3 = and <8 x i16> %2, <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>
+  %ret = icmp eq <8 x i16> %3, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0>
+  ret <8 x i1> %ret
+}
+
+define <8 x i1> @hadd_and_eq_v8i16_sat(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: define <8 x i1> @hadd_and_eq_v8i16_sat(
+; CHECK-SAME: <8 x i16> [[X:%.*]], <8 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = and <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <8 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16> %0, <8 x i16> %1)
+  %3 = and <8 x i16> %2, <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>
+  %ret = icmp eq <8 x i16> %3, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0>
+  ret <8 x i1> %ret
+}
+
+define <8 x i1> @hadd_and_eq_v8i32(<8 x i32> %x, <8 x i32> %y) {
+; CHECK-LABEL: define <8 x i1> @hadd_and_eq_v8i32(
+; CHECK-SAME: <8 x i32> [[X:%.*]], <8 x i32> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <8 x i1> zeroinitializer
+;
+entry:
+  %0 = and <8 x i32> %x, <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
+  %1 = and <8 x i32> %y, <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
+  %2 = tail call <8 x i32> @llvm.x86.avx2.phadd.d(<8 x i32> %0, <8 x i32> %1)
+  %3 = and <8 x i32> %2, <i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8>
+  %ret = icmp eq <8 x i32> %3, <i32 3, i32 4, i32 5, i32 6, i32 3, i32 4, i32 5, i32 6>
+  ret <8 x i1> %ret
+}
+
+define <16 x i1> @hadd_and_eq_v16i16(<16 x i16> %x, <16 x i16> %y) {
+; CHECK-LABEL: define <16 x i1> @hadd_and_eq_v16i16(
+; CHECK-SAME: <16 x i16> [[X:%.*]], <16 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <16 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = and <16 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <16 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <16 x i16> @llvm.x86.avx2.phadd.w(<16 x i16> %0, <16 x i16> %1)
+  %3 = and <16 x i16> %2, <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>
+  %ret = icmp eq <16 x i16> %3, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0, i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0>
+  ret <16 x i1> %ret
+}
+
+define <16 x i1> @hadd_and_eq_v16i16_sat(<16 x i16> %x, <16 x i16> %y) {
+; CHECK-LABEL: define <16 x i1> @hadd_and_eq_v16i16_sat(
+; CHECK-SAME: <16 x i16> [[X:%.*]], <16 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <16 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = and <16 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <16 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <16 x i16> @llvm.x86.avx2.phadd.sw(<16 x i16> %0, <16 x i16> %1)
+  %3 = and <16 x i16> %2, <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>
+  %ret = icmp eq <16 x i16> %3, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0, i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 0>
+  ret <16 x i1> %ret
+}
+
+define <4 x i1> @hsub_trunc_eq_v4i32(<4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: define <4 x i1> @hsub_trunc_eq_v4i32(
+; CHECK-SAME: <4 x i32> [[X:%.*]], <4 x i32> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <4 x i1> zeroinitializer
+;
+entry:
+  %0 = or <4 x i32> %x, <i32 65535, i32 65535, i32 65535, i32 65535>
+  %1 = or <4 x i32> %y, <i32 65535, i32 65535, i32 65535, i32 65535>
+  %2 = tail call <4 x i32> @llvm.x86.ssse3.phsub.d.128(<4 x i32> %0, <4 x i32> %1)
+  %conv = trunc <4 x i32> %2 to <4 x i16>
+  %ret = icmp eq <4 x i16> %conv, <i16 3, i16 4, i16 5, i16 6>
+  ret <4 x i1> %ret
+}
+
+define <8 x i1> @hsub_trunc_eq_v8i16(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: define <8 x i1> @hsub_trunc_eq_v8i16(
+; CHECK-SAME: <8 x i16> [[X:%.*]], <8 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = or <8 x i16> %x, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
+  %1 = or <8 x i16> %y, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phsub.w.128(<8 x i16> %0, <8 x i16> %1)
+  %conv = trunc <8 x i16> %2 to <8 x i8>
+  %ret = icmp eq <8 x i8> %conv, <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 0>
+  ret <8 x i1> %ret
+}
+
+define <8 x i1> @hsub_and_eq_v8i16_sat(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: define <8 x i1> @hsub_and_eq_v8i16_sat(
+; CHECK-SAME: <8 x i16> [[X:%.*]], <8 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = or <8 x i16> [[X]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP1:%.*]] = or <8 x i16> [[Y]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call <8 x i16> @llvm.x86.ssse3.phsub.sw.128(<8 x i16> [[TMP0]], <8 x i16> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = and <8 x i16> [[TMP2]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq <8 x i16> [[TMP3]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[TMP4]]
+;
+entry:
+  %0 = or <8 x i16> %x, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %1 = or <8 x i16> %y, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phsub.sw.128(<8 x i16> %0, <8 x i16> %1)
+  %3 = and <8 x i16> %2, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %4 = icmp eq <8 x i16> %3, zeroinitializer
+  ret <8 x i1> %4
+}
+
+define <8 x i1> @hsub_trunc_eq_v8i32(<8 x i32> %x, <8 x i32> %y) {
+; CHECK-LABEL: define <8 x i1> @hsub_trunc_eq_v8i32(
+; CHECK-SAME: <8 x i32> [[X:%.*]], <8 x i32> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <8 x i1> zeroinitializer
+;
+entry:
+  %0 = or <8 x i32> %x, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
+  %1 = or <8 x i32> %y, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
+  %2 = tail call <8 x i32> @llvm.x86.avx2.phsub.d(<8 x i32> %0, <8 x i32> %1)
+  %conv = trunc <8 x i32> %2 to <8 x i16>
+  %ret = icmp eq <8 x i16> %conv, <i16 3, i16 4, i16 5, i16 6, i16 3, i16 4, i16 5, i16 6>
+  ret <8 x i1> %ret
+}
+
+define <16 x i1> @hsub_trunc_eq_v16i16(<16 x i16> %x, <16 x i16> %y) {
+; CHECK-LABEL: define <16 x i1> @hsub_trunc_eq_v16i16(
+; CHECK-SAME: <16 x i16> [[X:%.*]], <16 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <16 x i1> <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true>
+;
+entry:
+  %0 = or <16 x i16> %x, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
+  %1 = or <16 x i16> %y, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
+  %2 = tail call <16 x i16> @llvm.x86.avx2.phsub.w(<16 x i16> %0, <16 x i16> %1)
+  %conv = trunc <16 x i16> %2 to <16 x i8>
+  %ret = icmp eq <16 x i8> %conv, <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 0, i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 0>
+  ret <16 x i1> %ret
+}
+
+define <16 x i1> @hsub_and_eq_v16i16_sat(<16 x i16> %x, <16 x i16> %y) {
+; CHECK-LABEL: define <16 x i1> @hsub_and_eq_v16i16_sat(
+; CHECK-SAME: <16 x i16> [[X:%.*]], <16 x i16> [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = or <16 x i16> [[X]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP1:%.*]] = or <16 x i16> [[Y]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call <16 x i16> @llvm.x86.avx2.phsub.sw(<16 x i16> [[TMP0]], <16 x i16> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = and <16 x i16> [[TMP2]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq <16 x i16> [[TMP3]], zeroinitializer
+; CHECK-NEXT:    ret <16 x i1> [[TMP4]]
+;
+entry:
+  %0 = or <16 x i16> %x, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %1 = or <16 x i16> %y, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %2 = tail call <16 x i16> @llvm.x86.avx2.phsub.sw(<16 x i16> %0, <16 x i16> %1)
+  %3 = and <16 x i16> %2, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
+  %4 = icmp eq <16 x i16> %3, zeroinitializer
+  ret <16 x i1> %4
+}
diff --git a/llvm/test/CodeGen/X86/knownbits-hadd-hsub.ll b/llvm/test/CodeGen/X86/knownbits-hadd-hsub.ll
new file mode 100644
index 0000000000000..eba7b9843d991
--- /dev/null
+++ b/llvm/test/CodeGen/X86/knownbits-hadd-hsub.ll
@@ -0,0 +1,201 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2 | FileCheck %s
+
+define <4 x i16> @hadd_trunc_v4i32(<4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: hadd_trunc_v4i32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vpbroadcastd {{.*#+}} xmm2 = [3,3,3,3]
+; CHECK-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; CHECK-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; CHECK-NEXT:    vphaddd %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    retq
+entry:
+  %0 = and <4 x i32> %x, <i32 3, i32 3, i32 3, i32 3>
+  %1 = and <4 x i32> %y, <i32 3, i32 3, i32 3, i32 3>
+  %2 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %0, <4 x i32> %1)
+  %conv = trunc <4 x i32> %2 to <4 x i16>
+  ret <4 x i16> %conv
+}
+
+define <8 x i8> @hadd_trunc_v8i16(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: hadd_trunc_v8i16:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vpbroadcastw {{.*#+}} xmm2 = [3,3,3,3,3,3,3,3]
+; CHECK-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; CHECK-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; CHECK-NEXT:    vphaddw %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    retq
+entry:
+  %0 = and <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <8 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phadd.w.128(<8 x i16> %0, <8 x i16> %1)
+  %conv = trunc <8 x i16> %2 to <8 x i8>
+  ret <8 x i8> %conv
+}
+
+define <8 x i8> @hadd_trunc_v8i16_sat(<8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: hadd_trunc_v8i16_sat:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vpbroadcastw {{.*#+}} xmm2 = [3,3,3,3,3,3,3,3]
+; CHECK-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; CHECK-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; CHECK-NEXT:    vphaddsw %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    retq
+entry:
+  %0 = and <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %1 = and <8 x i16> %y, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
+  %2 = tail call <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16> %0, <8 x i...
[truncated]

llvm/lib/Analysis/ValueTracking.cpp

goldsteinn · 2024-05-16T17:11:18Z

llvm/lib/Analysis/ValueTracking.cpp

+      case Intrinsic::x86_avx2_phadd_d:
+      case Intrinsic::x86_avx2_phadd_w: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);


I'm not sure demandedelts is right to propagate here. I.e in your example of:
[X1, X2], [Y1, Y2] -> [X1 + X2, Y1 + Y2]. If you only demand the first element, you still need both X1 and X2 from operand 0, you just don't need either Y1 or Y2.

This also appears to need to be changed for the rest of the impls.

Can you add some tests where the result is then shuffled/extracted from to test w/ DemandedElts as not all ones?

Apparently, I didn't quite understand the purpose of DemandedElts. I've adjusted the code and added some tests.

RKSimon

It might be better to focus on the SelectionDAG variant first to get this right, but its up to you.

RKSimon · 2024-05-16T17:09:13Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -37262,6 +37262,27 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
    }
    break;
  }
+  case X86ISD::HADD: {
+    Known = DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);


You can't use the result DemandedElts for the source operands you'll need something like:

APInt DemandedLHS, DemandedRHS; getHorizDemandedElts(VT, DemandedElts, DemandedLHS, DemandedRHS);

You can avoid computeKnownBits calls for cases where either DemandedLHS/RHS are zero.
Plus you can probably get more refined knownbits by getting the known bits of the odd/even elements separately, although that could mean 4 computeKnownBits calls.

I've added the implementation with 4 computeKnownBits calls in the hope that it does not slow down the analysis too much.

llvm/lib/Target/X86/X86ISelLowering.cpp

RKSimon · 2024-05-16T17:10:39Z

llvm/lib/Analysis/ValueTracking.cpp

+      case Intrinsic::x86_ssse3_phadd_sw_128:
+      case Intrinsic::x86_avx2_phadd_sw: {
+        computeKnownBits(I->getOperand(0), DemandedElts, Known, Depth + 1, Q);
+        computeKnownBits(I->getOperand(1), DemandedElts, Known2, Depth + 1, Q);


wow, saturated hadd, haven't seen that for a while :)

goldsteinn · 2024-05-19T16:46:30Z

llvm/include/llvm/Analysis/VectorUtils.h

+/// \param DemandedEltsOp  the demanded elements mask for the operation
+/// \param DemandedEltsLHS the demanded elements mask for the left operand
+/// \param DemandedEltsRHS the demanded elements mask for the right operand
+void getHorizontalDemandedElts(const APInt &DemandedEltsOp,


I think this function name should make it clear you are only getting half the elements.

Good point. Done. In the meantime, I've noticed that there is already another function that does almost the same. Therefore, I've consolidated them.

goldsteinn · 2024-05-19T16:47:31Z

llvm/lib/Analysis/ValueTracking.cpp

+
+  std::array<KnownBits, 2> KnownLHS;
+  for (unsigned Index = 0; Index < KnownLHS.size(); ++Index) {
+    if (!DemandedEltsLHS.isZero()) {


I think you must never be hitting this case otherwise think we will run into issues with uninitialized used of KnownLHS. Likewise below for RHS.

I'm not sure I understand. In both cases, KnownLHS[Index] is initialized. Shouldn't both cases occur in the hadd_extract test cases? Shouldn't they fail if anything were uninitialized? At least, valgrind --tool=memcheck doesn't report any error when executing the tests.

Yeah you're right, I misread code.

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Analysis/VectorUtils.cpp

goldsteinn · 2024-06-08T17:59:49Z

This basically LGTM. Please wait for some additional approvals to push.

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Analysis/VectorUtils.cpp

RKSimon · 2024-06-10T10:26:23Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+          [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+            return KnownBits::ssub_sat(KnownLHS, KnownRHS);
+          });
+      break;


I'd prefer it in the DAG if we just handled the X86ISD::HADD/SUB nodes and not the intrinsics - we try to do as little as possible with MMX types in DAG, and the saturation instructions are very rare - its much more likely that we just need to determine knownbits for X86ISD::HADD/SUB nodes we've created in the DAG.

The implementation handled the intrinsics because otherwise some test cases would not fold. For example, the test case that truncates <4 x i32> to <4 x i16> does not fold when handling only the X86ISD::HADD/HSUB. In contrast, tests that truncate <8 x i32> to <8 x i16> work fine this way.

After looking at this problem again, I believe that the code that replaces the shuffle with a pack instruction might be too strict. This is probably also the case in this example: https://godbolt.org/z/KW5b6r7xW

Anyway, I've removed the handling of the intrinsics and adapted the test cases such that they still fold with only the X86ISD::HADD/HSUB nodes.

llvm/test/Analysis/ValueTracking/knownbits-hadd-hsub.ll

RKSimon

Sorry for this falling off my radar - a few minors but otherwise almost ready to go

llvm/lib/Target/X86/X86ISelLowering.cpp

RKSimon · 2024-07-12T12:27:06Z

@mskamp Please can you rebase to fix the merge conflicts?

RKSimon

LGTM

Add KnownBits computations to ValueTracking and X86 DAG lowering. These instructions add/subtract adjacent vector elements in their operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2]. This means that, in this example, we can compute the KnownBits of the operation by computing the KnownBits of [X1, X2] + [X1, X2] and [Y1, Y2] + [Y1, Y2] and intersecting the results. This approach also generalizes to all x86 vector types. There are also the operations phadd.sw and phsub.sw, which perform saturating addition/subtraction. Use sadd_sat and ssub_sat to compute the KnownBits of these operations in ValueTracking. Also adjust the existing test case pr53247.ll because it can be transformed to a constant using the new KnownBits computation. Fixes llvm#82516.

github-actions · 2024-07-16T14:50:41Z

@mskamp Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Summary: Add KnownBits computations to ValueTracking and X86 DAG lowering. These instructions add/subtract adjacent vector elements in their operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2]. This means that, in this example, we can compute the KnownBits of the operation by computing the KnownBits of [X1, X2] + [X1, X2] and [Y1, Y2] + [Y1, Y2] and intersecting the results. This approach also generalizes to all x86 vector types. There are also the operations phadd.sw and phsub.sw, which perform saturating addition/subtraction. Use sadd_sat and ssub_sat to compute the KnownBits of these operations. Also adjust the existing test case pr53247.ll because it can be transformed to a constant using the new KnownBits computation. Fixes llvm#82516. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822457

Summary: Add KnownBits computations to ValueTracking and X86 DAG lowering. These instructions add/subtract adjacent vector elements in their operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2]. This means that, in this example, we can compute the KnownBits of the operation by computing the KnownBits of [X1, X2] + [X1, X2] and [Y1, Y2] + [Y1, Y2] and intersecting the results. This approach also generalizes to all x86 vector types. There are also the operations phadd.sw and phsub.sw, which perform saturating addition/subtraction. Use sadd_sat and ssub_sat to compute the KnownBits of these operations. Also adjust the existing test case pr53247.ll because it can be transformed to a constant using the new KnownBits computation. Fixes #82516. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251514

mskamp requested a review from nikic as a code owner May 16, 2024 16:53

llvmbot added backend:X86 llvm:analysis labels May 16, 2024

RKSimon requested review from RKSimon and goldsteinn May 16, 2024 17:00

goldsteinn reviewed May 16, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed May 16, 2024

View reviewed changes

RKSimon requested changes May 16, 2024

View reviewed changes

mskamp force-pushed the feature_hadd_hsub_knownbits branch from 8aa2cec to 58d73b1 Compare May 19, 2024 06:53

goldsteinn reviewed May 19, 2024

View reviewed changes

mskamp force-pushed the feature_hadd_hsub_knownbits branch from 58d73b1 to c29661e Compare May 20, 2024 05:04

mskamp force-pushed the feature_hadd_hsub_knownbits branch from c29661e to ff64740 Compare June 2, 2024 15:02

goldsteinn reviewed Jun 2, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Jun 2, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

mskamp force-pushed the feature_hadd_hsub_knownbits branch from ff64740 to 072a9c0 Compare June 4, 2024 17:08

goldsteinn reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Show resolved Hide resolved

goldsteinn reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

mskamp force-pushed the feature_hadd_hsub_knownbits branch from 072a9c0 to d680eab Compare June 7, 2024 14:56

goldsteinn reviewed Jun 7, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

mskamp force-pushed the feature_hadd_hsub_knownbits branch from d680eab to 8c7dc00 Compare June 8, 2024 05:13

goldsteinn reviewed Jun 8, 2024

View reviewed changes

llvm/lib/Analysis/VectorUtils.cpp Outdated Show resolved Hide resolved

RKSimon reviewed Jun 10, 2024

View reviewed changes

mskamp force-pushed the feature_hadd_hsub_knownbits branch from 8c7dc00 to e8ce125 Compare June 15, 2024 13:19

RKSimon reviewed Jul 10, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

mskamp force-pushed the feature_hadd_hsub_knownbits branch from e8ce125 to b64f2ea Compare July 12, 2024 10:40

RKSimon approved these changes Jul 12, 2024

View reviewed changes

mskamp added 2 commits July 14, 2024 17:59

[ValueTracking][X86][NFC] Add Tests for KnownBits of phadd/phsub

9c50e70

mskamp force-pushed the feature_hadd_hsub_knownbits branch from b64f2ea to 35f153c Compare July 15, 2024 19:09

RKSimon merged commit b22fa90 into llvm:main Jul 16, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ValueTracking][X86] Compute KnownBits for phadd/phsub #92429

[ValueTracking][X86] Compute KnownBits for phadd/phsub #92429

mskamp commented May 16, 2024 •

edited

Loading

github-actions bot commented May 16, 2024

llvmbot commented May 16, 2024 •

edited

Loading

goldsteinn May 16, 2024

mskamp May 19, 2024

RKSimon left a comment

RKSimon May 16, 2024

mskamp May 19, 2024

RKSimon May 16, 2024

goldsteinn May 19, 2024

mskamp May 20, 2024

goldsteinn May 19, 2024

mskamp May 20, 2024

goldsteinn May 20, 2024

goldsteinn commented Jun 8, 2024

RKSimon Jun 10, 2024

mskamp Jun 15, 2024

RKSimon left a comment

RKSimon commented Jul 12, 2024

RKSimon left a comment

github-actions bot commented Jul 16, 2024

[ValueTracking][X86] Compute KnownBits for phadd/phsub #92429

[ValueTracking][X86] Compute KnownBits for phadd/phsub #92429

Conversation

mskamp commented May 16, 2024 • edited Loading

github-actions bot commented May 16, 2024

llvmbot commented May 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RKSimon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented Jun 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RKSimon left a comment

Choose a reason for hiding this comment

RKSimon commented Jul 12, 2024

RKSimon left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 16, 2024

mskamp commented May 16, 2024 •

edited

Loading

llvmbot commented May 16, 2024 •

edited

Loading