[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

bipmis · 2024-03-12T15:33:24Z

We handled isKnownNonZero sub(ptr2int, ptr2int).

In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero.

llvmbot · 2024-03-12T15:33:43Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (bipmis)

Changes

We handled isKnownNonZero sub(ptr2int, ptr2int).

In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero.

Patch is 21.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84933.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ValueTracking.cpp (+78)
(added) llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll (+359)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 6d0e79e11eed43..9e8efb642bf922 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -35,6 +35,7 @@
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/Analysis/WithCache.h"
+#include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/BasicBlock.h"
@@ -88,6 +89,10 @@ using namespace llvm::PatternMatch;
 static cl::opt<unsigned> DomConditionsMaxUses("dom-conditions-max-uses",
                                               cl::Hidden, cl::init(20));
 
+// Controls the maximum memory allocation to 32-bit or 4GB.
+static cl::opt<bool> PointerAllocationsLimitedto32bit(
+    "alloc-limit-32bit", cl::init(false),
+    cl::desc("A Pointer with max 32bit allocs"));
 
 /// Returns the bitwidth of the given scalar or pointer type. For vector types,
 /// returns the element type's bitwidth.
@@ -2539,7 +2544,80 @@ static bool isKnownNonZeroFromOperator(const Operator *I,
   case Instruction::ZExt:
     // ext X != 0 if X != 0.
     return isKnownNonZero(I->getOperand(0), Depth, Q);
+  case Instruction::Trunc:
+    // trunc(operand) is KnownNonZero if lower order bits of operand is a
+    // KnownNonZero. Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to
+    // i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are
+    // from same object and one of the pointers is a recursive GEP. And either
+    // the objectSize is known OR is indicative via a compiler flag, which
+    // suggests objectSize<4G.
+    Value *A, *B;
+    if (match(I->getOperand(0), m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+                                      m_PtrToIntSameSize(Q.DL, m_Value(B)))) ||
+        match(I->getOperand(0),
+              m_LShr(m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+                           m_PtrToIntSameSize(Q.DL, m_Value(B))),
+                     m_ConstantInt()))) {
+      // Check for a specific pattern where A is recursive GEP with a PHI ptr
+      // with incoming values as (Start,Step) and Start==B.
+      auto *GEPA = dyn_cast<GEPOperator>(A);
+      if (!GEPA) {
+        GEPA = dyn_cast<GEPOperator>(B);
+        std::swap(A, B);
+      }
+      if (!GEPA || GEPA->getNumIndices() != 1 ||
+          !isa<Constant>(GEPA->idx_begin()))
+        return false;
+
+      // Handle 2 incoming PHI values with one being a recursive GEP.
+      auto *PN = dyn_cast<PHINode>(GEPA->getPointerOperand());
+      if (!PN || PN->getNumIncomingValues() != 2)
+        return false;
+
+      // Search for the recursive GEP as an incoming operand, and record that as
+      // Step.
+      Value *Start = nullptr;
+      Value *Step = const_cast<Value *>(A);
+      if (PN->getIncomingValue(0) == Step)
+        Start = PN->getIncomingValue(1);
+      else if (PN->getIncomingValue(1) == Step)
+        Start = PN->getIncomingValue(0);
+      else
+        return false;
 
+      // Check if the pointers Start and B are same and ObjectSize can be
+      // determined/PointerAllocationsLimitedto32bit flag set.
+      if (Start == B) {
+        ObjectSizeOpts Opts;
+        Opts.RoundToAlign = false;
+        Opts.NullIsUnknownSize = true;
+        uint64_t ObjSize = 0;
+        // Can we get the ObjectSize and ObjectSize is within range.
+        // There is possibly more that we can do when ObjSize if known, and
+        // extend the same for other truncates. Restricting it for a 32bit
+        // truncate result.
+        if (getObjectSize(Start, ObjSize, Q.DL, Q.TLI, Opts)) {
+          if (ObjSize == 0 || ObjSize > (uint64_t)0xFFFFFFFF)
+            return false;
+        }
+        // Does the compiler flag specify Object allocs within 4G.
+        else if (!PointerAllocationsLimitedto32bit)
+          return false;
+      } else
+        return false;
+
+      // Check if trunc src type is same as Ptr type and dest is a 32bit.
+      // If objectSize<4G or PointerAllocationsLimitedto32bit flag set, check if
+      // the trunc operand is a KnownNonZero.
+      LLVMContext &Ctx = I->getContext();
+      Type *SrcTy = I->getOperand(0)->getType();
+      Type *DstTy = I->getType();
+      unsigned IntPtrWidth = Q.DL.getPointerSizeInBits();
+      unsigned TruncSrcBits = Q.DL.getTypeSizeInBits(SrcTy);
+      if (TruncSrcBits == IntPtrWidth && DstTy == Type::getInt32Ty(Ctx))
+        return isKnownNonZero(I->getOperand(0), Depth, Q);
+    }
+    break;
   case Instruction::Shl: {
     // shl nsw/nuw can't remove any non-zero bits.
     const OverflowingBinaryOperator *BO = cast<OverflowingBinaryOperator>(I);
diff --git a/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
new file mode 100644
index 00000000000000..01c6e35e3fbdd2
--- /dev/null
+++ b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
@@ -0,0 +1,359 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -alloc-limit-32bit=true -passes=instcombine < %s -S | FileCheck %s
+
+define i1 @recursiveGEP_withPtrSubTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubAshrTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubAshrTrunc(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 2
+; CHECK-NEXT:    [[TMP0:%.*]] = load i16, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i16 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i16, ptr %a.pn.i, i64 1
+  %0 = load i16, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i16 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %sub.ptr.div = ashr exact i64 %sub.ptr.sub.i, 1
+  %conv4 = trunc i64 %sub.ptr.div to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[TEST_1_I:%.*]] = getelementptr inbounds i8, ptr [[VAL1]], i64 -5
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_1_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT:    [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[CONV4]], 0
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT:    ret i1 [[RETVAL_0_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %test.1.i = getelementptr inbounds i8, ptr %val1, i64 -5
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %test.1.i to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[VAL1]], [[ENTRY:%.*]] ], [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %val1, %entry ], [ %test.0.i, %while.cond.i ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.rhs.cast.i, %sub.ptr.lhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc64to16(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc64to16(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[VAL1]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT:    [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i16
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i16 [[CONV4]], 0
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT:    ret i1 [[RETVAL_0_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i16
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i16 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i16 %retval.0.i, 0
+  ret i1 %bool
+}
+
+%struct.Foo = type { [256 x i8] }
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT:    store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    br label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT:    [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK:       len.exit:
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    ret i1 false
+;
+entry:
+  %foo = alloca %struct.Foo, align 1
+  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %foo)
+  call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) %foo, i8 97, i64 255, i1 false)
+  %arrayidx = getelementptr inbounds [256 x i8], ptr %foo, i64 0, i64 255
+  store i8 0, ptr %arrayidx, align 1
+  br label %while.cond.i
+
+while.cond.i:
+  %f.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %foo, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %f.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 1
+  %cmp5.not.i = icmp eq i8 %0, 0
+  br i1 %cmp5.not.i, label %len.exit, label %while.cond.i
+
+len.exit:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %foo to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %1 = trunc i64 %sub.ptr.sub.i to i32
+  %cmp = icmp eq i32 %1, 0
+  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %foo)
+  ret i1 %cmp
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT:    store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[FOO2:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 256
+; CHECK-NEXT:    br label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO2]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT:    [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK:       len.exit:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[FOO2]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_...
[truncated]

github-actions · 2024-03-12T15:35:54Z

✅ With the latest revision this PR passed the C/C++ code formatter.

goldsteinn · 2024-03-12T15:47:14Z

llvm/lib/Analysis/ValueTracking.cpp

+    // i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are
+    // from same object and one of the pointers is a recursive GEP. And either
+    // the objectSize is known OR is indicative via a compiler flag, which
+    // suggests objectSize<4G.


I wonder if we could instead support the exact flag in trunc to indicate that it doesn't change the high bits.
I don't think this is the only case where we wish we could peek past trunc.

I am sure there are other cases we can benefit particularly where the Object Size is known and can be determined.
The idea here was also for situations where we do not know much about the object and cannot possibly identify a ptr as an Object. Also for many applications we do not exceed the 32bit allocation requirement. In these cases a compiler flag would benefit.

goldsteinn · 2024-03-12T15:48:01Z

Is there a motivating case for this?

bipmis · 2024-04-22T10:27:08Z

Is there a motivating case for this?

@goldsteinn I believe it would extend the reductions for scenarios where objectSize can be known.
Additionally it would also speedup the loops which implements a stringLen(recursive gep in loop), which can be possibly be reduced and results in a truncate. A simple case can be as seen in https://gcc.godbolt.org/z/K93v1dsaG

bipmis · 2024-04-22T13:45:45Z

Ping.

llvm/lib/Analysis/ValueTracking.cpp

goldsteinn · 2024-04-22T17:56:12Z

llvm/lib/Analysis/ValueTracking.cpp

+      if (PN->getIncomingValue(0) == Step)
+        Start = PN->getIncomingValue(1);
+      else if (PN->getIncomingValue(1) == Step)
+        Start = PN->getIncomingValue(0);


Think you should use matchSimpleRecurence here. Will be pre-req patch to support GEP in the helper.

The matchSimpleRecurence is tightly bound to a BinaryOperator and extends beyond ValueTracking. I could create one to handle GEPOperator.

llvm/lib/Analysis/ValueTracking.cpp

goldsteinn · 2024-04-22T18:10:56Z

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

bipmis · 2024-04-23T17:35:16Z

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

goldsteinn · 2024-04-23T18:47:50Z

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

That patch will land today.

nikic · 2024-04-24T06:33:07Z

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

…ion when objSize is known

…ion when allocs are restricted to 32-bit

bipmis · 2024-04-24T21:03:04Z

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

That patch will land today.

Great. Made the changes.

bipmis · 2024-04-24T21:07:12Z

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

nikic · 2024-04-25T01:14:31Z

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

@@ -910,6 +916,67 @@ Instruction *InstCombinerImpl::visitTrunc(TruncInst &Trunc) {
    Changed = true;
  }

+  // Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to i32 / trunc i64


Shouldn't this logic be inside computeKnownBits() for sub? The information we get out of this is that the top bits of the sub result are zero. That will result in the nuw flag on trunc being set down the line.

nikic · 2024-04-25T01:14:41Z

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

+// Controls the maximum memory allocation to 32-bit or 4GB.
+static cl::opt<bool> PointerAllocationsLimitedto32bit(
+    "alloc-limit-32bit", cl::init(false),
+    cl::desc("A Pointer with max 32bit allocs"));


I'm not willing to add this flag.

I agree.
WARNING: Adding this flag may violate SPEC's rule.
See also https://www.spec.org/cpu2017/Docs/runrules.html#rule_1.4 and https://www.spec.org/cpu2017/results/res2023q1/cpu2017-20221223-33220.html.

I have provided my understanding of this code pattern and dont quite agree that it violates a benchmark rule or criteria.

nikic · 2024-04-25T01:20:04Z

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

+      if (!GEPA) {
+        GEPA = dyn_cast<GEPOperator>(B);
+        std::swap(A, B);
+      }


Don't you need inbounds for this (please also add test)? Otherwise the GEPs may go beyond the limit of the allocation size.

nikic · 2024-04-25T01:26:17Z

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

dtcxzyw · 2024-04-25T08:06:32Z

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern. But this patch seems to just handle a special case :( I cannot tell whether it provides benefits to some real-world applications.

bipmis · 2024-04-25T12:47:34Z

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern. But this patch seems to just handle a special case :( I cannot tell whether it provides benefits to some real-world applications.

It definitely extends beyond a single case. This would also apply to the legacy code scenarios where stringLength() was bound to an int type. I am seeing this with utf8_check_string(). And there can be scenarios where we would like to build these applications with the latest compiler.
Additionally if trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern and it can be determined that the 2 ptrs point to same object and objectSize evaluated statically/known by the application user and suggested via flag, it should be a valid optimisation.

bipmis · 2024-04-29T11:15:25Z

I just the looked at usability of the sequence trunc(sub(ptr2int, ptr2int)) / (unsigned)(strlen) and looks like it indeed it is quite common in a class of applications beyond the legacy code example cited earlier. Some of the other scenarios(not an extensive list) where we see this pattern and should be able to optimise out the "trunc instruction" would be

Database applications like sql - https://opensource.apple.com/source/CPANInternal/CPANInternal-108/DBD-SQLite/sqlite3.c, openDirectory()
Multimedia applications - https://github.com/xiph/flac/blob/master/src/metaflac/options.c, parse_add_seekpoint()
Compression algorithms - https://github.com/mcmilk/7-Zip/blob/master/CPP/7zip/UI/Common/HashCalc.cpp, AddHashResultLine()
In some these cases if the allocations are limited to a max 4Gb(32 bit allocations) which is definitely possible and known to the application user, we could use the supported flag to optimise out the "trunc" instruction, which in turn could enable host of other optimisations.

bipmis added llvm:analysis llvm:transforms labels Mar 12, 2024

bipmis requested review from davemgreen, sjoerdmeijer and david-arm March 12, 2024 15:33

bipmis requested a review from nikic as a code owner March 12, 2024 15:33

goldsteinn reviewed Mar 12, 2024

View reviewed changes

bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch 2 times, most recently from cbfbdb3 to 09a55bb Compare April 22, 2024 13:44

goldsteinn reviewed Apr 22, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Apr 22, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Apr 22, 2024

View reviewed changes

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved

bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch from 09a55bb to b18d95c Compare April 23, 2024 17:28

bipmis added 3 commits April 24, 2024 20:27

Add tests where trunc operand can be a KnownNonZero

6b14ef9

Add nuw to trunc with sub of ptr2int operands which suggests a recurs…

48a08c4

…ion when objSize is known

Add nuw to trunc with sub of ptr2int operands which suggests a recurs…

7b21620

…ion when allocs are restricted to 32-bit

bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch from b18d95c to 7b21620 Compare April 24, 2024 20:59

nikic reviewed Apr 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

bipmis commented Mar 12, 2024

llvmbot commented Mar 12, 2024 •

edited

github-actions bot commented Mar 12, 2024 •

edited

goldsteinn Mar 12, 2024

bipmis Mar 12, 2024

goldsteinn commented Mar 12, 2024

bipmis commented Apr 22, 2024

bipmis commented Apr 22, 2024

goldsteinn Apr 22, 2024

bipmis Apr 23, 2024

goldsteinn commented Apr 22, 2024

bipmis commented Apr 23, 2024

goldsteinn commented Apr 23, 2024

nikic commented Apr 24, 2024

bipmis commented Apr 24, 2024

bipmis commented Apr 24, 2024

nikic Apr 25, 2024

nikic Apr 25, 2024

dtcxzyw Apr 25, 2024

bipmis Apr 29, 2024

nikic Apr 25, 2024

nikic commented Apr 25, 2024

dtcxzyw commented Apr 25, 2024

bipmis commented Apr 25, 2024

bipmis commented Apr 29, 2024

[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

Are you sure you want to change the base?

[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

Conversation

bipmis commented Mar 12, 2024

llvmbot commented Mar 12, 2024 • edited

github-actions bot commented Mar 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented Mar 12, 2024

bipmis commented Apr 22, 2024

bipmis commented Apr 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented Apr 22, 2024

bipmis commented Apr 23, 2024

goldsteinn commented Apr 23, 2024

nikic commented Apr 24, 2024

bipmis commented Apr 24, 2024

bipmis commented Apr 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic commented Apr 25, 2024

dtcxzyw commented Apr 25, 2024

bipmis commented Apr 25, 2024

bipmis commented Apr 29, 2024

llvmbot commented Mar 12, 2024 •

edited

github-actions bot commented Mar 12, 2024 •

edited