Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bipmis
Copy link
Contributor

@bipmis bipmis commented Mar 12, 2024

We handled isKnownNonZero sub(ptr2int, ptr2int).

In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 12, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (bipmis)

Changes

We handled isKnownNonZero sub(ptr2int, ptr2int).

In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero.


Patch is 21.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84933.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+78)
  • (added) llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll (+359)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 6d0e79e11eed43..9e8efb642bf922 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -35,6 +35,7 @@
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/Analysis/WithCache.h"
+#include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/BasicBlock.h"
@@ -88,6 +89,10 @@ using namespace llvm::PatternMatch;
 static cl::opt<unsigned> DomConditionsMaxUses("dom-conditions-max-uses",
                                               cl::Hidden, cl::init(20));
 
+// Controls the maximum memory allocation to 32-bit or 4GB.
+static cl::opt<bool> PointerAllocationsLimitedto32bit(
+    "alloc-limit-32bit", cl::init(false),
+    cl::desc("A Pointer with max 32bit allocs"));
 
 /// Returns the bitwidth of the given scalar or pointer type. For vector types,
 /// returns the element type's bitwidth.
@@ -2539,7 +2544,80 @@ static bool isKnownNonZeroFromOperator(const Operator *I,
   case Instruction::ZExt:
     // ext X != 0 if X != 0.
     return isKnownNonZero(I->getOperand(0), Depth, Q);
+  case Instruction::Trunc:
+    // trunc(operand) is KnownNonZero if lower order bits of operand is a
+    // KnownNonZero. Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to
+    // i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are
+    // from same object and one of the pointers is a recursive GEP. And either
+    // the objectSize is known OR is indicative via a compiler flag, which
+    // suggests objectSize<4G.
+    Value *A, *B;
+    if (match(I->getOperand(0), m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+                                      m_PtrToIntSameSize(Q.DL, m_Value(B)))) ||
+        match(I->getOperand(0),
+              m_LShr(m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+                           m_PtrToIntSameSize(Q.DL, m_Value(B))),
+                     m_ConstantInt()))) {
+      // Check for a specific pattern where A is recursive GEP with a PHI ptr
+      // with incoming values as (Start,Step) and Start==B.
+      auto *GEPA = dyn_cast<GEPOperator>(A);
+      if (!GEPA) {
+        GEPA = dyn_cast<GEPOperator>(B);
+        std::swap(A, B);
+      }
+      if (!GEPA || GEPA->getNumIndices() != 1 ||
+          !isa<Constant>(GEPA->idx_begin()))
+        return false;
+
+      // Handle 2 incoming PHI values with one being a recursive GEP.
+      auto *PN = dyn_cast<PHINode>(GEPA->getPointerOperand());
+      if (!PN || PN->getNumIncomingValues() != 2)
+        return false;
+
+      // Search for the recursive GEP as an incoming operand, and record that as
+      // Step.
+      Value *Start = nullptr;
+      Value *Step = const_cast<Value *>(A);
+      if (PN->getIncomingValue(0) == Step)
+        Start = PN->getIncomingValue(1);
+      else if (PN->getIncomingValue(1) == Step)
+        Start = PN->getIncomingValue(0);
+      else
+        return false;
 
+      // Check if the pointers Start and B are same and ObjectSize can be
+      // determined/PointerAllocationsLimitedto32bit flag set.
+      if (Start == B) {
+        ObjectSizeOpts Opts;
+        Opts.RoundToAlign = false;
+        Opts.NullIsUnknownSize = true;
+        uint64_t ObjSize = 0;
+        // Can we get the ObjectSize and ObjectSize is within range.
+        // There is possibly more that we can do when ObjSize if known, and
+        // extend the same for other truncates. Restricting it for a 32bit
+        // truncate result.
+        if (getObjectSize(Start, ObjSize, Q.DL, Q.TLI, Opts)) {
+          if (ObjSize == 0 || ObjSize > (uint64_t)0xFFFFFFFF)
+            return false;
+        }
+        // Does the compiler flag specify Object allocs within 4G.
+        else if (!PointerAllocationsLimitedto32bit)
+          return false;
+      } else
+        return false;
+
+      // Check if trunc src type is same as Ptr type and dest is a 32bit.
+      // If objectSize<4G or PointerAllocationsLimitedto32bit flag set, check if
+      // the trunc operand is a KnownNonZero.
+      LLVMContext &Ctx = I->getContext();
+      Type *SrcTy = I->getOperand(0)->getType();
+      Type *DstTy = I->getType();
+      unsigned IntPtrWidth = Q.DL.getPointerSizeInBits();
+      unsigned TruncSrcBits = Q.DL.getTypeSizeInBits(SrcTy);
+      if (TruncSrcBits == IntPtrWidth && DstTy == Type::getInt32Ty(Ctx))
+        return isKnownNonZero(I->getOperand(0), Depth, Q);
+    }
+    break;
   case Instruction::Shl: {
     // shl nsw/nuw can't remove any non-zero bits.
     const OverflowingBinaryOperator *BO = cast<OverflowingBinaryOperator>(I);
diff --git a/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
new file mode 100644
index 00000000000000..01c6e35e3fbdd2
--- /dev/null
+++ b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
@@ -0,0 +1,359 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -alloc-limit-32bit=true -passes=instcombine < %s -S | FileCheck %s
+
+define i1 @recursiveGEP_withPtrSubTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubAshrTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubAshrTrunc(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 2
+; CHECK-NEXT:    [[TMP0:%.*]] = load i16, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i16 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i16, ptr %a.pn.i, i64 1
+  %0 = load i16, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i16 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %sub.ptr.div = ashr exact i64 %sub.ptr.sub.i, 1
+  %conv4 = trunc i64 %sub.ptr.div to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[TEST_1_I:%.*]] = getelementptr inbounds i8, ptr [[VAL1]], i64 -5
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_1_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT:    [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[CONV4]], 0
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT:    ret i1 [[RETVAL_0_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %test.1.i = getelementptr inbounds i8, ptr %val1, i64 -5
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %test.1.i to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[VAL1]], [[ENTRY:%.*]] ], [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %val1, %entry ], [ %test.0.i, %while.cond.i ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    ret i1 [[CMP_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.rhs.cast.i, %sub.ptr.lhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i32
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i32 %retval.0.i, 0
+  ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc64to16(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc64to16(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT:    br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT:    [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK:       while.end.i:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[VAL1]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT:    [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i16
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i16 [[CONV4]], 0
+; CHECK-NEXT:    br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK:       _Z9stringlenPKs.exit:
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT:    ret i1 [[RETVAL_0_I]]
+;
+entry:
+  %cmp.i = icmp eq ptr %val1, null
+  br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+  %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 2
+  %cmp3.not.i = icmp eq i8 %0, 0
+  br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %conv4 = trunc i64 %sub.ptr.sub.i to i16
+  br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+  %retval.0.i = phi i16 [ %conv4, %while.end.i ], [ 0, %entry ]
+  %bool = icmp eq i16 %retval.0.i, 0
+  ret i1 %bool
+}
+
+%struct.Foo = type { [256 x i8] }
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT:    store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    br label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT:    [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK:       len.exit:
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    ret i1 false
+;
+entry:
+  %foo = alloca %struct.Foo, align 1
+  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %foo)
+  call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) %foo, i8 97, i64 255, i1 false)
+  %arrayidx = getelementptr inbounds [256 x i8], ptr %foo, i64 0, i64 255
+  store i8 0, ptr %arrayidx, align 1
+  br label %while.cond.i
+
+while.cond.i:
+  %f.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %foo, %entry ]
+  %test.0.i = getelementptr inbounds i8, ptr %f.pn.i, i64 1
+  %0 = load i8, ptr %test.0.i, align 1
+  %cmp5.not.i = icmp eq i8 %0, 0
+  br i1 %cmp5.not.i, label %len.exit, label %while.cond.i
+
+len.exit:
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %foo to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  %1 = trunc i64 %sub.ptr.sub.i to i32
+  %cmp = icmp eq i32 %1, 0
+  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %foo)
+  ret i1 %cmp
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT:    store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[FOO2:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 256
+; CHECK-NEXT:    br label [[WHILE_COND_I:%.*]]
+; CHECK:       while.cond.i:
+; CHECK-NEXT:    [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO2]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT:    [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK:       len.exit:
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[FOO2]] to i64
+; CHECK-NEXT:    [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_...
[truncated]

Copy link

github-actions bot commented Mar 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

// i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are
// from same object and one of the pointers is a recursive GEP. And either
// the objectSize is known OR is indicative via a compiler flag, which
// suggests objectSize<4G.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could instead support the exact flag in trunc to indicate that it doesn't change the high bits.
I don't think this is the only case where we wish we could peek past trunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there are other cases we can benefit particularly where the Object Size is known and can be determined.
The idea here was also for situations where we do not know much about the object and cannot possibly identify a ptr as an Object. Also for many applications we do not exceed the 32bit allocation requirement. In these cases a compiler flag would benefit.

@goldsteinn
Copy link
Contributor

Is there a motivating case for this?

@bipmis
Copy link
Contributor Author

bipmis commented Apr 22, 2024

Is there a motivating case for this?

@goldsteinn I believe it would extend the reductions for scenarios where objectSize can be known.
Additionally it would also speedup the loops which implements a stringLen(recursive gep in loop), which can be possibly be reduced and results in a truncate. A simple case can be as seen in https://gcc.godbolt.org/z/K93v1dsaG

@bipmis bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch 2 times, most recently from cbfbdb3 to 09a55bb Compare April 22, 2024 13:44
@bipmis
Copy link
Contributor Author

bipmis commented Apr 22, 2024

Ping.

if (PN->getIncomingValue(0) == Step)
Start = PN->getIncomingValue(1);
else if (PN->getIncomingValue(1) == Step)
Start = PN->getIncomingValue(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you should use matchSimpleRecurence here. Will be pre-req patch to support GEP in the helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matchSimpleRecurence is tightly bound to a BinaryOperator and extends beyond ValueTracking. I could create one to handle GEPOperator.

@goldsteinn
Copy link
Contributor

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

@bipmis bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch from 09a55bb to b18d95c Compare April 23, 2024 17:28
@bipmis
Copy link
Contributor Author

bipmis commented Apr 23, 2024

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

@goldsteinn
Copy link
Contributor

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

That patch will land today.

@nikic
Copy link
Contributor

nikic commented Apr 24, 2024

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

@bipmis bipmis force-pushed the ValueTracking64bitPointersWith32bitAlloc branch from b18d95c to 7b21620 Compare April 24, 2024 20:59
@bipmis
Copy link
Contributor Author

bipmis commented Apr 24, 2024

I think this special case is probably misplaced given that we have trunc nuw/nsw. I think this should be converted to a fold in InstCombine that adds nuw flag to the trunc. I have a patch to support nuw/nsw is isKnownNonZero for trunc: #89643

Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands.

That patch will land today.

Great. Made the changes.

@bipmis
Copy link
Contributor Author

bipmis commented Apr 24, 2024

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

@@ -910,6 +916,67 @@ Instruction *InstCombinerImpl::visitTrunc(TruncInst &Trunc) {
Changed = true;
}

// Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to i32 / trunc i64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this logic be inside computeKnownBits() for sub? The information we get out of this is that the top bits of the sub result are zero. That will result in the nuw flag on trunc being set down the line.

// Controls the maximum memory allocation to 32-bit or 4GB.
static cl::opt<bool> PointerAllocationsLimitedto32bit(
"alloc-limit-32bit", cl::init(false),
cl::desc("A Pointer with max 32bit allocs"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not willing to add this flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have provided my understanding of this code pattern and dont quite agree that it violates a benchmark rule or criteria.

if (!GEPA) {
GEPA = dyn_cast<GEPOperator>(B);
std::swap(A, B);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need inbounds for this (please also add test)? Otherwise the GEPs may go beyond the limit of the allocation size.

@nikic
Copy link
Contributor

nikic commented Apr 25, 2024

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 25, 2024

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern. But this patch seems to just handle a special case :( I cannot tell whether it provides benefits to some real-world applications.

@bipmis
Copy link
Contributor Author

bipmis commented Apr 25, 2024

Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned).

I would agree. However there are scenarios in benchmark applications where source cannot be changed.

I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter...

trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern. But this patch seems to just handle a special case :( I cannot tell whether it provides benefits to some real-world applications.

It definitely extends beyond a single case. This would also apply to the legacy code scenarios where stringLength() was bound to an int type. I am seeing this with utf8_check_string(). And there can be scenarios where we would like to build these applications with the latest compiler.
Additionally if trunc(sub(ptr2int, ptr2int)) to i32 is a common pattern and it can be determined that the 2 ptrs point to same object and objectSize evaluated statically/known by the application user and suggested via flag, it should be a valid optimisation.

@bipmis
Copy link
Contributor Author

bipmis commented Apr 29, 2024

I just the looked at usability of the sequence trunc(sub(ptr2int, ptr2int)) / (unsigned)(strlen) and looks like it indeed it is quite common in a class of applications beyond the legacy code example cited earlier. Some of the other scenarios(not an extensive list) where we see this pattern and should be able to optimise out the "trunc instruction" would be

  1. Database applications like sql - https://opensource.apple.com/source/CPANInternal/CPANInternal-108/DBD-SQLite/sqlite3.c, openDirectory()
  2. Multimedia applications - https://github.com/xiph/flac/blob/master/src/metaflac/options.c, parse_add_seekpoint()
  3. Compression algorithms - https://github.com/mcmilk/7-Zip/blob/master/CPP/7zip/UI/Common/HashCalc.cpp, AddHashResultLine()
    In some these cases if the allocations are limited to a max 4Gb(32 bit allocations) which is definitely possible and known to the application user, we could use the supported flag to optimise out the "trunc" instruction, which in turn could enable host of other optimisations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants