-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ValueTracking] isNonZero trunc of sub of ptr2int's with recursive GEP where pointers are limited to a 32bit alloc. #84933
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: None (bipmis) ChangesWe handled isKnownNonZero sub(ptr2int, ptr2int). In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero. Patch is 21.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84933.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 6d0e79e11eed43..9e8efb642bf922 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -35,6 +35,7 @@
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/VectorUtils.h"
#include "llvm/Analysis/WithCache.h"
+#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"
@@ -88,6 +89,10 @@ using namespace llvm::PatternMatch;
static cl::opt<unsigned> DomConditionsMaxUses("dom-conditions-max-uses",
cl::Hidden, cl::init(20));
+// Controls the maximum memory allocation to 32-bit or 4GB.
+static cl::opt<bool> PointerAllocationsLimitedto32bit(
+ "alloc-limit-32bit", cl::init(false),
+ cl::desc("A Pointer with max 32bit allocs"));
/// Returns the bitwidth of the given scalar or pointer type. For vector types,
/// returns the element type's bitwidth.
@@ -2539,7 +2544,80 @@ static bool isKnownNonZeroFromOperator(const Operator *I,
case Instruction::ZExt:
// ext X != 0 if X != 0.
return isKnownNonZero(I->getOperand(0), Depth, Q);
+ case Instruction::Trunc:
+ // trunc(operand) is KnownNonZero if lower order bits of operand is a
+ // KnownNonZero. Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to
+ // i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are
+ // from same object and one of the pointers is a recursive GEP. And either
+ // the objectSize is known OR is indicative via a compiler flag, which
+ // suggests objectSize<4G.
+ Value *A, *B;
+ if (match(I->getOperand(0), m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+ m_PtrToIntSameSize(Q.DL, m_Value(B)))) ||
+ match(I->getOperand(0),
+ m_LShr(m_Sub(m_PtrToIntSameSize(Q.DL, m_Value(A)),
+ m_PtrToIntSameSize(Q.DL, m_Value(B))),
+ m_ConstantInt()))) {
+ // Check for a specific pattern where A is recursive GEP with a PHI ptr
+ // with incoming values as (Start,Step) and Start==B.
+ auto *GEPA = dyn_cast<GEPOperator>(A);
+ if (!GEPA) {
+ GEPA = dyn_cast<GEPOperator>(B);
+ std::swap(A, B);
+ }
+ if (!GEPA || GEPA->getNumIndices() != 1 ||
+ !isa<Constant>(GEPA->idx_begin()))
+ return false;
+
+ // Handle 2 incoming PHI values with one being a recursive GEP.
+ auto *PN = dyn_cast<PHINode>(GEPA->getPointerOperand());
+ if (!PN || PN->getNumIncomingValues() != 2)
+ return false;
+
+ // Search for the recursive GEP as an incoming operand, and record that as
+ // Step.
+ Value *Start = nullptr;
+ Value *Step = const_cast<Value *>(A);
+ if (PN->getIncomingValue(0) == Step)
+ Start = PN->getIncomingValue(1);
+ else if (PN->getIncomingValue(1) == Step)
+ Start = PN->getIncomingValue(0);
+ else
+ return false;
+ // Check if the pointers Start and B are same and ObjectSize can be
+ // determined/PointerAllocationsLimitedto32bit flag set.
+ if (Start == B) {
+ ObjectSizeOpts Opts;
+ Opts.RoundToAlign = false;
+ Opts.NullIsUnknownSize = true;
+ uint64_t ObjSize = 0;
+ // Can we get the ObjectSize and ObjectSize is within range.
+ // There is possibly more that we can do when ObjSize if known, and
+ // extend the same for other truncates. Restricting it for a 32bit
+ // truncate result.
+ if (getObjectSize(Start, ObjSize, Q.DL, Q.TLI, Opts)) {
+ if (ObjSize == 0 || ObjSize > (uint64_t)0xFFFFFFFF)
+ return false;
+ }
+ // Does the compiler flag specify Object allocs within 4G.
+ else if (!PointerAllocationsLimitedto32bit)
+ return false;
+ } else
+ return false;
+
+ // Check if trunc src type is same as Ptr type and dest is a 32bit.
+ // If objectSize<4G or PointerAllocationsLimitedto32bit flag set, check if
+ // the trunc operand is a KnownNonZero.
+ LLVMContext &Ctx = I->getContext();
+ Type *SrcTy = I->getOperand(0)->getType();
+ Type *DstTy = I->getType();
+ unsigned IntPtrWidth = Q.DL.getPointerSizeInBits();
+ unsigned TruncSrcBits = Q.DL.getTypeSizeInBits(SrcTy);
+ if (TruncSrcBits == IntPtrWidth && DstTy == Type::getInt32Ty(Ctx))
+ return isKnownNonZero(I->getOperand(0), Depth, Q);
+ }
+ break;
case Instruction::Shl: {
// shl nsw/nuw can't remove any non-zero bits.
const OverflowingBinaryOperator *BO = cast<OverflowingBinaryOperator>(I);
diff --git a/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
new file mode 100644
index 00000000000000..01c6e35e3fbdd2
--- /dev/null
+++ b/llvm/test/Analysis/ValueTracking/phi-known-bits-truncflag.ll
@@ -0,0 +1,359 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -alloc-limit-32bit=true -passes=instcombine < %s -S | FileCheck %s
+
+define i1 @recursiveGEP_withPtrSubTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: ret i1 [[CMP_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+ %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i8 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %conv4 = trunc i64 %sub.ptr.sub.i to i32
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i32 %retval.0.i, 0
+ ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubAshrTrunc(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubAshrTrunc(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 2
+; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i16 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: ret i1 [[CMP_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+ %test.0.i = getelementptr inbounds i16, ptr %a.pn.i, i64 1
+ %0 = load i16, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i16 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %sub.ptr.div = ashr exact i64 %sub.ptr.sub.i, 1
+ %conv4 = trunc i64 %sub.ptr.div to i32
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i32 %retval.0.i, 0
+ ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_StartNotEqualtoB(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT: [[TEST_1_I:%.*]] = getelementptr inbounds i8, ptr [[VAL1]], i64 -5
+; CHECK-NEXT: [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_1_I]] to i64
+; CHECK-NEXT: [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT: [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i32
+; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[CONV4]], 0
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT: ret i1 [[RETVAL_0_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+ %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i8 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %test.1.i = getelementptr inbounds i8, ptr %val1, i64 -5
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %test.1.i to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %conv4 = trunc i64 %sub.ptr.sub.i to i32
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i32 %retval.0.i, 0
+ ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_PhiOperandsCommuted(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[VAL1]], [[ENTRY:%.*]] ], [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: ret i1 [[CMP_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %val1, %entry ], [ %test.0.i, %while.cond.i ]
+ %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i8 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %conv4 = trunc i64 %sub.ptr.sub.i to i32
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i32 %retval.0.i, 0
+ ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_SubOperandsCommuted(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: ret i1 [[CMP_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+ %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i8 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.rhs.cast.i, %sub.ptr.lhs.cast.i
+ %conv4 = trunc i64 %sub.ptr.sub.i to i32
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i32 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i32 %retval.0.i, 0
+ ret i1 %bool
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc64to16(ptr noundef %val1) {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc64to16(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq ptr [[VAL1:%.*]], null
+; CHECK-NEXT: br i1 [[CMP_I]], label [[_Z9STRINGLENPKS_EXIT:%.*]], label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[A_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[VAL1]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[A_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 2
+; CHECK-NEXT: [[CMP3_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP3_NOT_I]], label [[WHILE_END_I:%.*]], label [[WHILE_COND_I]]
+; CHECK: while.end.i:
+; CHECK-NEXT: [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT: [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[VAL1]] to i64
+; CHECK-NEXT: [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_RHS_CAST_I]]
+; CHECK-NEXT: [[CONV4:%.*]] = trunc i64 [[SUB_PTR_SUB_I]] to i16
+; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i16 [[CONV4]], 0
+; CHECK-NEXT: br label [[_Z9STRINGLENPKS_EXIT]]
+; CHECK: _Z9stringlenPKs.exit:
+; CHECK-NEXT: [[RETVAL_0_I:%.*]] = phi i1 [ [[TMP1]], [[WHILE_END_I]] ], [ true, [[ENTRY]] ]
+; CHECK-NEXT: ret i1 [[RETVAL_0_I]]
+;
+entry:
+ %cmp.i = icmp eq ptr %val1, null
+ br i1 %cmp.i, label %_Z9stringlenPKs.exit, label %while.cond.i
+
+while.cond.i:
+ %a.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %val1, %entry ]
+ %test.0.i = getelementptr inbounds i8, ptr %a.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 2
+ %cmp3.not.i = icmp eq i8 %0, 0
+ br i1 %cmp3.not.i, label %while.end.i, label %while.cond.i
+
+while.end.i:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %conv4 = trunc i64 %sub.ptr.sub.i to i16
+ br label %_Z9stringlenPKs.exit
+
+_Z9stringlenPKs.exit:
+ %retval.0.i = phi i16 [ %conv4, %while.end.i ], [ 0, %entry ]
+ %bool = icmp eq i16 %retval.0.i, 0
+ ret i1 %bool
+}
+
+%struct.Foo = type { [256 x i8] }
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT: store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: br label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT: [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK: len.exit:
+; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT: ret i1 false
+;
+entry:
+ %foo = alloca %struct.Foo, align 1
+ call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %foo)
+ call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) %foo, i8 97, i64 255, i1 false)
+ %arrayidx = getelementptr inbounds [256 x i8], ptr %foo, i64 0, i64 255
+ store i8 0, ptr %arrayidx, align 1
+ br label %while.cond.i
+
+while.cond.i:
+ %f.pn.i = phi ptr [ %test.0.i, %while.cond.i ], [ %foo, %entry ]
+ %test.0.i = getelementptr inbounds i8, ptr %f.pn.i, i64 1
+ %0 = load i8, ptr %test.0.i, align 1
+ %cmp5.not.i = icmp eq i8 %0, 0
+ br i1 %cmp5.not.i, label %len.exit, label %while.cond.i
+
+len.exit:
+ %sub.ptr.lhs.cast.i = ptrtoint ptr %test.0.i to i64
+ %sub.ptr.rhs.cast.i = ptrtoint ptr %foo to i64
+ %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+ %1 = trunc i64 %sub.ptr.sub.i to i32
+ %cmp = icmp eq i32 %1, 0
+ call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %foo)
+ ret i1 %cmp
+}
+
+define i1 @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0() {
+; CHECK-LABEL: @recursiveGEP_withPtrSubTrunc_knownObject_ObjSize0(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[FOO:%.*]] = alloca [[STRUCT_FOO:%.*]], align 1
+; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 256, ptr nonnull [[FOO]])
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(256) [[FOO]], i8 97, i64 255, i1 false)
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 255
+; CHECK-NEXT: store i8 0, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: [[FOO2:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 256
+; CHECK-NEXT: br label [[WHILE_COND_I:%.*]]
+; CHECK: while.cond.i:
+; CHECK-NEXT: [[F_PN_I:%.*]] = phi ptr [ [[TEST_0_I:%.*]], [[WHILE_COND_I]] ], [ [[FOO2]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TEST_0_I]] = getelementptr inbounds i8, ptr [[F_PN_I]], i64 1
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TEST_0_I]], align 1
+; CHECK-NEXT: [[CMP5_NOT_I:%.*]] = icmp eq i8 [[TMP0]], 0
+; CHECK-NEXT: br i1 [[CMP5_NOT_I]], label [[LEN_EXIT:%.*]], label [[WHILE_COND_I]]
+; CHECK: len.exit:
+; CHECK-NEXT: [[SUB_PTR_LHS_CAST_I:%.*]] = ptrtoint ptr [[TEST_0_I]] to i64
+; CHECK-NEXT: [[SUB_PTR_RHS_CAST_I:%.*]] = ptrtoint ptr [[FOO2]] to i64
+; CHECK-NEXT: [[SUB_PTR_SUB_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I]], [[SUB_PTR_...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/lib/Analysis/ValueTracking.cpp
Outdated
// i32 / trunc i64 (lshr(sub(ptr2int, ptr2int))) to i32, where the ptrs are | ||
// from same object and one of the pointers is a recursive GEP. And either | ||
// the objectSize is known OR is indicative via a compiler flag, which | ||
// suggests objectSize<4G. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could instead support the exact
flag in trunc to indicate that it doesn't change the high bits.
I don't think this is the only case where we wish we could peek past trunc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure there are other cases we can benefit particularly where the Object Size is known and can be determined.
The idea here was also for situations where we do not know much about the object and cannot possibly identify a ptr as an Object. Also for many applications we do not exceed the 32bit allocation requirement. In these cases a compiler flag would benefit.
Is there a motivating case for this? |
@goldsteinn I believe it would extend the reductions for scenarios where objectSize can be known. |
cbfbdb3
to
09a55bb
Compare
Ping. |
llvm/lib/Analysis/ValueTracking.cpp
Outdated
if (PN->getIncomingValue(0) == Step) | ||
Start = PN->getIncomingValue(1); | ||
else if (PN->getIncomingValue(1) == Step) | ||
Start = PN->getIncomingValue(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you should use matchSimpleRecurence
here. Will be pre-req patch to support GEP
in the helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matchSimpleRecurence
is tightly bound to a BinaryOperator and extends beyond ValueTracking. I could create one to handle GEPOperator.
I think this special case is probably misplaced given that we have |
09a55bb
to
b18d95c
Compare
Could be done. I think we can have this here for now and I can possibly do an update to trunc in InstCombine once the patch lands. |
That patch will land today. |
Looks like a case where it's better to change the source code to use the correct type instead (size_t instead of unsigned). |
…ion when objSize is known
…ion when allocs are restricted to 32-bit
b18d95c
to
7b21620
Compare
Great. Made the changes. |
I would agree. However there are scenarios in benchmark applications where source cannot be changed. |
@@ -910,6 +916,67 @@ Instruction *InstCombinerImpl::visitTrunc(TruncInst &Trunc) { | |||
Changed = true; | |||
} | |||
|
|||
// Handle scenarios like trunc i64 (sub(ptr2int, ptr2int)) to i32 / trunc i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this logic be inside computeKnownBits() for sub? The information we get out of this is that the top bits of the sub result are zero. That will result in the nuw flag on trunc being set down the line.
// Controls the maximum memory allocation to 32-bit or 4GB. | ||
static cl::opt<bool> PointerAllocationsLimitedto32bit( | ||
"alloc-limit-32bit", cl::init(false), | ||
cl::desc("A Pointer with max 32bit allocs")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not willing to add this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
WARNING: Adding this flag may violate SPEC's rule.
See also https://www.spec.org/cpu2017/Docs/runrules.html#rule_1.4 and https://www.spec.org/cpu2017/results/res2023q1/cpu2017-20221223-33220.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have provided my understanding of this code pattern and dont quite agree that it violates a benchmark rule or criteria.
if (!GEPA) { | ||
GEPA = dyn_cast<GEPOperator>(B); | ||
std::swap(A, B); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need inbounds for this (please also add test)? Otherwise the GEPs may go beyond the limit of the allocation size.
I suspected you were going to say this. This patch very much looks like a hack for the sake of improving some benchmark score, which is something I am not very fond of. Of course, if we can get a clean implementation out of this and show that it benefits non-benchmark code (cc @dtcxzyw), then that's a different matter... |
|
It definitely extends beyond a single case. This would also apply to the legacy code scenarios where stringLength() was bound to an int type. I am seeing this with utf8_check_string(). And there can be scenarios where we would like to build these applications with the latest compiler. |
I just the looked at usability of the sequence trunc(sub(ptr2int, ptr2int)) / (unsigned)(strlen) and looks like it indeed it is quite common in a class of applications beyond the legacy code example cited earlier. Some of the other scenarios(not an extensive list) where we see this pattern and should be able to optimise out the "trunc instruction" would be
|
We handled isKnownNonZero sub(ptr2int, ptr2int).
In cases like trunc(sub(ptr2int, ptr2int)) to i32, where the pointer operand to the ptr2int() is referring to an object where either the ObjSize can be determined(like alloca), or restricted via compiler flag which limits all allocations to max of 32bits/4Gb, we can eliminate the truncate and check if the truncate argument is a KnownNonZero.