Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AggressiveInstCombine] Inline strcmp/strncmp #89371

Merged
merged 6 commits into from
May 3, 2024

Conversation

FLZ101
Copy link
Contributor

@FLZ101 FLZ101 commented Apr 19, 2024

Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and exactly one of s1 and s2 are constant.

For example:

int res = strcmp(s, "ab");

is converted to

int res = (int)s[0] - (int)'a';
if (res != 0)
  goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
  goto END;
res = (int)s[2] - (int)'\0';
END:

Ported from a similar gcc feature Inline strcmp with small constant strings.

Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N),
where N and exactly one of s1 and s2 are constant.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 19, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Franklin Zhang (FLZ101)

Changes

Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and exactly one of s1 and s2 are constant.

For example:

int res = strcmp(s, "ab");

is converted to

int res = (int)s[0] - (int)'a';
if (res != 0)
  goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
  goto END;
res = (int)s[2] - (int)'\0';
END:

Ported from a similar gcc feature Inline strcmp with small constant strings.


Patch is 31.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89371.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp (+255-3)
  • (removed) llvm/test/Transforms/AggressiveInstCombine/strcmp.ll (-219)
  • (added) llvm/test/Transforms/AggressiveInstCombine/strncmp-1.ll (+203)
  • (added) llvm/test/Transforms/AggressiveInstCombine/strncmp-2.ll (+145)
diff --git a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
index e586e9eda1322f..eddd7382c27bbc 100644
--- a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+++ b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
@@ -19,6 +19,7 @@
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
 #include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
@@ -28,6 +29,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/PatternMatch.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/BuildLibCalls.h"
 #include "llvm/Transforms/Utils/Local.h"
 
@@ -922,6 +924,251 @@ static bool foldPatternedLoads(Instruction &I, const DataLayout &DL) {
   return true;
 }
 
+static cl::opt<unsigned> StrNCmpInlineThreshold(
+    "strncmp-inline-threshold", cl::init(3), cl::Hidden,
+    cl::desc("The maximum length of a constant string for a builtin string cmp "
+             "call eligible for inlining. The default value is 3."));
+
+namespace {
+class StrNCmpInliner {
+public:
+  StrNCmpInliner(CallInst *CI, LibFunc Func, Function::iterator &BBNext,
+                 DomTreeUpdater *DTU, const DataLayout &DL)
+      : CI(CI), Func(Func), BBNext(BBNext), DTU(DTU), DL(DL) {}
+
+  bool optimizeStrNCmp();
+
+private:
+  bool inlineCompare(Value *LHS, StringRef RHS, uint64_t N, bool Switched);
+
+  CallInst *CI;
+  LibFunc Func;
+  Function::iterator &BBNext;
+  DomTreeUpdater *DTU;
+  const DataLayout &DL;
+};
+
+} // namespace
+
+/// First we normalize calls to strncmp/strcmp to the form of
+/// compare(s1, s2, N), which means comparing first N bytes of s1 and s2
+/// (without considering '\0')
+///
+/// Examples:
+///
+/// \code
+///   strncmp(s, "a", 3) -> compare(s, "a", 2)
+///   strncmp(s, "abc", 3) -> compare(s, "abc", 3)
+///   strncmp(s, "a\0b", 3) -> compare(s, "a\0b", 2)
+///   strcmp(s, "a") -> compare(s, "a", 2)
+///
+///   char s2[] = {'a'}
+///   strncmp(s, s2, 3) -> compare(s, s2, 3)
+///
+///   char s2[] = {'a', 'b', 'c', 'd'}
+///   strncmp(s, s2, 3) -> compare(s, s2, 3)
+/// \endcode
+///
+/// We only handle cases that N and exactly one of s1 and s2 are constant. Cases
+/// that s1 and s2 are both constant are already handled by the instcombine
+/// pass.
+///
+/// We do not handle cases that N > StrNCmpInlineThreshold.
+///
+/// We also do not handles cases that N < 2, which are already
+/// handled by the instcombine pass.
+///
+bool StrNCmpInliner::optimizeStrNCmp() {
+  if (StrNCmpInlineThreshold < 2)
+    return false;
+
+  Value *Str1P = CI->getArgOperand(0);
+  Value *Str2P = CI->getArgOperand(1);
+  // should be handled elsewhere
+  if (Str1P == Str2P)
+    return false;
+
+  StringRef Str1, Str2;
+  bool HasStr1 = getConstantStringInfo(Str1P, Str1, false);
+  bool HasStr2 = getConstantStringInfo(Str2P, Str2, false);
+  if (!(HasStr1 ^ HasStr2))
+    return false;
+
+  // note that '\0' and characters after it are not trimmed
+  StringRef Str = HasStr1 ? Str1 : Str2;
+
+  size_t Idx = Str.find('\0');
+  uint64_t N = Idx == StringRef::npos ? UINT64_MAX : Idx + 1;
+  if (Func == LibFunc_strncmp) {
+    if (!isa<ConstantInt>(CI->getArgOperand(2)))
+      return false;
+    N = std::min(N, cast<ConstantInt>(CI->getArgOperand(2))->getZExtValue());
+  }
+  // now N means how many bytes we need to compare at most
+  if (N > Str.size() || N < 2 || N > StrNCmpInlineThreshold)
+    return false;
+
+  Value *StrP = HasStr1 ? Str2P : Str1P;
+
+  // cases that StrP has two or more dereferenceable bytes might be better
+  // optimized elsewhere
+  bool CanBeNull = false, CanBeFreed = false;
+  if (StrP->getPointerDereferenceableBytes(DL, CanBeNull, CanBeFreed) > 1)
+    return false;
+
+  return inlineCompare(StrP, Str, N, HasStr1);
+}
+
+/// Convert
+///
+/// \code
+///   ret = compare(s1, s2, N)
+/// \endcode
+///
+/// into
+///
+/// \code
+///   ret = (int)s1[0] - (int)s2[0]
+///   if (ret != 0)
+///     goto NE
+///   ...
+///   ret = (int)s1[N-2] - (int)s2[N-2]
+///   if (ret != 0)
+///     goto NE
+///   ret = (int)s1[N-1] - (int)s2[N-1]
+///   NE:
+/// \endcode
+///
+/// CFG before and after the transformation:
+///
+/// (before)
+/// BBCI
+///
+/// (after)
+/// BBBefore -> BBSubs[0] (sub,icmp) --NE-> BBNE -> BBCI
+///                      |                    ^
+///                      E                    |
+///                      |                    |
+///             BBSubs[1] (sub,icmp) --NE-----+
+///                     ...                   |
+///             BBSubs[N-1]    (sub) ---------+
+///
+bool StrNCmpInliner::inlineCompare(Value *LHS, StringRef RHS, uint64_t N,
+                                   bool Switched) {
+  IRBuilder<> B(CI->getContext());
+
+  BasicBlock *BBCI = CI->getParent();
+  bool IsEntry = BBCI->isEntryBlock();
+  BasicBlock *BBBefore = splitBlockBefore(BBCI, CI, DTU, nullptr, nullptr,
+                                          BBCI->getName() + ".before");
+
+  SmallVector<BasicBlock *> BBSubs;
+  for (uint64_t i = 0; i < N + 1; ++i)
+    BBSubs.push_back(
+        BasicBlock::Create(CI->getContext(), "sub", BBCI->getParent(), BBCI));
+  BasicBlock *BBNE = BBSubs[N];
+
+  cast<BranchInst>(BBBefore->getTerminator())->setSuccessor(0, BBSubs[0]);
+
+  B.SetInsertPoint(BBNE);
+  PHINode *Phi = B.CreatePHI(CI->getType(), N);
+  B.CreateBr(BBCI);
+
+  Value *Base = LHS;
+  for (uint64_t i = 0; i < N; ++i) {
+    B.SetInsertPoint(BBSubs[i]);
+    Value *VL = B.CreateZExt(
+        B.CreateLoad(B.getInt8Ty(),
+                     B.CreateInBoundsGEP(B.getInt8Ty(), Base, B.getInt64(i))),
+        CI->getType());
+    Value *VR = ConstantInt::get(CI->getType(), RHS[i]);
+    Value *Sub = Switched ? B.CreateSub(VR, VL) : B.CreateSub(VL, VR);
+    if (i < N - 1)
+      B.CreateCondBr(B.CreateICmpNE(Sub, ConstantInt::get(CI->getType(), 0)),
+                     BBNE, BBSubs[i + 1]);
+    else
+      B.CreateBr(BBNE);
+
+    Phi->addIncoming(Sub, BBSubs[i]);
+  }
+
+  CI->replaceAllUsesWith(Phi);
+  CI->eraseFromParent();
+
+  BBNext = BBCI->getIterator();
+
+  // Update DomTree
+  if (DTU) {
+    if (IsEntry) {
+      DTU->recalculate(*BBCI->getParent());
+    } else {
+      SmallVector<DominatorTree::UpdateType, 8> Updates;
+      Updates.push_back({DominatorTree::Delete, BBBefore, BBCI});
+      Updates.push_back({DominatorTree::Insert, BBBefore, BBSubs[0]});
+      for (uint64_t i = 0; i < N; ++i) {
+        if (i < N - 1)
+          Updates.push_back({DominatorTree::Insert, BBSubs[i], BBSubs[i + 1]});
+        Updates.push_back({DominatorTree::Insert, BBSubs[i], BBNE});
+      }
+      Updates.push_back({DominatorTree::Insert, BBNE, BBCI});
+      DTU->applyUpdates(Updates);
+    }
+  }
+  return true;
+}
+
+static bool inlineLibCalls(Function &F, TargetLibraryInfo &TLI,
+                           const TargetTransformInfo &TTI, DominatorTree &DT,
+                           bool &MadeCFGChange) {
+  MadeCFGChange = false;
+  DomTreeUpdater DTU(&DT, DomTreeUpdater::UpdateStrategy::Lazy);
+
+  bool MadeChange = false;
+
+  Function::iterator CurrBB;
+  for (Function::iterator BB = F.begin(), BE = F.end(); BB != BE;) {
+    CurrBB = BB++;
+
+    for (BasicBlock::iterator II = CurrBB->begin(), IE = CurrBB->end();
+         II != IE; ++II) {
+      CallInst *Call = dyn_cast<CallInst>(&*II);
+      Function *CalledFunc;
+
+      if (!Call || !(CalledFunc = Call->getCalledFunction()))
+        continue;
+
+      if (Call->isNoBuiltin())
+        continue;
+
+      // Skip if function either has local linkage or is not a known library
+      // function.
+      LibFunc LF;
+      if (CalledFunc->hasLocalLinkage() || !TLI.getLibFunc(*CalledFunc, LF) ||
+          !TLI.has(LF))
+        continue;
+
+      switch (LF) {
+      case LibFunc_strcmp:
+      case LibFunc_strncmp: {
+        auto &DL = F.getParent()->getDataLayout();
+        if (StrNCmpInliner(Call, LF, BB, &DTU, DL).optimizeStrNCmp()) {
+          MadeCFGChange = true;
+          break;
+        }
+        continue;
+      }
+      default:
+        continue;
+      }
+
+      MadeChange = true;
+      break;
+    }
+  }
+
+  return MadeChange;
+}
+
 /// This is the entry point for folds that could be implemented in regular
 /// InstCombine, but they are separated because they are not expected to
 /// occur frequently and/or have more than a constant-length pattern match.
@@ -969,11 +1216,12 @@ static bool foldUnusualPatterns(Function &F, DominatorTree &DT,
 /// handled in the callers of this function.
 static bool runImpl(Function &F, AssumptionCache &AC, TargetTransformInfo &TTI,
                     TargetLibraryInfo &TLI, DominatorTree &DT,
-                    AliasAnalysis &AA) {
+                    AliasAnalysis &AA, bool &MadeCFGChange) {
   bool MadeChange = false;
   const DataLayout &DL = F.getParent()->getDataLayout();
   TruncInstCombine TIC(AC, TLI, DL, DT);
   MadeChange |= TIC.run(F);
+  MadeChange |= inlineLibCalls(F, TLI, TTI, DT, MadeCFGChange);
   MadeChange |= foldUnusualPatterns(F, DT, TTI, TLI, AA, AC);
   return MadeChange;
 }
@@ -985,12 +1233,16 @@ PreservedAnalyses AggressiveInstCombinePass::run(Function &F,
   auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
   auto &TTI = AM.getResult<TargetIRAnalysis>(F);
   auto &AA = AM.getResult<AAManager>(F);
-  if (!runImpl(F, AC, TTI, TLI, DT, AA)) {
+  bool MadeCFGChange = false;
+  if (!runImpl(F, AC, TTI, TLI, DT, AA, MadeCFGChange)) {
     // No changes, all analyses are preserved.
     return PreservedAnalyses::all();
   }
   // Mark all the analyses that instcombine updates as preserved.
   PreservedAnalyses PA;
-  PA.preserveSet<CFGAnalyses>();
+  if (MadeCFGChange)
+    PA.preserve<DominatorTreeAnalysis>();
+  else
+    PA.preserveSet<CFGAnalyses>();
   return PA;
 }
diff --git a/llvm/test/Transforms/AggressiveInstCombine/strcmp.ll b/llvm/test/Transforms/AggressiveInstCombine/strcmp.ll
deleted file mode 100644
index 99dd450e6f44e6..00000000000000
--- a/llvm/test/Transforms/AggressiveInstCombine/strcmp.ll
+++ /dev/null
@@ -1,219 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -passes=aggressive-instcombine -S | FileCheck %s
-
-declare i32 @strcmp(ptr, ptr)
-
-@s0 = constant [1 x i8] c"\00"
-@s1 = constant [2 x i8] c"0\00"
-@s2 = constant [3 x i8] c"01\00"
-@s3 = constant [4 x i8] c"012\00"
-@s4 = constant [5 x i8] c"0123\00"
-
-; Expand strcmp(C, "x"), strcmp(C, "xy").
-
-define i1 @expand_strcmp_s0(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s0(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s0)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s0)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_eq_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_eq_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_eq_s1_commuted(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_eq_s1_commuted(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr @s1, ptr [[C:%.*]])
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr @s1, ptr %C)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_ne_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_ne_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp ne i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sgt_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sgt_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp sgt i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sge_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sge_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sge i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp sge i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_slt_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_slt_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp slt i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sle_s1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sle_s1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sle i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp sle i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_s1_fail_1(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s1_fail_1(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 1
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  %cmp = icmp eq i32 %call, 1
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_s1_fail_2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s1_fail_2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr @s1, ptr @s1)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr @s1, ptr @s1)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i32 @expand_strcmp_s1_fail_3(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s1_fail_3(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s1)
-; CHECK-NEXT:    ret i32 [[CALL]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s1)
-  ret i32 %call
-}
-
-define i1 @expand_strcmp_eq_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_eq_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_ne_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_ne_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp ne i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sgt_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sgt_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp sgt i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sge_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sge_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sge i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp sge i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_slt_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_slt_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp slt i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_sle_s2(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_sle_s2(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s2)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp sle i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s2)
-  %cmp = icmp sle i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_s3(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s3(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s3)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s3)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
-
-define i1 @expand_strcmp_s4(ptr %C) {
-; CHECK-LABEL: @expand_strcmp_s4(
-; CHECK-NEXT:    [[CALL:%.*]] = call i32 @strcmp(ptr [[C:%.*]], ptr @s4)
-; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
-; CHECK-NEXT:    ret i1 [[CMP]]
-;
-  %call = call i32 @strcmp(ptr %C, ptr @s4)
-  %cmp = icmp eq i32 %call, 0
-  ret i1 %cmp
-}
diff --git a/llvm/test/Transforms/AggressiveInstCombine/strncmp-1.ll b/llvm/test/Transforms/AggressiveInstCombine/strncmp-1.ll
new file mode 100644
index 00000000000000..4679d6d7fca143
--- /dev/null
+++ b/llvm/test/Transforms/AggressiveInstCombine/strncmp-1.ll
@@ -0,0 +1,203 @@
+; RUN: opt -S -passes=aggressive-instcombine -strncmp-inline-threshold=3 < %s | FileCheck %s
+
+declare i32 @strncmp(ptr nocapture, ptr nocapture, i64)
+declare i32 @strcmp(ptr nocapture, ptr nocapture)
+
+@.str = private unnamed_addr constant [3 x i8] c"ab\00", align 1
+@.str.1 = private unnamed_addr constant [2 x i8] c"a\00", align 1
+
+define i32 @test_strncmp_1(ptr nocapture readonly %s) {
+; CHECK-LABEL: @test_strncmp_1(
+; CHECK-NEXT:  entry.before:
+; CHECK-NEXT:    br label [[SUB:%.*]]
+; CHECK:       sub:
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i8 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = sub i32 97, [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], 0
+; CHECK-NEXT:    br i1 [[TMP4]], label [[SUB2:%.*]], label [[SUB1:%.*]]
+; CHECK:       sub1:
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 1
+; CHECK-NEXT:    [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
+; CHECK-NEXT:    [[TMP7:%.*]] = zext i8 [[TMP6]] to i32
+; CHECK-NEXT:    [[TMP8:%.*]] = sub i32 98, [[TMP7]]
+; CHECK-NEXT:    br label [[SUB2]]
+; CHECK:       sub2:
+; CHECK-NEXT:    [[TMP9:%.*]] = phi i32 [ [[TMP3]], [[SUB]] ], [ [[TMP8]], [[SUB1]] ]
+; CHECK-NEXT:    br label [[ENTRY:%.*]]
+; CHECK:       entry:
+; CHECK-NEXT:    ret i32 [[TMP9]]
+;
+entry:
+  %call = tail call i32 @strncmp(ptr nonnull dereferenceable(3) @.str, ptr nonnull dereferenceable(1) %s, i64 2)
+  ret i32 %call
+}
+
+define i32 @test_strncmp_2(ptr nocapture readonly %s) {
+; CHECK-LABEL: @test_strncmp_2(
+; CHECK-NEXT:  entry.before:
+; CHECK-NEXT:    br label [[SUB:%.*]]
+; CHECK:       sub:
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i8 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = sub i32 97, [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], 0
+; CHECK-NEXT:    br i1 [[TMP4]], label [[SUB3:%.*]], label [[SUB1:%.*]]
+; CHECK:       sub1:
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 1
+; CHECK-NEXT:    [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
+; CHECK-NEXT:    [[TMP7:%.*]] = zext i8 [[TMP6]] to i32
+; CHECK-NEXT:    [[TMP8:%.*]] = sub i32 98, [[TMP7]]
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp ne i32 [[TMP8]], 0
+; CHECK-NEXT:    br i1 [[TMP9]], label [[SUB3]], label [[SUB2:%.*]]
+; CHECK:       sub2:
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 2
+; CHECK-NEXT:    [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
+; CHECK-NEXT:    [[TMP12:%.*]] = zext i8 [[TMP11]] to i32
+; CHECK-NEXT:    [[TMP13:%.*]] = sub i32 0, [[TMP12]]
+; CHECK-NEXT:    br label [[SUB3]]
+; CHECK:       sub3:
+; CHECK-NEXT:    [[TMP14:%.*]] = phi i32 [ [[TMP3]], [[SUB]] ], [ [[TMP8]], [[SUB1]] ], [ ...
[truncated]

@nikic nikic requested review from nikic and dtcxzyw April 19, 2024 11:48
@firewave
Copy link

See #58003 and 0b779b0 for a previous attempt.

* add missing test cases
* remove redundant code
@nikic
Copy link
Contributor

nikic commented Apr 23, 2024

It looks like this patch crashes when compiling lencod from llvm-test-suite in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=73bf9d626bf3fb7a43453f6731f1aecab83177b4

@FLZ101 FLZ101 force-pushed the feature-strncmp-inlining-ai branch from 666c908 to e2d3521 Compare April 26, 2024 02:54
@FLZ101
Copy link
Contributor Author

FLZ101 commented Apr 26, 2024

It looks like this patch crashes when compiling lencod from llvm-test-suite in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=73bf9d626bf3fb7a43453f6731f1aecab83177b4

It seems to be a bug related to the new DebugInfo, I found the change as below could fix it.

--- a/llvm/lib/IR/BasicBlock.cpp
+++ b/llvm/lib/IR/BasicBlock.cpp
@@ -1009,7 +1009,7 @@ void BasicBlock::spliceDebugInfoImpl(BasicBlock::iterator Dest, BasicBlock *Src,
     // generate the iterator with begin() / getFirstInsertionPt(), it means
     // any trailing debug-info at the end of the block would "normally" have
     // been pushed in front of "First". Move it there now.
-    DbgMarker *FirstMarker = getMarker(First);
+    DbgMarker *FirstMarker = createMarker(First);
     DbgMarker *TrailingDbgRecords = getTrailingDbgRecords();
     if (TrailingDbgRecords) {
       FirstMarker->absorbDebugValues(*TrailingDbgRecords, true);

I am preparing a minimal test case and will create a new pull request to fix this bug soon.

@FLZ101 FLZ101 force-pushed the feature-strncmp-inlining-ai branch from ede43f9 to e2d3521 Compare April 27, 2024 03:38
@FLZ101
Copy link
Contributor Author

FLZ101 commented Apr 27, 2024

It looks like this patch crashes when compiling lencod from llvm-test-suite in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=73bf9d626bf3fb7a43453f6731f1aecab83177b4

Shoule have been fixed by #90312

Copy link

github-actions bot commented May 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

* isOnlyUsedInZeroComparison
* more tests
* ...
@FLZ101 FLZ101 force-pushed the feature-strncmp-inlining-ai branch from 436675a to d5769df Compare May 1, 2024 04:18
@nikic
Copy link
Contributor

nikic commented May 1, 2024

Looks like there is a hang while building the stage 2 compiler now: https://llvm-compile-time-tracker.com/show_error.php?commit=c690fb9bf301e84b21a239dfafb278bab262a762

@FLZ101
Copy link
Contributor Author

FLZ101 commented May 1, 2024

Looks like there is a hang while building the stage 2 compiler now: https://llvm-compile-time-tracker.com/show_error.php?commit=c690fb9bf301e84b21a239dfafb278bab262a762

I should have known what is wrong. How can I run these tests before I push the fix?

@fhahn
Copy link
Contributor

fhahn commented May 1, 2024

Looks like there is a hang while building the stage 2 compiler now: https://llvm-compile-time-tracker.com/show_error.php?commit=c690fb9bf301e84b21a239dfafb278bab262a762

I should have known what is wrong. How can I run these tests before I push the fix?

I'd recommend you try and reproduce this locally, which you'll need to investigate. To do that, you need to do a 2 stage build

  1. build clang using your system compiler
  2. build LLVM/Clang again with the clang you just built

* use SplitBlock rather than splitBlockBefore to
  avoid invalidating the Instruction iterator
@FLZ101
Copy link
Contributor Author

FLZ101 commented May 1, 2024

Looks like there is a hang while building the stage 2 compiler now: https://llvm-compile-time-tracker.com/show_error.php?commit=c690fb9bf301e84b21a239dfafb278bab262a762

Should have been fixed by latest commit 4ca7ba9. Tested by doing a two stage build locally.

@FLZ101 FLZ101 force-pushed the feature-strncmp-inlining-ai branch from 286f9fe to 8e63645 Compare May 2, 2024 10:21
; CHECK-NEXT: [[TMP8:%.*]] = icmp ne i32 [[TMP7]], 0
; CHECK-NEXT: br i1 [[TMP8]], label [[NE]], label [[SUB2:%.*]]
; CHECK: sub_2:
; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the inbounds on %s valid here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, otherwise the strncmp call would be UB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I guess, was thinking we only have derefeanceable(1) on %s, but the entire transform is premised on standard definition of strncmp so you're right.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm/test/Transforms/AggressiveInstCombine/strncmp-2.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/AggressiveInstCombine/strncmp-2.ll Outdated Show resolved Hide resolved
@nikic nikic merged commit 6b94870 into llvm:main May 3, 2024
3 of 4 checks passed
@mstorsjo
Copy link
Member

mstorsjo commented May 5, 2024

This caused a miscompile in ffmpeg for x86 platforms.

It's reproducible on both Linux and MinGW, for both i686 and x86_64. To reproduce on linux on x86:

$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync # download test samples
$ make -j$(nproc) fate-subtitles

The miscompilation lies in the object file libavformat/subtitles.o.

I presume we should revert this until the issue has been resolved?

@FLZ101
Copy link
Contributor Author

FLZ101 commented May 6, 2024

This caused a miscompile in ffmpeg for x86 platforms.

It's reproducible on both Linux and MinGW, for both i686 and x86_64. To reproduce on linux on x86:

$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync # download test samples
$ make -j$(nproc) fate-subtitles

The miscompilation lies in the object file libavformat/subtitles.o.

I presume we should revert this until the issue has been resolved?

I'll check this issue soon.

This optimization could be disabled by specifying "-mllvm -strncmp-inline-threshold=0" and/or "-Wl,-mllvm,-strncmp-inline-threshold=0".

@mstorsjo
Copy link
Member

mstorsjo commented May 6, 2024

Hmm, did you remove the message about you being unable to reproduce the issue, or is github glitching?

Anyway, I re-reproduced the issue, with clang at d70267f as in your case.

To simplify the reproducing and get rid of some potential variables, you can grab https://martin.st/temp/subtitles-preproc.c, compile it with clang -O2 subtitles-preproc.c -c -o libavformat/subtitles.o on top of your existing build, then rerun those tests with make -j$(nproc) fate-subtitles. Without the -O2, or with an older clang, the tests run successfully.

@FLZ101
Copy link
Contributor Author

FLZ101 commented May 6, 2024

Sorry, I thought the issue was a compilation error. Then I realized it was about test failures. Yes, I can reproduce it.

@FLZ101
Copy link
Contributor Author

FLZ101 commented May 6, 2024

I should have known what is wrong, and will push the fix soon.

@FLZ101
Copy link
Contributor Author

FLZ101 commented May 6, 2024

Should have been fixed by #91204.

@mstorsjo
Copy link
Member

mstorsjo commented May 7, 2024

Should have been fixed by #91204.

Thanks! I can confirm that the issue is gone now, in all of my regularly tested setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants