Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LICM] Support integer mul/add in hoistFPAssociation. #67736

Merged
merged 5 commits into from
Feb 12, 2024

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Sep 28, 2023

The reassociation this is trying to repair can happen for integer types too.

This patch adds support for integer mul/add to hoistFPAssociation. The function has been renamed to hoistMulAddAssociation. I've used separate statistics and limits for integer to allow tuning flexibility.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 28, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

This is a naive copy/paste of the FP code. We could do more integration to share code.

Posting for discussion.


Patch is 70.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67736.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LICM.cpp (+74)
  • (added) llvm/test/Transforms/LICM/expr-reassociate-int.ll (+1078)
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 4cb70cbdf093b36..d3f3cf84b4bcc1c 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -110,6 +110,9 @@ STATISTIC(NumAddSubHoisted, "Number of add/subtract expressions reassociated "
                             "and hoisted out of the loop");
 STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP expressions "
                                     "reassociated and hoisted out of the loop");
+STATISTIC(NumIntAssociationsHoisted,
+          "Number of invariant int expressions "
+          "reassociated and hoisted out of the loop");
 
 /// Memory promotion is enabled by default.
 static cl::opt<bool>
@@ -135,6 +138,12 @@ cl::opt<unsigned> FPAssociationUpperLimit(
         "Set upper limit for the number of transformations performed "
         "during a single round of hoisting the reassociated expressions."));
 
+cl::opt<unsigned> IntAssociationUpperLimit(
+    "licm-max-num-int-reassociations", cl::init(5U), cl::Hidden,
+    cl::desc(
+        "Set upper limit for the number of transformations performed "
+        "during a single round of hoisting the reassociated expressions."));
+
 // Experimental option to allow imprecision in LICM in pathological cases, in
 // exchange for faster compile. This is to be removed if MemorySSA starts to
 // address the same issue. LICM calls MemorySSAWalker's
@@ -2719,6 +2728,65 @@ static bool hoistFPAssociation(Instruction &I, Loop &L,
   return true;
 }
 
+static bool hoistIntAssociation(Instruction &I, Loop &L,
+                                ICFLoopSafetyInfo &SafetyInfo,
+                                MemorySSAUpdater &MSSAU, AssumptionCache *AC,
+                                DominatorTree *DT) {
+  using namespace PatternMatch;
+  Value *VariantOp = nullptr, *InvariantOp = nullptr;
+
+  if (!match(&I, m_Mul(m_Value(VariantOp), m_Value(InvariantOp))))
+    return false;
+  if (L.isLoopInvariant(VariantOp))
+    std::swap(VariantOp, InvariantOp);
+  if (L.isLoopInvariant(VariantOp) || !L.isLoopInvariant(InvariantOp))
+    return false;
+  Value *Factor = InvariantOp;
+
+  // First, we need to make sure we should do the transformation.
+  SmallVector<Use *> Changes;
+  SmallVector<BinaryOperator *> Worklist;
+  if (BinaryOperator *VariantBinOp = dyn_cast<BinaryOperator>(VariantOp))
+    Worklist.push_back(VariantBinOp);
+  while (!Worklist.empty()) {
+    BinaryOperator *BO = Worklist.pop_back_val();
+    if (!BO->hasOneUse())
+      return false;
+    BinaryOperator *Op0, *Op1;
+    if (match(BO, m_Add(m_BinOp(Op0), m_BinOp(Op1)))) {
+      Worklist.push_back(Op0);
+      Worklist.push_back(Op1);
+      continue;
+    }
+    if (BO->getOpcode() != Instruction::Mul || L.isLoopInvariant(BO))
+      return false;
+    Use &U0 = BO->getOperandUse(0);
+    Use &U1 = BO->getOperandUse(1);
+    if (L.isLoopInvariant(U0))
+      Changes.push_back(&U0);
+    else if (L.isLoopInvariant(U1))
+      Changes.push_back(&U1);
+    else
+      return false;
+    if (Changes.size() > IntAssociationUpperLimit)
+      return false;
+  }
+  if (Changes.empty())
+    return false;
+
+  // We know we should do it so let's do the transformation.
+  auto *Preheader = L.getLoopPreheader();
+  assert(Preheader && "Loop is not in simplify form?");
+  IRBuilder<> Builder(Preheader->getTerminator());
+  for (auto *U : Changes) {
+    assert(L.isLoopInvariant(U->get()));
+    U->set(Builder.CreateMul(U->get(), Factor, "factor.op.mul"));
+  }
+  I.replaceAllUsesWith(VariantOp);
+  eraseInstruction(I, SafetyInfo, MSSAU);
+  return true;
+}
+
 static bool hoistArithmetics(Instruction &I, Loop &L,
                              ICFLoopSafetyInfo &SafetyInfo,
                              MemorySSAUpdater &MSSAU, AssumptionCache *AC,
@@ -2752,6 +2820,12 @@ static bool hoistArithmetics(Instruction &I, Loop &L,
     return true;
   }
 
+  if (hoistIntAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) {
+    ++NumHoisted;
+    ++NumIntAssociationsHoisted;
+    return true;
+  }
+
   return false;
 }
 
diff --git a/llvm/test/Transforms/LICM/expr-reassociate-int.ll b/llvm/test/Transforms/LICM/expr-reassociate-int.ll
new file mode 100644
index 000000000000000..556bcfa4dbfd0bc
--- /dev/null
+++ b/llvm/test/Transforms/LICM/expr-reassociate-int.ll
@@ -0,0 +1,1078 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -passes='reassociate' -S < %s | FileCheck %s --check-prefix=REASSOCIATE_ONLY
+; RUN: opt -passes='licm' -S < %s | FileCheck %s --check-prefix=LICM_ONLY
+; RUN: opt -passes='licm' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_ONLY_CONSTRAINED
+; RUN: opt -passes='reassociate,loop-mssa(licm)' -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE
+; RUN: opt -passes='reassociate,loop-mssa(licm)' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE_CONSTRAINED
+
+;
+; A simple loop, should not get modified:
+;
+;  int j;
+;  const uint64_t d1d = d1 * delta;
+;
+;  for (j = 0; j <= i; j++)
+;    cells[j] = d1d * cells[j + 1];
+;
+
+define void @innermost_loop_1d(i32 %i, i64 %d1, i64 %delta, ptr %cells) {
+; REASSOCIATE_ONLY-LABEL: define void @innermost_loop_1d
+; REASSOCIATE_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; REASSOCIATE_ONLY-NEXT:  entry:
+; REASSOCIATE_ONLY-NEXT:    [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; REASSOCIATE_ONLY-NEXT:    br label [[FOR_COND:%.*]]
+; REASSOCIATE_ONLY:       for.cond:
+; REASSOCIATE_ONLY-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; REASSOCIATE_ONLY-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; REASSOCIATE_ONLY-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; REASSOCIATE_ONLY:       for.body:
+; REASSOCIATE_ONLY-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; REASSOCIATE_ONLY-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; REASSOCIATE_ONLY-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; REASSOCIATE_ONLY-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; REASSOCIATE_ONLY-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; REASSOCIATE_ONLY-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; REASSOCIATE_ONLY-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; REASSOCIATE_ONLY-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; REASSOCIATE_ONLY-NEXT:    br label [[FOR_COND]]
+; REASSOCIATE_ONLY:       for.end:
+; REASSOCIATE_ONLY-NEXT:    ret void
+;
+; LICM_ONLY-LABEL: define void @innermost_loop_1d
+; LICM_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY-NEXT:  entry:
+; LICM_ONLY-NEXT:    [[FMUL_D1:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_ONLY:       for.cond:
+; LICM_ONLY-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY:       for.body:
+; LICM_ONLY-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_ONLY-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY-NEXT:    br label [[FOR_COND]]
+; LICM_ONLY:       for.end:
+; LICM_ONLY-NEXT:    ret void
+;
+; LICM_ONLY_CONSTRAINED-LABEL: define void @innermost_loop_1d
+; LICM_ONLY_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY_CONSTRAINED-NEXT:  entry:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[FMUL_D1:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY_CONSTRAINED-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_ONLY_CONSTRAINED:       for.cond:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY_CONSTRAINED-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY_CONSTRAINED:       for.body:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY_CONSTRAINED-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY_CONSTRAINED-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT:    br label [[FOR_COND]]
+; LICM_ONLY_CONSTRAINED:       for.end:
+; LICM_ONLY_CONSTRAINED-NEXT:    ret void
+;
+; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_1d
+; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE-NEXT:  entry:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE:       for.cond:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE:       for.body:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT:    br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE:       for.end:
+; LICM_AFTER_REASSOCIATE-NEXT:    ret void
+;
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-LABEL: define void @innermost_loop_1d
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:  entry:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED:       for.cond:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED:       for.body:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED:       for.end:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT:    ret void
+;
+entry:
+  %fmul.d1 = mul i64 %d1, %delta
+  br label %for.cond
+
+for.cond:
+  %j = phi i32 [ 0, %entry ], [ %add.j.1, %for.body ]
+  %cmp.not = icmp sgt i32 %j, %i
+  br i1 %cmp.not, label %for.end, label %for.body
+
+for.body:
+  %add.j.1 = add nuw nsw i32 %j, 1
+  %idxprom.j.1 = zext i32 %add.j.1 to i64
+  %arrayidx.j.1 = getelementptr inbounds i64, ptr %cells, i64 %idxprom.j.1
+  %cell.1 = load i64, ptr %arrayidx.j.1, align 8
+  %fmul.1 = mul i64 %fmul.d1, %cell.1
+  %idxprom.j = zext i32 %j to i64
+  %arrayidx.j = getelementptr inbounds i64, ptr %cells, i64 %idxprom.j
+  store i64 %fmul.1, ptr %arrayidx.j, align 8
+  br label %for.cond
+
+for.end:
+  ret void
+}
+
+;
+; A simple loop:
+;
+;  int j;
+;
+;  for (j = 0; j <= i; j++)
+;    cells[j] = d1 * cells[j + 1] * delta;
+;
+; ...should be transformed by the LICM pass into this:
+;
+;  int j;
+;  const uint64_t d1d = d1 * delta;
+;
+;  for (j = 0; j <= i; j++)
+;    cells[j] = d1d * cells[j + 1];
+;
+
+define void @innermost_loop_1d_shouldhoist(i32 %i, i64 %d1, i64 %delta, ptr %cells) {
+; REASSOCIATE_ONLY-LABEL: define void @innermost_loop_1d_shouldhoist
+; REASSOCIATE_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; REASSOCIATE_ONLY-NEXT:  entry:
+; REASSOCIATE_ONLY-NEXT:    br label [[FOR_COND:%.*]]
+; REASSOCIATE_ONLY:       for.cond:
+; REASSOCIATE_ONLY-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; REASSOCIATE_ONLY-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; REASSOCIATE_ONLY-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; REASSOCIATE_ONLY:       for.body:
+; REASSOCIATE_ONLY-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; REASSOCIATE_ONLY-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; REASSOCIATE_ONLY-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; REASSOCIATE_ONLY-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; REASSOCIATE_ONLY-NEXT:    [[FMUL_1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; REASSOCIATE_ONLY-NEXT:    [[FMUL_2:%.*]] = mul i64 [[FMUL_1]], [[CELL_1]]
+; REASSOCIATE_ONLY-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; REASSOCIATE_ONLY-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; REASSOCIATE_ONLY-NEXT:    store i64 [[FMUL_2]], ptr [[ARRAYIDX_J]], align 8
+; REASSOCIATE_ONLY-NEXT:    br label [[FOR_COND]]
+; REASSOCIATE_ONLY:       for.end:
+; REASSOCIATE_ONLY-NEXT:    ret void
+;
+; LICM_ONLY-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY-NEXT:  entry:
+; LICM_ONLY-NEXT:    [[FACTOR_OP_MUL:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_ONLY:       for.cond:
+; LICM_ONLY-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY:       for.body:
+; LICM_ONLY-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FACTOR_OP_MUL]], [[CELL_1]]
+; LICM_ONLY-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY-NEXT:    br label [[FOR_COND]]
+; LICM_ONLY:       for.end:
+; LICM_ONLY-NEXT:    ret void
+;
+; LICM_ONLY_CONSTRAINED-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_ONLY_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY_CONSTRAINED-NEXT:  entry:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[FACTOR_OP_MUL:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY_CONSTRAINED-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_ONLY_CONSTRAINED:       for.cond:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY_CONSTRAINED-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY_CONSTRAINED:       for.body:
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY_CONSTRAINED-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT:    [[FMUL_1:%.*]] = mul i64 [[FACTOR_OP_MUL]], [[CELL_1]]
+; LICM_ONLY_CONSTRAINED-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY_CONSTRAINED-NEXT:    store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT:    br label [[FOR_COND]]
+; LICM_ONLY_CONSTRAINED:       for.end:
+; LICM_ONLY_CONSTRAINED-NEXT:    ret void
+;
+; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE-NEXT:  entry:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[FMUL_1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE:       for.cond:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE:       for.body:
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE-NEXT:    [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT:    [[FMUL_2:%.*]] = mul i64 [[FMUL_1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE-NEXT:    [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT:    [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE-NEXT:    store i64 [[FMUL_2]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT:    br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE:       for.end:
+; LICM_AFTER_REASSOCIATE-NEXT:    ret void
+;
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AF...
[truncated]

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this makes exactly as much since to me as the FP variant does.

@@ -2719,6 +2728,65 @@ static bool hoistFPAssociation(Instruction &I, Loop &L,
return true;
}

static bool hoistIntAssociation(Instruction &I, Loop &L,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment as much on the existing code as the new code. We're going through the following steps:
0: ((A1 * B1) + (A2 * B2) + ...) * C where
1: ((A1 * C * B1) + (A2 * C * B2) + ...)
2: ((A1 * C) * B1 + (A2 * C) * B2 + ...)

Isn't the step going from 0 to 1 distribution, not associativity? If so, some comments need clarified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if I'm right here about there being a distribution step here, isn't the prior FP code unsound? At least according to wikipedia (https://en.wikipedia.org/wiki/Distributive_property), floating point multiplication does not distribute over addition, and I don't know hasAllowReassoc gives enough freedom. (Since this isn't re-association).

Note that 2s complement integer math doesn't have this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be explicitly mentioned in LangRef, but we widely assume that reassoc also permits distribution.

}
if (BO->getOpcode() != Instruction::Mul || L.isLoopInvariant(BO))
return false;
Use &U0 = BO->getOperandUse(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we should match the entire mul reduce tree, and perform the transform if any of the operands are loop invariant. Again, this is really a generalization on the prior code as well.

@topperc topperc changed the title [LICM][WIP] Make an integer version of hoistFPAssociation. [LICM] Make an integer version of hoistFPAssociation. Feb 2, 2024
@topperc
Copy link
Collaborator Author

topperc commented Feb 2, 2024

@nikic @preames what do you think about the overall direction here?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic @preames what do you think about the overall direction here?

Seems ok. Anything in particular you're concerned about?

; RUN: opt -passes='licm' -S < %s | FileCheck %s --check-prefix=LICM_ONLY
; RUN: opt -passes='licm' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_ONLY_CONSTRAINED
; RUN: opt -passes='reassociate,loop-mssa(licm)' -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE
; RUN: opt -passes='reassociate,loop-mssa(licm)' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE_CONSTRAINED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the tests LICM only.

@topperc
Copy link
Collaborator Author

topperc commented Feb 2, 2024

@nikic @preames what do you think about the overall direction here?

Seems ok. Anything in particular you're concerned about?

No. My only question now is should I merge integer and FP into the same function?

This is a naive copy/paste of the FP code. We could do more
integration to share code.

Posting for discussion.
llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved
Comment on lines +141 to +146
cl::opt<unsigned> IntAssociationUpperLimit(
"licm-max-num-int-reassociations", cl::init(5U), cl::Hidden,
cl::desc(
"Set upper limit for the number of transformations performed "
"during a single round of hoisting the reassociated expressions."));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the type relevant to the limit? I'm not against the change just wondering if this is a compile time reduction flag whether it's worth just renaming the current one.

@topperc topperc changed the title [LICM] Make an integer version of hoistFPAssociation. [LICM] Support integer mul/add in hoistFPAssociation. Feb 12, 2024
@topperc topperc merged commit 7ff5dfb into llvm:main Feb 12, 2024
3 of 4 checks passed
@topperc topperc deleted the pr/licm-int-reassociate branch February 12, 2024 22:59
@nico
Copy link
Contributor

nico commented Feb 13, 2024

Looks Like this might cause crashes on Mac: http://45.33.8.238/macm1/78525/step_11.txt

Please take a look and revert for now if it takes a while to fix.

@dyung
Copy link
Collaborator

dyung commented Feb 13, 2024

Looks Like this might cause crashes on Mac: http://45.33.8.238/macm1/78525/step_11.txt

Please take a look and revert for now if it takes a while to fix.

I also saw the same crashes on 2 mac bots:
https://lab.llvm.org/staging/#/builders/188/builds/50
https://lab.llvm.org/staging/#/builders/189/builds/378

topperc added a commit that referenced this pull request Feb 13, 2024
This reverts commit 7ff5dfb.

Causing crashes on Mac build bots.
topperc added a commit that referenced this pull request Feb 13, 2024
…)"

With a fix for build bot failure. I was accessing the type of a deleted
Instruction.

Original message:

The reassociation this is trying to repair can happen for integer types
too.

This patch adds support for integer mul/add to hoistFPAssociation. The
function has been renamed to hoistMulAddAssociation. I've used separate
statistics and limits for integer to allow tuning flexibility.
@mikaelholmen
Copy link
Collaborator

I wrote an issue #91957 about a miscompile that starts happening with this patch. I have no idea if this patch is really to blame though or if it just uncovers something already existing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants