Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ConstantHoisting] Add a TTI hook to prevent hoisting. #69004

Merged
merged 2 commits into from
Dec 13, 2023

Conversation

paulwalker-arm
Copy link
Collaborator

Code generation can sometimes simplify expensive operations when
an operand is constant. An example of this is divides on AArch64
where they can be rewritten using a cheaper sequence of multiplies
and subtracts. Doing this is often better than hoisting expensive
constants which are likely to be hoisted by MachineLICM anyway.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 13, 2023

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

Code generation can sometimes simplify expensive operations when
an operand is constant. An example of this is divides on AArch64
where they can be rewritten using a cheaper sequence of multiplies
and subtracts. Doing this is often better than hoisting expensive
constants which are likely to be hoisted by MachineLICM anyway.


Full diff: https://github.com/llvm/llvm-project/pull/69004.diff

7 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+18)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+5)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+5)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+19)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+4)
  • (modified) llvm/lib/Transforms/Scalar/ConstantHoisting.cpp (+3-1)
  • (modified) llvm/test/Transforms/ConstantHoisting/AArch64/large-immediate.ll (+117-9)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 5234ef8788d9e96..aa2b7618734d45c 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -986,6 +986,18 @@ class TargetTransformInfo {
   /// more beneficial constant hoisting is).
   InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx,
                                         const APInt &Imm, Type *Ty) const;
+
+  /// Returns false if the target prefers to keep an instruction's constant
+  /// operands attached. It is not required for this function to perform
+  /// detailed constant analysis and in fact returning true does not imply
+  /// the instruction has any constant operands. This hook exists soley to
+  /// disable constant hoisting for reasons beyond the cost of generating a
+  /// constant. The motivating example is divides whereby hoisting constants
+  /// prevents the code generator's ability to transform them into combinations
+  /// of simpler operations.
+  bool isCandidateForConstantHoisting(const Instruction &Inst,
+                                      const Function &Fn) const;
+
   /// @}
 
   /// \name Vector Target Information
@@ -1847,6 +1859,8 @@ class TargetTransformInfo::Concept {
   virtual InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
                                               const APInt &Imm, Type *Ty,
                                               TargetCostKind CostKind) = 0;
+  virtual bool isCandidateForConstantHoisting(const Instruction &Inst,
+                                              const Function &Fn) const = 0;
   virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
   virtual unsigned getRegisterClassForType(bool Vector,
                                            Type *Ty = nullptr) const = 0;
@@ -2399,6 +2413,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
                                       TargetCostKind CostKind) override {
     return Impl.getIntImmCostIntrin(IID, Idx, Imm, Ty, CostKind);
   }
+  bool isCandidateForConstantHoisting(const Instruction &Inst,
+                                      const Function &Fn) const override {
+    return Impl.isCandidateForConstantHoisting(Inst, Fn);
+  }
   unsigned getNumberOfRegisters(unsigned ClassID) const override {
     return Impl.getNumberOfRegisters(ClassID);
   }
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index c1ff314ae51c98b..0dfe9e8d0c96ac2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -425,6 +425,11 @@ class TargetTransformInfoImplBase {
     return TTI::TCC_Free;
   }
 
+  bool isCandidateForConstantHoisting(const Instruction &Inst,
+                                      const Function &Fn) const {
+    return true;
+  }
+
   unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }
 
   unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index aad14f21d114619..1341589f5fbf809 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -678,6 +678,11 @@ TargetTransformInfo::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
   return Cost;
 }
 
+bool TargetTransformInfo::isCandidateForConstantHoisting(
+    const Instruction &Inst, const Function &Fn) const {
+  return TTIImpl->isCandidateForConstantHoisting(Inst, Fn);
+}
+
 unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
   return TTIImpl->getNumberOfRegisters(ClassID);
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d8a0e68d7123759..83eef059b107988 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -382,6 +382,25 @@ AArch64TTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
   return AArch64TTIImpl::getIntImmCost(Imm, Ty, CostKind);
 }
 
+bool AArch64TTIImpl::isCandidateForConstantHoisting(const Instruction &Inst,
+                                                    const Function &Fn) const {
+  switch (Inst.getOpcode()) {
+  default:
+    break;
+  case Instruction::SDiv:
+  case Instruction::SRem:
+  case Instruction::UDiv:
+  case Instruction::URem: {
+    if (!isa<ConstantInt>(Inst.getOperand(1)))
+      return true;
+    EVT VT = TLI->getValueType(DL, Inst.getType());
+    return TLI->isIntDivCheap(VT, Fn.getAttributes());
+  }
+  };
+
+  return true;
+}
+
 TargetTransformInfo::PopcntSupportKind
 AArch64TTIImpl::getPopcntSupport(unsigned TyWidth) {
   assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index a6baade412c77d2..a2ac4b6aa46f44c 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -91,6 +91,10 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
   InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
                                       const APInt &Imm, Type *Ty,
                                       TTI::TargetCostKind CostKind);
+
+  bool isCandidateForConstantHoisting(const Instruction &Inst,
+                                      const Function &Fn) const;
+
   TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
 
   /// @}
diff --git a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
index 3e5d979f11cc50a..233a1a7bf82ce15 100644
--- a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
@@ -523,7 +523,9 @@ void ConstantHoistingPass::collectConstantCandidates(Function &Fn) {
     if (!DT->isReachableFromEntry(&BB))
       continue;
     for (Instruction &Inst : BB)
-      collectConstantCandidates(ConstCandMap, &Inst);
+      // Skip instructions the target prefers constants remain attached.
+      if (TTI->isCandidateForConstantHoisting(Inst, Fn))
+        collectConstantCandidates(ConstCandMap, &Inst);
   }
 }
 
diff --git a/llvm/test/Transforms/ConstantHoisting/AArch64/large-immediate.ll b/llvm/test/Transforms/ConstantHoisting/AArch64/large-immediate.ll
index 015f52157b9e7c7..fe8c699e72d9980 100644
--- a/llvm/test/Transforms/ConstantHoisting/AArch64/large-immediate.ll
+++ b/llvm/test/Transforms/ConstantHoisting/AArch64/large-immediate.ll
@@ -1,27 +1,135 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
 ; RUN: opt -mtriple=arm64-darwin-unknown -S -passes=consthoist < %s | FileCheck %s
 
-define i128 @test1(i128 %a) nounwind {
-; CHECK-LABEL: test1
-; CHECK: %const = bitcast i128 12297829382473034410122878 to i128
+define i128 @test1(i128 %a) {
+; CHECK-LABEL: define i128 @test1(
+; CHECK-SAME: i128 [[A:%.*]]) {
+; CHECK-NEXT:    [[CONST:%.*]] = bitcast i128 12297829382473034410122878 to i128
+; CHECK-NEXT:    [[TMP1:%.*]] = add i128 [[A]], [[CONST]]
+; CHECK-NEXT:    [[TMP2:%.*]] = add i128 [[TMP1]], [[CONST]]
+; CHECK-NEXT:    ret i128 [[TMP2]]
+;
   %1 = add i128 %a, 12297829382473034410122878
   %2 = add i128 %1, 12297829382473034410122878
   ret i128 %2
 }
 
 ; Check that we don't hoist large, but cheap constants
-define i512 @test2(i512 %a) nounwind {
-; CHECK-LABEL: test2
-; CHECK-NOT: %const = bitcast i512 7 to i512
+define i512 @test2(i512 %a) {
+; CHECK-LABEL: define i512 @test2(
+; CHECK-SAME: i512 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = and i512 [[A]], 7
+; CHECK-NEXT:    [[TMP2:%.*]] = or i512 [[TMP1]], 7
+; CHECK-NEXT:    ret i512 [[TMP2]]
+;
   %1 = and i512 %a, 7
   %2 = or i512 %1, 7
   ret i512 %2
 }
 
 ; Check that we don't hoist the shift value of a shift instruction.
-define i512 @test3(i512 %a) nounwind {
-; CHECK-LABEL: test3
-; CHECK-NOT: %const = bitcast i512 504 to i512
+define i512 @test3(i512 %a) {
+; CHECK-LABEL: define i512 @test3(
+; CHECK-SAME: i512 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i512 [[A]], 504
+; CHECK-NEXT:    [[TMP2:%.*]] = ashr i512 [[TMP1]], 504
+; CHECK-NEXT:    ret i512 [[TMP2]]
+;
   %1 = shl i512 %a, 504
   %2 = ashr i512 %1, 504
   ret i512 %2
 }
+
+; Ensure the code generator has the information necessary to simply sdiv.
+define i64 @sdiv(i64 %a) {
+; CHECK-LABEL: define i64 @sdiv(
+; CHECK-SAME: i64 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sdiv i64 [[A]], 4294967087
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], 4294967087
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = sdiv i64 %a, 4294967087
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Ensure the code generator has the information necessary to simply srem.
+define i64 @srem(i64 %a) {
+; CHECK-LABEL: define i64 @srem(
+; CHECK-SAME: i64 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = srem i64 [[A]], 4294967087
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], 4294967087
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = srem i64 %a, 4294967087
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Ensure the code generator has the information necessary to simply udiv.
+define i64 @udiv(i64 %a) {
+; CHECK-LABEL: define i64 @udiv(
+; CHECK-SAME: i64 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = udiv i64 [[A]], 4294967087
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], 4294967087
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = udiv i64 %a, 4294967087
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Ensure the code generator has the information necessary to simply urem.
+define i64 @urem(i64 %a) {
+; CHECK-LABEL: define i64 @urem(
+; CHECK-SAME: i64 [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = urem i64 [[A]], 4294967087
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], 4294967087
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = urem i64 %a, 4294967087
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Code generator will not decompose divide like operations when the divisor is
+; no a constant.
+define i64 @sdiv_non_const_divisor(i64 %a) {
+; CHECK-LABEL: define i64 @sdiv_non_const_divisor(
+; CHECK-SAME: i64 [[A:%.*]]) {
+; CHECK-NEXT:    [[CONST:%.*]] = bitcast i64 4294967087 to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = sdiv i64 [[CONST]], [[A]]
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], [[CONST]]
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = sdiv i64 4294967087, %a
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Code generator emits divide instructions when optimising for size.
+define i64 @sdiv_minsize(i64 %a) minsize {
+; CHECK-LABEL: define i64 @sdiv_minsize(
+; CHECK-SAME: i64 [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[CONST:%.*]] = bitcast i64 4294967087 to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = sdiv i64 [[A]], [[CONST]]
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], [[CONST]]
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %1 = sdiv i64 %a, 4294967087
+  %2 = add i64 %1, 4294967087
+  ret i64 %2
+}
+
+; Code generator will not decompose vector divide like operations.
+define <2 x i64> @sdiv_v2i64(<2 x i64> %a) {
+; CHECK-LABEL: define <2 x i64> @sdiv_v2i64(
+; CHECK-SAME: <2 x i64> [[A:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sdiv <2 x i64> [[A]], <i64 4294967087, i64 4294967087>
+; CHECK-NEXT:    [[TMP2:%.*]] = add <2 x i64> [[TMP1]], <i64 4294967087, i64 4294967087>
+; CHECK-NEXT:    ret <2 x i64> [[TMP2]]
+;
+  %1 = sdiv <2 x i64> %a, <i64 4294967087, i64 4294967087>
+  %2 = add <2 x i64> %1, <i64 4294967087, i64 4294967087>
+  ret <2 x i64> %2
+}

@nikic
Copy link
Contributor

nikic commented Oct 13, 2023

Does this need to be target-specific? Not hoisting divisors to allow efficient expansion seems like something all targets would benefit from.

@davemgreen
Copy link
Collaborator

This should probably be in getIntImmCostInst, similar to the Arm version:

InstructionCost ARMTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
                                              const APInt &Imm, Type *Ty,
                                              TTI::TargetCostKind CostKind,
                                              Instruction *Inst) {
  // Division by a constant can be turned into multiplication, but only if we
  // know it's constant. So it's not so much that the immediate is cheap (it's
  // not), but that the alternative is worse.
  // FIXME: this is probably unneeded with GlobalISel.
  if ((Opcode == Instruction::SDiv || Opcode == Instruction::UDiv ||
       Opcode == Instruction::SRem || Opcode == Instruction::URem) &&
      Idx == 1)
    return 0;

@paulwalker-arm
Copy link
Collaborator Author

Does this need to be target-specific? Not hoisting divisors to allow efficient expansion seems like something all targets would benefit from.

I guess not given the decision is derived from an existing TLI hook. I've moved the implementation into BaseTTI.

This should probably be in getIntImmCostInst, similar to the Arm version:

I originally considered this but it felt wrong because the code generation doesn't necessary result in cheap or free constants. For the cases I want to handle the resulting constants are just as expensive with the gain being the removal of the divides.

@paulwalker-arm
Copy link
Collaborator Author

review ping

@davemgreen
Copy link
Collaborator

I've always considered the cost returned from getIntImmCostInst as the cost(inst + imm) - cost(inst). So if we are producing a cheaper instruction, the cost of the imm can be cheap. That already happens in a few places, like saturating patterns under Arm and a few of the other patterns in other backends. Comparing a div vs multiple mul/adds is going to be a bit complex, but the immediate is also going to be completely different at any rate.

I'm not wanting to block this. If others disagree then feel free to go ahead, but this seems like the new interface might not really be needed.

@sjoerdmeijer
Copy link
Collaborator

I understood that hoisting and LICM were considered canonicalisation passes/transforms in LLVM. I have seen many attempts to restrict hoisting for one reason or another but I have seen them always being rejected because of the canonicalisation argument. Not sure this case is different, but perhaps @nikic who replied here can comment on that.

@nikic
Copy link
Contributor

nikic commented Oct 30, 2023

@sjoerdmeijer ConstaintHoisting is a backend pass, so it's fine to restrict it based on target-specific heuristics (in fact, the entire pass is already TTI-driven).

@paulwalker-arm
Copy link
Collaborator Author

I’ve looked to see if I can change the meaning of getIntImmCostInst to better reflect the intent of constant hoisting but I see it’s also used by ScalarEvolution when it calculates the cost of an expression tree. I don’t see what else I can do differently here because at the end of the day the cost of generating the immediate for sdiv i64 %a, 4294967087 is not free.

@paulwalker-arm
Copy link
Collaborator Author

Based on feedback from @david-arm I have renamed the TTI hook to preferToKeepConstantsAttached so that it doesn't give the false impression the instruction has anything to hoist.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! After looking at the code and trying to understand the comments, I think overall this is the best approach. It looks like changing the meaning of getIntImmInstCost() will likely break other existing code.

@@ -427,6 +427,11 @@ class TargetTransformInfoImplBase {
return TTI::TCC_Free;
}

bool preferToKeepConstantsAttached(const Instruction &Inst,
const Function &Fn) const {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this default implementation return false rather than true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn it. Thanks, I'll push an update.

Code generation can sometimes simplify expensive operations when
an operand is constant.  An example of this is divides on AArch64
where they can be rewritten using a cheaper sequence of multiplies
and subtracts.  Doing this is often better than hoisting expensive
constants which are likely to be hoisted by MachineLICM anyway.
@paulwalker-arm paulwalker-arm merged commit 930b5b5 into llvm:main Dec 13, 2023
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants