Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PowerPC] Inline callee if its target-features are a subset of the caller #67710

Closed
wants to merge 1 commit into from

Conversation

scui-ibm
Copy link
Contributor

Simliar to other plateforms (X86, ARM), it should be safe to inline callees if their target-features
are a subset of the caller. 

@scui-ibm scui-ibm self-assigned this Sep 28, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 28, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

Simliar to other plateforms (X86, ARM), it should be safe to inline callees if their target-features
are a subset of the caller. 


Full diff: https://github.com/llvm/llvm-project/pull/67710.diff

3 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+13)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+2)
  • (added) llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll (+37)
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index ca0f2c2e18af5f9..e378e5f671cb71e 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -885,6 +885,19 @@ PPCTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
   return BaseT::getIntrinsicInstrCost(ICA, CostKind);
 }
 
+bool PPCTTIImpl::areInlineCompatible(const Function *Caller,
+                                     const Function *Callee) const {
+  // Allow inlining only when the Callee has a subset of the Caller's features.
+  const TargetMachine &TM = getTLI()->getTargetMachine();
+
+  const FeatureBitset &CallerBits =
+      TM.getSubtargetImpl(*Caller)->getFeatureBits();
+  const FeatureBitset &CalleeBits =
+      TM.getSubtargetImpl(*Callee)->getFeatureBits();
+
+  return (CallerBits & CalleeBits) == CalleeBits;
+}
+
 bool PPCTTIImpl::areTypesABICompatible(const Function *Caller,
                                        const Function *Callee,
                                        const ArrayRef<Type *> &Types) const {
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
index c3ade9968c336a0..bdc2f17d95c03c2 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
@@ -138,6 +138,8 @@ class PPCTTIImpl : public BasicTTIImplBase<PPCTTIImpl> {
       bool UseMaskForCond = false, bool UseMaskForGaps = false);
   InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
                                         TTI::TargetCostKind CostKind);
+  bool areInlineCompatible(const Function *Caller,
+                           const Function *Callee) const;
   bool areTypesABICompatible(const Function *Caller, const Function *Callee,
                              const ArrayRef<Type *> &Types) const;
   bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
diff --git a/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll b/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll
new file mode 100644
index 000000000000000..2b703cfaf87d4ac
--- /dev/null
+++ b/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll
@@ -0,0 +1,37 @@
+; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s
+; Check that we only inline when we have compatible target features.
+
+target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
+target triple = "powerpc64le-unknown-linux-gnu"
+
+define i32 @f1() #0 {
+; CHECK-LABEL: define i32 @f1(
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 (...) @f0()
+; CHECK-NEXT:    ret i32 [[CALL]]
+;
+  %call = call i32 (...) @f0()
+  ret i32 %call
+}
+
+define i32 @f2() #1 {
+; CHECK-LABEL: define i32 @f2(
+; CHECK-NEXT:    [[CALL_I:%.*]] = call i32 (...) @f0()
+; CHECK-NEXT:    ret i32 [[CALL_I]]
+;
+  %call = call i32 @f1()
+  ret i32 %call
+}
+
+define i32 @f3() #0 {
+; CHECK-LABEL: define i32 @f3(
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @f2()
+; CHECK-NEXT:    ret i32 [[CALL]]
+;
+  %call = call i32 @f2()
+  ret i32 %call
+}
+
+declare i32 @f0(...) #0
+
+attributes #0 = { "target-cpu"="pwr7" "target-features"="-crbits,-crypto,-direct-move,-isa-v207-instructions,-power8-vector" }
+attributes #1 = { "target-cpu"="pwr8" "target-features"="+crbits,+crypto,+direct-move,+isa-v207-instructions,+power8-vector" }

const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();

return (CallerBits & CalleeBits) == CalleeBits;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the X86 equivalent function also checks for ABI compatibility. Wonder if that is also needed for PPC, for instance for vec-extabi.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
And except for the functionality issue, do we need to exclude the PPC features that should not impact the inline optimization, like fusion related features fuse-store, fuse-zeromove and other optimization related features like fast-MFLR, modern-aix-as, ppc-postra-sched... Inline optimization may introduce bigger improvement than breaking these optimization related features?

const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();

return (CallerBits & CalleeBits) == CalleeBits;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
And except for the functionality issue, do we need to exclude the PPC features that should not impact the inline optimization, like fusion related features fuse-store, fuse-zeromove and other optimization related features like fast-MFLR, modern-aix-as, ppc-postra-sched... Inline optimization may introduce bigger improvement than breaking these optimization related features?

@@ -0,0 +1,37 @@
; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use utils/update_test_checks.py to automatically generate the check lines? Seems the case is quite simple.

@@ -0,0 +1,37 @@
; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since the change is not PPC Linux specific, maybe we can add RUN lines for AIX as well?

; Check that we only inline when we have compatible target features.

target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these two target related assignments seem unnecessary.

@scui-ibm scui-ibm closed this Feb 28, 2024
@scui-ibm scui-ibm deleted the ppc_inline_compatible branch February 28, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants