-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] Inline callee if its target-features are a subset of the caller #67710
Conversation
@llvm/pr-subscribers-llvm-transforms ChangesSimliar to other plateforms (X86, ARM), it should be safe to inline callees if their target-features Full diff: https://github.com/llvm/llvm-project/pull/67710.diff 3 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index ca0f2c2e18af5f9..e378e5f671cb71e 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -885,6 +885,19 @@ PPCTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
}
+bool PPCTTIImpl::areInlineCompatible(const Function *Caller,
+ const Function *Callee) const {
+ // Allow inlining only when the Callee has a subset of the Caller's features.
+ const TargetMachine &TM = getTLI()->getTargetMachine();
+
+ const FeatureBitset &CallerBits =
+ TM.getSubtargetImpl(*Caller)->getFeatureBits();
+ const FeatureBitset &CalleeBits =
+ TM.getSubtargetImpl(*Callee)->getFeatureBits();
+
+ return (CallerBits & CalleeBits) == CalleeBits;
+}
+
bool PPCTTIImpl::areTypesABICompatible(const Function *Caller,
const Function *Callee,
const ArrayRef<Type *> &Types) const {
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
index c3ade9968c336a0..bdc2f17d95c03c2 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
@@ -138,6 +138,8 @@ class PPCTTIImpl : public BasicTTIImplBase<PPCTTIImpl> {
bool UseMaskForCond = false, bool UseMaskForGaps = false);
InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);
+ bool areInlineCompatible(const Function *Caller,
+ const Function *Callee) const;
bool areTypesABICompatible(const Function *Caller, const Function *Callee,
const ArrayRef<Type *> &Types) const;
bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
diff --git a/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll b/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll
new file mode 100644
index 000000000000000..2b703cfaf87d4ac
--- /dev/null
+++ b/llvm/test/Transforms/Inline/PowerPC/inline-target-features.ll
@@ -0,0 +1,37 @@
+; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s
+; Check that we only inline when we have compatible target features.
+
+target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
+target triple = "powerpc64le-unknown-linux-gnu"
+
+define i32 @f1() #0 {
+; CHECK-LABEL: define i32 @f1(
+; CHECK-NEXT: [[CALL:%.*]] = call i32 (...) @f0()
+; CHECK-NEXT: ret i32 [[CALL]]
+;
+ %call = call i32 (...) @f0()
+ ret i32 %call
+}
+
+define i32 @f2() #1 {
+; CHECK-LABEL: define i32 @f2(
+; CHECK-NEXT: [[CALL_I:%.*]] = call i32 (...) @f0()
+; CHECK-NEXT: ret i32 [[CALL_I]]
+;
+ %call = call i32 @f1()
+ ret i32 %call
+}
+
+define i32 @f3() #0 {
+; CHECK-LABEL: define i32 @f3(
+; CHECK-NEXT: [[CALL:%.*]] = call i32 @f2()
+; CHECK-NEXT: ret i32 [[CALL]]
+;
+ %call = call i32 @f2()
+ ret i32 %call
+}
+
+declare i32 @f0(...) #0
+
+attributes #0 = { "target-cpu"="pwr7" "target-features"="-crbits,-crypto,-direct-move,-isa-v207-instructions,-power8-vector" }
+attributes #1 = { "target-cpu"="pwr8" "target-features"="+crbits,+crypto,+direct-move,+isa-v207-instructions,+power8-vector" }
|
const FeatureBitset &CalleeBits = | ||
TM.getSubtargetImpl(*Callee)->getFeatureBits(); | ||
|
||
return (CallerBits & CalleeBits) == CalleeBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the X86 equivalent function also checks for ABI compatibility. Wonder if that is also needed for PPC, for instance for vec-extabi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
And except for the functionality issue, do we need to exclude the PPC features that should not impact the inline optimization, like fusion related features fuse-store
, fuse-zeromove
and other optimization related features like fast-MFLR
, modern-aix-as
, ppc-postra-sched
... Inline optimization may introduce bigger improvement than breaking these optimization related features?
const FeatureBitset &CalleeBits = | ||
TM.getSubtargetImpl(*Callee)->getFeatureBits(); | ||
|
||
return (CallerBits & CalleeBits) == CalleeBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
And except for the functionality issue, do we need to exclude the PPC features that should not impact the inline optimization, like fusion related features fuse-store
, fuse-zeromove
and other optimization related features like fast-MFLR
, modern-aix-as
, ppc-postra-sched
... Inline optimization may introduce bigger improvement than breaking these optimization related features?
@@ -0,0 +1,37 @@ | |||
; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use utils/update_test_checks.py
to automatically generate the check lines? Seems the case is quite simple.
@@ -0,0 +1,37 @@ | |||
; RUN: opt < %s -mtriple=powerpc64le-unknown-linux-gnu -S -passes=inline | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since the change is not PPC Linux specific, maybe we can add RUN lines for AIX as well?
; Check that we only inline when we have compatible target features. | ||
|
||
target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512" | ||
target triple = "powerpc64le-unknown-linux-gnu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these two target related assignments seem unnecessary.
Simliar to other plateforms (X86, ARM), it should be safe to inline callees if their target-features
are a subset of the caller.