-
Couldn't load subscription status.
- Fork 15k
[GlobalOpt] Add TTI interface useFastCCForInternalCall for FASTCC #164768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR llvm#164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature. The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of llvm#164638 with this patch.
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86 Author: Phoebe Wang (phoebewang) ChangesBackground: X86 APX feature adds 16 registers within the same 64-bit mode. PR #164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature. The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of #164638 with this patch. Full diff: https://github.com/llvm/llvm-project/pull/164768.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 5d3b233ed6b6a..f52fb448fc584 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -943,6 +943,10 @@ class TargetTransformInfo {
/// should use coldcc calling convention.
LLVM_ABI bool useColdCCForColdCall(Function &F) const;
+ /// Return true if the input function is internal, should use fastcc calling
+ /// convention.
+ LLVM_ABI bool useFastCCForInternalCall(Function &F) const;
+
LLVM_ABI bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const;
/// Identifies if the vector form of the intrinsic has a scalar operand.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 4cd607c0d0c8d..064e28c504af4 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -431,6 +431,8 @@ class TargetTransformInfoImplBase {
virtual bool useColdCCForColdCall(Function &F) const { return false; }
+ virtual bool useFastCCForInternalCall(Function &F) const { return true; }
+
virtual bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const {
return false;
}
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index bf62623099a97..dd65d8375828c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -609,6 +609,10 @@ bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
return TTIImpl->useColdCCForColdCall(F);
}
+bool TargetTransformInfo::useFastCCForInternalCall(Function &F) const {
+ return TTIImpl->useFastCCForInternalCall(F);
+}
+
bool TargetTransformInfo::isTargetIntrinsicTriviallyScalarizable(
Intrinsic::ID ID) const {
return TTIImpl->isTargetIntrinsicTriviallyScalarizable(ID);
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index 133b3668a46c8..609861a53a0a0 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -319,6 +319,10 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> {
unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
Type *ScalarValTy) const override;
+ bool useFastCCForInternalCall(Function &F) const override {
+ return !ST->is64Bit() || ST->hasEGPR();
+ }
+
private:
bool supportsGather() const;
InstructionCost getGSVectorCost(unsigned Opcode, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 99c4982c58b47..1516a5bb7a6c2 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -2018,12 +2018,15 @@ OptimizeFunctions(Module &M,
if (hasChangeableCC(&F, ChangeableCCCache)) {
// If this function has a calling convention worth changing, is not a
- // varargs function, and is only called directly, promote it to use the
- // Fast calling convention.
- F.setCallingConv(CallingConv::Fast);
- ChangeCalleesToFastCall(&F);
- ++NumFastCallFns;
- Changed = true;
+ // varargs function, is only called directly, and is supported by the
+ // target, promote it to use the Fast calling convention.
+ TargetTransformInfo &TTI = GetTTI(F);
+ if (TTI.useFastCCForInternalCall(F)) {
+ F.setCallingConv(CallingConv::Fast);
+ ChangeCalleesToFastCall(&F);
+ ++NumFastCallFns;
+ Changed = true;
+ }
}
if (F.getAttributes().hasAttrSomewhere(Attribute::Nest) &&
diff --git a/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll b/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
index 70923c547940c..4a0c93f09c7df 100644
--- a/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
+++ b/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
@@ -12,7 +12,7 @@ define dso_local i32 @main() {
; CHECK-LABEL: define {{[^@]+}}@main() local_unnamed_addr {
; CHECK-NEXT: bb:
; CHECK-NEXT: store ptr null, ptr @_ZL3g_i, align 8
-; CHECK-NEXT: call fastcc void @_ZL13PutsSomethingv()
+; CHECK-NEXT: call void @_ZL13PutsSomethingv()
; CHECK-NEXT: ret i32 0
;
bb:
diff --git a/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll b/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
index a499fe1e4ad92..2b92d856d1848 100644
--- a/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
+++ b/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
@@ -15,7 +15,7 @@ define dso_local i32 @main() {
; CHECK-LABEL: define {{[^@]+}}@main() local_unnamed_addr {
; CHECK-NEXT: bb:
; CHECK-NEXT: store ptr null, ptr @_ZL3g_i, align 8
-; CHECK-NEXT: call fastcc void @_ZL13PutsSomethingv()
+; CHECK-NEXT: call void @_ZL13PutsSomethingv()
; CHECK-NEXT: ret i32 0
;
bb:
diff --git a/llvm/test/tools/gold/X86/merge-functions.ll b/llvm/test/tools/gold/X86/merge-functions.ll
index d4a49b1c40b47..296e7aa3f76f7 100644
--- a/llvm/test/tools/gold/X86/merge-functions.ll
+++ b/llvm/test/tools/gold/X86/merge-functions.ll
@@ -11,8 +11,8 @@
; Check that we've merged foo and bar
; CHECK: define dso_local noundef i32 @main()
-; CHECK-NEXT: tail call fastcc void @bar()
-; CHECK-NEXT: tail call fastcc void @bar()
+; CHECK-NEXT: tail call void @bar()
+; CHECK-NEXT: tail call void @bar()
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
diff --git a/llvm/test/tools/gold/X86/unified-lto.ll b/llvm/test/tools/gold/X86/unified-lto.ll
index e5030e863a64a..24eb94a08de39 100644
--- a/llvm/test/tools/gold/X86/unified-lto.ll
+++ b/llvm/test/tools/gold/X86/unified-lto.ll
@@ -25,10 +25,10 @@
; Constant propagation is not supported by thin LTO.
; With full LTO we fold argument into constant 43
; CHECK: define dso_local noundef i32 @main()
-; CHECK-NEXT: tail call fastcc void @foo()
+; CHECK-NEXT: tail call void @foo()
; CHECK-NEXT: ret i32 43
-; CHECK: define internal fastcc void @foo()
+; CHECK: define internal void @foo()
; CHECK-NEXT: store i32 43, ptr @_g, align 4
; ThinLTO doesn't import foo, because the latter has noinline attribute
|
| @@ -319,6 +319,10 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> { | |||
| unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy, | |||
| Type *ScalarValTy) const override; | |||
|
|
|||
| bool useFastCCForInternalCall(Function &F) const override { | |||
| return !ST->is64Bit() || ST->hasEGPR(); | |||
| } | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check that both caller and callee have EGPR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think single direction is enough. We can call a function without EGPR from EGPR enabled function, but not the opposite direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't really follow.
If we have no-EGPR -> EGPR, then the EGPR function may expect arguments to be passed in EGPR registers, while the no-EGPR function will push them to the stack.
If we have EGPR -> no-EGPR, then the EGPR function may pass arguments in EGPR registers, while the no-EGPR function will expect them to be on the stack.
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I get the point. I was confused with the inlining logic.
|
Hm, I'm not sure I like this. Would it make sense to potentially have multiple fastcc calling conventions (e.g. fastcc_egpr) and have the hook select the best one? For this specific case (where fastcc doesn't do anything on the baseline target) this wouldn't make a difference, but this would also accommodate other cases in the future. |
I don't think multiple fastcc calling conventions is useful, based on the fact that we cannot bind calling convention with a single feature. Assume we have multiple features in the future, the solution is still to disable fastcc instead of allowing multiple calling conventions coexist. |
|
"fastcc" was originally invented in the context of targets with a single calling convention, but where that calling convention had significant limitations. Like, on 32-bit x86, the default "C" calling convention passes everything on the stack. So it's basically supposed to be a variation on the C calling convention: assuming the same underlying machine constraints, but ignoring bad decisions made when the ABI documents were originally written. The x86-64 ABI developers learned those lessons, so fastcc is just the C calling convention; we didn't need to modify it. If we have new architectural features, and we want a different calling convention for machines that have the feature, versus machines which don't have the feature, I think we need more specific names for those calling conventions. Having the C calling convention vary based on per-function target features is a constant source of problems for interprocedural optimizations. We don't want to extend those same problems to fastcc. I think, if you want to pass arguments using APX registers, you should introduce a new "APX" calling convention. Probably from there, instead of a boolean useFastCCForInternalCall, you want a function |
|
I don't fully agree with the points. If fastcc is just (and always) the C calling convention, it means its existence is meaningless. I agree with its original intention and a well designed ABI doesn't need it at all. But it is the exactly reason why we retargeting it for APX. We are solving (almost) the same problem: a single calling convention has its design limitations. The difference is just, for 32-bit, it's bad design at the beginning; for APX, the design cannot fit the new feature. So, IMO, using fastcc for APX is the best choice. The way to use |
|
I sort of understand the logic of "we need a calling convention, fastcc isn't useful on x86-64, let's repurpose it". But there's a significant benefit to giving a calling convention with unusual semantics (like, it only works on targets that have APX enabled) its own name; it's less confusing, and it cleanly allows future extensions for all targets. And introducing a new calling convention really isn't that much more work. It isn't likely the set of integer registers on x86 will change again, but floating-point/vector registers have been getting constant updates. |
I'm afraid I disagree. Notice the minor difference between
The current floating-point/vector calling convention uses 8 registers for argument passing, and they are less common than GPRs (since we use them to pass PTR and/or large type like i128 etc.). So I don't think it's a problem is short time. AVX512 has extended total register number from 16 to 32 for more than a decade. I don't see a request to change fastcc for it :) Nevertheless, I have moved the caller check logic to target code, and kept 64-bit generating fastcc as is. Besides, it keeps the ability to be forward compatible to newer calling conventions and a target independent fastcc in the middle end. Please take another look. Thanks! |
Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR #164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature.
The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of #164638 with this patch.