-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Fixes inlining issue in armv7 #169337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fixes inlining issue in armv7 #169337
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-arm Author: Croose (CrooseGit) ChangesThere is an issue on armv7 where a function wont be inlined due to mismatching target features between caller and callee. .fnstart
vsdot.s8 q0, q1, d4[0]
bx lr
.Lfunc_end0:
Thanks to @Amichaxx we managed to narrow it down and now can resolve this problem by adding Whilst we're at it we have also added some debugging to make it easier to tell why (or why not) a function is being inlined for ARM, and a couple other features that seem to be missing from the list. This patch was motivated by an issue experienced with rust that was traced back to llvm, and thus was designed to address that. Full diff: https://github.com/llvm/llvm-project/pull/169337.diff 2 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index d12b802fe234f..89ebc3e715930 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -102,6 +102,50 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
// the callers'.
bool MatchSubset = ((CallerBits & CalleeBits) & InlineFeaturesAllowed) ==
(CalleeBits & InlineFeaturesAllowed);
+
+ LLVM_DEBUG({
+ dbgs() << "=== Inline compatibility debug ===\n";
+ dbgs() << "Caller: " << Caller->getName() << "\n";
+ dbgs() << "Callee: " << Callee->getName() << "\n";
+
+ // Bit diffs
+ FeatureBitset MissingInCaller = CalleeBits & ~CallerBits; // callee-only
+ FeatureBitset ExtraInCaller = CallerBits & ~CalleeBits; // caller-only
+
+ // Counts
+ dbgs() << "Only-in-caller bit count: " << ExtraInCaller.count() << "\n";
+ dbgs() << "Only-in-callee bit count: " << MissingInCaller.count() << "\n";
+
+ dbgs() << "Only-in-caller feature indices [";
+ {
+ bool First = true;
+ for (size_t I = 0, E = ExtraInCaller.size(); I < E; ++I) {
+ if (ExtraInCaller.test(I)) {
+ if (!First) dbgs() << ", ";
+ dbgs() << I;
+ First = false;
+ }
+ }
+ }
+ dbgs() << "]\n";
+
+ dbgs() << "Only-in-callee feature indices [";
+ {
+ bool First = true;
+ for (size_t I = 0, E = MissingInCaller.size(); I < E; ++I) {
+ if (MissingInCaller.test(I)) {
+ if (!First) dbgs() << ", ";
+ dbgs() << I;
+ First = false;
+ }
+ }
+ }
+ dbgs() << "]\n";
+
+ // Indicies map to features as found in llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
+ dbgs() << "MatchExact=" << (MatchExact ? "true" : "false")
+ << " MatchSubset=" << (MatchSubset ? "true" : "false") << "\n";
+ });
return MatchExact && MatchSubset;
}
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 919a6fc9fd0b0..2ecfce0de9f55 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -70,6 +70,7 @@ class ARMTTIImpl final : public BasicTTIImplBase<ARMTTIImpl> {
// -thumb-mode in a caller with +thumb-mode, may cause the assembler to
// fail if the callee uses ARM only instructions, e.g. in inline asm.
const FeatureBitset InlineFeaturesAllowed = {
+ ARM::FeatureDotProd, ARM::HasV8Ops, ARM::FeatureBF16, ARM::FeatureSB,
ARM::FeatureVFP2, ARM::FeatureVFP3, ARM::FeatureNEON, ARM::FeatureThumb2,
ARM::FeatureFP16, ARM::FeatureVFP4, ARM::FeatureFPARMv8,
ARM::FeatureFullFP16, ARM::FeatureFP16FML, ARM::FeatureHWDivThumb,
|
|
Hi @davemgreen, I saw you edited |
|
The original review requested an allowlist as missed optimisations are preferable to miscompilations |
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp llvm/lib/Target/ARM/ARMTargetTransformInfo.h --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 89ebc3e71..f0d378b66 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -110,18 +110,19 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
// Bit diffs
FeatureBitset MissingInCaller = CalleeBits & ~CallerBits; // callee-only
- FeatureBitset ExtraInCaller = CallerBits & ~CalleeBits; // caller-only
+ FeatureBitset ExtraInCaller = CallerBits & ~CalleeBits; // caller-only
// Counts
dbgs() << "Only-in-caller bit count: " << ExtraInCaller.count() << "\n";
dbgs() << "Only-in-callee bit count: " << MissingInCaller.count() << "\n";
-
+
dbgs() << "Only-in-caller feature indices [";
{
bool First = true;
for (size_t I = 0, E = ExtraInCaller.size(); I < E; ++I) {
if (ExtraInCaller.test(I)) {
- if (!First) dbgs() << ", ";
+ if (!First)
+ dbgs() << ", ";
dbgs() << I;
First = false;
}
@@ -134,7 +135,8 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
bool First = true;
for (size_t I = 0, E = MissingInCaller.size(); I < E; ++I) {
if (MissingInCaller.test(I)) {
- if (!First) dbgs() << ", ";
+ if (!First)
+ dbgs() << ", ";
dbgs() << I;
First = false;
}
@@ -142,10 +144,11 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller,
}
dbgs() << "]\n";
- // Indicies map to features as found in llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
- dbgs() << "MatchExact=" << (MatchExact ? "true" : "false")
+ // Indicies map to features as found in
+ // llvm-project/(your_build)/lib/Target/ARM/ARMGenSubtargetInfo.inc
+ dbgs() << "MatchExact=" << (MatchExact ? "true" : "false")
<< " MatchSubset=" << (MatchSubset ? "true" : "false") << "\n";
- });
+ });
return MatchExact && MatchSubset;
}
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 2ecfce0de..87fee9a1b 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -70,32 +70,69 @@ class ARMTTIImpl final : public BasicTTIImplBase<ARMTTIImpl> {
// -thumb-mode in a caller with +thumb-mode, may cause the assembler to
// fail if the callee uses ARM only instructions, e.g. in inline asm.
const FeatureBitset InlineFeaturesAllowed = {
- ARM::FeatureDotProd, ARM::HasV8Ops, ARM::FeatureBF16, ARM::FeatureSB,
- ARM::FeatureVFP2, ARM::FeatureVFP3, ARM::FeatureNEON, ARM::FeatureThumb2,
- ARM::FeatureFP16, ARM::FeatureVFP4, ARM::FeatureFPARMv8,
- ARM::FeatureFullFP16, ARM::FeatureFP16FML, ARM::FeatureHWDivThumb,
- ARM::FeatureHWDivARM, ARM::FeatureDB, ARM::FeatureV7Clrex,
- ARM::FeatureAcquireRelease, ARM::FeatureSlowFPBrcc,
- ARM::FeaturePerfMon, ARM::FeatureTrustZone, ARM::Feature8MSecExt,
- ARM::FeatureCrypto, ARM::FeatureCRC, ARM::FeatureRAS,
- ARM::FeatureFPAO, ARM::FeatureFuseAES, ARM::FeatureZCZeroing,
- ARM::FeatureProfUnpredicate, ARM::FeatureSlowVGETLNi32,
- ARM::FeatureSlowVDUP32, ARM::FeaturePreferVMOVSR,
- ARM::FeaturePrefISHSTBarrier, ARM::FeatureMuxedUnits,
- ARM::FeatureSlowOddRegister, ARM::FeatureSlowLoadDSubreg,
- ARM::FeatureDontWidenVMOVS, ARM::FeatureExpandMLx,
- ARM::FeatureHasVMLxHazards, ARM::FeatureNEONForFPMovs,
- ARM::FeatureNEONForFP, ARM::FeatureCheckVLDnAlign,
- ARM::FeatureHasSlowFPVMLx, ARM::FeatureHasSlowFPVFMx,
- ARM::FeatureVMLxForwarding, ARM::FeaturePref32BitThumb,
- ARM::FeatureAvoidPartialCPSR, ARM::FeatureCheapPredicableCPSR,
- ARM::FeatureAvoidMOVsShOp, ARM::FeatureHasRetAddrStack,
- ARM::FeatureHasNoBranchPredictor, ARM::FeatureDSP, ARM::FeatureMP,
- ARM::FeatureVirtualization, ARM::FeatureMClass, ARM::FeatureRClass,
- ARM::FeatureAClass, ARM::FeatureStrictAlign, ARM::FeatureLongCalls,
- ARM::FeatureExecuteOnly, ARM::FeatureReserveR9, ARM::FeatureNoMovt,
- ARM::FeatureNoNegativeImmediates
- };
+ ARM::FeatureDotProd,
+ ARM::HasV8Ops,
+ ARM::FeatureBF16,
+ ARM::FeatureSB,
+ ARM::FeatureVFP2,
+ ARM::FeatureVFP3,
+ ARM::FeatureNEON,
+ ARM::FeatureThumb2,
+ ARM::FeatureFP16,
+ ARM::FeatureVFP4,
+ ARM::FeatureFPARMv8,
+ ARM::FeatureFullFP16,
+ ARM::FeatureFP16FML,
+ ARM::FeatureHWDivThumb,
+ ARM::FeatureHWDivARM,
+ ARM::FeatureDB,
+ ARM::FeatureV7Clrex,
+ ARM::FeatureAcquireRelease,
+ ARM::FeatureSlowFPBrcc,
+ ARM::FeaturePerfMon,
+ ARM::FeatureTrustZone,
+ ARM::Feature8MSecExt,
+ ARM::FeatureCrypto,
+ ARM::FeatureCRC,
+ ARM::FeatureRAS,
+ ARM::FeatureFPAO,
+ ARM::FeatureFuseAES,
+ ARM::FeatureZCZeroing,
+ ARM::FeatureProfUnpredicate,
+ ARM::FeatureSlowVGETLNi32,
+ ARM::FeatureSlowVDUP32,
+ ARM::FeaturePreferVMOVSR,
+ ARM::FeaturePrefISHSTBarrier,
+ ARM::FeatureMuxedUnits,
+ ARM::FeatureSlowOddRegister,
+ ARM::FeatureSlowLoadDSubreg,
+ ARM::FeatureDontWidenVMOVS,
+ ARM::FeatureExpandMLx,
+ ARM::FeatureHasVMLxHazards,
+ ARM::FeatureNEONForFPMovs,
+ ARM::FeatureNEONForFP,
+ ARM::FeatureCheckVLDnAlign,
+ ARM::FeatureHasSlowFPVMLx,
+ ARM::FeatureHasSlowFPVFMx,
+ ARM::FeatureVMLxForwarding,
+ ARM::FeaturePref32BitThumb,
+ ARM::FeatureAvoidPartialCPSR,
+ ARM::FeatureCheapPredicableCPSR,
+ ARM::FeatureAvoidMOVsShOp,
+ ARM::FeatureHasRetAddrStack,
+ ARM::FeatureHasNoBranchPredictor,
+ ARM::FeatureDSP,
+ ARM::FeatureMP,
+ ARM::FeatureVirtualization,
+ ARM::FeatureMClass,
+ ARM::FeatureRClass,
+ ARM::FeatureAClass,
+ ARM::FeatureStrictAlign,
+ ARM::FeatureLongCalls,
+ ARM::FeatureExecuteOnly,
+ ARM::FeatureReserveR9,
+ ARM::FeatureNoMovt,
+ ARM::FeatureNoNegativeImmediates};
const ARMSubtarget *getST() const { return ST; }
const ARMTargetLowering *getTLI() const { return TLI; }
|
Fixes issue where functions are not inlined when caller has these features, but callee does not.
This makes it easier to see why your function isn't getting inlined for.
|
@adamgemmell thanks, missed that. I suppose missing features will be added as they are spotted then (provided they are allowed to differ). |
45a393b to
c053e07
Compare
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - What makes a feature invalid for inlining? If it disables the use of some instructions/registers? It would seem that many more features (including FeatureFP64 and FeatureD32) could be added to the list. Why is HasV8Ops there if HasV8_1aOps isn't? Why is FeatureDotProd, but not FeatureAES?
Can you add a test? Both for inlining a function with dotprod and for inlining to a function with dotprod.
Removes FeatureD32 and FeatureFP64 from black list in comments as: - In https://reviews.llvm.org/D34697#805590 D16 and VFPOnlySP were added to this allowlist because they do "the opposite of what you would expect. - https://github.com/llvm/llvm-project/commit/760df47b778a530e9368a4b8706940ba103d57ba#diff-8165208908f69b3582d556451[…]6c4b474f2bf32c4ac7fec031cf2efd replaces the previous features with the inverse, but incorrectly keeps them in the allow list as the original reasoning no longer applies. Some subtarget features provide different instructions depending on whether they are set or unset, these features are believed safe as *not* having these features present does not add instructions.
c053e07 to
0f89803
Compare
There is an issue on armv7 where a function wont be inlined due to mismatching target features between caller and callee.
The caller has
HasV8OpsandFeatureDotProdand the callee does not, but AFAIK this should not be a problem.https://godbolt.org/z/f19h3zT66 is an example showing how the call is not inlined on armv7.
The expected asm output would be something like:
Thanks to @Amichaxx we managed to narrow it down and now can resolve this problem by adding
ARM::FeatureDotProd, ARM::HasV8Opsto InlineFeaturesAllowed in llvm/lib/Target/ARM/ARMTargetTransformInfo.h, after which the inlining occurs successfully.Whilst we're at it we have also added some debugging to make it easier to tell why (or why not) a function is being inlined for ARM, and a couple other features that seem to be missing from the list.
This patch was motivated by an issue experienced with rust that was traced back to llvm, and thus was designed to address that.