-
Notifications
You must be signed in to change notification settings - Fork 15k
[AArch64] Tune unrolling prefs for more patterns on Apple CPUs #149358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Ahmad Yasin (ayasin-a) ChangesEnhance the heuristics in Specifically, this patch adjusts two checks: Full diff: https://github.com/llvm/llvm-project/pull/149358.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 90d3d92d6bbf5..6d97ae7c8c5e7 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4808,10 +4808,10 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
if (Header == L->getLoopLatch()) {
// Estimate the size of the loop.
unsigned Size;
- if (!isLoopSizeWithinBudget(L, TTI, 8, &Size))
+ if (!isLoopSizeWithinBudget(L, TTI, 9, &Size))
return;
- SmallPtrSet<Value *, 8> LoadedValues;
+ SmallPtrSet<Value *, 8> LoadedValuesPlus;
SmallVector<StoreInst *> Stores;
for (auto *BB : L->blocks()) {
for (auto &I : *BB) {
@@ -4821,9 +4821,16 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
const SCEV *PtrSCEV = SE.getSCEV(Ptr);
if (SE.isLoopInvariant(PtrSCEV, L))
continue;
- if (isa<LoadInst>(&I))
- LoadedValues.insert(&I);
- else
+ if (isa<LoadInst>(&I)) {
+ LoadedValuesPlus.insert(&I);
+ // Included 1st users of loaded values
+ for (auto *U : I.users()) {
+ auto *Inst = dyn_cast<Instruction>(U);
+ if (!Inst || Inst->getParent() != BB)
+ continue;
+ LoadedValuesPlus.insert(U);
+ }
+ } else
Stores.push_back(cast<StoreInst>(&I));
}
}
@@ -4846,8 +4853,8 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
UC++;
}
- if (BestUC == 1 || none_of(Stores, [&LoadedValues](StoreInst *SI) {
- return LoadedValues.contains(SI->getOperand(0));
+ if (BestUC == 1 || none_of(Stores, [&LoadedValuesPlus](StoreInst *SI) {
+ return LoadedValuesPlus.contains(SI->getOperand(0));
}))
return;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adjusting the title to be a little more descriptive. Maybe something like:
[aarch64] Tune unrolling prefs for ... on Apple CPUs
Co-authored-by: Florian Hahn <flo@fhahn.com>
✅ With the latest revision this PR passed the C/C++ code formatter. |
Co-authored-by: Florian Hahn <flo@fhahn.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
…CPUs (#149358) Enhance the heuristics in `getAppleRuntimeUnrollPreferences` to let a bit more loops to be unrolled. Specifically, this patch adjusts two checks: I. Tune the loop size budget from 8 to 10 II. Include immediate in-loop users of loaded values in the load/stores dependencies predicate --------- Co-authored-by: Florian Hahn <flo@fhahn.com> PR: llvm/llvm-project#149358
Enhance the heuristics in
getAppleRuntimeUnrollPreferences
to let a bit more loops to be unrolled.Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 10
II. Include immediate in-loop users of loaded values in the load/stores dependencies predicate