Skip to content

Conversation

ayasin-a
Copy link
Contributor

@ayasin-a ayasin-a commented Jul 17, 2025

Enhance the heuristics in getAppleRuntimeUnrollPreferences to let a bit more loops to be unrolled.

Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 10
II. Include immediate in-loop users of loaded values in the load/stores dependencies predicate

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Ahmad Yasin (ayasin-a)

Changes

Enhance the heuristics in getAppleRuntimeUnrollPreferences to let a bit more loops to be unrolled.

Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 9
II. Include immediate users of loaded values in the load/stores dependencies predicate


Full diff: https://github.com/llvm/llvm-project/pull/149358.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+14-7)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 90d3d92d6bbf5..6d97ae7c8c5e7 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4808,10 +4808,10 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
   if (Header == L->getLoopLatch()) {
     // Estimate the size of the loop.
     unsigned Size;
-    if (!isLoopSizeWithinBudget(L, TTI, 8, &Size))
+    if (!isLoopSizeWithinBudget(L, TTI, 9, &Size))
       return;
 
-    SmallPtrSet<Value *, 8> LoadedValues;
+    SmallPtrSet<Value *, 8> LoadedValuesPlus;
     SmallVector<StoreInst *> Stores;
     for (auto *BB : L->blocks()) {
       for (auto &I : *BB) {
@@ -4821,9 +4821,16 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
         const SCEV *PtrSCEV = SE.getSCEV(Ptr);
         if (SE.isLoopInvariant(PtrSCEV, L))
           continue;
-        if (isa<LoadInst>(&I))
-          LoadedValues.insert(&I);
-        else
+        if (isa<LoadInst>(&I)) {
+          LoadedValuesPlus.insert(&I);
+          // Included 1st users of loaded values
+          for (auto *U : I.users()) {
+            auto *Inst = dyn_cast<Instruction>(U);
+            if (!Inst || Inst->getParent() != BB)
+              continue;
+            LoadedValuesPlus.insert(U);
+          }
+        } else
           Stores.push_back(cast<StoreInst>(&I));
       }
     }
@@ -4846,8 +4853,8 @@ getAppleRuntimeUnrollPreferences(Loop *L, ScalarEvolution &SE,
       UC++;
     }
 
-    if (BestUC == 1 || none_of(Stores, [&LoadedValues](StoreInst *SI) {
-          return LoadedValues.contains(SI->getOperand(0));
+    if (BestUC == 1 || none_of(Stores, [&LoadedValuesPlus](StoreInst *SI) {
+          return LoadedValuesPlus.contains(SI->getOperand(0));
         }))
       return;
 

Copy link
Contributor

@jroelofs jroelofs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adjusting the title to be a little more descriptive. Maybe something like:

[aarch64] Tune unrolling prefs for ... on Apple CPUs

@ayasin-a ayasin-a changed the title Unroll loops apple [AArch64] Tune unrolling prefs for more patterns on Apple CPUs Jul 23, 2025
Copy link

github-actions bot commented Aug 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@fhahn fhahn merged commit 1f2fb8e into llvm:main Aug 13, 2025
9 checks passed
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 13, 2025
…CPUs (#149358)

Enhance the heuristics in `getAppleRuntimeUnrollPreferences` to let a
bit more loops to be unrolled.

Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 10
II. Include immediate in-loop users of loaded values in the load/stores
dependencies predicate

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>

PR: llvm/llvm-project#149358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants