Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopRotation] Enable LoopRotation with -Oz if header folds #72842

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mohammed-nurulhoque
Copy link
Contributor

LoopRotation duplicates the header and can increase codesize. If header size is more than a threshold, rotation is disabled. The threshold is set to 0 with -Oz, so loop rotation is disabled.
But sometimes, the duplicated header folds, thus not increasing size. In that case, rotate the header even if > threshold header size.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 20, 2023

@llvm/pr-subscribers-llvm-transforms

Author: None (mohammed-nurulhoque)

Changes

LoopRotation duplicates the header and can increase codesize. If header size is more than a threshold, rotation is disabled. The threshold is set to 0 with -Oz, so loop rotation is disabled.
But sometimes, the duplicated header folds, thus not increasing size. In that case, rotate the header even if > threshold header size.


Full diff: https://github.com/llvm/llvm-project/pull/72842.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+67-7)
  • (added) llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll (+32)
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 012aa5dbb9ca004..fab9dcdbab68145 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -365,6 +365,64 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
   }
 }
 
+// Check if duplicating the header to the preheader will fold all duplicated
+// instructions, terminating with an unconditional jump. So No code duplication
+// would be necessary.
+static bool canRotateWithoutDuplication(Loop *L,
+                                        llvm::LoopInfo *LI,
+                                        llvm::SimplifyQuery SQ) {
+  BasicBlock *OrigHeader = L->getHeader();
+  BasicBlock *OrigPreheader = L->getLoopPreheader();
+
+  BranchInst *T = dyn_cast<BranchInst>(OrigHeader->getTerminator());  
+  assert(T && T->isConditional() && "Header Terminator should be conditional!");
+
+  // a value map to holds the values incoming from preheader
+  ValueToValueMapTy ValueMap;
+  
+  // iterate over non-debug instructions and check which ones fold
+  for (Instruction &Inst : OrigHeader->instructionsWithoutDebug()) {
+    // For PHI nodes, the values in the cloned header are the values in the preheader
+    if (PHINode *PN = dyn_cast<PHINode>(&Inst)) {
+      InsertNewValueIntoMap(ValueMap, PN,
+                            PN->getIncomingValueForBlock(OrigPreheader));
+      continue;
+    }
+    if (Inst.mayHaveSideEffects())
+      return false;
+
+    Instruction *C = Inst.clone();
+    C->insertBefore(&Inst);
+
+    // Eagerly remap the operands of the instruction.
+    RemapInstruction(C, ValueMap, RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
+
+    // if we hit the terminator, return whether it folds
+    if (static_cast<BranchInst *>(&Inst) == T) {
+      if (isa<ConstantInt>(dyn_cast<BranchInst>(C)->getCondition())) {
+        LLVM_DEBUG(dbgs() << "Loop Rotation: bypassing header threshold "
+                             "because the header folds\n");
+        C->eraseFromParent();
+        return true;
+      } else {
+        LLVM_DEBUG(
+            dbgs() << "Loop Rotation: skipping. Header threshold exceeded "
+                      "and header doesn't fold\n");
+        C->eraseFromParent();
+        return false;
+      }
+    }
+
+    // other instructions, evaluate if possible
+    Value *V = simplifyInstruction(C, SQ);
+    if (V)
+      InsertNewValueIntoMap(ValueMap, &Inst, V);
+    C->eraseFromParent();
+  }
+
+  llvm_unreachable("OrigHeader didn't contain terminal branch!");  
+}
+
 /// Rotate loop LP. Return true if the loop is rotated.
 ///
 /// \param SimplifiedLatch is true if the latch was just folded into the final
@@ -438,13 +496,15 @@ bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
         return Rotated;
       }
       if (Metrics.NumInsts > MaxHeaderSize) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
-                          << Metrics.NumInsts
-                          << " instructions, which is more than the threshold ("
-                          << MaxHeaderSize << " instructions): ";
-                   L->dump());
-        ++NumNotRotatedDueToHeaderSize;
-        return Rotated;
+        if (!canRotateWithoutDuplication(L, LI, SQ)) {
+          LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
+                            << Metrics.NumInsts
+                            << " instructions, which is more than the threshold ("
+                            << MaxHeaderSize << " instructions): ";
+                     L->dump());
+          ++NumNotRotatedDueToHeaderSize;
+          return Rotated;
+        }
       }
 
       // When preparing for LTO, avoid rotating loops with calls that could be
diff --git a/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll b/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll
new file mode 100644
index 000000000000000..cca386ebc495b23
--- /dev/null
+++ b/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll
@@ -0,0 +1,32 @@
+; RUN: opt < %s -S -Oz -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
+
+; Loop should be rotated for -Oz if the duplicated branch in the header
+; can be folded.
+; Test adapted from gcc-c-torture/execute/pr90949.c
+
+; OZ: bypassing header threshold
+; OZ: rotating Loop at depth 1 containing: %tailrecurse.i<header><exiting>,%tailrecurse.backedge.i<latch>
+; OZ: into Loop at depth 1 containing: %tailrecurse.backedge.i<header><latch><exiting>
+
+%struct.Node = type { %struct.Node* }
+
+@space = dso_local global [2 x %struct.Node] zeroinitializer, align 8
+
+define dso_local signext i32 @main() local_unnamed_addr #2 {
+entry:
+  store %struct.Node* null, %struct.Node** getelementptr inbounds ([2 x %struct.Node], [2 x %struct.Node]* @space, i64 0, i64 0, i32 0), align 8
+  br label %tailrecurse.i
+
+tailrecurse.i:                                    ; preds = %tailrecurse.backedge.i, %entry
+  %module.tr.i = phi %struct.Node* [ getelementptr inbounds ([2 x %struct.Node], [2 x %struct.Node]* @space, i64 0, i64 0), %entry ], [ %module.tr.be.i, %tailrecurse.backedge.i ]
+  %cmp.i = icmp eq %struct.Node* %module.tr.i, null
+  br i1 %cmp.i, label %walk.exit, label %tailrecurse.backedge.i
+
+tailrecurse.backedge.i:                           ; preds = %tailrecurse.i
+  %module.tr.be.in.i = getelementptr inbounds %struct.Node, %struct.Node* %module.tr.i, i64 0, i32 0
+  %module.tr.be.i = load %struct.Node*, %struct.Node** %module.tr.be.in.i, align 8
+  br label %tailrecurse.i
+
+walk.exit:                                        ; preds = %tailrecurse.i
+  ret i32 0
+}

LoopRotation duplicates the header and can increase codesize. If
header size is more than a threshold, rotation is disabled. But
sometimes, the duplicated header folds, thus not increasing size.
In that case, rotate the header even if > threshold header size.
@OCHyams
Copy link
Contributor

OCHyams commented Dec 8, 2023

Hi @mohammed-nurulhoque. I'm not best placed to review this. I was going to suggest asking "recent" contributors to the file but now see @fhahn has already been added. Perhaps @MatzeB could help too or suggest a reviewer.

@OCHyams OCHyams removed their request for review December 8, 2023 10:14
Copy link
Collaborator

@dfukalov dfukalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add this new optimization under an option, switched off by default.

// Check if duplicating the header to the preheader will fold all duplicated
// instructions, terminating with an unconditional jump. So No code duplication
// would be necessary.
static bool canRotateWithoutDuplication(Loop *L, llvm::LoopInfo *LI,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems LI is not used.

; OZ: rotating Loop at depth 1 containing: %tailrecurse.i<header><exiting>,%tailrecurse.backedge.i<latch>
; OZ: into Loop at depth 1 containing: %tailrecurse.backedge.i<header><latch><exiting>

%struct.Node = type { %struct.Node* }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify the test to remove these complex types and auto-generated names with a lot of dots?
Also, it would be good idea to add tests for all return false; cases, as well as for llvm_unreachable().

BranchInst *T = dyn_cast<BranchInst>(OrigHeader->getTerminator());
assert(T && T->isConditional() && "Header Terminator should be conditional!");

// a value map to holds the values incoming from preheader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please reformat this comment and some of them below, capitalizing first letter and period at the end of sentences.

return false;

Instruction *C = Inst.clone();
C->insertBefore(&Inst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to actually clone and remap instructions for this? Can you simplifyInstructionWithOperands() instead?

@@ -0,0 +1,39 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
; RUN: opt < %s -S -Oz -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to test -Oz, this must go into the PhaseOrdering directory -- and you also need dedicated tests for only LoopRotate. Also, do not test debug output.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the heuristic determines we can rotate without duplication, the folding is guaranteed to happen right after rotate without anything interfering in between? cc @kyulee-com who rely on Oz and is very sensitive to any size increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants