-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopRotation] Enable LoopRotation with -Oz if header folds #72842
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms Author: None (mohammed-nurulhoque) ChangesLoopRotation duplicates the header and can increase codesize. If header size is more than a threshold, rotation is disabled. The threshold is set to 0 with -Oz, so loop rotation is disabled. Full diff: https://github.com/llvm/llvm-project/pull/72842.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 012aa5dbb9ca004..fab9dcdbab68145 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -365,6 +365,64 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
}
}
+// Check if duplicating the header to the preheader will fold all duplicated
+// instructions, terminating with an unconditional jump. So No code duplication
+// would be necessary.
+static bool canRotateWithoutDuplication(Loop *L,
+ llvm::LoopInfo *LI,
+ llvm::SimplifyQuery SQ) {
+ BasicBlock *OrigHeader = L->getHeader();
+ BasicBlock *OrigPreheader = L->getLoopPreheader();
+
+ BranchInst *T = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+ assert(T && T->isConditional() && "Header Terminator should be conditional!");
+
+ // a value map to holds the values incoming from preheader
+ ValueToValueMapTy ValueMap;
+
+ // iterate over non-debug instructions and check which ones fold
+ for (Instruction &Inst : OrigHeader->instructionsWithoutDebug()) {
+ // For PHI nodes, the values in the cloned header are the values in the preheader
+ if (PHINode *PN = dyn_cast<PHINode>(&Inst)) {
+ InsertNewValueIntoMap(ValueMap, PN,
+ PN->getIncomingValueForBlock(OrigPreheader));
+ continue;
+ }
+ if (Inst.mayHaveSideEffects())
+ return false;
+
+ Instruction *C = Inst.clone();
+ C->insertBefore(&Inst);
+
+ // Eagerly remap the operands of the instruction.
+ RemapInstruction(C, ValueMap, RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
+
+ // if we hit the terminator, return whether it folds
+ if (static_cast<BranchInst *>(&Inst) == T) {
+ if (isa<ConstantInt>(dyn_cast<BranchInst>(C)->getCondition())) {
+ LLVM_DEBUG(dbgs() << "Loop Rotation: bypassing header threshold "
+ "because the header folds\n");
+ C->eraseFromParent();
+ return true;
+ } else {
+ LLVM_DEBUG(
+ dbgs() << "Loop Rotation: skipping. Header threshold exceeded "
+ "and header doesn't fold\n");
+ C->eraseFromParent();
+ return false;
+ }
+ }
+
+ // other instructions, evaluate if possible
+ Value *V = simplifyInstruction(C, SQ);
+ if (V)
+ InsertNewValueIntoMap(ValueMap, &Inst, V);
+ C->eraseFromParent();
+ }
+
+ llvm_unreachable("OrigHeader didn't contain terminal branch!");
+}
+
/// Rotate loop LP. Return true if the loop is rotated.
///
/// \param SimplifiedLatch is true if the latch was just folded into the final
@@ -438,13 +496,15 @@ bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
return Rotated;
}
if (Metrics.NumInsts > MaxHeaderSize) {
- LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
- << Metrics.NumInsts
- << " instructions, which is more than the threshold ("
- << MaxHeaderSize << " instructions): ";
- L->dump());
- ++NumNotRotatedDueToHeaderSize;
- return Rotated;
+ if (!canRotateWithoutDuplication(L, LI, SQ)) {
+ LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
+ << Metrics.NumInsts
+ << " instructions, which is more than the threshold ("
+ << MaxHeaderSize << " instructions): ";
+ L->dump());
+ ++NumNotRotatedDueToHeaderSize;
+ return Rotated;
+ }
}
// When preparing for LTO, avoid rotating loops with calls that could be
diff --git a/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll b/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll
new file mode 100644
index 000000000000000..cca386ebc495b23
--- /dev/null
+++ b/llvm/test/Transforms/LoopRotate/rotate-oz-if-branch-fold.ll
@@ -0,0 +1,32 @@
+; RUN: opt < %s -S -Oz -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
+
+; Loop should be rotated for -Oz if the duplicated branch in the header
+; can be folded.
+; Test adapted from gcc-c-torture/execute/pr90949.c
+
+; OZ: bypassing header threshold
+; OZ: rotating Loop at depth 1 containing: %tailrecurse.i<header><exiting>,%tailrecurse.backedge.i<latch>
+; OZ: into Loop at depth 1 containing: %tailrecurse.backedge.i<header><latch><exiting>
+
+%struct.Node = type { %struct.Node* }
+
+@space = dso_local global [2 x %struct.Node] zeroinitializer, align 8
+
+define dso_local signext i32 @main() local_unnamed_addr #2 {
+entry:
+ store %struct.Node* null, %struct.Node** getelementptr inbounds ([2 x %struct.Node], [2 x %struct.Node]* @space, i64 0, i64 0, i32 0), align 8
+ br label %tailrecurse.i
+
+tailrecurse.i: ; preds = %tailrecurse.backedge.i, %entry
+ %module.tr.i = phi %struct.Node* [ getelementptr inbounds ([2 x %struct.Node], [2 x %struct.Node]* @space, i64 0, i64 0), %entry ], [ %module.tr.be.i, %tailrecurse.backedge.i ]
+ %cmp.i = icmp eq %struct.Node* %module.tr.i, null
+ br i1 %cmp.i, label %walk.exit, label %tailrecurse.backedge.i
+
+tailrecurse.backedge.i: ; preds = %tailrecurse.i
+ %module.tr.be.in.i = getelementptr inbounds %struct.Node, %struct.Node* %module.tr.i, i64 0, i32 0
+ %module.tr.be.i = load %struct.Node*, %struct.Node** %module.tr.be.in.i, align 8
+ br label %tailrecurse.i
+
+walk.exit: ; preds = %tailrecurse.i
+ ret i32 0
+}
|
1acd178
to
e1a6994
Compare
LoopRotation duplicates the header and can increase codesize. If header size is more than a threshold, rotation is disabled. But sometimes, the duplicated header folds, thus not increasing size. In that case, rotate the header even if > threshold header size.
e1a6994
to
c71ca6b
Compare
Hi @mohammed-nurulhoque. I'm not best placed to review this. I was going to suggest asking "recent" contributors to the file but now see @fhahn has already been added. Perhaps @MatzeB could help too or suggest a reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add this new optimization under an option, switched off by default.
// Check if duplicating the header to the preheader will fold all duplicated | ||
// instructions, terminating with an unconditional jump. So No code duplication | ||
// would be necessary. | ||
static bool canRotateWithoutDuplication(Loop *L, llvm::LoopInfo *LI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems LI
is not used.
; OZ: rotating Loop at depth 1 containing: %tailrecurse.i<header><exiting>,%tailrecurse.backedge.i<latch> | ||
; OZ: into Loop at depth 1 containing: %tailrecurse.backedge.i<header><latch><exiting> | ||
|
||
%struct.Node = type { %struct.Node* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to simplify the test to remove these complex types and auto-generated names with a lot of dots?
Also, it would be good idea to add tests for all return false;
cases, as well as for llvm_unreachable()
.
BranchInst *T = dyn_cast<BranchInst>(OrigHeader->getTerminator()); | ||
assert(T && T->isConditional() && "Header Terminator should be conditional!"); | ||
|
||
// a value map to holds the values incoming from preheader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please reformat this comment and some of them below, capitalizing first letter and period at the end of sentences.
return false; | ||
|
||
Instruction *C = Inst.clone(); | ||
C->insertBefore(&Inst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to actually clone and remap instructions for this? Can you simplifyInstructionWithOperands() instead?
@@ -0,0 +1,39 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 | |||
; RUN: opt < %s -S -Oz -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to test -Oz
, this must go into the PhaseOrdering directory -- and you also need dedicated tests for only LoopRotate. Also, do not test debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the heuristic determines we can rotate without duplication, the folding is guaranteed to happen right after rotate without anything interfering in between? cc @kyulee-com who rely on Oz and is very sensitive to any size increase.
LoopRotation duplicates the header and can increase codesize. If header size is more than a threshold, rotation is disabled. The threshold is set to 0 with -Oz, so loop rotation is disabled.
But sometimes, the duplicated header folds, thus not increasing size. In that case, rotate the header even if > threshold header size.