-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SCEV] Preserve divisibility info when creating UMax/SMax expressions. #160012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesCurrently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands. Full diff: https://github.com/llvm/llvm-project/pull/160012.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b08399b381f34..ee1f92a4197e8 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -15850,12 +15850,17 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
To = SE.getUMaxExpr(FromRewritten, RHS);
if (auto *UMin = dyn_cast<SCEVUMinExpr>(FromRewritten))
EnqueueOperands(UMin);
+ if (RHS->isOne())
+ ExprsToRewrite.push_back(From);
break;
case CmpInst::ICMP_SGT:
case CmpInst::ICMP_SGE:
To = SE.getSMaxExpr(FromRewritten, RHS);
- if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten))
+ if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten)) {
EnqueueOperands(SMin);
+ }
+ if (RHS->isOne())
+ ExprsToRewrite.push_back(From);
break;
case CmpInst::ICMP_EQ:
if (isa<SCEVConstant>(RHS))
@@ -15986,7 +15991,22 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
for (const SCEV *Expr : ExprsToRewrite) {
const SCEV *RewriteTo = Guards.RewriteMap[Expr];
Guards.RewriteMap.erase(Expr);
- Guards.RewriteMap.insert({Expr, Guards.rewrite(RewriteTo)});
+ const SCEV *Rewritten = Guards.rewrite(RewriteTo);
+
+ // Try to strengthen divisibility of SMax/UMax expressions coming from >=
+ // 1 conditions.
+ if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) {
+ unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1));
+ for (const SCEV *Op : drop_begin(SMax->operands(), 2))
+ MinTrailingZeros =
+ std::min(MinTrailingZeros, SE.getMinTrailingZeros(Op));
+ if (MinTrailingZeros != 0)
+ Rewritten = SE.getSMaxExpr(
+ SE.getConstant(APInt(SMax->getType()->getScalarSizeInBits(), 1)
+ .shl(MinTrailingZeros)),
+ SMax);
+ }
+ Guards.RewriteMap.insert({Expr, Rewritten});
}
}
}
diff --git a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
index 8d091a00ed4b9..d38010403dad7 100644
--- a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
+++ b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
@@ -61,7 +61,7 @@ define void @umin(i32 noundef %a, i32 noundef %b) {
; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 2147483646
; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
-; CHECK-NEXT: Loop %for.body: Trip multiple is 1
+; CHECK-NEXT: Loop %for.body: Trip multiple is 2
;
; void umin(unsigned a, unsigned b) {
; a *= 2;
@@ -157,7 +157,7 @@ define void @smin(i32 noundef %a, i32 noundef %b) {
; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 2147483646
; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
-; CHECK-NEXT: Loop %for.body: Trip multiple is 1
+; CHECK-NEXT: Loop %for.body: Trip multiple is 2
;
; void smin(signed a, signed b) {
; a *= 2;
|
When re-writing SCEVAddExprs to apply information from guards, check if we have information for the expression itself. If so, apply it. When we have an expression of the form (Const + A), check if we have have guard info for (Const + 1 + A) and use it. This is needed to avoid regressions in a few cases, where we have BTCs with a subtracted constant. Rewriting expressions could cause regressions, e.g. when comparing 2 SCEV expressions where we are only able to rewrite one side, but I could not find any cases where this happens more with this patch in practice. Depends on llvm#160012 (included in PR) Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
No differences on llvm-opt-benchmark (dtcxzyw/llvm-opt-benchmark#2846), but there are a few changes on large C/C++ corpus with unrolling and vectorization enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?
Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?
// Try to strengthen divisibility of SMax/UMax expressions coming from >= | ||
// 1 conditions. | ||
if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) { | ||
unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use getConstantMultiple instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks. Also updated the code to actually support UMax and SMax here, not just SMax
Add test for SCEVUMaxExpr handling in #160012.
7d05774
to
941d620
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?
The benefit of delaying is that we can delay the re-write until we collected information from all loop guards, making the code independent of the order of guards.
We could have 3 guards, establishing
- umax(%a, %b) > 0
- %a multiple of 2
- %b multiple of 2
When we construct umax(1, %a, %b)
for the first condition, we may not yet have the information available that %a and %b are multiple of 2.
But once we collected all information, we can rewrite umax(1, %a, %b)
to something like umax(1, 2 * %a / 2, 2* %b / 2)
and get the common multiple using the info from the guards.
Not sure if there's a nicer way to keep things independent of the guard order.
Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?
Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.
Are there any particluar cases you are thinking of on construction?
// Try to strengthen divisibility of SMax/UMax expressions coming from >= | ||
// 1 conditions. | ||
if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) { | ||
unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks. Also updated the code to actually support UMax and SMax here, not just SMax
…nfo from guard Add test for SCEVUMaxExpr handling in llvm/llvm-project#160012.
What I meant is that we can generally fold |
Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check |
Your wording here triggered a thought. When phrased like this, this sounds a lot like SCEV construction under an assumption (or predicate) that x != 0. We have a bunch of logic of this variety in PredicatedScalarEvolution, and our assumption handling already, is there a possibility for code sharing here? (This may not be worth the work to actually do immediately. Not a blocking comment by any means.) |
Thanks, I need to think about this a bit more. With both PredicatedScalarEvolution and loop guards we rewrite SCEV expressions given extra information (runtime predicates and information from guards respectively), but currently there does't seem much overlap in the types of expressions we rewrite, with PSE mostly focused on extends and forced AddRecs. |
Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.
941d620
to
59a8e0f
Compare
ping :) |
When collecting information from loop guards, use UMax(1, %b - %a) for ICMP NE %a, %b, if neither are constant. This improves results in some cases, and will be even more useful together with * llvm#160012 * llvm#159942 https://alive2.llvm.org/ce/z/YyBvoT
Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.