Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 21, 2025

Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Sep 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 21, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.


Full diff: https://github.com/llvm/llvm-project/pull/160012.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+22-2)
  • (modified) llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll (+2-2)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b08399b381f34..ee1f92a4197e8 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -15850,12 +15850,17 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
         To = SE.getUMaxExpr(FromRewritten, RHS);
         if (auto *UMin = dyn_cast<SCEVUMinExpr>(FromRewritten))
           EnqueueOperands(UMin);
+        if (RHS->isOne())
+          ExprsToRewrite.push_back(From);
         break;
       case CmpInst::ICMP_SGT:
       case CmpInst::ICMP_SGE:
         To = SE.getSMaxExpr(FromRewritten, RHS);
-        if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten))
+        if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten)) {
           EnqueueOperands(SMin);
+        }
+        if (RHS->isOne())
+          ExprsToRewrite.push_back(From);
         break;
       case CmpInst::ICMP_EQ:
         if (isa<SCEVConstant>(RHS))
@@ -15986,7 +15991,22 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
     for (const SCEV *Expr : ExprsToRewrite) {
       const SCEV *RewriteTo = Guards.RewriteMap[Expr];
       Guards.RewriteMap.erase(Expr);
-      Guards.RewriteMap.insert({Expr, Guards.rewrite(RewriteTo)});
+      const SCEV *Rewritten = Guards.rewrite(RewriteTo);
+
+      // Try to strengthen divisibility of SMax/UMax expressions coming from >=
+      // 1 conditions.
+      if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) {
+        unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1));
+        for (const SCEV *Op : drop_begin(SMax->operands(), 2))
+          MinTrailingZeros =
+              std::min(MinTrailingZeros, SE.getMinTrailingZeros(Op));
+        if (MinTrailingZeros != 0)
+          Rewritten = SE.getSMaxExpr(
+              SE.getConstant(APInt(SMax->getType()->getScalarSizeInBits(), 1)
+                                 .shl(MinTrailingZeros)),
+              SMax);
+      }
+      Guards.RewriteMap.insert({Expr, Rewritten});
     }
   }
 }
diff --git a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
index 8d091a00ed4b9..d38010403dad7 100644
--- a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
+++ b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
@@ -61,7 +61,7 @@ define void @umin(i32 noundef %a, i32 noundef %b) {
 ; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
 ; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i32 2147483646
 ; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
-; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 2
 ;
 ; void umin(unsigned a, unsigned b) {
 ;   a *= 2;
@@ -157,7 +157,7 @@ define void @smin(i32 noundef %a, i32 noundef %b) {
 ; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
 ; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i32 2147483646
 ; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
-; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 2
 ;
 ; void smin(signed a, signed b) {
 ;   a *= 2;

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 22, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
@fhahn
Copy link
Contributor Author

fhahn commented Sep 22, 2025

No differences on llvm-opt-benchmark (dtcxzyw/llvm-opt-benchmark#2846), but there are a few changes on large C/C++ corpus with unrolling and vectorization enabled.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?

Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?

// Try to strengthen divisibility of SMax/UMax expressions coming from >=
// 1 conditions.
if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) {
unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use getConstantMultiple instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks. Also updated the code to actually support UMax and SMax here, not just SMax

fhahn added a commit that referenced this pull request Sep 22, 2025
@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 7d05774 to 941d620 Compare September 22, 2025 20:03
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?

The benefit of delaying is that we can delay the re-write until we collected information from all loop guards, making the code independent of the order of guards.

We could have 3 guards, establishing

  • umax(%a, %b) > 0
  • %a multiple of 2
  • %b multiple of 2

When we construct umax(1, %a, %b) for the first condition, we may not yet have the information available that %a and %b are multiple of 2.

But once we collected all information, we can rewrite umax(1, %a, %b) to something like umax(1, 2 * %a / 2, 2* %b / 2) and get the common multiple using the info from the guards.

Not sure if there's a nicer way to keep things independent of the guard order.

Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.

Are there any particluar cases you are thinking of on construction?

// Try to strengthen divisibility of SMax/UMax expressions coming from >=
// 1 conditions.
if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) {
unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks. Also updated the code to actually support UMax and SMax here, not just SMax

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 22, 2025
@nikic
Copy link
Contributor

nikic commented Sep 22, 2025

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.

Are there any particluar cases you are thinking of on construction?

What I meant is that we can generally fold umax(C1, C2*x) with C2>C1 and x!=0 to umax(C2, C2*x). But I guess x!=0 is the critical part here -- we generally do not know that outside the guard context (or rather, can't ignore it outside the guard context).

@fhahn
Copy link
Contributor Author

fhahn commented Sep 23, 2025

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.
Are there any particluar cases you are thinking of on construction?

What I meant is that we can generally fold umax(C1, C2*x) with C2>C1 and x!=0 to umax(C2, C2*x). But I guess x!=0 is the critical part here -- we generally do not know that outside the guard context (or rather, can't ignore it outside the guard context).

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

@preames
Copy link
Collaborator

preames commented Sep 23, 2025

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

Your wording here triggered a thought. When phrased like this, this sounds a lot like SCEV construction under an assumption (or predicate) that x != 0. We have a bunch of logic of this variety in PredicatedScalarEvolution, and our assumption handling already, is there a possibility for code sharing here?

(This may not be worth the work to actually do immediately. Not a blocking comment by any means.)

@fhahn
Copy link
Contributor Author

fhahn commented Sep 24, 2025

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

Your wording here triggered a thought. When phrased like this, this sounds a lot like SCEV construction under an assumption (or predicate) that x != 0. We have a bunch of logic of this variety in PredicatedScalarEvolution, and our assumption handling already, is there a possibility for code sharing here?

(This may not be worth the work to actually do immediately. Not a blocking comment by any means.)

Thanks, I need to think about this a bit more. With both PredicatedScalarEvolution and loop guards we rewrite SCEV expressions given extra information (runtime predicates and information from guards respectively), but currently there does't seem much overlap in the types of expressions we rewrite, with PSE mostly focused on extends and forced AddRecs.

Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard
divisibility info of Op. This patch rewrites such SMax/UMax expressions
to use the lowest common multiplier for all non-constant operands.
@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 941d620 to 59a8e0f Compare September 29, 2025 18:26
@fhahn
Copy link
Contributor Author

fhahn commented Sep 29, 2025

ping :)

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 1, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants