Skip to content

Conversation

kper
Copy link
Contributor

@kper kper commented Sep 9, 2025

Folds urem when first modulus is redundant.

Alive: https://alive2.llvm.org/ce/z/b_GQJX

Fixes #157370

@kper kper requested a review from nikic as a code owner September 9, 2025 10:58
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Sep 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (kper)

Changes

Folds urem when first modulus is redundant.

Alive: https://alive2.llvm.org/ce/z/b_GQJX

Fixes #157370


Full diff: https://github.com/llvm/llvm-project/pull/157644.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+8)
  • (added) llvm/test/Transforms/InstCombine/urem.ll (+82)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index a9aacc707cc20..5288be9c84383 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -2473,6 +2473,14 @@ Instruction *InstCombinerImpl::visitURem(BinaryOperator &I) {
     }
   }
 
+  Value *A;
+  Value *B;
+  // urem(urem(A, B), Op1) -> urem(A, Op1)
+  if (match(Op0, m_URem(m_Value(A), m_Value(B)))) {
+    Value *Fold = Builder.CreateURem(A, Op1);
+    return replaceInstUsesWith(I, Fold);
+  }
+
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/urem.ll b/llvm/test/Transforms/InstCombine/urem.ll
new file mode 100644
index 0000000000000..751dc4991907a
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/urem.ll
@@ -0,0 +1,82 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+declare void @use(i8)
+
+define i8 @fold_urem_constants(i64 %arg0) {
+; CHECK-LABEL: @fold_urem_constants(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V2:%.*]] = urem i8 [[V0]], 5
+; CHECK-NEXT:    ret i8 [[V2]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, 25
+  %v2 = urem i8 %v1, 5
+  ret i8 %v2
+}
+
+define i8 @fold_urem_variables(i64 %arg0, i8 %A, i8 %B) {
+; CHECK-LABEL: @fold_urem_variables(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V2:%.*]] = urem i8 [[V0]], [[B:%.*]]
+; CHECK-NEXT:    ret i8 [[V2]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, %A
+  %v2 = urem i8 %v1, %B
+  ret i8 %v2
+}
+
+define i8 @fold_urem_constants_multi_use(i64 %arg0) {
+; CHECK-LABEL: @fold_urem_constants_multi_use(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V1:%.*]] = urem i8 [[V0]], 25
+; CHECK-NEXT:    call void @use(i8 [[V1]])
+; CHECK-NEXT:    [[V2:%.*]] = urem i8 [[V0]], 5
+; CHECK-NEXT:    ret i8 [[V2]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, 25
+  call void @use(i8 %v1)
+  %v2 = urem i8 %v1, 5
+  ret i8 %v2
+}
+
+define i8 @fold_urem_variables_multi_use(i64 %arg0, i8 %A, i8 %B) {
+; CHECK-LABEL: @fold_urem_variables_multi_use(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V1:%.*]] = urem i8 [[V0]], [[A:%.*]]
+; CHECK-NEXT:    call void @use(i8 [[V1]])
+; CHECK-NEXT:    [[V2:%.*]] = urem i8 [[V0]], [[B:%.*]]
+; CHECK-NEXT:    ret i8 [[V2]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, %A
+  call void @use(i8 %v1)
+  %v2 = urem i8 %v1, %B
+  ret i8 %v2
+}
+
+define i8 @fold_urem_constants_negative_test(i64 %arg0) {
+; CHECK-LABEL: @fold_urem_constants_negative_test(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V2:%.*]] = urem i8 [[V0]], 25
+; CHECK-NEXT:    ret i8 [[V2]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, 25
+  %v2 = urem i8 %v0, 5 ; removes this because not used
+  ret i8 %v1
+}
+
+define i8 @fold_urem_variables_negative_test(i64 %arg0, i8 %A, i8 %B) {
+; CHECK-LABEL: @fold_urem_variables_negative_test(
+; CHECK-NEXT:    [[V0:%.*]] = trunc i64 [[ARG0:%.*]] to i8
+; CHECK-NEXT:    [[V1:%.*]] = urem i8 [[V0]], [[A:%.*]]
+; CHECK-NEXT:    ret i8 [[V1]]
+;
+  %v0 = trunc i64 %arg0 to i8
+  %v1 = urem i8 %v0, %A
+  %v2 = urem i8 %v0, %B ; removes this because not used
+  ret i8 %v1
+}

@XChy XChy requested review from dtcxzyw and RKSimon September 9, 2025 11:04
@kper kper force-pushed the instcombine/157370-urem-fold branch from 7cabd0b to f648f02 Compare September 9, 2025 12:00
@kper kper force-pushed the instcombine/157370-urem-fold branch from f648f02 to b6529db Compare September 9, 2025 13:13
@kper kper force-pushed the instcombine/157370-urem-fold branch from b6529db to 5a7f62d Compare September 9, 2025 14:06
const APInt *Op1Cst, *BCst;
// urem(urem(A, BCst), Op1Cst) -> urem(A, Op1Cst)
// iff urem(BCst, Op1Cst) == 0
if (match(Op0, m_URem(m_Value(A), m_APInt(BCst))) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need a one-use check on the inner urem. Otherwise we often end up breaking up div+rem patterns, and also performing the urem on a wider type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you rerun the benchmark? (I am not sure whether I need rights for that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with this change the fold never triggers anymore...

Copy link
Contributor Author

@kper kper Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems so that it's always this div+rem pair in the pattern posted in the original issue. (actually found it only once)

https://github.com/dtcxzyw/llvm-opt-benchmark/blob/main/bench/ffmpeg/optimized/ac3dec_fixed.ll#L7580

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what's here left for me to do
Should I close this PR because it seems to be unprofitable to do the optimisation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should close this issue as "won't fix". It's a valid transform, but it doesn't look like it will be profitable in practice.

@kper kper force-pushed the instcombine/157370-urem-fold branch from b9559a7 to e6ed237 Compare September 9, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed Optimization: eliminate redundant modulus when divisor divides previous modulus
5 participants