[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669

xgupta · 2024-05-18T19:19:04Z

The testcase provided by Henning Thielemann implemented a simple random number generator based on linear concurrences.
This needs a division and LLVM chooses to call __umoddi3 which is very slow since denominator is a constant, this can be expanded into a multiply-by-reciprocal sequence.

Fix #6769

…ocal Fix llvm#6769

llvmbot · 2024-05-18T19:19:33Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Shivam Gupta (xgupta)

Changes

The testcase provided by Henning Thielemann implemented a simple random number generator based on linear concurrences.
This needs a division and LLVM chooses to call __umoddi3 which is very slow since denominator is a constant, this can be expanded into a multiply-by-reciprocal sequence.

Fix #6769

Full diff: https://github.com/llvm/llvm-project/pull/92669.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+26)
(added) llvm/test/CodeGen/X86/pr6769.ll (+20)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 2b181cd3ab1db..e6a5370e0fdef 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -5019,6 +5019,32 @@ SDValue DAGCombiner::visitREM(SDNode *N) {
     }
   }
 
+  // Optimization for urem with a constant divisor
+  if (!isSigned && isa<ConstantSDNode>(N1)) {
+    uint64_t M = cast<ConstantSDNode>(N1)->getZExtValue();
+    uint64_t R = (1ULL << 63) / M + 1;
+
+    SDValue Reciprocal = DAG.getConstant(R, DL, MVT::i64);
+    SDValue N0Ext = DAG.getZExtOrTrunc(N0, DL, MVT::i64);
+
+    // Multiply by reciprocal
+    SDValue Mul = DAG.getNode(ISD::MUL, DL, MVT::i64, N0Ext, Reciprocal);
+
+    // Right shift by 63 to get the quotient
+    SDValue ShiftAmount = DAG.getConstant(63, DL, MVT::i64);
+    SDValue Quotient = DAG.getNode(ISD::SRL, DL, MVT::i64, Mul, ShiftAmount);
+
+    // Multiply quotient by M to get the product
+    SDValue Modulus = DAG.getConstant(M, DL, MVT::i64);
+    SDValue Product = DAG.getNode(ISD::MUL, DL, MVT::i64, Quotient, Modulus);
+
+    // Subtract product from the original dividend to get the remainder
+    SDValue Remainder = DAG.getNode(ISD::SUB, DL, MVT::i64, N0Ext, Product);
+
+    // Truncate the result to the original type
+    return DAG.getNode(ISD::TRUNCATE, DL, VT, Remainder);
+  }
+
   AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
 
   // If X/C can be simplified by the division-by-constant logic, lower
diff --git a/llvm/test/CodeGen/X86/pr6769.ll b/llvm/test/CodeGen/X86/pr6769.ll
new file mode 100644
index 0000000000000..328bbf0f594c7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr6769.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+
+define i32 @_rnd(i32 %a0) {
+; CHECK-LABEL: _rnd:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    movl %edi, %ecx
+; CHECK-NEXT:    imull $40692, %edi, %eax # imm = 0x9EF4
+; CHECK-NEXT:    movabsq $174770829514140, %rdx # imm = 0x9EF40135D59C
+; CHECK-NEXT:    imulq %rcx, %rdx
+; CHECK-NEXT:    shrq $63, %rdx
+; CHECK-NEXT:    imull $2147483399, %edx, %ecx # imm = 0x7FFFFF07
+; CHECK-NEXT:    subl %ecx, %eax
+; CHECK-NEXT:    retq
+  %x = zext i32 %a0 to i64
+  %y = mul i64 40692, %x
+  %z = urem i64 %y, 2147483399
+  %r = trunc i64 %z to i32
+  ret i32 %r
+}

efriedma-quic

Please don't waste reviewers' time.

topperc · 2024-05-18T20:10:16Z

It only calls __umoddi3 when compiling for 32-bit x86. For x86-64 it already uses a multiply.

xgupta · 2024-05-18T20:26:14Z

It only calls __umoddi3 when compiling for 32-bit x86. For x86-64 it already uses a multiply.

Thanks for the information, in that case these changes are not required since there are not many 32 bit systems.

[x86] Optimize urem with a constant divisor to use multiply-by-recipr…

8731d01

…ocal Fix llvm#6769

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels May 18, 2024

xgupta requested review from RKSimon, arsenm and dtcxzyw May 18, 2024 19:20

efriedma-quic requested changes May 18, 2024

View reviewed changes

xgupta closed this May 18, 2024

xgupta deleted the urem branch June 12, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669

[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669

xgupta commented May 18, 2024

llvmbot commented May 18, 2024 •

edited

Loading

efriedma-quic left a comment

topperc commented May 18, 2024

xgupta commented May 18, 2024

[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669

[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669

Conversation

xgupta commented May 18, 2024

llvmbot commented May 18, 2024 • edited Loading

efriedma-quic left a comment

Choose a reason for hiding this comment

topperc commented May 18, 2024

xgupta commented May 18, 2024

llvmbot commented May 18, 2024 •

edited

Loading