-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x86] Optimize urem with a constant divisor to use multiply-by-reciprocal #92669
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-selectiondag Author: Shivam Gupta (xgupta) ChangesThe testcase provided by Henning Thielemann implemented a simple random number generator based on linear concurrences. Fix #6769 Full diff: https://github.com/llvm/llvm-project/pull/92669.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 2b181cd3ab1db..e6a5370e0fdef 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -5019,6 +5019,32 @@ SDValue DAGCombiner::visitREM(SDNode *N) {
}
}
+ // Optimization for urem with a constant divisor
+ if (!isSigned && isa<ConstantSDNode>(N1)) {
+ uint64_t M = cast<ConstantSDNode>(N1)->getZExtValue();
+ uint64_t R = (1ULL << 63) / M + 1;
+
+ SDValue Reciprocal = DAG.getConstant(R, DL, MVT::i64);
+ SDValue N0Ext = DAG.getZExtOrTrunc(N0, DL, MVT::i64);
+
+ // Multiply by reciprocal
+ SDValue Mul = DAG.getNode(ISD::MUL, DL, MVT::i64, N0Ext, Reciprocal);
+
+ // Right shift by 63 to get the quotient
+ SDValue ShiftAmount = DAG.getConstant(63, DL, MVT::i64);
+ SDValue Quotient = DAG.getNode(ISD::SRL, DL, MVT::i64, Mul, ShiftAmount);
+
+ // Multiply quotient by M to get the product
+ SDValue Modulus = DAG.getConstant(M, DL, MVT::i64);
+ SDValue Product = DAG.getNode(ISD::MUL, DL, MVT::i64, Quotient, Modulus);
+
+ // Subtract product from the original dividend to get the remainder
+ SDValue Remainder = DAG.getNode(ISD::SUB, DL, MVT::i64, N0Ext, Product);
+
+ // Truncate the result to the original type
+ return DAG.getNode(ISD::TRUNCATE, DL, VT, Remainder);
+ }
+
AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
// If X/C can be simplified by the division-by-constant logic, lower
diff --git a/llvm/test/CodeGen/X86/pr6769.ll b/llvm/test/CodeGen/X86/pr6769.ll
new file mode 100644
index 0000000000000..328bbf0f594c7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr6769.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+
+define i32 @_rnd(i32 %a0) {
+; CHECK-LABEL: _rnd:
+; CHECK: # %bb.0:
+; CHECK-NEXT: movl %edi, %ecx
+; CHECK-NEXT: imull $40692, %edi, %eax # imm = 0x9EF4
+; CHECK-NEXT: movabsq $174770829514140, %rdx # imm = 0x9EF40135D59C
+; CHECK-NEXT: imulq %rcx, %rdx
+; CHECK-NEXT: shrq $63, %rdx
+; CHECK-NEXT: imull $2147483399, %edx, %ecx # imm = 0x7FFFFF07
+; CHECK-NEXT: subl %ecx, %eax
+; CHECK-NEXT: retq
+ %x = zext i32 %a0 to i64
+ %y = mul i64 40692, %x
+ %z = urem i64 %y, 2147483399
+ %r = trunc i64 %z to i32
+ ret i32 %r
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't waste reviewers' time.
It only calls __umoddi3 when compiling for 32-bit x86. For x86-64 it already uses a multiply. |
Thanks for the information, in that case these changes are not required since there are not many 32 bit systems. |
The testcase provided by Henning Thielemann implemented a simple random number generator based on linear concurrences.
This needs a division and LLVM chooses to call __umoddi3 which is very slow since denominator is a constant, this can be expanded into a multiply-by-reciprocal sequence.
Fix #6769