[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` #85772

dtcxzyw · 2024-03-19T11:14:26Z

fneg + copysign is better than fmul for analysis/codegen.
godbolt: https://godbolt.org/z/eEs6dGd1G
Alive2: https://alive2.llvm.org/ce/z/K3M5BA

llvmbot · 2024-03-19T11:14:57Z

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

fneg + copysign is better than fmul for analysis/codegen.
godbolt: https://godbolt.org/z/eEs6dGd1G
Alive2: https://alive2.llvm.org/ce/z/xzp4QD

Full diff: https://github.com/llvm/llvm-project/pull/85772.diff

4 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+11-2)
(modified) llvm/test/Transforms/InstCombine/binop-itofp.ll (+7-5)
(modified) llvm/test/Transforms/InstCombine/fmul.ll (+85-3)
(modified) llvm/test/Transforms/InstCombine/fpcast.ll (+1-4)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 9d4c271f990d19..9582084e23b8b1 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -814,8 +814,17 @@ Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
   if (match(Op1, m_SpecificFP(-1.0)))
     return UnaryOperator::CreateFNegFMF(Op0, &I);
 
-  // With no-nans: X * 0.0 --> copysign(0.0, X)
-  if (I.hasNoNaNs() && match(Op1, m_PosZeroFP())) {
+  // With no-nans:
+  // X * 0.0 --> copysign(0.0, X)
+  // X * -0.0 --> copysign(0.0, -X)
+  if (match(Op1, m_AnyZeroFP()) &&
+      isKnownNeverNaN(&I, /*Depth=*/0, SQ.getWithInstruction(&I))) {
+    if (match(Op1, m_NegZeroFP())) {
+      Op0 = Builder.CreateFNegFMF(Op0, &I);
+      // Canonicalize -0.0 to 0.0
+      Op1 = ConstantFoldUnaryOpOperand(Instruction::FNeg, cast<Constant>(Op1),
+                                       DL);
+    }
     CallInst *CopySign = Builder.CreateIntrinsic(Intrinsic::copysign,
                                                  {I.getType()}, {Op1, Op0}, &I);
     return replaceInstUsesWith(I, CopySign);
diff --git a/llvm/test/Transforms/InstCombine/binop-itofp.ll b/llvm/test/Transforms/InstCombine/binop-itofp.ll
index c72e4ac413a3ed..4a78e0058e4917 100644
--- a/llvm/test/Transforms/InstCombine/binop-itofp.ll
+++ b/llvm/test/Transforms/InstCombine/binop-itofp.ll
@@ -1012,7 +1012,7 @@ define float @missed_nonzero_check_on_constant_for_si_fmul(i1 %c, i1 %.b, ptr %g
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i32 65529, i32 53264
 ; CHECK-NEXT:    [[CONV_I:%.*]] = trunc i32 [[SEL]] to i16
 ; CHECK-NEXT:    [[CONV1_I:%.*]] = sitofp i16 [[CONV_I]] to float
-; CHECK-NEXT:    [[MUL3_I_I:%.*]] = fmul float [[CONV1_I]], 0.000000e+00
+; CHECK-NEXT:    [[MUL3_I_I:%.*]] = call float @llvm.copysign.f32(float 0.000000e+00, float [[CONV1_I]])
 ; CHECK-NEXT:    store i32 [[SEL]], ptr [[G_2345:%.*]], align 4
 ; CHECK-NEXT:    ret float [[MUL3_I_I]]
 ;
@@ -1031,7 +1031,7 @@ define <2 x float> @missed_nonzero_check_on_constant_for_si_fmul_vec(i1 %c, i1 %
 ; CHECK-NEXT:    [[CONV_I_V:%.*]] = insertelement <2 x i16> poison, i16 [[CONV_I_S]], i64 0
 ; CHECK-NEXT:    [[CONV_I:%.*]] = shufflevector <2 x i16> [[CONV_I_V]], <2 x i16> poison, <2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[CONV1_I:%.*]] = sitofp <2 x i16> [[CONV_I]] to <2 x float>
-; CHECK-NEXT:    [[MUL3_I_I:%.*]] = fmul <2 x float> [[CONV1_I]], zeroinitializer
+; CHECK-NEXT:    [[MUL3_I_I:%.*]] = call <2 x float> @llvm.copysign.v2f32(<2 x float> zeroinitializer, <2 x float> [[CONV1_I]])
 ; CHECK-NEXT:    store i32 [[SEL]], ptr [[G_2345:%.*]], align 4
 ; CHECK-NEXT:    ret <2 x float> [[MUL3_I_I]]
 ;
@@ -1050,7 +1050,8 @@ define float @negzero_check_on_constant_for_si_fmul(i1 %c, i1 %.b, ptr %g_2345)
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i32 65529, i32 53264
 ; CHECK-NEXT:    [[CONV_I:%.*]] = trunc i32 [[SEL]] to i16
 ; CHECK-NEXT:    [[CONV1_I:%.*]] = sitofp i16 [[CONV_I]] to float
-; CHECK-NEXT:    [[MUL3_I_I:%.*]] = fmul float [[CONV1_I]], -0.000000e+00
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg float [[CONV1_I]]
+; CHECK-NEXT:    [[MUL3_I_I:%.*]] = call float @llvm.copysign.f32(float 0.000000e+00, float [[TMP1]])
 ; CHECK-NEXT:    store i32 [[SEL]], ptr [[G_2345:%.*]], align 4
 ; CHECK-NEXT:    ret float [[MUL3_I_I]]
 ;
@@ -1069,7 +1070,7 @@ define <2 x float> @nonzero_check_on_constant_for_si_fmul_vec_w_undef(i1 %c, i1
 ; CHECK-NEXT:    [[CONV_I_V:%.*]] = insertelement <2 x i16> poison, i16 [[CONV_I_S]], i64 0
 ; CHECK-NEXT:    [[CONV_I:%.*]] = shufflevector <2 x i16> [[CONV_I_V]], <2 x i16> poison, <2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[CONV1_I:%.*]] = sitofp <2 x i16> [[CONV_I]] to <2 x float>
-; CHECK-NEXT:    [[MUL3_I_I:%.*]] = fmul <2 x float> [[CONV1_I]], <float undef, float 0.000000e+00>
+; CHECK-NEXT:    [[MUL3_I_I:%.*]] = call <2 x float> @llvm.copysign.v2f32(<2 x float> <float undef, float 0.000000e+00>, <2 x float> [[CONV1_I]])
 ; CHECK-NEXT:    store i32 [[SEL]], ptr [[G_2345:%.*]], align 4
 ; CHECK-NEXT:    ret <2 x float> [[MUL3_I_I]]
 ;
@@ -1111,7 +1112,8 @@ define <2 x float> @nonzero_check_on_constant_for_si_fmul_negz_vec_w_undef(i1 %c
 ; CHECK-NEXT:    [[CONV_I_V:%.*]] = insertelement <2 x i16> poison, i16 [[CONV_I_S]], i64 0
 ; CHECK-NEXT:    [[CONV_I:%.*]] = shufflevector <2 x i16> [[CONV_I_V]], <2 x i16> poison, <2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[CONV1_I:%.*]] = sitofp <2 x i16> [[CONV_I]] to <2 x float>
-; CHECK-NEXT:    [[MUL3_I_I:%.*]] = fmul <2 x float> [[CONV1_I]], <float undef, float -0.000000e+00>
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg <2 x float> [[CONV1_I]]
+; CHECK-NEXT:    [[MUL3_I_I:%.*]] = call <2 x float> @llvm.copysign.v2f32(<2 x float> <float undef, float 0.000000e+00>, <2 x float> [[TMP1]])
 ; CHECK-NEXT:    store i32 [[SEL]], ptr [[G_2345:%.*]], align 4
 ; CHECK-NEXT:    ret <2 x float> [[MUL3_I_I]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/fmul.ll b/llvm/test/Transforms/InstCombine/fmul.ll
index 7e7373e6ef5bdd..913837cb65eb48 100644
--- a/llvm/test/Transforms/InstCombine/fmul.ll
+++ b/llvm/test/Transforms/InstCombine/fmul.ll
@@ -1268,13 +1268,95 @@ define half @mul_zero(half %x) {
   ret half %r
 }
 
-; TODO: This could be fneg+copysign.
-
 define half @mul_negzero_nnan(half %x) {
 ; CHECK-LABEL: @mul_negzero_nnan(
-; CHECK-NEXT:    [[R:%.*]] = fmul nnan half [[X:%.*]], 0xH8000
+; CHECK-NEXT:    [[TMP1:%.*]] = fneg nnan half [[X:%.*]]
+; CHECK-NEXT:    [[R:%.*]] = call nnan half @llvm.copysign.f16(half 0xH0000, half [[TMP1]])
 ; CHECK-NEXT:    ret half [[R]]
 ;
   %r = fmul nnan half %x, -0.0
   ret half %r
 }
+
+define float @mul_pos_zero_nnan_ninf(float nofpclass(inf nan) %a) {
+; CHECK-LABEL: @mul_pos_zero_nnan_ninf(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.copysign.f32(float 0.000000e+00, float [[A:%.*]])
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul float %a, 0.000000e+00
+  ret float %ret
+}
+
+define float @mul_pos_zero_nnan(float nofpclass(nan) %a) {
+; CHECK-LABEL: @mul_pos_zero_nnan(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RET:%.*]] = fmul float [[A:%.*]], 0.000000e+00
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul float %a, 0.000000e+00
+  ret float %ret
+}
+
+define float @mul_pos_zero_nnan_ninf_fmf(float nofpclass(nan) %a) {
+; CHECK-LABEL: @mul_pos_zero_nnan_ninf_fmf(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RET:%.*]] = fmul ninf float [[A:%.*]], 0.000000e+00
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul ninf float %a, 0.000000e+00
+  ret float %ret
+}
+
+define float @mul_neg_zero_nnan_ninf(float nofpclass(inf nan) %a) {
+; CHECK-LABEL: @mul_neg_zero_nnan_ninf(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = fneg float [[A:%.*]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.copysign.f32(float 0.000000e+00, float [[TMP0]])
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul float %a, -0.000000e+00
+  ret float %ret
+}
+
+define float @mul_neg_zero_nnan_fmf(float %a) {
+; CHECK-LABEL: @mul_neg_zero_nnan_fmf(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = fneg nnan float [[A:%.*]]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.copysign.f32(float 0.000000e+00, float [[TMP0]])
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul nnan float %a, -0.000000e+00
+  ret float %ret
+}
+
+define float @mul_neg_zero_nnan_ninf_fmf(float nofpclass(inf nan) %a) {
+; CHECK-LABEL: @mul_neg_zero_nnan_ninf_fmf(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = fneg nnan ninf float [[A:%.*]]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan ninf float @llvm.copysign.f32(float 0.000000e+00, float [[TMP0]])
+; CHECK-NEXT:    ret float [[RET]]
+;
+entry:
+  %ret = fmul nnan ninf float %a, -0.000000e+00
+  ret float %ret
+}
+
+; poison propagates through vector elements
+
+define <2 x float> @mul_neg_zero_nnan_ninf_vec(<2 x float> nofpclass(inf nan) %a) {
+; CHECK-LABEL: @mul_neg_zero_nnan_ninf_vec(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = fneg <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.copysign.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x float> [[RET]]
+;
+entry:
+  %ret = fmul <2 x float> %a, <float -0.0, float poison>
+  ret <2 x float> %ret
+}
diff --git a/llvm/test/Transforms/InstCombine/fpcast.ll b/llvm/test/Transforms/InstCombine/fpcast.ll
index 1f0bfbc40ac1b6..fee78ab315ea06 100644
--- a/llvm/test/Transforms/InstCombine/fpcast.ll
+++ b/llvm/test/Transforms/InstCombine/fpcast.ll
@@ -424,10 +424,7 @@ define i32 @fptosi_select(i1 %cond) {
 define i32 @mul_pos_zero_convert(i32 %a) {
 ; CHECK-LABEL: @mul_pos_zero_convert(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FP:%.*]] = sitofp i32 [[A:%.*]] to float
-; CHECK-NEXT:    [[RET:%.*]] = fmul float [[FP]], 0.000000e+00
-; CHECK-NEXT:    [[CONV:%.*]] = fptosi float [[RET]] to i32
-; CHECK-NEXT:    ret i32 [[CONV]]
+; CHECK-NEXT:    ret i32 0
 ;
 entry:
   %fp = sitofp i32 %a to float

PR Link: llvm/llvm-project#85772

llvm/test/Transforms/InstCombine/fmul.ll

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/fmul.ll

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

dtcxzyw · 2024-03-19T14:32:52Z

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

+       isKnownNeverNaN(&I, /*Depth=*/0, SQ.getWithInstruction(&I)))) {
+    if (FPC->isNegative())
+      Op0 = Builder.CreateFNegFMF(Op0, &I);
+    Op1 = Constant::replaceUndefsWith(


Alive2: https://alive2.llvm.org/ce/z/GNc4gb

…85787) Alive2: https://alive2.llvm.org/ce/z/vFykcZ Address the comment #85772 (comment). Unfortunately, non-splat vector constants are not supported because we haven't implemented constant folding of fabs with vector operands.

…lvm#85787) Alive2: https://alive2.llvm.org/ce/z/vFykcZ Address the comment llvm#85772 (comment). Unfortunately, non-splat vector constants are not supported because we haven't implemented constant folding of fabs with vector operands.

`fneg + copysign` is better than fmul for analysis/codegen. godbolt: https://godbolt.org/z/eEs6dGd1G Alive2: https://alive2.llvm.org/ce/z/K3M5BA

dtcxzyw requested a review from arsenm March 19, 2024 11:14

dtcxzyw requested a review from nikic as a code owner March 19, 2024 11:14

llvmbot added the llvm:transforms label Mar 19, 2024

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 19, 2024

pre-commit: test PR85772

7588859

PR Link: llvm/llvm-project#85772

dtcxzyw mentioned this pull request Mar 19, 2024

pre-commit: test PR85772 dtcxzyw/llvm-opt-benchmark#426

Closed

arsenm reviewed Mar 19, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fmul.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/fmul.ll Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

arsenm reviewed Mar 19, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fmul.ll Show resolved Hide resolved

dtcxzyw force-pushed the perf/mul-zero-nnan-ninf branch 2 times, most recently from 9466a62 to fd81698 Compare March 19, 2024 12:59

dtcxzyw mentioned this pull request Mar 19, 2024

[InstCombine] Clear sign-bit of the constant magnitude in copysign #85787

Merged

arsenm reviewed Mar 19, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

nikic added the floating-point Floating-point math label Mar 19, 2024

nikic removed their request for review March 19, 2024 14:20

dtcxzyw force-pushed the perf/mul-zero-nnan-ninf branch from fd81698 to f205f2c Compare March 19, 2024 14:31

dtcxzyw commented Mar 19, 2024

View reviewed changes

arsenm approved these changes Mar 19, 2024

View reviewed changes

dtcxzyw added 2 commits March 20, 2024 17:38

[InstCombine] Add pre-commit tests. NFC.

cdb0602

[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X)

86417bf

dtcxzyw force-pushed the perf/mul-zero-nnan-ninf branch from 95be94b to 86417bf Compare March 20, 2024 09:52

dtcxzyw merged commit 2bfa7d0 into llvm:main Mar 21, 2024
3 of 4 checks passed

dtcxzyw deleted the perf/mul-zero-nnan-ninf branch March 21, 2024 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` #85772

[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` #85772

dtcxzyw commented Mar 19, 2024 •

edited

llvmbot commented Mar 19, 2024

dtcxzyw Mar 19, 2024

[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X) #85772

[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X) #85772

Conversation

dtcxzyw commented Mar 19, 2024 • edited

llvmbot commented Mar 19, 2024

dtcxzyw Mar 19, 2024

Choose a reason for hiding this comment

[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` #85772

[InstCombine] Fold `fmul X, -0.0` into `copysign(0.0, -X)` #85772

dtcxzyw commented Mar 19, 2024 •

edited