[GlobalISel] replace right identity X * -1.0 with fneg(x) #80526

nickleus27 · 2024-02-03T04:27:35Z

follow up patch to #78673

@Pierre-vh @jayfoad @arsenm Could you review when you have a chance.

llvmbot · 2024-02-03T04:28:05Z

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Nick Anderson (nickleus27)

Changes

follow up patch to #78673

@Pierre-vh @jayfoad @arsenm Could you review when you have a chance.

Full diff: https://github.com/llvm/llvm-project/pull/80526.diff

7 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+2)
(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+9-1)
(modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+7)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fpneg-one-fneg.mir (+109)
(modified) llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll (+20-140)
(modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+8-12)
(modified) llvm/test/CodeGen/X86/popcnt.ll (+1)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 90428a622b417..b6d40573812ef 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -468,6 +468,8 @@ class CombinerHelper {
   /// equal to \p C.
   bool matchConstantFPOp(const MachineOperand &MOP, double C);
 
+  void replaceNegOneWithFNeg(MachineInstr &MI) const;
+
   /// @brief Checks if constant at \p ConstIdx is larger than \p MI 's bitwidth
   /// @param ConstIdx Index of the constant
   bool matchConstantLargerBitWidth(MachineInstr &MI, unsigned ConstIdx);
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 6bda80681432a..21063263455a9 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -493,6 +493,13 @@ def right_identity_one_fp: GICombineRule<
   (apply (GIReplaceReg $dst, $x))
 >;
 
+def right_identity_neg_one_fp: GICombineRule<
+  (defs root:$dst),
+  (match (G_FMUL $dst, $x, $y):$root,
+    [{ return Helper.matchConstantFPOp(${y}, -1.0); }]),
+  (apply [{ Helper.replaceNegOneWithFNeg(*${root}); }])
+>;
+
 def right_identity_one : GICombineGroup<[right_identity_one_int, right_identity_one_fp]>;
 
 // Fold (x op x) - > x
@@ -1263,7 +1270,8 @@ def identity_combines : GICombineGroup<[select_same_val, right_identity_zero,
                                         trunc_buildvector_fold,
                                         trunc_lshr_buildvector_fold,
                                         bitcast_bitcast_fold, fptrunc_fpext_fold,
-                                        right_identity_neg_zero_fp]>;
+                                        right_identity_neg_zero_fp,
+                                        right_identity_neg_one_fp]>;
 
 def const_combines : GICombineGroup<[constant_fold_fp_ops, const_ptradd_to_i2p,
                                      overlapping_and, mulo_by_2, mulo_by_0,
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 772229215e798..c725f9ce4b260 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -6651,3 +6651,10 @@ bool CombinerHelper::matchSelect(MachineInstr &MI, BuildFnTy &MatchInfo) {
 
   return false;
 }
+
+void CombinerHelper::replaceNegOneWithFNeg(MachineInstr &MI) const {
+  Register Dst = MI.getOperand(0).getReg();
+  Register Src = MI.getOperand(1).getReg();
+  Builder.buildFNeg(Dst, Src, MI.getFlags());
+  MI.eraseFromParent();
+}
\ No newline at end of file
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fpneg-one-fneg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fpneg-one-fneg.mir
new file mode 100644
index 0000000000000..45b63fcb58361
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fpneg-one-fneg.mir
@@ -0,0 +1,109 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK
+
+---
+name:            test_neg_one_f16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $sgpr0
+
+    ; CHECK-LABEL: name: test_neg_one_f16
+    ; CHECK: liveins: $sgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: %x:_(s16) = G_TRUNC [[COPY]](s32)
+    ; CHECK-NEXT: %d:_(s16) = G_FNEG %x
+    ; CHECK-NEXT: %ext:_(s32) = G_ANYEXT %d(s16)
+    ; CHECK-NEXT: $sgpr0 = COPY %ext(s32)
+    %0:_(s32) = COPY $sgpr0
+    %x:_(s16) = G_TRUNC %0:_(s32)
+    %y:_(s16) = G_FCONSTANT half -1.0
+    %d:_(s16) = G_FMUL %x, %y
+    %ext:_(s32) = G_ANYEXT %d:_(s16)
+    $sgpr0 = COPY %ext
+
+...
+
+---
+name:            test_neg_one_f32
+body:             |
+  bb.0:
+    liveins: $sgpr0
+
+    ; CHECK-LABEL: name: test_neg_one_f32
+    ; CHECK: liveins: $sgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[FNEG:%[0-9]+]]:_(s32) = G_FNEG [[COPY]]
+    ; CHECK-NEXT: $sgpr0 = COPY [[FNEG]](s32)
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = G_FCONSTANT float -1.0
+    %2:_(s32) = G_FMUL %0, %1
+    $sgpr0 = COPY %2(s32)
+
+...
+
+---
+name:            test_neg_one_f64
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $sgpr0
+
+    ; CHECK-LABEL: name: test_neg_one_f64
+    ; CHECK: liveins: $sgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: %x:_(s64) = G_ANYEXT [[COPY]](s32)
+    ; CHECK-NEXT: %d:_(s64) = G_FNEG %x
+    ; CHECK-NEXT: %ext:_(s32) = G_TRUNC %d(s64)
+    ; CHECK-NEXT: $sgpr0 = COPY %ext(s32)
+    %0:_(s32) = COPY $sgpr0
+    %x:_(s64) = G_ANYEXT %0:_(s32)
+    %y:_(s64) = G_FCONSTANT double -1.0
+    %d:_(s64) = G_FMUL %x, %y
+    %ext:_(s32) = G_TRUNC %d:_(s64)
+    $sgpr0 = COPY %ext
+
+...
+
+---
+name:            test_neg_ten_f32
+body:             |
+  bb.0:
+    liveins: $sgpr0
+
+    ; CHECK-LABEL: name: test_neg_ten_f32
+    ; CHECK: liveins: $sgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float -1.000000e+01
+    ; CHECK-NEXT: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[COPY]], [[C]]
+    ; CHECK-NEXT: $sgpr0 = COPY [[FMUL]](s32)
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = G_FCONSTANT float -10.0
+    %2:_(s32) = G_FMUL %0, %1
+    $sgpr0 = COPY %2(s32)
+
+...
+
+---
+name:            test_neg_fract_f32
+body:             |
+  bb.0:
+    liveins: $sgpr0
+
+    ; CHECK-LABEL: name: test_neg_fract_f32
+    ; CHECK: liveins: $sgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float -5.000000e-01
+    ; CHECK-NEXT: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[COPY]], [[C]]
+    ; CHECK-NEXT: $sgpr0 = COPY [[FMUL]](s32)
+    %0:_(s32) = COPY $sgpr0
+    %1:_(s32) = G_FCONSTANT float -0.5
+    %2:_(s32) = G_FMUL %0, %1
+    $sgpr0 = COPY %2(s32)
+
+...
diff --git a/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll b/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
index 3e658c6f38532..711a5fff1a063 100644
--- a/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
@@ -218,41 +218,11 @@ define float @v_mul_neg2_f32(float %x) {
 }
 
 define float @v_mul_neg1_f32(float %x) {
-; GFX9-SDAG-LABEL: v_mul_neg1_f32:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: v_mul_neg1_f32:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_mul_f32_e32 v0, -1.0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-SDAG-LABEL: v_mul_neg1_f32:
-; GFX10-SDAG:       ; %bb.0:
-; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-SDAG-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
-; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-GISEL-LABEL: v_mul_neg1_f32:
-; GFX10-GISEL:       ; %bb.0:
-; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f32_e32 v0, -1.0, v0
-; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-SDAG-LABEL: v_mul_neg1_f32:
-; GFX11-SDAG:       ; %bb.0:
-; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
-; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-GISEL-LABEL: v_mul_neg1_f32:
-; GFX11-GISEL:       ; %bb.0:
-; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f32_e32 v0, -1.0, v0
-; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: v_mul_neg1_f32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %mul = fmul float %x, -1.0
   ret float %mul
 }
@@ -1356,41 +1326,11 @@ define double @v_mul_0_f64(double %x) {
 }
 
 define double @v_mul_neg1_f64(double %x) {
-; GFX9-SDAG-LABEL: v_mul_neg1_f64:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: v_mul_neg1_f64:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], -1.0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-SDAG-LABEL: v_mul_neg1_f64:
-; GFX10-SDAG:       ; %bb.0:
-; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-SDAG-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
-; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-GISEL-LABEL: v_mul_neg1_f64:
-; GFX10-GISEL:       ; %bb.0:
-; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], -1.0
-; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-SDAG-LABEL: v_mul_neg1_f64:
-; GFX11-SDAG:       ; %bb.0:
-; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
-; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-GISEL-LABEL: v_mul_neg1_f64:
-; GFX11-GISEL:       ; %bb.0:
-; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], -1.0
-; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: v_mul_neg1_f64:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %mul = fmul double %x, -1.0
   ret double %mul
 }
@@ -2848,41 +2788,11 @@ define half @v_mul_neg2_f16(half %x) {
 }
 
 define half @v_mul_neg1_f16(half %x) {
-; GFX9-SDAG-LABEL: v_mul_neg1_f16:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_xor_b32_e32 v0, 0x8000, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: v_mul_neg1_f16:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_mul_f16_e32 v0, -1.0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-SDAG-LABEL: v_mul_neg1_f16:
-; GFX10-SDAG:       ; %bb.0:
-; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-SDAG-NEXT:    v_xor_b32_e32 v0, 0x8000, v0
-; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-GISEL-LABEL: v_mul_neg1_f16:
-; GFX10-GISEL:       ; %bb.0:
-; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f16_e32 v0, -1.0, v0
-; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-SDAG-LABEL: v_mul_neg1_f16:
-; GFX11-SDAG:       ; %bb.0:
-; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_xor_b32_e32 v0, 0x8000, v0
-; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-GISEL-LABEL: v_mul_neg1_f16:
-; GFX11-GISEL:       ; %bb.0:
-; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f16_e32 v0, -1.0, v0
-; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: v_mul_neg1_f16:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_xor_b32_e32 v0, 0x8000, v0
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %mul = fmul half %x, -1.0
   ret half %mul
 }
@@ -7112,41 +7022,11 @@ define double @v_mul_fabs_neg2_f64(double %x) {
 }
 
 define double @v_mul_fabs_neg1_f64(double %x) {
-; GFX9-SDAG-LABEL: v_mul_fabs_neg1_f64:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_or_b32_e32 v1, 0x80000000, v1
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: v_mul_fabs_neg1_f64:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_mul_f64 v[0:1], |v[0:1]|, -1.0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-SDAG-LABEL: v_mul_fabs_neg1_f64:
-; GFX10-SDAG:       ; %bb.0:
-; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-SDAG-NEXT:    v_or_b32_e32 v1, 0x80000000, v1
-; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-GISEL-LABEL: v_mul_fabs_neg1_f64:
-; GFX10-GISEL:       ; %bb.0:
-; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], |v[0:1]|, -1.0
-; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-SDAG-LABEL: v_mul_fabs_neg1_f64:
-; GFX11-SDAG:       ; %bb.0:
-; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_or_b32_e32 v1, 0x80000000, v1
-; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-GISEL-LABEL: v_mul_fabs_neg1_f64:
-; GFX11-GISEL:       ; %bb.0:
-; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], |v[0:1]|, -1.0
-; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GCN-LABEL: v_mul_fabs_neg1_f64:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_or_b32_e32 v1, 0x80000000, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
   %fabs.x = call double @llvm.fabs.f64(double %x)
   %mul = fmul double %fabs.x, -1.0
   ret double %mul
diff --git a/llvm/test/CodeGen/AMDGPU/rsq.f64.ll b/llvm/test/CodeGen/AMDGPU/rsq.f64.ll
index 90175298a99ac..bd6e1f54e636d 100644
--- a/llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+++ b/llvm/test/CodeGen/AMDGPU/rsq.f64.ll
@@ -3431,9 +3431,8 @@ define double @v_neg_rsq_f64__afn(double %x) {
 ; SI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
 ; SI-GISEL-NEXT:    v_fma_f64 v[4:5], -v[0:1], v[2:3], 1.0
 ; SI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
-; SI-GISEL-NEXT:    v_mul_f64 v[4:5], v[2:3], -1.0
-; SI-GISEL-NEXT:    v_fma_f64 v[0:1], -v[0:1], v[4:5], -1.0
-; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -1.0
+; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -v[2:3]
 ; SI-GISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; VI-SDAG-LABEL: v_neg_rsq_f64__afn:
@@ -3503,9 +3502,8 @@ define double @v_neg_rsq_f64__afn(double %x) {
 ; VI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
 ; VI-GISEL-NEXT:    v_fma_f64 v[4:5], -v[0:1], v[2:3], 1.0
 ; VI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
-; VI-GISEL-NEXT:    v_mul_f64 v[4:5], v[2:3], -1.0
-; VI-GISEL-NEXT:    v_fma_f64 v[0:1], -v[0:1], v[4:5], -1.0
-; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -1.0
+; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -v[2:3]
 ; VI-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %sqrt = call contract afn double @llvm.sqrt.f64(double %x)
   %rsq = fdiv contract afn double -1.0, %sqrt
@@ -4015,9 +4013,8 @@ define double @v_neg_rsq_f64__afn_nnan_ninf(double %x) {
 ; SI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
 ; SI-GISEL-NEXT:    v_fma_f64 v[4:5], -v[0:1], v[2:3], 1.0
 ; SI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
-; SI-GISEL-NEXT:    v_mul_f64 v[4:5], v[2:3], -1.0
-; SI-GISEL-NEXT:    v_fma_f64 v[0:1], -v[0:1], v[4:5], -1.0
-; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -1.0
+; SI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -v[2:3]
 ; SI-GISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; VI-SDAG-LABEL: v_neg_rsq_f64__afn_nnan_ninf:
@@ -4087,9 +4084,8 @@ define double @v_neg_rsq_f64__afn_nnan_ninf(double %x) {
 ; VI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
 ; VI-GISEL-NEXT:    v_fma_f64 v[4:5], -v[0:1], v[2:3], 1.0
 ; VI-GISEL-NEXT:    v_fma_f64 v[2:3], v[4:5], v[2:3], v[2:3]
-; VI-GISEL-NEXT:    v_mul_f64 v[4:5], v[2:3], -1.0
-; VI-GISEL-NEXT:    v_fma_f64 v[0:1], -v[0:1], v[4:5], -1.0
-; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -1.0
+; VI-GISEL-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], -v[2:3]
 ; VI-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %sqrt = call contract afn nnan ninf double @llvm.sqrt.f64(double %x)
   %rsq = fdiv contract afn nnan ninf double -1.0, %sqrt
diff --git a/llvm/test/CodeGen/X86/popcnt.ll b/llvm/test/CodeGen/X86/popcnt.ll
index 37c7b051de7b1..77167df9834ef 100644
--- a/llvm/test/CodeGen/X86/popcnt.ll
+++ b/llvm/test/CodeGen/X86/popcnt.ll
@@ -43,6 +43,7 @@ define i8 @cnt8(i8 %x) nounwind readnone {
 ; X64-POPCNT-NEXT:    popcntl %eax, %eax
 ; X64-POPCNT-NEXT:    # kill: def $al killed $al killed $eax
 ; X64-POPCNT-NEXT:    retq
+
   %cnt = tail call i8 @llvm.ctpop.i8(i8 %x)
   ret i8 %cnt
 }

llvm/test/CodeGen/X86/popcnt.ll

jayfoad

I don't think this is safe since fmul is an arithmetic operation that may quieten nans and flush denormals, whereas fneg is defined to just flip the sign bit.

See https://reviews.llvm.org/D109446 for the DAG equivalent.

arsenm · 2024-02-03T15:45:23Z

I don't think this is safe since fmul is an arithmetic operation that may quieten nans and flush denormals, whereas fneg is defined to just flip the sign bit.

See https://reviews.llvm.org/D109446 for the DAG equivalent.

This is OK, we are allowed to drop canonicalizes for non-strictfp functions. The DAG is too strict and I've been meaning to remove those restrictions

tschuett · 2024-02-04T21:00:03Z

I believe that we should move to a visitOpcode model like the Dag combiner eventually. If this combine is registered before, constant_fold_fp_binop, then we get a fneg. Otherwise, we may have gotten a constant for fmul 1000,0 * -1.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

llvm/include/llvm/Target/GlobalISel/Combine.td

nickleus27 · 2024-02-06T20:51:16Z

This is OK, we are allowed to drop canonicalizes for non-strictfp functions. The DAG is too strict and I've been meaning to remove those restrictions

Does this mean the matcher should be checking for FMafn flags or check that the function has UnsafeFPMath?

arsenm · 2024-02-08T10:25:57Z

This is OK, we are allowed to drop canonicalizes for non-strictfp functions. The DAG is too strict and I've been meaning to remove those restrictions

Does this mean the matcher should be checking for FMafn flags or check that the function has UnsafeFPMath?

Neither. It would be a question of checking the parent function's denormal mode (could maybe consider unsafe flags, but none are an exact match). I think we need to loosen the DAG code, and this is fine (for non-strictfp functions, in which case this would be G_STRICT_FMUL so this wouldn't fire anyway)

arsenm

InstCombine already does this, and the DAG should resume doing this

arsenm · 2024-02-08T10:24:49Z

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fpneg-one-fneg.mir

+    %2:_(s32) = G_FMUL %0, %1
+    $sgpr0 = COPY %2(s32)
+
+...


Vector case wouldn't hurt

Pierre-vh · 2024-02-08T14:22:04Z

I believe that we should move to a visitOpcode model like the Dag combiner eventually. If this combine is registered before, constant_fold_fp_binop, then we get a fneg. Otherwise, we may have gotten a constant for fmul 1000,0 * -1.

No, a visitOpcode model would be a regression compared to what we have now, IMHO.

I agree we need a way to tell the combiner which combine goes first. Currently I think it's decided by the order in which the combines appear in the list, but I don't think it's actually enforced much (well, a bunch of tests check the match table as a whole so if its order changes, those tests will fail).

We can achieve ordering differently:

Use a system like AddedComplexity, maybe with a bit more syntactic sugar, to tell which combine goes first.
Allow a combine to explicitly mention some dependencies so it can't run unless something else has been checked first (probably overkill)
etc.

Is it a concern/issue currently? Do we have bugs due to combine ordering?
If so, please create an issue. I don't mind looking into it and coming up with more detail proposals

tschuett · 2024-02-08T15:56:12Z

A miscompile is caused by ordering issues:
#78383
Nonetheless, I would expect UNDEF and catching misbeviour be executed/registered before optimizations.

jayfoad · 2024-02-09T09:56:22Z

A miscompile is caused by ordering issues: #78383 Nonetheless, I would expect UNDEF and catching misbeviour be executed/registered before optimizations.

I think it's wrong to say that ordering issues cause that problem. One of the combines is broken - that's what causes the problem. Ordering issues just happen to hide the problem in some cases.

tschuett · 2024-02-09T10:03:19Z

Agreed that combine is broken, but detecting misbehaviour is executed/registered after the buggy optimisation.

jayfoad · 2024-02-21T09:28:45Z

No objection from me given @arsenm's comment about dropping canonicalizations. I thought this was already approved.

nickleus27 · 2024-02-21T09:35:46Z

If everyone approves could someone go ahead and commit for me. I do not have write access. Thank you.

replace right identity X * -1.0 with fneg(x)

f9251a1

llvmbot added backend:AMDGPU backend:X86 llvm:globalisel labels Feb 3, 2024

phoebewang reviewed Feb 3, 2024

View reviewed changes

llvm/test/CodeGen/X86/popcnt.ll Outdated Show resolved Hide resolved

jayfoad requested changes Feb 3, 2024

View reviewed changes

fixup! replace right identity X * -1.0 with fneg(x)

62cd963

arsenm reviewed Feb 5, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

llvm/include/llvm/Target/GlobalISel/Combine.td Show resolved Hide resolved

fixup! replace right identity X * -1.0 with fneg(x)

24082d3

arsenm approved these changes Feb 8, 2024

View reviewed changes

fixup! replace right identity X * -1.0 with fneg(x)

3d7b05f

nickleus27 requested review from arsenm and jayfoad February 21, 2024 08:48

jayfoad approved these changes Feb 21, 2024

View reviewed changes

jayfoad merged commit 5db49f7 into llvm:main Feb 21, 2024
4 checks passed

nickleus27 deleted the r_id_neg_one_fp branch February 21, 2024 17:47

nickleus27 mentioned this pull request Feb 21, 2024

GlobalISel needs fdiv 1 / sqrt(x) to rsq combine #78673

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GlobalISel] replace right identity X * -1.0 with fneg(x) #80526

[GlobalISel] replace right identity X * -1.0 with fneg(x) #80526

nickleus27 commented Feb 3, 2024

llvmbot commented Feb 3, 2024 •

edited

jayfoad left a comment

arsenm commented Feb 3, 2024

tschuett commented Feb 4, 2024

nickleus27 commented Feb 6, 2024

arsenm commented Feb 8, 2024

arsenm left a comment

arsenm Feb 8, 2024

Pierre-vh commented Feb 8, 2024

tschuett commented Feb 8, 2024

jayfoad commented Feb 9, 2024

tschuett commented Feb 9, 2024

jayfoad commented Feb 21, 2024

nickleus27 commented Feb 21, 2024

[GlobalISel] replace right identity X * -1.0 with fneg(x) #80526

[GlobalISel] replace right identity X * -1.0 with fneg(x) #80526

Conversation

nickleus27 commented Feb 3, 2024

llvmbot commented Feb 3, 2024 • edited

jayfoad left a comment

Choose a reason for hiding this comment

arsenm commented Feb 3, 2024

tschuett commented Feb 4, 2024

nickleus27 commented Feb 6, 2024

arsenm commented Feb 8, 2024

arsenm left a comment

Choose a reason for hiding this comment

arsenm Feb 8, 2024

Choose a reason for hiding this comment

Pierre-vh commented Feb 8, 2024

tschuett commented Feb 8, 2024

jayfoad commented Feb 9, 2024

tschuett commented Feb 9, 2024

jayfoad commented Feb 21, 2024

nickleus27 commented Feb 21, 2024

llvmbot commented Feb 3, 2024 •

edited