Enable Custom Lowering for fabs.v8f16 on AVX #71730

david-xl · 2023-11-08T20:08:26Z

[X86]: Enable custom lowering for fabs.v8f16 on AVX

Currently, custom lowering of fabs.v8f16 requires AVX512FP16, which is too restrictive. For v8f16 fabs lowering, no instructions in AVX512FP16 are needed. Without the fix, horribly inefficient code is generated without AVX512FP16. Note instcombiner generates calls to intrinsics @llvm.fabs.v8f16 when simplifyping AND <8 x half> operations.

llvmbot · 2023-11-08T20:08:58Z

@llvm/pr-subscribers-backend-x86

Author: David Li (david-xl)

Changes

[X86]: Enable custom lowering for fabs.v8f16 on AVX

Currently, custom lowering of fabs.v8f16 requires AVX512FP16, which is too restrictive. For v8f16 fabs lowering, no instructions in AVX512FP16 are needed. Without the fix, horribly inefficient code is generated without AVX512FP16. Note instcombiner generates calls to intrinsics @llvm.fabs.v8f16 when simplifyping AND <8 x half> operations.

Full diff: https://github.com/llvm/llvm-project/pull/71730.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3)
(modified) llvm/test/CodeGen/X86/vec_fabs.ll (+41)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 22fba5601ccfd38..b3b5a0c1b68ec82 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2238,6 +2238,9 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
     }
   }
 
+  if (!Subtarget.useSoftFloat() && Subtarget.hasAVX())
+    setOperationAction(ISD::FABS, MVT::v8f16, Custom);
+
   if (!Subtarget.useSoftFloat() &&
       (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) {
     addRegisterClass(MVT::v8bf16, Subtarget.hasAVX512() ? &X86::VR128XRegClass
diff --git a/llvm/test/CodeGen/X86/vec_fabs.ll b/llvm/test/CodeGen/X86/vec_fabs.ll
index 982062d8907542a..08364449ab1a378 100644
--- a/llvm/test/CodeGen/X86/vec_fabs.ll
+++ b/llvm/test/CodeGen/X86/vec_fabs.ll
@@ -1,8 +1,10 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx | FileCheck %s --check-prefix=X86 --check-prefix=X86-AVX
+; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefix=X86 --check-prefix=X86-AVX2
 ; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefix=X86 --check-prefix=X86-AVX512VL
 ; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefix=X86 --check-prefix=X86-AVX512VLDQ
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX2
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512VL
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512VLDQ
 
@@ -111,6 +113,45 @@ define <4 x double> @fabs_v4f64(<4 x double> %p) {
 }
 declare <4 x double> @llvm.fabs.v4f64(<4 x double> %p)
 
+define <8 x half> @fabs_v8f16(ptr %p) {
+; X86-AVX-LABEL: fabs_v8f16:
+; X86-AVX:       # %bb.0:
+; X86-AVX-NEXT:    movl 4(%esp), [[ADDRREG:%.*]]
+; X86-AVX-NEXT:    vmovaps ([[ADDRREG]]), %xmm0
+; X86-AVX-NEXT:    vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; X86-AVX-NEXT:    retl
+
+; X86-AVX2-LABEL: fabs_v8f16:
+; X86-AVX2:       # %bb.0:
+; X86-AVX2-NEXT:    movl 4(%esp), [[REG:%.*]]
+; X86-AVX2-NEXT:    vpbroadcastw {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-AVX2-NEXT:    vpand ([[REG]]), %xmm0, %xmm0
+; X86-AVX2-NEXT:    retl
+
+; X64-AVX512VL-LABEL: fabs_v8f16:
+; X64-AVX512VL:       # %bb.0:
+; X64-AVX512VL-NEXT:    vpbroadcastw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-AVX512VL-NEXT:    vpand (%rdi), %xmm0, %xmm0
+; X64-AVX512VL-NEXT:    retq
+
+; X64-AVX-LABEL: fabs_v8f16:
+; X64-AVX:       # %bb.0:
+; X64-AVX-NEXT:    vmovaps (%rdi), %xmm0
+; X64-AVX-NEXT:    vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; X64-AVX-NEXT:    retq
+
+; X64-AVX2-LABEL: fabs_v8f16:
+; X64-AVX2:       # %bb.0:
+; X64-AVX2-NEXT:    vpbroadcastw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-AVX2-NEXT:    vpand (%rdi), %xmm0, %xmm0
+; X64-AVX2-NEXT:    retq
+
+  %v = load <8 x half>, ptr %p, align 16
+  %nnv = call <8 x half> @llvm.fabs.v8f16(<8 x half> %v)
+  ret <8 x half> %nnv
+}
+declare <8 x half> @llvm.fabs.v8f16(<8 x half> %p)
+
 define <8 x float> @fabs_v8f32(<8 x float> %p) {
 ; X86-AVX-LABEL: fabs_v8f32:
 ; X86-AVX:       # %bb.0:

david-xl · 2023-11-12T00:23:38Z

This fix improves Eigen pcos performance by ~65% on Haswell.

RKSimon

@david-xl - please can you rebase this?

RKSimon · 2023-11-13T19:59:41Z

This looks like the merge has gone wrong?

david-xl · 2023-11-13T20:31:09Z

This looks like the merge has gone wrong?

Right -- I messed it up (new to the workflow). Will fix.

david-xl · 2023-11-14T00:20:25Z

This looks like the merge has gone wrong?

Fixed the bad merge. PTAL.

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/vec_fabs.ll

github-actions · 2023-11-15T20:08:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

david-xl · 2023-11-15T20:09:14Z

Addressed review comments. PTAL.

RKSimon

LGTM with one minor

RKSimon · 2023-11-16T09:51:37Z

llvm/test/CodeGen/X86/vec_fabs.ll

+  %v = load <16 x half>, ptr %p, align 32
+  %nnv = call <16 x half> @llvm.fabs.v16f16(<16 x half> %v)
+  ret <16 x half> %nnv
+}


Just let the script generate the codegen, even if its really poor for pre-AVX512FP16

Ok. I will have a followup patch to update the test.

thanks. Looks like you have already updated and cleaned up the test.

Yup, we're still doing a really bad job with v16f16/v32f16 on AVX targets - please can you take a look?

Will take a look.

modified: llvm/lib/Target/X86/X86ISelLowering.cpp modified: llvm/test/CodeGen/X86/vec_fabs.ll

modified: llvm/test/CodeGen/X86/vec_fabs.ll

modified: llvm/test/CodeGen/X86/vec_fabs.ll modified: llvm/test/CodeGen/X86/vec_fabs.ll modified: llvm/test/CodeGen/X86/vec_fabs.ll

1. Move the code to the common place 2. Add test coverage for v16f16 and v32f16 FABS lowering modified: llvm/lib/Target/X86/X86ISelLowering.cpp modified: llvm/test/CodeGen/X86/vec_fabs.ll modified: llvm/lib/Target/X86/X86ISelLowering.cpp modified: llvm/test/CodeGen/X86/vec_fabs.ll

[X86]: Enable custom lowering for fabs.v8f16 on AVX Currently, custom lowering of fabs.v8f16 requires AVX512FP16, which is too restrictive. For v8f16 fabs lowering, no instructions in AVX512FP16 are needed. Without the fix, horribly inefficient code is generated without AVX512FP16. Note instcombiner generates calls to intrinsics @llvm.fabs.v8f16 when simplifyping AND <8 x half> operations.

david-xl requested a review from RKSimon November 8, 2023 20:08

llvmbot added the backend:X86 label Nov 8, 2023

david-xl requested a review from topperc November 9, 2023 17:53

RKSimon reviewed Nov 13, 2023

View reviewed changes

david-xl requested review from JDevlieghere, nikic and a team as code owners November 13, 2023 19:28

nikic removed request for a team and nikic November 13, 2023 20:35

david-xl force-pushed the main branch 5 times, most recently from 785b6a3 to 04071fb Compare November 14, 2023 00:12

RKSimon reviewed Nov 14, 2023

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

RKSimon reviewed Nov 14, 2023

View reviewed changes

llvm/test/CodeGen/X86/vec_fabs.ll Show resolved Hide resolved

david-xl force-pushed the main branch from ba1f953 to 8aa2bcc Compare November 15, 2023 20:13

RKSimon approved these changes Nov 16, 2023

View reviewed changes

david-xl added 5 commits November 16, 2023 11:30

Enable Custom Lowering for fabs.v8f16 on AVX

8fc364a

modified: llvm/lib/Target/X86/X86ISelLowering.cpp modified: llvm/test/CodeGen/X86/vec_fabs.ll

Check softfloat setting for fabs.v8f16 custom lowering

adfb5b4

Enable Custom Lowering for fabs.v8f16 on AVX

7898ac7

Enable Custom Lowering for fabs.v8f16 on AVX

c5c6670

modified: llvm/test/CodeGen/X86/vec_fabs.ll

Enable Custom Lowering for fabs.v8f16 on AVX

205507f

modified: llvm/test/CodeGen/X86/vec_fabs.ll modified: llvm/test/CodeGen/X86/vec_fabs.ll modified: llvm/test/CodeGen/X86/vec_fabs.ll

david-xl force-pushed the main branch from 8aa2bcc to 3ae532b Compare November 16, 2023 20:59

david-xl merged commit ac3779e into llvm:main Nov 16, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Custom Lowering for fabs.v8f16 on AVX #71730

Enable Custom Lowering for fabs.v8f16 on AVX #71730

david-xl commented Nov 8, 2023

llvmbot commented Nov 8, 2023

david-xl commented Nov 12, 2023

RKSimon left a comment

RKSimon commented Nov 13, 2023

david-xl commented Nov 13, 2023

david-xl commented Nov 14, 2023

github-actions bot commented Nov 15, 2023 •

edited

david-xl commented Nov 15, 2023

RKSimon left a comment

RKSimon Nov 16, 2023

david-xl Nov 16, 2023

david-xl Nov 18, 2023

RKSimon Nov 18, 2023

david-xl Nov 19, 2023

Enable Custom Lowering for fabs.v8f16 on AVX #71730

Enable Custom Lowering for fabs.v8f16 on AVX #71730

Conversation

david-xl commented Nov 8, 2023

llvmbot commented Nov 8, 2023

david-xl commented Nov 12, 2023

RKSimon left a comment

Choose a reason for hiding this comment

RKSimon commented Nov 13, 2023

david-xl commented Nov 13, 2023

david-xl commented Nov 14, 2023

github-actions bot commented Nov 15, 2023 • edited

david-xl commented Nov 15, 2023

RKSimon left a comment

Choose a reason for hiding this comment

RKSimon Nov 16, 2023

Choose a reason for hiding this comment

david-xl Nov 16, 2023

Choose a reason for hiding this comment

david-xl Nov 18, 2023

Choose a reason for hiding this comment

RKSimon Nov 18, 2023

Choose a reason for hiding this comment

david-xl Nov 19, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 15, 2023 •

edited