Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Lower mathlib call ldexp into scalef when avx512 is enabled #69710

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huhu233
Copy link
Contributor

@huhu233 huhu233 commented Oct 20, 2023

No description provided.

@huhu233
Copy link
Contributor Author

huhu233 commented Oct 20, 2023

Similar to #67552

; CHECK-AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
; CHECK-AVX2-NEXT: addq $40, %rsp
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8
; CHECK-AVX2-NEXT: retq
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing avx512 checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplement the check, thanks!

@@ -1705,6 +1705,8 @@ namespace llvm {
const SmallVectorImpl<SDValue> &OutVals,
const SDLoc &dl, SelectionDAG &DAG) const override;

SDValue LowerFLDEXP(SDValue Op, SelectionDAG &DAG) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create a method - just make it static inside X86ISelLowering.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

case MVT::f64:
XVT = MVT::v2f64;
ExpVT = MVT::v2f64;
IID = DAG.getConstant(Intrinsic::x86_avx512_mask_scalef_sd, DL, MVT::i64);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be much better off adding a X86ISD::SCALEF nodetype rather than individual intrinsic lowering cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @RKSimon, sorry for the delay in replying your comments, I have made some changes to the patch, please have a check, thanks very much!

@huhu233
Copy link
Contributor Author

huhu233 commented Dec 13, 2023

  • Rebase the branch
  • Use X86ISD::SCALEFS instead of specific intrinsics
  • Transfrom the function into static version
  • update the test file

@huhu233 huhu233 requested a review from RKSimon December 13, 2023 12:14
@@ -31979,6 +32024,8 @@ SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::ADDRSPACECAST: return LowerADDRSPACECAST(Op, DAG);
case X86ISD::CVTPS2PH: return LowerCVTPS2PH(Op, DAG);
case ISD::PREFETCH: return LowerPREFETCH(Op, Subtarget, DAG);
case ISD::FLDEXP:
return LowerFLDEXP(Op, DAG);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the single line style like (most of) the previous cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that coding style check discourages this change ...

DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, XVT, DAG.getUNDEF(XVT), X, Zero);
SDValue VExp = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, ExpVT,
DAG.getUNDEF(ExpVT), Exp, Zero);
SDValue Scalef = DAG.getNode(X86ISD::SCALEFS, DL, XVT, VX, VExp, VX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to vectorize to use SCALEFS? I thought SCALEFS was for scalar types and SCALEF was for vector types? (So it should be possible to add vector support here as well).

default:
return SDValue();
case MVT::f16:
X = DAG.getNode(ISD::FP_EXTEND, DL, MVT::f32, X);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use vscalefph if we have AVX512FP16?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@huhu233 huhu233 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, there may be risk of truncation, as the EXP operand of FLDEXP types i32, e.g., @llvm.ldexp.f16.i32(half, i32) ->@llvm.x86.avx512fp16.mask.scalef.sh(<8 x half>, <8 x half>, <8 x half>. I didn't know how to handle the issue elegantly, so I made an extension here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! LangRef doesn't give an example f16 case. Should we define it as @llvm.ldexp.f16.i16(half, i16)? i32 is a too large range to be useful for FP16.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use value tracking to check the bounds of the EXP operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value tracking seems to make things very complicated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd settle for a TODO comment for now

Copy link

github-actions bot commented Dec 14, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff e34c35a21ccc215ce507a1e19b4ff2a1ce9906f3 4ac55f1aecf19b4e2ce981ad757c79b0daf46e5e -- llvm/lib/Target/X86/X86ISelLowering.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 19510bbba0..7ac39cea21 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2406,11 +2406,11 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   }
 
   if (Subtarget.hasAVX512()) {
-    for (MVT VT : { MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64 })
+    for (MVT VT : {MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64})
       setOperationAction(ISD::FLDEXP, VT, Custom);
 
     if (Subtarget.hasVLX())
-      for (MVT VT : { MVT::v8f32, MVT::v4f64, MVT::v16f32, MVT::v8f64 })
+      for (MVT VT : {MVT::v8f32, MVT::v4f64, MVT::v16f32, MVT::v8f64})
         setOperationAction(ISD::FLDEXP, VT, Custom);
   }
 
@@ -32039,7 +32039,8 @@ SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
   case ISD::ADDRSPACECAST:      return LowerADDRSPACECAST(Op, DAG);
   case X86ISD::CVTPS2PH:        return LowerCVTPS2PH(Op, DAG);
   case ISD::PREFETCH:           return LowerPREFETCH(Op, Subtarget, DAG);
-  case ISD::FLDEXP:             return LowerFLDEXP(Op, Subtarget, DAG);
+  case ISD::FLDEXP:
+    return LowerFLDEXP(Op, Subtarget, DAG);
   }
 }
 

@huhu233
Copy link
Contributor Author

huhu233 commented Dec 14, 2023

  • Fix coding style issues
  • Support vector versions of ldexp, e.g., @llvm.ldexp.v8f32.v8i32 -- +avx512vl , @llvm.ldexp.v4f32.v4i32 -- +avx512f or +avx512vl, etc.

@huhu233
Copy link
Contributor Author

huhu233 commented Dec 15, 2023

  • add TODO for avx512fp16

@huhu233 huhu233 requested a review from RKSimon December 27, 2023 06:25
@@ -2405,6 +2405,15 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i128, Custom);
}

if (Subtarget.hasAVX512()) {
for (MVT VT : { MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64 })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is right? I'd expect MVT::v4f32, MVT::v2f64, MVT::v8f32, MVT::v4f64 to be the VLX cases.

; AVX512VL-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; AVX512VL-NEXT: retq
entry:
%r = tail call fast half @llvm.ldexp.f16.i32(half %x, i32 %exp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop unnecessary fast flags?

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 3, 2024

@huhu233 reverse-ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants