-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Lower mathlib call ldexp into scalef when avx512 is enabled #69710
base: main
Are you sure you want to change the base?
Conversation
Similar to #67552 |
; CHECK-AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0] | ||
; CHECK-AVX2-NEXT: addq $40, %rsp | ||
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8 | ||
; CHECK-AVX2-NEXT: retq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing avx512 checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supplement the check, thanks!
@@ -1705,6 +1705,8 @@ namespace llvm { | |||
const SmallVectorImpl<SDValue> &OutVals, | |||
const SDLoc &dl, SelectionDAG &DAG) const override; | |||
|
|||
SDValue LowerFLDEXP(SDValue Op, SelectionDAG &DAG) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to create a method - just make it static inside X86ISelLowering.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
case MVT::f64: | ||
XVT = MVT::v2f64; | ||
ExpVT = MVT::v2f64; | ||
IID = DAG.getConstant(Intrinsic::x86_avx512_mask_scalef_sd, DL, MVT::i64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd be much better off adding a X86ISD::SCALEF nodetype rather than individual intrinsic lowering cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @RKSimon, sorry for the delay in replying your comments, I have made some changes to the patch, please have a check, thanks very much!
7355b45
to
85f59ff
Compare
|
@@ -31979,6 +32024,8 @@ SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { | |||
case ISD::ADDRSPACECAST: return LowerADDRSPACECAST(Op, DAG); | |||
case X86ISD::CVTPS2PH: return LowerCVTPS2PH(Op, DAG); | |||
case ISD::PREFETCH: return LowerPREFETCH(Op, Subtarget, DAG); | |||
case ISD::FLDEXP: | |||
return LowerFLDEXP(Op, DAG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the single line style like (most of) the previous cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that coding style check discourages this change ...
DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, XVT, DAG.getUNDEF(XVT), X, Zero); | ||
SDValue VExp = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, ExpVT, | ||
DAG.getUNDEF(ExpVT), Exp, Zero); | ||
SDValue Scalef = DAG.getNode(X86ISD::SCALEFS, DL, XVT, VX, VExp, VX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to vectorize to use SCALEFS? I thought SCALEFS was for scalar types and SCALEF was for vector types? (So it should be possible to add vector support here as well).
default: | ||
return SDValue(); | ||
case MVT::f16: | ||
X = DAG.getNode(ISD::FP_EXTEND, DL, MVT::f32, X); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use vscalefph if we have AVX512FP16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, there may be risk of truncation, as the EXP operand of FLDEXP
types i32
, e.g., @llvm.ldexp.f16.i32(half, i32)
->@llvm.x86.avx512fp16.mask.scalef.sh(<8 x half>, <8 x half>, <8 x half>
. I didn't know how to handle the issue elegantly, so I made an extension here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! LangRef doesn't give an example f16 case. Should we define it as @llvm.ldexp.f16.i16(half, i16)
? i32
is a too large range to be useful for FP16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use value tracking to check the bounds of the EXP operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value tracking seems to make things very complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd settle for a TODO comment for now
85f59ff
to
c365199
Compare
You can test this locally with the following command:git-clang-format --diff e34c35a21ccc215ce507a1e19b4ff2a1ce9906f3 4ac55f1aecf19b4e2ce981ad757c79b0daf46e5e -- llvm/lib/Target/X86/X86ISelLowering.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 19510bbba0..7ac39cea21 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2406,11 +2406,11 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
}
if (Subtarget.hasAVX512()) {
- for (MVT VT : { MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64 })
+ for (MVT VT : {MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64})
setOperationAction(ISD::FLDEXP, VT, Custom);
if (Subtarget.hasVLX())
- for (MVT VT : { MVT::v8f32, MVT::v4f64, MVT::v16f32, MVT::v8f64 })
+ for (MVT VT : {MVT::v8f32, MVT::v4f64, MVT::v16f32, MVT::v8f64})
setOperationAction(ISD::FLDEXP, VT, Custom);
}
@@ -32039,7 +32039,8 @@ SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::ADDRSPACECAST: return LowerADDRSPACECAST(Op, DAG);
case X86ISD::CVTPS2PH: return LowerCVTPS2PH(Op, DAG);
case ISD::PREFETCH: return LowerPREFETCH(Op, Subtarget, DAG);
- case ISD::FLDEXP: return LowerFLDEXP(Op, Subtarget, DAG);
+ case ISD::FLDEXP:
+ return LowerFLDEXP(Op, Subtarget, DAG);
}
}
|
|
c365199
to
4ac55f1
Compare
|
@@ -2405,6 +2405,15 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, | |||
setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i128, Custom); | |||
} | |||
|
|||
if (Subtarget.hasAVX512()) { | |||
for (MVT VT : { MVT::f16, MVT::f32, MVT::f64, MVT::v4f32, MVT::v2f64 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is right? I'd expect MVT::v4f32, MVT::v2f64, MVT::v8f32, MVT::v4f64 to be the VLX cases.
; AVX512VL-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0 | ||
; AVX512VL-NEXT: retq | ||
entry: | ||
%r = tail call fast half @llvm.ldexp.f16.i32(half %x, i32 %exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop unnecessary fast flags?
@huhu233 reverse-ping |
No description provided.