-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Remove copy instruction between uaddlv with v8i16 and dup #66068
Conversation
@llvm/pr-subscribers-backend-aarch64 ChangesIf there are copy instructions between uaddlv with v8i16 and dup for transfer from gpr to fpr, try to remove them with duplane.
|
@@ -5329,7 +5329,8 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, | |||
case Intrinsic::aarch64_neon_uaddlv: { | |||
EVT OpVT = Op.getOperand(1).getValueType(); | |||
EVT ResVT = Op.getValueType(); | |||
if (ResVT == MVT::i32 && (OpVT == MVT::v8i8 || OpVT == MVT::v16i8)) { | |||
if (ResVT == MVT::i32 && | |||
(OpVT == MVT::v8i8 || OpVT == MVT::v16i8 || OpVT == MVT::v8i16)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this add v4i16 too?
@@ -6077,6 +6077,8 @@ defm : DUPWithTruncPats<v16i8, v4i16, v8i16, i32, DUPv16i8lane, VecIndex_x2>; | |||
defm : DUPWithTruncPats<v16i8, v2i32, v4i32, i32, DUPv16i8lane, VecIndex_x4>; | |||
defm : DUPWithTruncPats<v8i16, v2i32, v4i32, i32, DUPv8i16lane, VecIndex_x2>; | |||
|
|||
defm : DUPWithTruncPats<v4i32, v2i32, v4i32, i32, DUPv8i16lane, VecIndex_x2>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't really an trunc going on here, If I'm understanding what is going on. Can we add a DAG combine for
t38: i32 = extract_vector_elt t36, Constant:i64<0>
t35: v4i32 = AArch64ISD::DUP t38
We should be able to turn that into a AArch64ISD::DUPLANE, and it should be generally useful to do so I believe.
; CHECK-LABEL: uaddlv_dup_v8i16: | ||
; CHECK: // %bb.0: // %entry | ||
; CHECK-NEXT: uaddlv s0, v0.8h | ||
; CHECK-NEXT: dup v1.8h, v0.h[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the .8h might be incorrect here?
%vaddlv.i = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v8i16(<8 x i16> %a) | ||
%vecinit.i = insertelement <8 x i32> undef, i32 %vaddlv.i, i64 0 | ||
%vecinit7.i = shufflevector <8 x i32> %vecinit.i, <8 x i32> poison, <8 x i32> zeroinitializer | ||
%vrshrn_n2 = tail call <8 x i16> @llvm.aarch64.neon.rshrn.v8i16(<8 x i32> %vecinit7.i, i32 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a valid neon intrinsic - they need to legal vector sizes for the inputs and outputs. I think it works in this case because it gets expanded to shifts and whatnot. Is there another instruction that could be used in it's place for the test? Maybe just a simple shift?
@davemgreen sorry... It looks I made a mistake... I did not get notification for this pull request... |
If there are copy instructions between uaddlv with v8i16 and dup for transfer from gpr to fpr, try to remove them with duplane.
It is a follow-up patch of https://reviews.llvm.org/D159267