Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

Merged
merged 3 commits into from
Sep 22, 2023

Conversation

MDevereau
Copy link
Contributor

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

rshrnb's shift amount operand must be between 1-EltSizeInBits.
This patch stops RSHRNB ISD nodes being emitted in this case
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 18, 2023

@llvm/pr-subscribers-backend-aarch64

Changes

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case


Full diff: https://github.com/llvm/llvm-project/pull/66672.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+3)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll (+46)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5cc001c44e7a24f..4ef97e682b7bf64 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -20241,6 +20241,9 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
     return SDValue();
   unsigned ShiftValue = SrlOp1->getZExtValue();
 
+  if (ShiftValue > ResVT.getScalarSizeInBits())
+    return SDValue();
+
   SDValue Add = Srl->getOperand(0);
   if (Add->getOpcode() != ISD::ADD || !Add->hasOneUse())
     return SDValue();
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
index f94daa45fb82a63..fe86a94e3035787 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
@@ -142,6 +142,52 @@ define void @wide_add_shift_add_rshrnb_h(ptr %dest, i64 %index, <vscale x 8 x i3
   ret void
 }
 
+define void @wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
+; CHECK-LABEL: wide_add_shift_add_rshrnb_d:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    rshrnb z1.s, z1.d, #32
+; CHECK-NEXT:    rshrnb z0.s, z0.d, #32
+; CHECK-NEXT:    uzp1 z0.s, z0.s, z1.s
+; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
+; CHECK-NEXT:    add z0.s, z1.s, z0.s
+; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT:    ret
+  %1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2147483648, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 32, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %3 = getelementptr inbounds i32, ptr %dest, i64 %index
+  %load = load <vscale x 4 x i32>, ptr %3, align 4
+  %4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
+  %5 = add <vscale x 4 x i32> %load, %4
+  store <vscale x 4 x i32> %5, ptr %3, align 4
+  ret void
+}
+
+; Do not emit rshrnb if the shift amount is larger than the dest eltsize in bits
+define void @neg_wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
+; CHECK-LABEL: neg_wide_add_shift_add_rshrnb_d:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov z2.d, #0x800000000000
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    add z0.d, z0.d, z2.d
+; CHECK-NEXT:    add z1.d, z1.d, z2.d
+; CHECK-NEXT:    lsr z1.d, z1.d, #48
+; CHECK-NEXT:    lsr z0.d, z0.d, #48
+; CHECK-NEXT:    uzp1 z0.s, z0.s, z1.s
+; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
+; CHECK-NEXT:    add z0.s, z1.s, z0.s
+; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT:    ret
+  %1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 140737488355328, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 48, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %3 = getelementptr inbounds i32, ptr %dest, i64 %index
+  %load = load <vscale x 4 x i32>, ptr %3, align 4
+  %4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
+  %5 = add <vscale x 4 x i32> %load, %4
+  store <vscale x 4 x i32> %5, ptr %3, align 4
+  ret void
+}
+
 define void @neg_trunc_lsr_add_op1_not_splat(ptr %ptr, ptr %dst, i64 %index, <vscale x 8 x i16> %add_op1){
 ; CHECK-LABEL: neg_trunc_lsr_add_op1_not_splat:
 ; CHECK:       // %bb.0:

@@ -20241,6 +20241,9 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
return SDValue();
unsigned ShiftValue = SrlOp1->getZExtValue();

if (ShiftValue > ResVT.getScalarSizeInBits())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unlikely to happen, but is it worth being paranoid here and also test for the zero shift case? The instruction RSHRNB only permits shifts between 1 and the dest element bit width.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the shift amount cannot be clamped to the maximum rather than not doing the transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-arm zero shifts get nuked out of existence really early on by other passes so I'm skeptical this isn't just slowing down the transform. That being said I'll add a test with a 0 shift just to assert it's being correctly removed. From what I can see as well, it isn't possible for SVE shifts to emit if the shift amount is 0. So even if this code bailed on a zero shift, it wouldn't be valid code anyway.

Copy link
Contributor Author

@MDevereau MDevereau Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulwalker-arm I was trying to think of a case but since the add is checked for single use and the top half of the elements get narrowed I think this it's safe to clamp it to the maximum value.

rshrnb's shift amount operand must be between 1-EltSizeInBits.
This patch stops RSHRNB ISD nodes being emitted in this case
@@ -20241,6 +20241,10 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
return SDValue();
unsigned ShiftValue = SrlOp1->getZExtValue();

uint64_t EltSize = ResVT.getScalarSizeInBits();
if (ShiftValue > EltSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @paulwalker-arm offline and we don't think this is right after all. I think we should just bail out of the optimisation if the ShiftValue exceeds the truncated element width. Also, it would be good to at least add an assert here that the shift value is zero. From what you're saying it should never happen, which makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why you both came to the conclusion that this isn't correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Matt this was my fault. My question arose from the typically behaviour of right shifts that are greater than the element bit length, whereby i32 >> #32+N == i32 >> #32.

Dave pointed out that in this case the element type in question is actually half sized because the combine is only expected to be called when a truncate is in play. That would imply i32 >> 16+N == i32 >> 16, which is clearly bogus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've now just gone with

  if (ShiftValue < 1 || ShiftValue > ResVT.getScalarSizeInBits())
    return SDValue();

rshrnb's shift amount operand must be between 1-EltSizeInBits.
This patch stops RSHRNB ISD nodes being emitted in this case
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @MDevereau.

@MDevereau MDevereau merged commit 23ea98f into llvm:main Sep 22, 2023
2 checks passed
@MDevereau MDevereau deleted the rshrnb branch September 22, 2023 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants