[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

MDevereau · 2023-09-18T17:11:22Z

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

llvmbot · 2023-09-18T17:12:31Z

@llvm/pr-subscribers-backend-aarch64

Changes

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

Full diff: https://github.com/llvm/llvm-project/pull/66672.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+3)
(modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll (+46)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5cc001c44e7a24f..4ef97e682b7bf64 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -20241,6 +20241,9 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
     return SDValue();
   unsigned ShiftValue = SrlOp1->getZExtValue();
 
+  if (ShiftValue > ResVT.getScalarSizeInBits())
+    return SDValue();
+
   SDValue Add = Srl->getOperand(0);
   if (Add->getOpcode() != ISD::ADD || !Add->hasOneUse())
     return SDValue();
diff --git a/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll b/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
index f94daa45fb82a63..fe86a94e3035787 100644
--- a/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
@@ -142,6 +142,52 @@ define void @wide_add_shift_add_rshrnb_h(ptr %dest, i64 %index, <vscale x 8 x i3
   ret void
 }
 
+define void @wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
+; CHECK-LABEL: wide_add_shift_add_rshrnb_d:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    rshrnb z1.s, z1.d, #32
+; CHECK-NEXT:    rshrnb z0.s, z0.d, #32
+; CHECK-NEXT:    uzp1 z0.s, z0.s, z1.s
+; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
+; CHECK-NEXT:    add z0.s, z1.s, z0.s
+; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT:    ret
+  %1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2147483648, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 32, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %3 = getelementptr inbounds i32, ptr %dest, i64 %index
+  %load = load <vscale x 4 x i32>, ptr %3, align 4
+  %4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
+  %5 = add <vscale x 4 x i32> %load, %4
+  store <vscale x 4 x i32> %5, ptr %3, align 4
+  ret void
+}
+
+; Do not emit rshrnb if the shift amount is larger than the dest eltsize in bits
+define void @neg_wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
+; CHECK-LABEL: neg_wide_add_shift_add_rshrnb_d:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov z2.d, #0x800000000000
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    add z0.d, z0.d, z2.d
+; CHECK-NEXT:    add z1.d, z1.d, z2.d
+; CHECK-NEXT:    lsr z1.d, z1.d, #48
+; CHECK-NEXT:    lsr z0.d, z0.d, #48
+; CHECK-NEXT:    uzp1 z0.s, z0.s, z1.s
+; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
+; CHECK-NEXT:    add z0.s, z1.s, z0.s
+; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT:    ret
+  %1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 140737488355328, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 48, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
+  %3 = getelementptr inbounds i32, ptr %dest, i64 %index
+  %load = load <vscale x 4 x i32>, ptr %3, align 4
+  %4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
+  %5 = add <vscale x 4 x i32> %load, %4
+  store <vscale x 4 x i32> %5, ptr %3, align 4
+  ret void
+}
+
 define void @neg_trunc_lsr_add_op1_not_splat(ptr %ptr, ptr %dst, i64 %index, <vscale x 8 x i16> %add_op1){
 ; CHECK-LABEL: neg_trunc_lsr_add_op1_not_splat:
 ; CHECK:       // %bb.0:

david-arm · 2023-09-20T09:50:41Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -20241,6 +20241,9 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
    return SDValue();
  unsigned ShiftValue = SrlOp1->getZExtValue();

+  if (ShiftValue > ResVT.getScalarSizeInBits())


I think it's unlikely to happen, but is it worth being paranoid here and also test for the zero shift case? The instruction RSHRNB only permits shifts between 1 and the dest element bit width.

Is there a reason the shift amount cannot be clamped to the maximum rather than not doing the transformation?

@david-arm zero shifts get nuked out of existence really early on by other passes so I'm skeptical this isn't just slowing down the transform. That being said I'll add a test with a 0 shift just to assert it's being correctly removed. From what I can see as well, it isn't possible for SVE shifts to emit if the shift amount is 0. So even if this code bailed on a zero shift, it wouldn't be valid code anyway.

@paulwalker-arm I was trying to think of a case but since the add is checked for single use and the top half of the elements get narrowed I think this it's safe to clamp it to the maximum value.

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

david-arm · 2023-09-20T12:22:26Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -20241,6 +20241,10 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
    return SDValue();
  unsigned ShiftValue = SrlOp1->getZExtValue();

+  uint64_t EltSize = ResVT.getScalarSizeInBits();
+  if (ShiftValue > EltSize)


I discussed this with @paulwalker-arm offline and we don't think this is right after all. I think we should just bail out of the optimisation if the ShiftValue exceeds the truncated element width. Also, it would be good to at least add an assert here that the shift value is zero. From what you're saying it should never happen, which makes sense.

Can you please explain why you both came to the conclusion that this isn't correct?

Sorry Matt this was my fault. My question arose from the typically behaviour of right shifts that are greater than the element bit length, whereby i32 >> #32+N == i32 >> #32.

Dave pointed out that in this case the element type in question is actually half sized because the combine is only expected to be called when a truncate is in play. That would imply i32 >> 16+N == i32 >> 16, which is clearly bogus.

Thanks. I've now just gone with

if (ShiftValue < 1 || ShiftValue > ResVT.getScalarSizeInBits()) return SDValue();

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

david-arm

LGTM! Thanks @MDevereau.

[AArch64][SVE2] Do not emit RSHRNB for large shifts

55851e0

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

MDevereau requested review from sdesmalen-arm and david-arm September 18, 2023 17:11

llvmbot added the backend:AArch64 label Sep 18, 2023

david-arm reviewed Sep 20, 2023

View reviewed changes

[AArch64][SVE2] Do not emit RSHRNB for large shifts

3573db3

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

david-arm reviewed Sep 20, 2023

View reviewed changes

[AArch64][SVE2] Do not emit RSHRNB for large shifts

3a8af1b

rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case

david-arm approved these changes Sep 21, 2023

View reviewed changes

MDevereau merged commit 23ea98f into llvm:main Sep 22, 2023
2 checks passed

MDevereau deleted the rshrnb branch September 22, 2023 09:37

kstoimenov mentioned this pull request Sep 22, 2023

Add memcpm test kstoimenov/llvm-project#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

MDevereau commented Sep 18, 2023

llvmbot commented Sep 18, 2023

david-arm Sep 20, 2023

paulwalker-arm Sep 20, 2023

MDevereau Sep 20, 2023

MDevereau Sep 20, 2023 •

edited

david-arm Sep 20, 2023

MDevereau Sep 20, 2023

paulwalker-arm Sep 20, 2023

MDevereau Sep 20, 2023

david-arm left a comment

[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

[AArch64][SVE2] Do not emit RSHRNB for large shifts #66672

Conversation

MDevereau commented Sep 18, 2023

llvmbot commented Sep 18, 2023

david-arm Sep 20, 2023

Choose a reason for hiding this comment

paulwalker-arm Sep 20, 2023

Choose a reason for hiding this comment

MDevereau Sep 20, 2023

Choose a reason for hiding this comment

MDevereau Sep 20, 2023 • edited

Choose a reason for hiding this comment

david-arm Sep 20, 2023

Choose a reason for hiding this comment

MDevereau Sep 20, 2023

Choose a reason for hiding this comment

paulwalker-arm Sep 20, 2023

Choose a reason for hiding this comment

MDevereau Sep 20, 2023

Choose a reason for hiding this comment

david-arm left a comment

Choose a reason for hiding this comment

MDevereau Sep 20, 2023 •

edited