[AArch64][SVE2] SVE2 NBSL instruction lowering. #89732

dtemirbulatov · 2024-04-23T10:15:18Z

Allow to fold BSL/EOR instuctions to NBSL instruction for scalable vectors.

llvmbot · 2024-04-23T10:15:50Z

@llvm/pr-subscribers-backend-aarch64

Author: Dinar Temirbulatov (dtemirbulatov)

Changes

Allow to fold BSL/CNOT instuctions to NBSL instruction for scalable vectors.

Full diff: https://github.com/llvm/llvm-project/pull/89732.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+17)
(modified) llvm/test/CodeGen/AArch64/sve2-bsl.ll (+15)

diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 6972acd985cb9a..e291d8857bdcd5 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3750,6 +3750,23 @@ let Predicates = [HasSVE2orSME] in {
 
   // SVE2 extract vector (immediate offset, constructive)
   def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;
+
+  // zext(cmpeq(bsl(x, y, z), splat(0))) -> nbsl(x, y, z)
+  def : Pat<(nxv16i8 (zext (nxv16i1 (AArch64setcc_z (nxv16i1 (SVEAllActive)),
+            (nxv16i8 (AArch64bsp nxv16i8:$Op1, nxv16i8:$Op2, nxv16i8:$Op3)), (SVEDup0), SETEQ)))),
+            (NBSL_ZZZZ nxv16i8:$Op1, nxv16i8:$Op2, nxv16i8:$Op3)>;
+
+  def : Pat<(nxv8i16 (zext (nxv8i1 (AArch64setcc_z (nxv8i1 (SVEAllActive)),
+            (nxv8i16 (AArch64bsp nxv8i16:$Op1, nxv8i16:$Op2, nxv8i16:$Op3)), (SVEDup0), SETEQ)))),
+            (NBSL_ZZZZ nxv8i16:$Op1, nxv8i16:$Op2, nxv8i16:$Op3)>;
+
+  def : Pat<(nxv4i32 (zext (nxv4i1 (AArch64setcc_z (nxv4i1 (SVEAllActive)),
+            (nxv4i32 (AArch64bsp nxv4i32:$Op1, nxv4i32:$Op2, nxv4i32:$Op3)), (SVEDup0), SETEQ)))),
+            (NBSL_ZZZZ nxv4i32:$Op1, nxv4i32:$Op2, nxv4i32:$Op3)>;
+
+  def : Pat<(nxv2i64 (zext (nxv2i1 (AArch64setcc_z (nxv2i1 (SVEAllActive)),
+            (nxv2i64 (AArch64bsp nxv2i64:$Op1, nxv2i64:$Op2, nxv2i64:$Op3)), (SVEDup0), SETEQ)))),
+            (NBSL_ZZZZ nxv2i64:$Op1, nxv2i64:$Op2, nxv2i64:$Op3)>;
 } // End HasSVE2orSME
 
 let Predicates = [HasSVE2] in {
diff --git a/llvm/test/CodeGen/AArch64/sve2-bsl.ll b/llvm/test/CodeGen/AArch64/sve2-bsl.ll
index 23b2622f5f5863..a7edd944e399fe 100644
--- a/llvm/test/CodeGen/AArch64/sve2-bsl.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-bsl.ll
@@ -41,3 +41,18 @@ define <vscale x 4 x i32> @no_bsl_fold(<vscale x 4 x i32> %a, <vscale x 4 x i32>
   %c = or <vscale x 4 x i32> %1, %2
   ret <vscale x 4 x i32> %c
 }
+
+define <vscale x 4 x i32> @nbsl(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
+; CHECK-LABEL: nbsl:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov z2.s, #0x7fffffff
+; CHECK-NEXT:    nbsl z2.d, z2.d, z0.d, z1.d
+; CHECK-NEXT:    mov z0.d, z2.d
+; CHECK-NEXT:    ret
+  %1 = and <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2147483647, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+  %2 = and <vscale x 4 x i32> %b, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 -2147483648, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+  %3 = or <vscale x 4 x i32> %1, %2
+  %4 = icmp eq <vscale x 4 x i32> %3, zeroinitializer
+  %5 = zext <vscale x 4 x i1> %4 to <vscale x 4 x i32>
+  ret <vscale x 4 x i32> %5
+}

davemgreen · 2024-04-23T14:12:37Z

llvm/test/CodeGen/AArch64/sve2-bsl.ll

@@ -41,3 +41,18 @@ define <vscale x 4 x i32> @no_bsl_fold(<vscale x 4 x i32> %a, <vscale x 4 x i32>
  %c = or <vscale x 4 x i32> %1, %2
  ret <vscale x 4 x i32> %c
 }
+
+define <vscale x 4 x i32> @nbsl(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {


It is probably worth having a test per type-size, if the patterns are different.

davemgreen · 2024-04-23T14:12:52Z

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

+  // zext(cmpeq(bsl(x, y, z), splat(0))) -> nbsl(x, y, z)
+  def : Pat<(nxv16i8 (zext (nxv16i1 (AArch64setcc_z (nxv16i1 (SVEAllActive)),
+            (nxv16i8 (AArch64bsp nxv16i8:$Op1, nxv16i8:$Op2, nxv16i8:$Op3)), (SVEDup0), SETEQ)))),
+            (NBSL_ZZZZ nxv16i8:$Op1, nxv16i8:$Op2, nxv16i8:$Op3)>;


I believe the operands need to be Op2, Op3, Op1. The order of the operands is weird.

davemgreen · 2024-04-23T14:14:17Z

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

@@ -3750,6 +3750,23 @@ let Predicates = [HasSVE2orSME] in {

  // SVE2 extract vector (immediate offset, constructive)
  def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;
+
+  // zext(cmpeq(bsl(x, y, z), splat(0))) -> nbsl(x, y, z)


The instruction looks like it should be not(or(and(x, z), and(y, not(z))), not the same as a "cnot" instruction. I think you can use vnot in the tablegen pattern, and the ir will be xor "bsl", -1.

It may be possible to pass a pattern fragment to sve2_int_bitwise_ternary_op, like bsl already does.

davemgreen

Thanks. LGTM

davemgreen · 2024-04-25T14:24:50Z

llvm/lib/Target/AArch64/AArch64InstrInfo.td

@@ -733,6 +733,8 @@ def AArch64vsli : SDNode<"AArch64ISD::VSLI", SDT_AArch64vshiftinsert>;
 def AArch64vsri : SDNode<"AArch64ISD::VSRI", SDT_AArch64vshiftinsert>;

 def AArch64bsp: SDNode<"AArch64ISD::BSP", SDT_AArch64trivec>;
+def AArch64nbsl: PatFrag<(ops node:$Op1, node:$Op2, node:$Op3),
+                        (vnot (AArch64bsp node:$Op1, node:$Op2, node:$Op3))>;


Add an extra space, to line this up.

Allow to fold BSL/CNOT instuctions to NBSL instruction for scalable vectors.

dtemirbulatov requested review from SamTebbs33, davemgreen, huntergr-arm, sdesmalen-arm, kmclaughlin-arm and david-arm April 23, 2024 10:15

llvmbot added the backend:AArch64 label Apr 23, 2024

davemgreen reviewed Apr 23, 2024

View reviewed changes

davemgreen approved these changes Apr 26, 2024

View reviewed changes

dtemirbulatov added 3 commits April 26, 2024 09:09

[AArch64][SVE2] SVE2 NBSL instruction lowering.

9705cee

Allow to fold BSL/CNOT instuctions to NBSL instruction for scalable vectors.

Resolved remarks.

1889d8d

Formatting.

45ad4b1

dtemirbulatov force-pushed the nbsl branch from ab2d185 to 45ad4b1 Compare April 26, 2024 09:10

dtemirbulatov merged commit 37a92f9 into llvm:main Apr 26, 2024
3 of 4 checks passed

dtemirbulatov deleted the nbsl branch April 26, 2024 16:07

This was referenced Apr 29, 2024

main #90439

Closed

[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 #90440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][SVE2] SVE2 NBSL instruction lowering. #89732

[AArch64][SVE2] SVE2 NBSL instruction lowering. #89732

dtemirbulatov commented Apr 23, 2024 •

edited

Loading

llvmbot commented Apr 23, 2024

davemgreen Apr 23, 2024

dtemirbulatov Apr 25, 2024

davemgreen Apr 23, 2024

dtemirbulatov Apr 25, 2024

davemgreen Apr 23, 2024

davemgreen Apr 23, 2024

dtemirbulatov Apr 25, 2024

davemgreen left a comment

davemgreen Apr 25, 2024

dtemirbulatov Apr 26, 2024

[AArch64][SVE2] SVE2 NBSL instruction lowering. #89732

[AArch64][SVE2] SVE2 NBSL instruction lowering. #89732

Conversation

dtemirbulatov commented Apr 23, 2024 • edited Loading

llvmbot commented Apr 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtemirbulatov commented Apr 23, 2024 •

edited

Loading