Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Use vnclip for scalable vector saturating truncation. #88648

Merged
merged 4 commits into from
Apr 18, 2024

Conversation

sun-jacobi
Copy link
Member

@sun-jacobi sun-jacobi commented Apr 14, 2024

Similar to #75145, but for scalable vectors.

Specifically, this patch works for the below optimization case:

Source Code

define void @trunc_sat_i8i16_maxmin(ptr %x, ptr %y) {
  %1 = load <vscale x 4 x i16>, ptr %x, align 16
  %2 = tail call <vscale x 4 x i16> @llvm.smax.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 -128))
  %3 = tail call <vscale x 4 x i16> @llvm.smin.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 127))
  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
  store <vscale x 4 x i8> %4, ptr %y, align 8
  ret void
}

Before this patch

Compiler Explorer

trunc_sat_i8i16_maxmin:
        vl1re16.v       v8, (a0)
        li      a0, -128
        vsetvli a2, zero, e16, m1, ta, ma
        vmax.vx v8, v8, a0
        li      a0, 127
        vmin.vx v8, v8, a0
        vsetvli zero, zero, e8, mf2, ta, ma
        vnsrl.wi        v8, v8, 0
        vse8.v  v8, (a1)
        ret

After this patch

trunc_sat_i8i16_maxmin:
        vsetivli zero, 4, e8, mf4, ta, ma
        vle16.v v8, (a0)
        vnclip.wi v8, v8, 0
        vse8.v v8, (a1)
        ret

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 14, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Chia (sun-jacobi)

Changes

Similar to #75145, but for scalable vector.

Source Code

define void @<!-- -->trunc_sat_i8i16_maxmin(ptr %x, ptr %y) {
  %1 = load &lt;vscale x 4 x i16&gt;, ptr %x, align 16
  %2 = tail call &lt;vscale x 4 x i16&gt; @<!-- -->llvm.smax.v4i16(&lt;vscale x 4 x i16&gt; %1, &lt;vscale x 4 x i16&gt; splat (i16 -128))
  %3 = tail call &lt;vscale x 4 x i16&gt; @<!-- -->llvm.smin.v4i16(&lt;vscale x 4 x i16&gt; %2, &lt;vscale x 4 x i16&gt; splat (i16 127))
  %4 = trunc &lt;vscale x 4 x i16&gt; %3 to &lt;vscale x 4 x i8&gt;
  store &lt;vscale x 4 x i8&gt; %4, ptr %y, align 8
  ret void
}

Before this patch

Compiler Explorer

trunc_sat_i8i16_maxmin:
        vl1re16.v       v8, (a0)
        li      a0, -128
        vsetvli a2, zero, e16, m1, ta, ma
        vmax.vx v8, v8, a0
        li      a0, 127
        vmin.vx v8, v8, a0
        vsetvli zero, zero, e8, mf2, ta, ma
        vnsrl.wi        v8, v8, 0
        vse8.v  v8, (a1)
        ret

After this patch

trunc_sat_i8i16_maxmin:
        vsetivli zero, 4, e8, mf4, ta, ma
        vle16.v v8, (a0)
        vnclip.wi v8, v8, 0
        vse8.v v8, (a1)
        ret

Patch is 23.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88648.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td (+50)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td (+15-17)
  • (renamed) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-sat-clip.ll ()
  • (added) llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip-sdnode.ll (+379)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td
index 7c77449b4f6e1c..b88e855c50583e 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td
@@ -1166,6 +1166,56 @@ defm : VPatBinarySDNode_VV_VX<usubsat, "PseudoVSSUBU">;
 defm : VPatAVGADD_VV_VX_RM<avgflooru, 0b10>;
 defm : VPatAVGADD_VV_VX_RM<avgceilu, 0b00>;
 
+// 12.5. Vector Narrowing Fixed-Point Clip Instructions
+class VPatTruncSatClipMaxMinSDNodeBase<VTypeInfo vti,
+                                       VTypeInfo wti,
+                                       SDPatternOperator op1,
+                                       int op1_value,
+                                       SDPatternOperator op2,
+                                       int op2_value> :
+  Pat<(vti.Vector (riscv_trunc_vector_vl
+        (wti.Vector (op1
+          (wti.Vector (op2 (wti.Vector wti.RegClass:$rs1),
+            (wti.Vector (riscv_vmv_v_x_vl (wti.Vector undef), op2_value, (XLenVT srcvalue))))),
+          (wti.Vector (riscv_vmv_v_x_vl (wti.Vector undef), op1_value, (XLenVT srcvalue))))),
+        (vti.Mask V0), VLOpFrag)),
+      (!cast<Instruction>("PseudoVNCLIP_WI_"#vti.LMul.MX#"_MASK")
+        (vti.Vector (IMPLICIT_DEF)), wti.RegClass:$rs1, 0,
+        (vti.Mask V0), 0, GPR:$vl, vti.Log2SEW, TA_MA)>;
+
+class VPatTruncSatClipUMinSDNode<VTypeInfo vti,
+                                 VTypeInfo wti,
+                                 int uminval> :
+  Pat<(vti.Vector (riscv_trunc_vector_vl
+        (wti.Vector (umin (wti.Vector wti.RegClass:$rs1),
+          (wti.Vector (riscv_vmv_v_x_vl (wti.Vector undef), uminval, (XLenVT srcvalue))))), (vti.Mask V0), VLOpFrag)),
+      (!cast<Instruction>("PseudoVNCLIPU_WI_"#vti.LMul.MX#"_MASK")
+        (vti.Vector (IMPLICIT_DEF)), wti.RegClass:$rs1, 0,
+        (vti.Mask V0), 0, GPR:$vl, vti.Log2SEW, TA_MA)>;
+
+multiclass VPatTruncSatClipMaxMinSDNode<VTypeInfo vti, VTypeInfo wti,
+  SDPatternOperator max, int maxval, SDPatternOperator min, int minval> {
+    def : VPatTruncSatClipMaxMinSDNodeBase<vti, wti, max, maxval, min, minval>;
+    def : VPatTruncSatClipMaxMinSDNodeBase<vti, wti, min, minval, max, maxval>;
+}
+
+multiclass VPatTruncSatClipSDNode<VTypeInfo vti, VTypeInfo wti> {
+  defvar sew = vti.SEW;
+  defvar uminval = !sub(!shl(1, sew), 1);
+  defvar sminval = !sub(!shl(1, !sub(sew, 1)), 1);
+  defvar smaxval = !sub(0, !shl(1, !sub(sew, 1)));
+
+  let Predicates = !listconcat(GetVTypePredicates<vti>.Predicates,
+                               GetVTypePredicates<wti>.Predicates) in {
+    defm : VPatTruncSatClipMaxMinSDNode<vti, wti, smin, sminval, smax, smaxval>;
+    def : VPatTruncSatClipUMinSDNode<vti, wti, uminval>;
+  }
+
+}
+
+foreach vtiToWti = AllWidenableIntVectors in
+  defm : VPatTruncSatClipSDNode<vtiToWti.Vti, vtiToWti.Wti>;
+
 // 15. Vector Mask Instructions
 
 // 15.1. Vector Mask-Register Logical Instructions
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
index 322c055306e86f..285e84551379b6 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
@@ -2373,13 +2373,12 @@ defm : VPatAVGADDVL_VV_VX_RM<riscv_avgflooru_vl, 0b10>;
 defm : VPatAVGADDVL_VV_VX_RM<riscv_avgceilu_vl, 0b00>;
 
 // 12.5. Vector Narrowing Fixed-Point Clip Instructions
-class VPatTruncSatClipMaxMinBase<string inst,
-                                 VTypeInfo vti,
-                                 VTypeInfo wti,
-                                 SDPatternOperator op1,
-                                 int op1_value,
-                                 SDPatternOperator op2,
-                                 int op2_value> :
+class VPatTruncSatClipMaxMinVLBase<VTypeInfo vti,
+                                   VTypeInfo wti,
+                                   SDPatternOperator op1,
+                                   int op1_value,
+                                   SDPatternOperator op2,
+                                   int op2_value> :
   Pat<(vti.Vector (riscv_trunc_vector_vl
         (wti.Vector (op1
           (wti.Vector (op2
@@ -2389,11 +2388,11 @@ class VPatTruncSatClipMaxMinBase<string inst,
           (wti.Vector (riscv_vmv_v_x_vl (wti.Vector undef), op1_value, (XLenVT srcvalue))),
           (wti.Vector undef), (wti.Mask V0), VLOpFrag)),
         (vti.Mask V0), VLOpFrag)),
-      (!cast<Instruction>(inst#"_WI_"#vti.LMul.MX#"_MASK")
+      (!cast<Instruction>("PseudoVNCLIP_WI_"#vti.LMul.MX#"_MASK")
         (vti.Vector (IMPLICIT_DEF)), wti.RegClass:$rs1, 0,
         (vti.Mask V0), 0, GPR:$vl, vti.Log2SEW, TA_MA)>;
 
-class VPatTruncSatClipUMin<VTypeInfo vti,
+class VPatTruncSatClipUMinVL<VTypeInfo vti,
                            VTypeInfo wti,
                            int uminval> :
   Pat<(vti.Vector (riscv_trunc_vector_vl
@@ -2406,13 +2405,13 @@ class VPatTruncSatClipUMin<VTypeInfo vti,
         (vti.Vector (IMPLICIT_DEF)), wti.RegClass:$rs1, 0,
         (vti.Mask V0), 0, GPR:$vl, vti.Log2SEW, TA_MA)>;
 
-multiclass VPatTruncSatClipMaxMin<string inst, VTypeInfo vti, VTypeInfo wti,
+multiclass VPatTruncSatClipMaxMinVL<VTypeInfo vti, VTypeInfo wti,
   SDPatternOperator max, int maxval, SDPatternOperator min, int minval> {
-    def : VPatTruncSatClipMaxMinBase<inst, vti, wti, max, maxval, min, minval>;
-    def : VPatTruncSatClipMaxMinBase<inst, vti, wti, min, minval, max, maxval>;
+    def : VPatTruncSatClipMaxMinVLBase<vti, wti, max, maxval, min, minval>;
+    def : VPatTruncSatClipMaxMinVLBase<vti, wti, min, minval, max, maxval>;
 }
 
-multiclass VPatTruncSatClip<VTypeInfo vti, VTypeInfo wti> {
+multiclass VPatTruncSatClipVL<VTypeInfo vti, VTypeInfo wti> {
   defvar sew = vti.SEW;
   defvar uminval = !sub(!shl(1, sew), 1);
   defvar sminval = !sub(!shl(1, !sub(sew, 1)), 1);
@@ -2420,15 +2419,14 @@ multiclass VPatTruncSatClip<VTypeInfo vti, VTypeInfo wti> {
 
   let Predicates = !listconcat(GetVTypePredicates<vti>.Predicates,
                                GetVTypePredicates<wti>.Predicates) in {
-    defm : VPatTruncSatClipMaxMin<"PseudoVNCLIP", vti, wti, riscv_smin_vl,
-                                  sminval, riscv_smax_vl, smaxval>;
-    def : VPatTruncSatClipUMin<vti, wti, uminval>;
+    defm : VPatTruncSatClipMaxMinVL<vti, wti, riscv_smin_vl, sminval, riscv_smax_vl, smaxval>;
+    def : VPatTruncSatClipUMinVL<vti, wti, uminval>;
   }
 
 }
 
 foreach vtiToWti = AllWidenableIntVectors in
-  defm : VPatTruncSatClip<vtiToWti.Vti, vtiToWti.Wti>;
+  defm : VPatTruncSatClipVL<vtiToWti.Vti, vtiToWti.Wti>;
 
 // 13. Vector Floating-Point Instructions
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-sat-clip.ll
similarity index 100%
rename from llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip.ll
rename to llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-sat-clip.ll
diff --git a/llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip-sdnode.ll
new file mode 100644
index 00000000000000..fae6abeed1b538
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/trunc-sat-clip-sdnode.ll
@@ -0,0 +1,379 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s
+
+declare <vscale x 4 x i16> @llvm.smax.v4i16(<vscale x 4 x i16>, <vscale x 4 x i16>)
+declare <vscale x 4 x i16> @llvm.smin.v4i16(<vscale x 4 x i16>, <vscale x 4 x i16>)
+declare <vscale x 4 x i32> @llvm.smax.v4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
+declare <vscale x 4 x i32> @llvm.smin.v4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
+declare <vscale x 4 x i64> @llvm.smax.v4i64(<vscale x 4 x i64>, <vscale x 4 x i64>)
+declare <vscale x 4 x i64> @llvm.smin.v4i64(<vscale x 4 x i64>, <vscale x 4 x i64>)
+
+declare <vscale x 4 x i16> @llvm.umax.v4i16(<vscale x 4 x i16>, <vscale x 4 x i16>)
+declare <vscale x 4 x i16> @llvm.umin.v4i16(<vscale x 4 x i16>, <vscale x 4 x i16>)
+declare <vscale x 4 x i32> @llvm.umax.v4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
+declare <vscale x 4 x i32> @llvm.umin.v4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
+declare <vscale x 4 x i64> @llvm.umax.v4i64(<vscale x 4 x i64>, <vscale x 4 x i64>)
+declare <vscale x 4 x i64> @llvm.umin.v4i64(<vscale x 4 x i64>, <vscale x 4 x i64>)
+
+define void @trunc_sat_i8i16_maxmin(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i8i16_maxmin:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnclip.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.smax.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 -128))
+  %3 = tail call <vscale x 4 x i16> @llvm.smin.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 127))
+  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %4, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_i8i16_minmax(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i8i16_minmax:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnclip.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.smin.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 127))
+  %3 = tail call <vscale x 4 x i16> @llvm.smax.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 -128))
+  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %4, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_i8i16_notopt(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i8i16_notopt:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    li a0, -127
+; CHECK-NEXT:    vsetvli a2, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vmax.vx v8, v8, a0
+; CHECK-NEXT:    li a0, 128
+; CHECK-NEXT:    vmin.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.smax.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 -127))
+  %3 = tail call <vscale x 4 x i16> @llvm.smin.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 128))
+  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %4, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_u8u16_min(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u8u16_min:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnclipu.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.umin.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 255))
+  %3 = trunc <vscale x 4 x i16> %2 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %3, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_u8u16_notopt(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u8u16_notopt:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    li a0, 127
+; CHECK-NEXT:    vsetvli a2, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vminu.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.umin.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 127))
+  %3 = trunc <vscale x 4 x i16> %2 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %3, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_u8u16_maxmin(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u8u16_maxmin:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnclipu.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.umax.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 0))
+  %3 = tail call <vscale x 4 x i16> @llvm.umin.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 255))
+  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %4, ptr %y, align 8
+  ret void
+}
+
+define void @trunc_sat_u8u16_minmax(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u8u16_minmax:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl1re16.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vnclipu.wi v8, v8, 0
+; CHECK-NEXT:    vse8.v v8, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i16>, ptr %x, align 16
+  %2 = tail call <vscale x 4 x i16> @llvm.umin.v4i16(<vscale x 4 x i16> %1, <vscale x 4 x i16> splat (i16 255))
+  %3 = tail call <vscale x 4 x i16> @llvm.umax.v4i16(<vscale x 4 x i16> %2, <vscale x 4 x i16> splat (i16 0))
+  %4 = trunc <vscale x 4 x i16> %3 to <vscale x 4 x i8>
+  store <vscale x 4 x i8> %4, ptr %y, align 8
+  ret void
+}
+
+
+define void @trunc_sat_i16i32_notopt(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i16i32_notopt:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 1048568
+; CHECK-NEXT:    addi a0, a0, 1
+; CHECK-NEXT:    vsetvli a2, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vmax.vx v8, v8, a0
+; CHECK-NEXT:    lui a0, 8
+; CHECK-NEXT:    vmin.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnsrl.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.smax.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 -32767))
+  %3 = tail call <vscale x 4 x i32> @llvm.smin.v4i32(<vscale x 4 x i32> %2, <vscale x 4 x i32> splat (i32 32768))
+  %4 = trunc <vscale x 4 x i32> %3 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %4, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_i16i32_maxmin(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i16i32_maxmin:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnclip.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.smax.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 -32768))
+  %3 = tail call <vscale x 4 x i32> @llvm.smin.v4i32(<vscale x 4 x i32> %2, <vscale x 4 x i32> splat (i32 32767))
+  %4 = trunc <vscale x 4 x i32> %3 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %4, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_i16i32_minmax(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i16i32_minmax:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnclip.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.smin.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 32767))
+  %3 = tail call <vscale x 4 x i32> @llvm.smax.v4i32(<vscale x 4 x i32> %2, <vscale x 4 x i32> splat (i32 -32768))
+  %4 = trunc <vscale x 4 x i32> %3 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %4, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_u16u32_notopt(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u16u32_notopt:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 8
+; CHECK-NEXT:    addi a0, a0, -1
+; CHECK-NEXT:    vsetvli a2, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vminu.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnsrl.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.umin.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 32767))
+  %3 = trunc <vscale x 4 x i32> %2 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %3, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_u16u32_min(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u16u32_min:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnclipu.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.umin.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 65535))
+  %3 = trunc <vscale x 4 x i32> %2 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %3, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_u16u32_minmax(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u16u32_minmax:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnclipu.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.umax.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 0))
+  %3 = tail call <vscale x 4 x i32> @llvm.umin.v4i32(<vscale x 4 x i32> %2, <vscale x 4 x i32> splat (i32 65535))
+  %4 = trunc <vscale x 4 x i32> %3 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %4, ptr %y, align 16
+  ret void
+}
+
+define void @trunc_sat_u16u32_maxmin(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_u16u32_maxmin:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl2re32.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vnclipu.wi v10, v8, 0
+; CHECK-NEXT:    vs1r.v v10, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i32>, ptr %x, align 32
+  %2 = tail call <vscale x 4 x i32> @llvm.umin.v4i32(<vscale x 4 x i32> %1, <vscale x 4 x i32> splat (i32 65535))
+  %3 = tail call <vscale x 4 x i32> @llvm.umax.v4i32(<vscale x 4 x i32> %2, <vscale x 4 x i32> splat (i32 0))
+  %4 = trunc <vscale x 4 x i32> %3 to <vscale x 4 x i16>
+  store <vscale x 4 x i16> %4, ptr %y, align 16
+  ret void
+}
+
+
+define void @trunc_sat_i32i64_notopt(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i32i64_notopt:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl4re64.v v8, (a0)
+; CHECK-NEXT:    lui a0, 524288
+; CHECK-NEXT:    addiw a0, a0, 1
+; CHECK-NEXT:    vsetvli a2, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vmax.vx v8, v8, a0
+; CHECK-NEXT:    li a0, 1
+; CHECK-NEXT:    slli a0, a0, 31
+; CHECK-NEXT:    vmin.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vnsrl.wi v12, v8, 0
+; CHECK-NEXT:    vs2r.v v12, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i64>, ptr %x, align 64
+  %2 = tail call <vscale x 4 x i64> @llvm.smax.v4i64(<vscale x 4 x i64> %1, <vscale x 4 x i64> splat (i64 -2147483647))
+  %3 = tail call <vscale x 4 x i64> @llvm.smin.v4i64(<vscale x 4 x i64> %2, <vscale x 4 x i64> splat (i64 2147483648))
+  %4 = trunc <vscale x 4 x i64> %3 to <vscale x 4 x i32>
+  store <vscale x 4 x i32> %4, ptr %y, align 32
+  ret void
+}
+
+define void @trunc_sat_i32i64_maxmin(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i32i64_maxmin:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl4re64.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vnclip.wi v12, v8, 0
+; CHECK-NEXT:    vs2r.v v12, (a1)
+; CHECK-NEXT:    ret
+  %1 = load <vscale x 4 x i64>, ptr %x, align 64
+  %2 = tail call <vscale x 4 x i64> @llvm.smax.v4i64(<vscale x 4 x i64> %1, <vscale x 4 x i64> splat (i64 -2147483648))
+  %3 = tail call <vscale x 4 x i64> @llvm.smin.v4i64(<vscale x 4 x i64> %2, <vscale x 4 x i64> splat (i64 2147483647))
+  %4 = trunc <vscale x 4 x i64> %3 to <vscale x 4 x i32>
+  store <vscale x 4 x i32> %4, ptr %y, align 32
+  ret void
+}
+
+define void @trunc_sat_i32i64_minmax(ptr %x, ptr %y) {
+; CHECK-LABEL: trunc_sat_i32i64_minmax:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl4re64.v v8, (a0)
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma...
[truncated]

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits

; CHECK-NEXT: ret
%1 = load <vscale x 4 x i64>, ptr %x, align 64
%2 = tail call <vscale x 4 x i64> @llvm.umin.v4i64(<vscale x 4 x i64> %1, <vscale x 4 x i64> splat (i64 4294967295))
%3 = tail call <vscale x 4 x i64> @llvm.umax.v4i64(<vscale x 4 x i64> %2, <vscale x 4 x i64> splat (i64 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I presume this umax of 0 is combined away by DAGCombiner, do we still need it in the tests?

Comment on lines 1198 to 1199
def : VPatTruncSatClipMaxMinSDNodeBase<vti, wti, max, maxval, min, minval>;
def : VPatTruncSatClipMaxMinSDNodeBase<vti, wti, min, minval, max, maxval>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, do we need this base multiclass? Would it be easier to just move these two defs inline beside the VPatTruncSatClipUMinSDNode def?

@sun-jacobi sun-jacobi requested a review from lukel97 April 15, 2024 13:55
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sun-jacobi
Copy link
Member Author

Pending on #88496

@lukel97
Copy link
Contributor

lukel97 commented Apr 17, 2024

Pending on #88496

I think it should be fine to merge this before #88496? It should be easy enough to update the patterns added in this PR later

@sun-jacobi sun-jacobi merged commit 0afc884 into llvm:main Apr 18, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants