[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

Bhe6669 · 2024-04-28T10:25:44Z

The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:documentation.

Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduling model encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jianchenjian@bosc.ac.cn
Lv Fanglvfang@bosc.ac.cn

This pull request adds definitions for the XiangShan-KunMingHu processor. "XiangShan" is a high-performance open-source RISC-V processor project, and "KunMingHu" architecture is its third generation. Official documentation can be found at: [documentation](https://xiangshan-doc.readthedocs.io/zh-cn/latest/). Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduler model and other components will be submitted in subsequent patches. Co-authored-by: Chen Jian<chenjian@bosc.ac.cn> Lv Fang<lvfang@bosc.ac.cn> Co-Authored-By: Khao7342 <167075369+Khao7342@users.noreply.github.com> Co-Authored-By: huxuan0307 <39661208+huxuan0307@users.noreply.github.com> Co-Authored-By: Ziyue-Zhang <46214232+Ziyue-Zhang@users.noreply.github.com> Co-Authored-By: Lin Wang <38717023+MrLinWang@users.noreply.github.com> Co-Authored-By: ict-ql <168183727+ict-ql@users.noreply.github.com> Co-Authored-By: bdne159 <168184120+bdne159@users.noreply.github.com> Co-Authored-By: Zhuke-bosc <168183309+Zhuke-bosc@users.noreply.github.com> Co-Authored-By: 雷电霸王龙 <111375214+microft11@users.noreply.github.com>

The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:[documentation](https://xiangshan-doc.readthedocs.io/zh-cn/latest/). This Pull Request introduces the foundational scheduling model of the KunMingHu architecture. It encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches. Co-authored-by: Chen Jian<chenjian@bosc.ac.cn> Lv Fang<lvfang@bosc.ac.cn> Co-Authored-By: Khao7342 <167075369+Khao7342@users.noreply.github.com> Co-Authored-By: huxuan0307 <39661208+huxuan0307@users.noreply.github.com> Co-Authored-By: Ziyue-Zhang <46214232+Ziyue-Zhang@users.noreply.github.com> Co-Authored-By: Lin Wang <38717023+MrLinWang@users.noreply.github.com> Co-Authored-By: ict-ql <168183727+ict-ql@users.noreply.github.com> Co-Authored-By: bdne159 <168184120+bdne159@users.noreply.github.com> Co-Authored-By: Zhuke-bosc <168183309+Zhuke-bosc@users.noreply.github.com> Co-Authored-By: 雷电霸王龙 <111375214+microft11@users.noreply.github.com>

github-actions · 2024-04-28T10:25:59Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-04-28T10:26:28Z

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: None (Bhe6669)

Changes

The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:documentation.

Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduling model encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jian<chenjian@bosc.ac.cn>
Lv Fang<lvfang@bosc.ac.cn>

Patch is 319.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90392.diff

8 Files Affected:

(modified) clang/test/Driver/riscv-cpus.c (+37)
(modified) clang/test/Misc/target-invalid-cpu-note.c (+2-2)
(modified) llvm/lib/Target/RISCV/RISCV.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+28)
(added) llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td (+1489)
(added) llvm/test/tools/llvm-mca/RISCV/XiangShan/gpr-bypass-kmh.s (+534)
(added) llvm/test/tools/llvm-mca/RISCV/XiangShan/no-sew-fp-8-16.s (+10)
(added) llvm/test/tools/llvm-mca/RISCV/XiangShan/vector-integer-arithmetic.s (+2271)

diff --git a/clang/test/Driver/riscv-cpus.c b/clang/test/Driver/riscv-cpus.c
index ff2bd6f7c8ba34..54c44a35c3e82e 100644
--- a/clang/test/Driver/riscv-cpus.c
+++ b/clang/test/Driver/riscv-cpus.c
@@ -31,6 +31,40 @@
 // MCPU-XIANGSHAN-NANHU-SAME: "-target-feature" "+zks" "-target-feature" "+zksed" "-target-feature" "+zksh" "-target-feature" "+svinval"
 // MCPU-XIANGSHAN-NANHU-SAME: "-target-abi" "lp64d"
 
+// RUN: %clang --target=riscv64 -### -c %s 2>&1 -mcpu=xiangshan-kunminghu | FileCheck -check-prefix=MCPU-XIANGSHAN-KUNMINGHU %s
+// MCPU-XIANGSHAN-KUNMINGHU: "-nostdsysteminc" "-target-cpu" "xiangshan-kunminghu"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+m"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+a"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+d"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+c"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+v"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicbom" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicboz" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicsr" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zifencei"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zba" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbb" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbc"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkb" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkc" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkx" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbs"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zkn" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zknd" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zkne" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zknh"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve32f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve32x"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64d"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64x"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl128b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl32b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl64b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-abi" "lp64d"
+
+
 // We cannot check much for -mcpu=native, but it should be replaced by a valid CPU string.
 // RUN: %clang --target=riscv64 -### -c %s -mcpu=native 2> %t.err || true
 // RUN: FileCheck --input-file=%t.err -check-prefix=MCPU-NATIVE %s
@@ -76,6 +110,9 @@
 // RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=xiangshan-nanhu | FileCheck -check-prefix=MTUNE-XIANGSHAN-NANHU %s
 // MTUNE-XIANGSHAN-NANHU: "-tune-cpu" "xiangshan-nanhu"
 
+// RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=xiangshan-kunminghu | FileCheck -check-prefix=MTUNE-XIANGSHAN-KUNMINGHU %s
+// MTUNE-XIANGSHAN-KUNMINGHU: "-tune-cpu" "xiangshan-kunminghu"
+
 // Check mtune alias CPU has resolved to the right CPU according XLEN.
 // RUN: %clang --target=riscv32 -### -c %s 2>&1 -mtune=generic | FileCheck -check-prefix=MTUNE-GENERIC-32 %s
 // MTUNE-GENERIC-32: "-tune-cpu" "generic"
diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c
index 21d80b7134508f..a95170aa01abd2 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -85,7 +85,7 @@
 
 // RUN: not %clang_cc1 -triple riscv64 -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix RISCV64
 // RISCV64: error: unknown target CPU 'not-a-cpu'
-// RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-nanhu{{$}}
+// RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-kunminghu, xiangshan-nanhu{{$}}
 
 // RUN: not %clang_cc1 -triple riscv32 -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE-RISCV32
 // TUNE-RISCV32: error: unknown target CPU 'not-a-cpu'
@@ -93,4 +93,4 @@
 
 // RUN: not %clang_cc1 -triple riscv64 -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE-RISCV64
 // TUNE-RISCV64: error: unknown target CPU 'not-a-cpu'
-// TUNE-RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-nanhu, generic, rocket, sifive-7-series{{$}}
+// TUNE-RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-kunminghu, xiangshan-nanhu, generic, rocket, sifive-7-series{{$}}
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 09f496574d64ae..b03a39a3d17502 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -52,6 +52,7 @@ include "RISCVSchedSiFiveP400.td"
 include "RISCVSchedSiFiveP600.td"
 include "RISCVSchedSyntacoreSCR1.td"
 include "RISCVSchedXiangShanNanHu.td"
+include "RISCVSchedXiangShanKunMingHu.td"
 
 //===----------------------------------------------------------------------===//
 // RISC-V processors supported.
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index a4a5d9e96c271a..6ede6fc21084e4 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -378,3 +378,31 @@ def XIANGSHAN_NANHU : RISCVProcessorModel<"xiangshan-nanhu",
                                             TuneZExtHFusion,
                                             TuneZExtWFusion,
                                             TuneShiftedZExtWFusion]>;
+                                                                                        
+def XIANGSHAN_KUNMINGHU : RISCVProcessorModel<"xiangshan-kunminghu",
+                                              XiangShanKunMingHuModel,
+                                              [Feature64Bit,
+                                               FeatureStdExtI,
+                                               FeatureStdExtZicsr,
+                                               FeatureStdExtZifencei,
+                                               FeatureStdExtM,
+                                               FeatureStdExtA,
+                                               FeatureStdExtF,
+                                               FeatureStdExtD,
+                                               FeatureStdExtC,
+                                               FeatureStdExtZba,
+                                               FeatureStdExtZbb,
+                                               FeatureStdExtZbc,
+                                               FeatureStdExtZbs,
+                                               FeatureStdExtZkn,
+                                               FeatureStdExtZksed,
+                                               FeatureStdExtZksh,
+                                               FeatureStdExtSvinval,
+                                               FeatureStdExtZicbom,
+                                               FeatureStdExtZicboz,
+                                               FeatureStdExtV,
+                                               FeatureStdExtZvl128b],
+                                               [TuneNoDefaultUnroll,
+                                                TuneZExtHFusion,
+                                                TuneZExtWFusion,
+                                                TuneShiftedZExtWFusion]>;
diff --git a/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td b/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td
new file mode 100644
index 00000000000000..e8460b8bfb05a3
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td
@@ -0,0 +1,1489 @@
+//==- RISCVSchedXiangShanKunMingHu.td - XiangShanKunMingHu Scheduling Defs -*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// The XiangShan is a high-performance open-source RISC-V processor project 
+// initiated by the Institute of Computing Technology(ICT), Chinese Academy of Sciences(CAS). 
+// The KunMingHu architecture is its third-generation derivative, 
+// developed by the Institute of Computing Technology, Chinese Academy of Sciences  
+// and the Beijing Institute of Open Source Chip (BOSC), 
+// with a focus on achieving higher performance.
+// Source: https://github.com/OpenXiangShan/XiangShan
+// Documentation: https://github.com/OpenXiangShan/XiangShan-doc
+
+//===----------------------------------------------------------------------===//
+// KunMingHu core supports "RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh
+// _zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei"
+// then floating-point SEW can only be 64 and 32, not 16 and 8.
+class NoZvfhSchedSEWSet_rm8and16<string mx, bit isF = 0, bit isWidening = 0> {
+  defvar t = SchedSEWSet<mx, isF, isWidening>.val; 
+  defvar remove8and16 = !if(isF, !listremove(t, [8, 16]), t);
+  list<int> val = remove8and16;
+}
+
+class NoZvfhSmallestSEW<string mx, bit isF = 0, bit isWidening = 0> {
+  int r = !head(NoZvfhSchedSEWSet_rm8and16<mx, isF, isWidening>.val);
+}
+
+multiclass NoZvfh_LMULSEWReadAdvanceImpl<string name, int val, list<SchedWrite> writes = [],
+                                  list<string> MxList, bit isF = 0,
+                                  bit isWidening = 0> {
+  if !exists<SchedRead>(name # "_WorstCase") then
+    def : ReadAdvance<!cast<SchedRead>(name # "_WorstCase"), val, writes>;
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm8and16<mx, isF, isWidening>.val in
+      if !exists<SchedRead>(name # "_" # mx # "_E" # sew) then
+        def : ReadAdvance<!cast<SchedRead>(name # "_" # mx # "_E" # sew), val, writes>;
+  }
+}
+
+multiclass LMULSEWReadAdvanceFnoZvfh<string name, int val, list<SchedWrite> writes = []>
+  : NoZvfh_LMULSEWReadAdvanceImpl<name, val, writes, SchedMxListF, isF=1,
+                           isWidening=0>;
+
+multiclass LMULSEWReadAdvanceFWnoZvfh<string name, int val, list<SchedWrite> writes = []>
+    : NoZvfh_LMULSEWReadAdvanceImpl<name, val, writes, SchedMxListFW, isF = 1,
+                             isWidening=1>;
+
+//===----------------------------------------------------------------------===//
+// If Zvfhmin and Zvfh are not supported, floating-point SEW can only be 32 or 64.
+class NoZvfhSchedSEWSet_rm32and64<string mx, bit isF = 0, bit isWidening = 0> {
+  defvar t = SchedSEWSet<mx, isF, isWidening>.val;
+  defvar remove32and64 = !if(isF, !listremove(t, [32, 64]), t);
+  list<int> val = remove32and64;
+}
+
+// Write-Impl
+multiclass NoZvfhLMULSEWWriteResImpl<string name, list<ProcResourceKind> resources,
+                               list<string> MxList, bit isF = 0,
+                               bit isWidening = 0> {
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm32and64<mx, isF, isWidening>.val in
+      if !exists<SchedWrite>(name # "_" # mx # "_E" # sew) then
+        def : WriteRes<!cast<SchedWrite>(name # "_" # mx # "_E" # sew), resources>;
+  }
+}
+// Read-Impl
+multiclass NoZvfhLMULSEWReadAdvanceImpl<string name, int val, list<SchedWrite> writes = [],
+                                  list<string> MxList, bit isF = 0,
+                                  bit isWidening = 0> {
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm32and64<mx, isF, isWidening>.val in
+      if !exists<SchedRead>(name # "_" # mx # "_E" # sew) then
+        def : ReadAdvance<!cast<SchedRead>(name # "_" # mx # "_E" # sew), val, writes>;
+  }
+}
+
+// Write
+multiclass NoZvfhLMULSEWWriteResF<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListF, isF=1>;
+
+multiclass NoZvfhLMULSEWWriteResFW<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListFW, isF=1, isWidening=1>;
+
+multiclass NoZvfhLMULSEWWriteResFWRed<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListFWRed, isF=1, isWidening=1>;
+
+// Read
+multiclass NoZvfhLMULSEWReadAdvanceF<string name, int val, list<SchedWrite> writes = []>
+  : NoZvfhLMULSEWReadAdvanceImpl<name, val, writes, SchedMxListF, isF=1>;
+multiclass
+    NoZvfhLMULSEWReadAdvanceFW<string name, int val, list<SchedWrite> writes = []>
+    : NoZvfhLMULSEWReadAdvanceImpl<name, val, writes, SchedMxListFW, isF=1,
+                             isWidening = 1>;
+
+multiclass UnsupportedSchedZvfh {
+let Unsupported = true in {
+// Write 
+// 13. Vector Floating-Point Instructions
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFALUV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFALUF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWALUV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWALUF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFDivV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFDivF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulAddV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulAddF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulAddV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulAddF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSqrtV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRecpV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMinMaxV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMinMaxF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSgnjV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSgnjF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFCvtIToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWCvtFToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFNCvtIToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFNCvtFToFV", []>;
+
+// 14. Vector Reduction Operations
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedOV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedMinMaxV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResFWRed<"WriteVFWRedV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResFWRed<"WriteVFWRedOV_From", []>;
+
+// Read
+// 13. Vector Floating-Point Instructions
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFALUV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFALUF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWALUV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWALUF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFDivV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFDivF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulAddV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulAddF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulAddV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulAddF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSqrtV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFRecpV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMinMaxV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMinMaxF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSgnjV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSgnjF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFCvtIToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWCvtFToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFNCvtIToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFNCvtFToFV", 0>;
+
+} // Unsupported
+} // UnsupportedSchedZvfh
+
+//===----------------------------------------------------------------------===//
+
+class XSGetCyclesVIALU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );
+}
+
+class XSGetCyclesVIMAC<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 3,
+    !eq(mx, "M2") : 6,
+    !eq(mx, "M4") : 12,
+    !eq(mx, "M8") : 24,
+    !eq(mx, "MF2") : 3,
+    !eq(mx, "MF4") : 3,
+    !eq(mx, "MF8") : 3
+  );
+}
+
+class XSGetCyclesVIDIV<string mx, int sew> {
+  int uop = !cond(
+    !eq(mx, "M1") : 1,
+    !eq(mx, "M2") : 2,
+    !eq(mx, "M4") : 4,
+    !eq(mx, "M8") : 8,
+    !eq(mx, "MF2") : 1,
+    !eq(mx, "MF4") : 1,
+    !eq(mx, "MF8") : 1
+  );
+  int cycles = !cond(
+    !eq(sew, 64) : 19,   // I64: 4-19
+    !eq(sew, 32) : 11,   // I32: 4-11
+    !eq(sew, 16) : 7,    // I16: 4-7
+    !eq(sew, 8) : 6      // I8: 6
+  );
+  int c = !mul(uop, cycles);
+}
+
+class XSGetCyclesVIPU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVPPU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVFALU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVFMA<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 4,
+    !eq(mx, "M2") : 8,
+    !eq(mx, "M4") : 16,
+    !eq(mx, "M8") : 32,
+    !eq(mx, "MF2") : 4,
+    !eq(mx, "MF4") : 4,
+    !eq(mx, "MF8") : 4
+  );    
+}
+
+class XSGetCyclesVFDIV<string mx, int sew> {
+  assert !or(!eq(sew, 32), !eq(sew, 64)), "Floating-point SEW of KunMingHu can only be 32 or 64.";
+  int uop = !cond(
+    !eq(mx, "M1") : 1,
+    !eq(mx, "M2") : 2,
+    !eq(mx, "M4") : 4,
+    !eq(mx, "M8") : 8,
+    !eq(mx, "MF2") : 1,
+    !eq(mx, "MF4") : 1,
+    !eq(mx, "MF8") : 1
+  );
+  int cycles = !cond(
+    !eq(sew, 64) : 15,   // FP64: 15
+    !eq(sew, 32) : 10,   // FP32: 10
+  );
+  int c = !mul(uop, cycles);
+}
+
+class XSGetCyclesVFCVT<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 3,
+    !eq(mx, "M2") : 6,
+    !eq(mx, "M4") : 12,
+    !eq(mx, "M8") : 24,
+    !eq(mx, "MF2") : 3,
+    !eq(mx, "MF4") : 3,
+    !eq(mx, "MF8") : 3
+  );    
+}
+
+class XSGetCyclesVLDU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 8,
+    !eq(mx, "M2") : 16,
+    !eq(mx, "M4") : 32,
+    !eq(mx, "M8") : 64,
+    !eq(mx, "MF2") : 8,
+    !eq(mx, "MF4") : 8,
+    !eq(mx, "MF8") : 8
+  );
+}
+
+class XSGetCyclesVSTU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 7,
+    !eq(mx, "M2") : 14,
+    !eq(mx, "M4") : 28,
+    !eq(mx, "M8") : 56,
+    !eq(mx, "MF2") : 7,
+    !eq(mx, "MF4") : 7,
+    !eq(mx, "MF8") : 7
+  );
+}
+
+// If mx is the maximum LMUL in the MxList, then c is true, indicating the worst case.
+class XSIsWorstCaseMX<string mx, list<string> MxList> {
+  defvar LLMUL = LargestLMUL<MxList>.r;
+  bit c = !eq(mx, LLMUL);
+}
+
+// If mx is the maximum LMUL in the MxList, and sew is the minimum value 
+// when LMUL=mx, then c is true, indicating the worst case.
+class XSIsWorstCaseMXSEW<string mx, int sew, list<string> MxList,
+                               bit isF = 0> {
+  defvar LLMUL = LargestLMUL<MxList>.r;
+  defvar SSEW = NoZvfhSmallestSEW<mx, isF>.r;
+  bit c = !and(!eq(mx, LLMUL), !eq(sew, SSEW));
+}
+
+class XSLDUtoAnyBypass<SchedRead read, int cycles = 2>
+    : ReadAdvance<re...
[truncated]

dtcxzyw · 2024-04-28T10:37:43Z

llvm/lib/Target/RISCV/RISCVProcessors.td

+                                               FeatureStdExtZicboz,
+                                               FeatureStdExtV,
+                                               FeatureStdExtZvl128b],
+                                               [TuneNoDefaultUnroll,


See #89359 (comment)

dtcxzyw · 2024-04-28T10:46:41Z

llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td

+  let LoopMicroOpBufferSize = 48;  // Instruction queue size
+  let LoadLatency = 6;
+  let MispredictPenalty = 13; // Based on estimate of pipeline depth.
+  let PostRAScheduler = 1;


Can you share some performance data about this? IIRC it hurts the performance on Xiangshan-Nanhu.

dtcxzyw · 2024-04-28T10:56:13Z

llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td

+  let UnsupportedFeatures = [HasStdExtZcmt, HasStdExtZkr];
+}
+
+let SchedModel = XiangShanKunMingHuModel in {


Can you tell me where is the documentation of Xiangshan-Kunminghu? The documentation is out-of-sync with your schedule model.

dtcxzyw · 2024-04-28T11:05:05Z

llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td

+defm : UnsupportedSchedXsfvcp;
+defm : UnsupportedSchedZvfh;
+
+// Move Elimination


Please add a MCA test for this behavior. IIRC llvm doesn't support RISCV move idioms without changes.

See 59f6e22.

dtcxzyw · 2024-04-28T11:08:40Z

llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td

+def : ReadAdvance<ReadFMul32, 0>;
+def : ReadAdvance<ReadFMul64, 0>;
+def : ReadAdvance<ReadFMA32, 0>;
+def : ReadAdvance<ReadFMA32Addend, 0>;


Is cascade FMA deprecated on Kunminghu?

camel-cdr · 2024-04-29T10:03:16Z

The execution unit layout does not match the current XiangShan master. Has the final execution unit layout been decided on?

earlier layout, which this PR seems to be based on:

VFEX0: VfaluCfg, VfmaCfg, VialuCfg, VimacCfg
VFEX1: VipuCfg, VppuCfg, VfcvtCfg, F2vCfg, F2fCfg, F2iCfg, VSetRvfWvfCfg
VFEX2: VfaluCfg, VfmaCfg, VialuCfg
VFEX3: VfdivCfg, VidivCfg

current master layout:

VFEX0: VfmaCfg, VialuCfg, VimacCfg, VppuCfg
VFEX1: VfaluCfg, VfcvtCfg, VipuCfg, VSetRvfWvfCfg
VFEX2: VfmaCfg, VialuCfg, F2vCfg
VFEX3: VfaluCfg, VfcvtCfg
VFEX4: VfdivCfg, VidivCfg
VFEX5: VfdivCfg, VidivCfg

There even is a open branch that seperates the vpu and fpu pipelines: https://github.com/OpenXiangShan/XiangShan/blob/fp-split/src/main/scala/xiangshan/Parameters.scala#L365

I'd love to have proper scheduling support for kunminghu, but it currectly doesn't look like a stable target.

BTW: Having two div, but only a single vppu seems like a bit of an odd choice. XuanTie C920 has two permutation execution units, and so do the more comparable ARM Neoverse N2 and AMD Zen1.

wangpc-pp · 2024-04-29T10:51:05Z

I'd like to see the support of KunMingHu, but please hold this PR and wait for the finalization of KunMingHu's architecture.

camel-cdr · 2024-04-29T14:31:23Z

llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td

+// 15. Vector Mask Instructions
+// VIALU
+foreach mx = SchedMxList in {
+  defvar Cycles = XSGetCyclesVIALU<mx>.c;


Are you sure this is correct? Since masks always fit into a single LMUL=1 vector register, you'd expect that an LMUL=8 SEW=8 vmand.mm would have the same latency as a LMUL=1 vand.vv. Or does xiangshan use a different internal format for mask registers? See how the SiFivdP600 schedular sets the latency of all mask instructions to 1.

Bhe6669 and others added 2 commits April 28, 2024 18:15

llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Apr 28, 2024

dtcxzyw reviewed Apr 28, 2024

View reviewed changes

camel-cdr reviewed Apr 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

Bhe6669 commented Apr 28, 2024

github-actions bot commented Apr 28, 2024

llvmbot commented Apr 28, 2024 •

edited

Loading

dtcxzyw Apr 28, 2024

dtcxzyw Apr 28, 2024

dtcxzyw Apr 28, 2024

dtcxzyw Apr 28, 2024

dtcxzyw Apr 28, 2024

camel-cdr commented Apr 29, 2024

wangpc-pp commented Apr 29, 2024

camel-cdr Apr 29, 2024 •

edited

Loading

[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

Are you sure you want to change the base?

[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

Conversation

Bhe6669 commented Apr 28, 2024

github-actions bot commented Apr 28, 2024

llvmbot commented Apr 28, 2024 • edited Loading

dtcxzyw Apr 28, 2024

Choose a reason for hiding this comment

dtcxzyw Apr 28, 2024

Choose a reason for hiding this comment

dtcxzyw Apr 28, 2024

Choose a reason for hiding this comment

dtcxzyw Apr 28, 2024

Choose a reason for hiding this comment

dtcxzyw Apr 28, 2024

Choose a reason for hiding this comment

camel-cdr commented Apr 29, 2024

wangpc-pp commented Apr 29, 2024

camel-cdr Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

llvmbot commented Apr 28, 2024 •

edited

Loading

camel-cdr Apr 29, 2024 •

edited

Loading