Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Add processor definition and scheduling model for XiangShan-KunMingHu #90392

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Bhe6669
Copy link

@Bhe6669 Bhe6669 commented Apr 28, 2024

The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:documentation.

Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduling model encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jianchenjian@bosc.ac.cn
Lv Fanglvfang@bosc.ac.cn

Bhe6669 and others added 2 commits April 28, 2024 18:15
This pull request adds definitions for the XiangShan-KunMingHu processor. "XiangShan" is a high-performance open-source RISC-V processor project, and "KunMingHu" architecture is its third generation. Official documentation can be found at: [documentation](https://xiangshan-doc.readthedocs.io/zh-cn/latest/).

Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduler model and other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jian<chenjian@bosc.ac.cn>
Lv Fang<lvfang@bosc.ac.cn>

Co-Authored-By: Khao7342 <167075369+Khao7342@users.noreply.github.com>
Co-Authored-By: huxuan0307 <39661208+huxuan0307@users.noreply.github.com>
Co-Authored-By: Ziyue-Zhang <46214232+Ziyue-Zhang@users.noreply.github.com>
Co-Authored-By: Lin Wang <38717023+MrLinWang@users.noreply.github.com>
Co-Authored-By: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-Authored-By: bdne159 <168184120+bdne159@users.noreply.github.com>
Co-Authored-By: Zhuke-bosc <168183309+Zhuke-bosc@users.noreply.github.com>
Co-Authored-By: 雷电霸王龙 <111375214+microft11@users.noreply.github.com>
The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:[documentation](https://xiangshan-doc.readthedocs.io/zh-cn/latest/).

This Pull Request introduces the foundational scheduling model of the KunMingHu architecture. It encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jian<chenjian@bosc.ac.cn>
Lv Fang<lvfang@bosc.ac.cn>

Co-Authored-By: Khao7342 <167075369+Khao7342@users.noreply.github.com>
Co-Authored-By: huxuan0307 <39661208+huxuan0307@users.noreply.github.com>
Co-Authored-By: Ziyue-Zhang <46214232+Ziyue-Zhang@users.noreply.github.com>
Co-Authored-By: Lin Wang <38717023+MrLinWang@users.noreply.github.com>
Co-Authored-By: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-Authored-By: bdne159 <168184120+bdne159@users.noreply.github.com>
Co-Authored-By: Zhuke-bosc <168183309+Zhuke-bosc@users.noreply.github.com>
Co-Authored-By: 雷电霸王龙 <111375214+microft11@users.noreply.github.com>
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Apr 28, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 28, 2024

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: None (Bhe6669)

Changes

The "XiangShan" is a high-performance open-source RISC-V processor project, and The "KunMingHu" architecture is its third generation. Official documentation can be found at:documentation.

Currently, the KunMingHu core supports"RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei". The scheduling model encompasses the basic configurations and instruction latencies of the KunMingHu core. Other components will be submitted in subsequent patches.

Co-authored-by:
Chen Jian<chenjian@bosc.ac.cn>
Lv Fang<lvfang@bosc.ac.cn>


Patch is 319.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90392.diff

8 Files Affected:

  • (modified) clang/test/Driver/riscv-cpus.c (+37)
  • (modified) clang/test/Misc/target-invalid-cpu-note.c (+2-2)
  • (modified) llvm/lib/Target/RISCV/RISCV.td (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+28)
  • (added) llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td (+1489)
  • (added) llvm/test/tools/llvm-mca/RISCV/XiangShan/gpr-bypass-kmh.s (+534)
  • (added) llvm/test/tools/llvm-mca/RISCV/XiangShan/no-sew-fp-8-16.s (+10)
  • (added) llvm/test/tools/llvm-mca/RISCV/XiangShan/vector-integer-arithmetic.s (+2271)
diff --git a/clang/test/Driver/riscv-cpus.c b/clang/test/Driver/riscv-cpus.c
index ff2bd6f7c8ba34..54c44a35c3e82e 100644
--- a/clang/test/Driver/riscv-cpus.c
+++ b/clang/test/Driver/riscv-cpus.c
@@ -31,6 +31,40 @@
 // MCPU-XIANGSHAN-NANHU-SAME: "-target-feature" "+zks" "-target-feature" "+zksed" "-target-feature" "+zksh" "-target-feature" "+svinval"
 // MCPU-XIANGSHAN-NANHU-SAME: "-target-abi" "lp64d"
 
+// RUN: %clang --target=riscv64 -### -c %s 2>&1 -mcpu=xiangshan-kunminghu | FileCheck -check-prefix=MCPU-XIANGSHAN-KUNMINGHU %s
+// MCPU-XIANGSHAN-KUNMINGHU: "-nostdsysteminc" "-target-cpu" "xiangshan-kunminghu"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+m"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+a"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+d"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+c"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+v"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicbom" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicboz" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zicsr" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zifencei"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zba" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbb" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbc"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkb" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkc" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbkx" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zbs"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zkn" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zknd" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zkne" 
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zknh"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve32f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve32x"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64d"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64f"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zve64x"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl128b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl32b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-feature" "+zvl64b"
+// MCPU-XIANGSHAN-KUNMINGHU-SAME: "-target-abi" "lp64d"
+
+
 // We cannot check much for -mcpu=native, but it should be replaced by a valid CPU string.
 // RUN: %clang --target=riscv64 -### -c %s -mcpu=native 2> %t.err || true
 // RUN: FileCheck --input-file=%t.err -check-prefix=MCPU-NATIVE %s
@@ -76,6 +110,9 @@
 // RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=xiangshan-nanhu | FileCheck -check-prefix=MTUNE-XIANGSHAN-NANHU %s
 // MTUNE-XIANGSHAN-NANHU: "-tune-cpu" "xiangshan-nanhu"
 
+// RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=xiangshan-kunminghu | FileCheck -check-prefix=MTUNE-XIANGSHAN-KUNMINGHU %s
+// MTUNE-XIANGSHAN-KUNMINGHU: "-tune-cpu" "xiangshan-kunminghu"
+
 // Check mtune alias CPU has resolved to the right CPU according XLEN.
 // RUN: %clang --target=riscv32 -### -c %s 2>&1 -mtune=generic | FileCheck -check-prefix=MTUNE-GENERIC-32 %s
 // MTUNE-GENERIC-32: "-tune-cpu" "generic"
diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c
index 21d80b7134508f..a95170aa01abd2 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -85,7 +85,7 @@
 
 // RUN: not %clang_cc1 -triple riscv64 -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix RISCV64
 // RISCV64: error: unknown target CPU 'not-a-cpu'
-// RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-nanhu{{$}}
+// RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-kunminghu, xiangshan-nanhu{{$}}
 
 // RUN: not %clang_cc1 -triple riscv32 -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE-RISCV32
 // TUNE-RISCV32: error: unknown target CPU 'not-a-cpu'
@@ -93,4 +93,4 @@
 
 // RUN: not %clang_cc1 -triple riscv64 -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE-RISCV64
 // TUNE-RISCV64: error: unknown target CPU 'not-a-cpu'
-// TUNE-RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-nanhu, generic, rocket, sifive-7-series{{$}}
+// TUNE-RISCV64-NEXT: note: valid target CPU values are: generic-rv64, rocket-rv64, sifive-p450, sifive-p670, sifive-s21, sifive-s51, sifive-s54, sifive-s76, sifive-u54, sifive-u74, sifive-x280, veyron-v1, xiangshan-kunminghu, xiangshan-nanhu, generic, rocket, sifive-7-series{{$}}
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 09f496574d64ae..b03a39a3d17502 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -52,6 +52,7 @@ include "RISCVSchedSiFiveP400.td"
 include "RISCVSchedSiFiveP600.td"
 include "RISCVSchedSyntacoreSCR1.td"
 include "RISCVSchedXiangShanNanHu.td"
+include "RISCVSchedXiangShanKunMingHu.td"
 
 //===----------------------------------------------------------------------===//
 // RISC-V processors supported.
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index a4a5d9e96c271a..6ede6fc21084e4 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -378,3 +378,31 @@ def XIANGSHAN_NANHU : RISCVProcessorModel<"xiangshan-nanhu",
                                             TuneZExtHFusion,
                                             TuneZExtWFusion,
                                             TuneShiftedZExtWFusion]>;
+                                                                                        
+def XIANGSHAN_KUNMINGHU : RISCVProcessorModel<"xiangshan-kunminghu",
+                                              XiangShanKunMingHuModel,
+                                              [Feature64Bit,
+                                               FeatureStdExtI,
+                                               FeatureStdExtZicsr,
+                                               FeatureStdExtZifencei,
+                                               FeatureStdExtM,
+                                               FeatureStdExtA,
+                                               FeatureStdExtF,
+                                               FeatureStdExtD,
+                                               FeatureStdExtC,
+                                               FeatureStdExtZba,
+                                               FeatureStdExtZbb,
+                                               FeatureStdExtZbc,
+                                               FeatureStdExtZbs,
+                                               FeatureStdExtZkn,
+                                               FeatureStdExtZksed,
+                                               FeatureStdExtZksh,
+                                               FeatureStdExtSvinval,
+                                               FeatureStdExtZicbom,
+                                               FeatureStdExtZicboz,
+                                               FeatureStdExtV,
+                                               FeatureStdExtZvl128b],
+                                               [TuneNoDefaultUnroll,
+                                                TuneZExtHFusion,
+                                                TuneZExtWFusion,
+                                                TuneShiftedZExtWFusion]>;
diff --git a/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td b/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td
new file mode 100644
index 00000000000000..e8460b8bfb05a3
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSchedXiangShanKunMingHu.td
@@ -0,0 +1,1489 @@
+//==- RISCVSchedXiangShanKunMingHu.td - XiangShanKunMingHu Scheduling Defs -*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// The XiangShan is a high-performance open-source RISC-V processor project 
+// initiated by the Institute of Computing Technology(ICT), Chinese Academy of Sciences(CAS). 
+// The KunMingHu architecture is its third-generation derivative, 
+// developed by the Institute of Computing Technology, Chinese Academy of Sciences  
+// and the Beijing Institute of Open Source Chip (BOSC), 
+// with a focus on achieving higher performance.
+// Source: https://github.com/OpenXiangShan/XiangShan
+// Documentation: https://github.com/OpenXiangShan/XiangShan-doc
+
+//===----------------------------------------------------------------------===//
+// KunMingHu core supports "RV64IMAFDCV_zba_zbb_zbc_zbs_zbkb_zbkc_zbkx_zknd_zkne_zknh
+// _zksed_zksh_svinval_zicbom_zicboz_zicsr_zifencei"
+// then floating-point SEW can only be 64 and 32, not 16 and 8.
+class NoZvfhSchedSEWSet_rm8and16<string mx, bit isF = 0, bit isWidening = 0> {
+  defvar t = SchedSEWSet<mx, isF, isWidening>.val; 
+  defvar remove8and16 = !if(isF, !listremove(t, [8, 16]), t);
+  list<int> val = remove8and16;
+}
+
+class NoZvfhSmallestSEW<string mx, bit isF = 0, bit isWidening = 0> {
+  int r = !head(NoZvfhSchedSEWSet_rm8and16<mx, isF, isWidening>.val);
+}
+
+multiclass NoZvfh_LMULSEWReadAdvanceImpl<string name, int val, list<SchedWrite> writes = [],
+                                  list<string> MxList, bit isF = 0,
+                                  bit isWidening = 0> {
+  if !exists<SchedRead>(name # "_WorstCase") then
+    def : ReadAdvance<!cast<SchedRead>(name # "_WorstCase"), val, writes>;
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm8and16<mx, isF, isWidening>.val in
+      if !exists<SchedRead>(name # "_" # mx # "_E" # sew) then
+        def : ReadAdvance<!cast<SchedRead>(name # "_" # mx # "_E" # sew), val, writes>;
+  }
+}
+
+multiclass LMULSEWReadAdvanceFnoZvfh<string name, int val, list<SchedWrite> writes = []>
+  : NoZvfh_LMULSEWReadAdvanceImpl<name, val, writes, SchedMxListF, isF=1,
+                           isWidening=0>;
+
+multiclass LMULSEWReadAdvanceFWnoZvfh<string name, int val, list<SchedWrite> writes = []>
+    : NoZvfh_LMULSEWReadAdvanceImpl<name, val, writes, SchedMxListFW, isF = 1,
+                             isWidening=1>;
+
+//===----------------------------------------------------------------------===//
+// If Zvfhmin and Zvfh are not supported, floating-point SEW can only be 32 or 64.
+class NoZvfhSchedSEWSet_rm32and64<string mx, bit isF = 0, bit isWidening = 0> {
+  defvar t = SchedSEWSet<mx, isF, isWidening>.val;
+  defvar remove32and64 = !if(isF, !listremove(t, [32, 64]), t);
+  list<int> val = remove32and64;
+}
+
+// Write-Impl
+multiclass NoZvfhLMULSEWWriteResImpl<string name, list<ProcResourceKind> resources,
+                               list<string> MxList, bit isF = 0,
+                               bit isWidening = 0> {
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm32and64<mx, isF, isWidening>.val in
+      if !exists<SchedWrite>(name # "_" # mx # "_E" # sew) then
+        def : WriteRes<!cast<SchedWrite>(name # "_" # mx # "_E" # sew), resources>;
+  }
+}
+// Read-Impl
+multiclass NoZvfhLMULSEWReadAdvanceImpl<string name, int val, list<SchedWrite> writes = [],
+                                  list<string> MxList, bit isF = 0,
+                                  bit isWidening = 0> {
+  foreach mx = MxList in {
+    foreach sew = NoZvfhSchedSEWSet_rm32and64<mx, isF, isWidening>.val in
+      if !exists<SchedRead>(name # "_" # mx # "_E" # sew) then
+        def : ReadAdvance<!cast<SchedRead>(name # "_" # mx # "_E" # sew), val, writes>;
+  }
+}
+
+// Write
+multiclass NoZvfhLMULSEWWriteResF<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListF, isF=1>;
+
+multiclass NoZvfhLMULSEWWriteResFW<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListFW, isF=1, isWidening=1>;
+
+multiclass NoZvfhLMULSEWWriteResFWRed<string name, list<ProcResourceKind> resources>
+    : NoZvfhLMULSEWWriteResImpl<name, resources, SchedMxListFWRed, isF=1, isWidening=1>;
+
+// Read
+multiclass NoZvfhLMULSEWReadAdvanceF<string name, int val, list<SchedWrite> writes = []>
+  : NoZvfhLMULSEWReadAdvanceImpl<name, val, writes, SchedMxListF, isF=1>;
+multiclass
+    NoZvfhLMULSEWReadAdvanceFW<string name, int val, list<SchedWrite> writes = []>
+    : NoZvfhLMULSEWReadAdvanceImpl<name, val, writes, SchedMxListFW, isF=1,
+                             isWidening = 1>;
+
+multiclass UnsupportedSchedZvfh {
+let Unsupported = true in {
+// Write 
+// 13. Vector Floating-Point Instructions
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFALUV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFALUF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWALUV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWALUF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFDivV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFDivF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulAddV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMulAddF", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulAddV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWMulAddF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSqrtV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRecpV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMinMaxV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFMinMaxF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSgnjV", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFSgnjF", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFCvtIToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFWCvtFToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFNCvtIToFV", []>;
+defm "" : NoZvfhLMULSEWWriteResFW<"WriteVFNCvtFToFV", []>;
+
+// 14. Vector Reduction Operations
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedOV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResF<"WriteVFRedMinMaxV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResFWRed<"WriteVFWRedV_From", []>;
+defm "" : NoZvfhLMULSEWWriteResFWRed<"WriteVFWRedOV_From", []>;
+
+// Read
+// 13. Vector Floating-Point Instructions
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFALUV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFALUF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWALUV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWALUF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFDivV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFDivF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulAddV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMulAddF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulAddV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWMulAddF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSqrtV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFRecpV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMinMaxV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFMinMaxF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSgnjV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFSgnjF", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceF<"ReadVFCvtIToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFWCvtFToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFNCvtIToFV", 0>;
+defm "" : NoZvfhLMULSEWReadAdvanceFW<"ReadVFNCvtFToFV", 0>;
+
+} // Unsupported
+} // UnsupportedSchedZvfh
+
+//===----------------------------------------------------------------------===//
+
+class XSGetCyclesVIALU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );
+}
+
+class XSGetCyclesVIMAC<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 3,
+    !eq(mx, "M2") : 6,
+    !eq(mx, "M4") : 12,
+    !eq(mx, "M8") : 24,
+    !eq(mx, "MF2") : 3,
+    !eq(mx, "MF4") : 3,
+    !eq(mx, "MF8") : 3
+  );
+}
+
+class XSGetCyclesVIDIV<string mx, int sew> {
+  int uop = !cond(
+    !eq(mx, "M1") : 1,
+    !eq(mx, "M2") : 2,
+    !eq(mx, "M4") : 4,
+    !eq(mx, "M8") : 8,
+    !eq(mx, "MF2") : 1,
+    !eq(mx, "MF4") : 1,
+    !eq(mx, "MF8") : 1
+  );
+  int cycles = !cond(
+    !eq(sew, 64) : 19,   // I64: 4-19
+    !eq(sew, 32) : 11,   // I32: 4-11
+    !eq(sew, 16) : 7,    // I16: 4-7
+    !eq(sew, 8) : 6      // I8: 6
+  );
+  int c = !mul(uop, cycles);
+}
+
+class XSGetCyclesVIPU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVPPU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVFALU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 2,
+    !eq(mx, "M2") : 4,
+    !eq(mx, "M4") : 8,
+    !eq(mx, "M8") : 16,
+    !eq(mx, "MF2") : 2,
+    !eq(mx, "MF4") : 2,
+    !eq(mx, "MF8") : 2
+  );    
+}
+
+class XSGetCyclesVFMA<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 4,
+    !eq(mx, "M2") : 8,
+    !eq(mx, "M4") : 16,
+    !eq(mx, "M8") : 32,
+    !eq(mx, "MF2") : 4,
+    !eq(mx, "MF4") : 4,
+    !eq(mx, "MF8") : 4
+  );    
+}
+
+class XSGetCyclesVFDIV<string mx, int sew> {
+  assert !or(!eq(sew, 32), !eq(sew, 64)), "Floating-point SEW of KunMingHu can only be 32 or 64.";
+  int uop = !cond(
+    !eq(mx, "M1") : 1,
+    !eq(mx, "M2") : 2,
+    !eq(mx, "M4") : 4,
+    !eq(mx, "M8") : 8,
+    !eq(mx, "MF2") : 1,
+    !eq(mx, "MF4") : 1,
+    !eq(mx, "MF8") : 1
+  );
+  int cycles = !cond(
+    !eq(sew, 64) : 15,   // FP64: 15
+    !eq(sew, 32) : 10,   // FP32: 10
+  );
+  int c = !mul(uop, cycles);
+}
+
+class XSGetCyclesVFCVT<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 3,
+    !eq(mx, "M2") : 6,
+    !eq(mx, "M4") : 12,
+    !eq(mx, "M8") : 24,
+    !eq(mx, "MF2") : 3,
+    !eq(mx, "MF4") : 3,
+    !eq(mx, "MF8") : 3
+  );    
+}
+
+class XSGetCyclesVLDU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 8,
+    !eq(mx, "M2") : 16,
+    !eq(mx, "M4") : 32,
+    !eq(mx, "M8") : 64,
+    !eq(mx, "MF2") : 8,
+    !eq(mx, "MF4") : 8,
+    !eq(mx, "MF8") : 8
+  );
+}
+
+class XSGetCyclesVSTU<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 7,
+    !eq(mx, "M2") : 14,
+    !eq(mx, "M4") : 28,
+    !eq(mx, "M8") : 56,
+    !eq(mx, "MF2") : 7,
+    !eq(mx, "MF4") : 7,
+    !eq(mx, "MF8") : 7
+  );
+}
+
+// If mx is the maximum LMUL in the MxList, then c is true, indicating the worst case.
+class XSIsWorstCaseMX<string mx, list<string> MxList> {
+  defvar LLMUL = LargestLMUL<MxList>.r;
+  bit c = !eq(mx, LLMUL);
+}
+
+// If mx is the maximum LMUL in the MxList, and sew is the minimum value 
+// when LMUL=mx, then c is true, indicating the worst case.
+class XSIsWorstCaseMXSEW<string mx, int sew, list<string> MxList,
+                               bit isF = 0> {
+  defvar LLMUL = LargestLMUL<MxList>.r;
+  defvar SSEW = NoZvfhSmallestSEW<mx, isF>.r;
+  bit c = !and(!eq(mx, LLMUL), !eq(sew, SSEW));
+}
+
+class XSLDUtoAnyBypass<SchedRead read, int cycles = 2>
+    : ReadAdvance<re...
[truncated]

FeatureStdExtZicboz,
FeatureStdExtV,
FeatureStdExtZvl128b],
[TuneNoDefaultUnroll,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let LoopMicroOpBufferSize = 48; // Instruction queue size
let LoadLatency = 6;
let MispredictPenalty = 13; // Based on estimate of pipeline depth.
let PostRAScheduler = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share some performance data about this? IIRC it hurts the performance on Xiangshan-Nanhu.

let UnsupportedFeatures = [HasStdExtZcmt, HasStdExtZkr];
}

let SchedModel = XiangShanKunMingHuModel in {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me where is the documentation of Xiangshan-Kunminghu? The documentation is out-of-sync with your schedule model.

defm : UnsupportedSchedXsfvcp;
defm : UnsupportedSchedZvfh;

// Move Elimination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a MCA test for this behavior. IIRC llvm doesn't support RISCV move idioms without changes.

See 59f6e22.

def : ReadAdvance<ReadFMul32, 0>;
def : ReadAdvance<ReadFMul64, 0>;
def : ReadAdvance<ReadFMA32, 0>;
def : ReadAdvance<ReadFMA32Addend, 0>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cascade FMA deprecated on Kunminghu?

@camel-cdr
Copy link

The execution unit layout does not match the current XiangShan master. Has the final execution unit layout been decided on?

earlier layout, which this PR seems to be based on:

VFEX0: VfaluCfg, VfmaCfg, VialuCfg, VimacCfg
VFEX1: VipuCfg, VppuCfg, VfcvtCfg, F2vCfg, F2fCfg, F2iCfg, VSetRvfWvfCfg
VFEX2: VfaluCfg, VfmaCfg, VialuCfg
VFEX3: VfdivCfg, VidivCfg

current master layout:

VFEX0: VfmaCfg, VialuCfg, VimacCfg, VppuCfg
VFEX1: VfaluCfg, VfcvtCfg, VipuCfg, VSetRvfWvfCfg
VFEX2: VfmaCfg, VialuCfg, F2vCfg
VFEX3: VfaluCfg, VfcvtCfg
VFEX4: VfdivCfg, VidivCfg
VFEX5: VfdivCfg, VidivCfg

There even is a open branch that seperates the vpu and fpu pipelines: https://github.com/OpenXiangShan/XiangShan/blob/fp-split/src/main/scala/xiangshan/Parameters.scala#L365

I'd love to have proper scheduling support for kunminghu, but it currectly doesn't look like a stable target.

BTW: Having two div, but only a single vppu seems like a bit of an odd choice. XuanTie C920 has two permutation execution units, and so do the more comparable ARM Neoverse N2 and AMD Zen1.

@wangpc-pp
Copy link
Contributor

I'd like to see the support of KunMingHu, but please hold this PR and wait for the finalization of KunMingHu's architecture.

// 15. Vector Mask Instructions
// VIALU
foreach mx = SchedMxList in {
defvar Cycles = XSGetCyclesVIALU<mx>.c;
Copy link

@camel-cdr camel-cdr Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is correct? Since masks always fit into a single LMUL=1 vector register, you'd expect that an LMUL=8 SEW=8 vmand.mm would have the same latency as a LMUL=1 vand.vv. Or does xiangshan use a different internal format for mask registers? See how the SiFivdP600 schedular sets the latency of all mask instructions to 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants