Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/Target/RISCV/RISCV.td
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ include "RISCVSchedSiFive7.td"
include "RISCVSchedSiFiveP400.td"
include "RISCVSchedSiFiveP500.td"
include "RISCVSchedSiFiveP600.td"
include "RISCVSchedSpacemitX60.td"
include "RISCVSchedSyntacoreSCR1.td"
include "RISCVSchedSyntacoreSCR345.td"
include "RISCVSchedSyntacoreSCR7.td"
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/RISCV/RISCVProcessors.td
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ def XIANGSHAN_KUNMINGHU : RISCVProcessorModel<"xiangshan-kunminghu",
TuneShiftedZExtWFusion]>;

def SPACEMIT_X60 : RISCVProcessorModel<"spacemit-x60",
NoSchedModel,
SpacemitX60Model,
!listconcat(RVA22S64Features,
[FeatureStdExtV,
FeatureStdExtSscofpmf,
Expand Down
353 changes: 353 additions & 0 deletions llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
Original file line number Diff line number Diff line change
@@ -0,0 +1,353 @@
//=- RISCVSchedSpacemitX60.td - Spacemit X60 Scheduling Defs -*- tablegen -*-=//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//
//
// Scheduler model for the SpacemiT-X60 processor based on documentation of the
// C908 and experiments on real hardware (bpi-f3).
//
//===----------------------------------------------------------------------===//

def SpacemitX60Model : SchedMachineModel {
let IssueWidth = 2; // dual-issue
let MicroOpBufferSize = 0; // in-order
let LoadLatency = 5; // worse case: >= 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load latency is 3 or 4 in the case of cachehit, but since load=5 actually performs the best in tests, we can keep this until another configuration beats it in test performance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.

let MispredictPenalty = 9; // nine-stage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of L1 cache hit, the penalty is about 3-6 cycle
However, we didn't test the performance impact of tuning this parameter. If a different value is better for the test results, then just use it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.


let CompleteModel = 0;

let UnsupportedFeatures = [HasStdExtZknd, HasStdExtZkne, HasStdExtZknh,
HasStdExtZksed, HasStdExtZksh, HasStdExtZkr];
}

let SchedModel = SpacemitX60Model in {

//===----------------------------------------------------------------------===//
// Define processor resources for Spacemit-X60

// Information gathered from the C908 user manual:
let BufferSize = 0 in {
// The LSU supports dual issue for scalar store/load instructions
def SMX60_LS : ProcResource<2>;

// An IEU can decode and issue two instructions at the same time
def SMX60_IEUA : ProcResource<1>;
def SMX60_IEUB : ProcResource<1>;
def SMX60_IEU : ProcResGroup<[SMX60_IEUA, SMX60_IEUB]>;

// Although the X60 does appear to support multiple issue for at least some
// floating point instructions, this model assumes single issue as
// increasing it reduces the gains we saw in performance
def SMX60_FP : ProcResource<1>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here including the bit from your review description about why dual issue isn't used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some floating-point instructions can double issue, such as those using FALU and FMAU, but not, for example, FCVT.
Mikhail mentioned that a value of 1 would give better performance, so we can start with 1. We will continue to improve this model in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with 1 here, and then see if we can split this in a follow up patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up (as in, not in this patch), it would be good to explore this further.

}

//===----------------------------------------------------------------------===//

// Branching
def : WriteRes<WriteJmp, [SMX60_IEUA]>;
def : WriteRes<WriteJal, [SMX60_IEUA]>;
def : WriteRes<WriteJalr, [SMX60_IEUA]>;

// Integer arithmetic and logic
// Latency of ALU instructions is 1, but add.uw is 2
def : WriteRes<WriteIALU32, [SMX60_IEU]>;
def : WriteRes<WriteIALU, [SMX60_IEU]>;
def : WriteRes<WriteShiftImm32, [SMX60_IEU]>;
def : WriteRes<WriteShiftImm, [SMX60_IEU]>;
def : WriteRes<WriteShiftReg32, [SMX60_IEU]>;
def : WriteRes<WriteShiftReg, [SMX60_IEU]>;

// Integer multiplication
def : WriteRes<WriteIMul32, [SMX60_IEU]> { let Latency = 3; }

// The latency of mul is 5, while in mulh, mulhsu, mulhu is 6
// Worst case latency is used
def : WriteRes<WriteIMul, [SMX60_IEU]> { let Latency = 6; }

// Integer division/remainder
// TODO: Latency set based on C908 datasheet and hasn't been
// confirmed experimentally.
let Latency = 12, ReleaseAtCycles = [12] in {
def : WriteRes<WriteIDiv32, [SMX60_IEUA]>;
def : WriteRes<WriteIRem32, [SMX60_IEUA]>;
}
let Latency = 20, ReleaseAtCycles = [20] in {
def : WriteRes<WriteIDiv, [SMX60_IEUA]>;
def : WriteRes<WriteIRem, [SMX60_IEUA]>;
}

// Bitmanip
def : WriteRes<WriteRotateImm, [SMX60_IEU]>;
def : WriteRes<WriteRotateImm32, [SMX60_IEU]>;
def : WriteRes<WriteRotateReg, [SMX60_IEU]>;
def : WriteRes<WriteRotateReg32, [SMX60_IEU]>;

def : WriteRes<WriteCLZ, [SMX60_IEU]>;
def : WriteRes<WriteCLZ32, [SMX60_IEU]>;
def : WriteRes<WriteCTZ, [SMX60_IEU]>;
def : WriteRes<WriteCTZ32, [SMX60_IEU]>;

let Latency = 2 in {
def : WriteRes<WriteCPOP, [SMX60_IEU]>;
def : WriteRes<WriteCPOP32, [SMX60_IEU]>;
}

def : WriteRes<WriteORCB, [SMX60_IEU]>;
def : WriteRes<WriteIMinMax, [SMX60_IEU]>;
def : WriteRes<WriteREV8, [SMX60_IEU]>;

let Latency = 2 in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up (as in, not in this patch), it would be interesting to explore if this is actually two cycle latency, or if this is micro-coded as two uops, each with latency one. You could maybe see this in perf counters.

def : WriteRes<WriteSHXADD, [SMX60_IEU]>;
def : WriteRes<WriteSHXADD32, [SMX60_IEU]>;
def : WriteRes<WriteCLMUL, [SMX60_IEU]>;
}

// Single-bit instructions
def : WriteRes<WriteSingleBit, [SMX60_IEU]>;
def : WriteRes<WriteSingleBitImm, [SMX60_IEU]>;
def : WriteRes<WriteBEXT, [SMX60_IEU]>;
def : WriteRes<WriteBEXTI, [SMX60_IEU]>;

// Memory/Atomic memory
let Latency = 3 in {
def : WriteRes<WriteSTB, [SMX60_LS]>;
def : WriteRes<WriteSTH, [SMX60_LS]>;
def : WriteRes<WriteSTW, [SMX60_LS]>;
def : WriteRes<WriteSTD, [SMX60_LS]>;
def : WriteRes<WriteFST16, [SMX60_LS]>;
def : WriteRes<WriteFST32, [SMX60_LS]>;
def : WriteRes<WriteFST64, [SMX60_LS]>;
def : WriteRes<WriteAtomicSTW, [SMX60_LS]>;
def : WriteRes<WriteAtomicSTD, [SMX60_LS]>;
}

let Latency = 5 in {
def : WriteRes<WriteLDB, [SMX60_LS]>;
def : WriteRes<WriteLDH, [SMX60_LS]>;
def : WriteRes<WriteLDW, [SMX60_LS]>;
def : WriteRes<WriteLDD, [SMX60_LS]>;
def : WriteRes<WriteFLD16, [SMX60_LS]>;
def : WriteRes<WriteFLD32, [SMX60_LS]>;
def : WriteRes<WriteFLD64, [SMX60_LS]>;
}

// Atomics
let Latency = 5 in {
def : WriteRes<WriteAtomicLDW, [SMX60_LS]>;
def : WriteRes<WriteAtomicLDD, [SMX60_LS]>;
def : WriteRes<WriteAtomicW, [SMX60_LS]>;
def : WriteRes<WriteAtomicD, [SMX60_LS]>;
}

// Floating point units Half precision
let Latency = 4 in {
def : WriteRes<WriteFAdd16, [SMX60_FP]>;
def : WriteRes<WriteFMul16, [SMX60_FP]>;
def : WriteRes<WriteFSGNJ16, [SMX60_FP]>;
def : WriteRes<WriteFMinMax16, [SMX60_FP]>;
}
def : WriteRes<WriteFMA16, [SMX60_FP]> { let Latency = 5; }

let Latency = 12, ReleaseAtCycles = [12] in {
def : WriteRes<WriteFDiv16, [SMX60_FP]>;
def : WriteRes<WriteFSqrt16, [SMX60_FP]>;
}

// Single precision
let Latency = 4 in {
def : WriteRes<WriteFAdd32, [SMX60_FP]>;
def : WriteRes<WriteFMul32, [SMX60_FP]>;
def : WriteRes<WriteFSGNJ32, [SMX60_FP]>;
def : WriteRes<WriteFMinMax32, [SMX60_FP]>;
}
def : WriteRes<WriteFMA32, [SMX60_FP]> { let Latency = 5; }

let Latency = 15, ReleaseAtCycles = [15] in {
def : WriteRes<WriteFDiv32, [SMX60_FP]>;
def : WriteRes<WriteFSqrt32, [SMX60_FP]>;
}

// Double precision
let Latency = 5 in {
def : WriteRes<WriteFAdd64, [SMX60_FP]>;
def : WriteRes<WriteFMul64, [SMX60_FP]>;
def : WriteRes<WriteFSGNJ64, [SMX60_FP]>;
}
def : WriteRes<WriteFMinMax64, [SMX60_FP]> { let Latency = 4; }
def : WriteRes<WriteFMA64, [SMX60_FP]> { let Latency = 6; }

let Latency = 22, ReleaseAtCycles = [22] in {
def : WriteRes<WriteFDiv64, [SMX60_FP]>;
def : WriteRes<WriteFSqrt64, [SMX60_FP]>;
}

// Conversions
let Latency = 6 in {
def : WriteRes<WriteFCvtF16ToI32, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF32ToI32, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF32ToI64, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF64ToI64, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF64ToI32, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF16ToI64, [SMX60_IEU]>;
}

let Latency = 4 in {
def : WriteRes<WriteFCvtI32ToF16, [SMX60_IEU]>;
def : WriteRes<WriteFCvtI32ToF32, [SMX60_IEU]>;
def : WriteRes<WriteFCvtI32ToF64, [SMX60_IEU]>;
def : WriteRes<WriteFCvtI64ToF16, [SMX60_IEU]>;
def : WriteRes<WriteFCvtI64ToF32, [SMX60_IEU]>;
def : WriteRes<WriteFCvtI64ToF64, [SMX60_IEU]>;
def : WriteRes<WriteFCvtF16ToF32, [SMX60_FP]>;
def : WriteRes<WriteFCvtF16ToF64, [SMX60_FP]>;
def : WriteRes<WriteFCvtF32ToF16, [SMX60_FP]>;
def : WriteRes<WriteFCvtF32ToF64, [SMX60_FP]>;
def : WriteRes<WriteFCvtF64ToF16, [SMX60_FP]>;
def : WriteRes<WriteFCvtF64ToF32, [SMX60_FP]>;
}

let Latency = 6 in {
def : WriteRes<WriteFClass16, [SMX60_FP]>;
def : WriteRes<WriteFClass32, [SMX60_FP]>;
def : WriteRes<WriteFClass64, [SMX60_FP]>;

def : WriteRes<WriteFCmp16, [SMX60_FP]>;
def : WriteRes<WriteFCmp32, [SMX60_FP]>;
def : WriteRes<WriteFCmp64, [SMX60_FP]>;

def : WriteRes<WriteFMovF32ToI32, [SMX60_IEU]>;
def : WriteRes<WriteFMovF16ToI16, [SMX60_IEU]>;
}

let Latency = 4 in {
def : WriteRes<WriteFMovI16ToF16, [SMX60_IEU]>;
def : WriteRes<WriteFMovF64ToI64, [SMX60_IEU]>;
def : WriteRes<WriteFMovI64ToF64, [SMX60_IEU]>;
def : WriteRes<WriteFMovI32ToF32, [SMX60_IEU]>;
}

// Others
def : WriteRes<WriteCSR, [SMX60_IEU]>;
def : WriteRes<WriteNop, [SMX60_IEU]>;

//===----------------------------------------------------------------------===//
// Bypass and advance
def : ReadAdvance<ReadJmp, 0>;
def : ReadAdvance<ReadJalr, 0>;
def : ReadAdvance<ReadCSR, 0>;
def : ReadAdvance<ReadStoreData, 0>;
def : ReadAdvance<ReadMemBase, 0>;
def : ReadAdvance<ReadIALU, 0>;
def : ReadAdvance<ReadIALU32, 0>;
def : ReadAdvance<ReadShiftImm, 0>;
def : ReadAdvance<ReadShiftImm32, 0>;
def : ReadAdvance<ReadShiftReg, 0>;
def : ReadAdvance<ReadShiftReg32, 0>;
def : ReadAdvance<ReadIDiv, 0>;
def : ReadAdvance<ReadIDiv32, 0>;
def : ReadAdvance<ReadIRem, 0>;
def : ReadAdvance<ReadIRem32, 0>;
def : ReadAdvance<ReadIMul, 0>;
def : ReadAdvance<ReadIMul32, 0>;
def : ReadAdvance<ReadAtomicWA, 0>;
def : ReadAdvance<ReadAtomicWD, 0>;
def : ReadAdvance<ReadAtomicDA, 0>;
def : ReadAdvance<ReadAtomicDD, 0>;
def : ReadAdvance<ReadAtomicLDW, 0>;
def : ReadAdvance<ReadAtomicLDD, 0>;
def : ReadAdvance<ReadAtomicSTW, 0>;
def : ReadAdvance<ReadAtomicSTD, 0>;
def : ReadAdvance<ReadFStoreData, 0>;
def : ReadAdvance<ReadFMemBase, 0>;
def : ReadAdvance<ReadFAdd16, 0>;
def : ReadAdvance<ReadFAdd32, 0>;
def : ReadAdvance<ReadFAdd64, 0>;
def : ReadAdvance<ReadFMul16, 0>;
def : ReadAdvance<ReadFMA16, 0>;
def : ReadAdvance<ReadFMA16Addend, 0>;
def : ReadAdvance<ReadFMul32, 0>;
def : ReadAdvance<ReadFMul64, 0>;
def : ReadAdvance<ReadFMA32, 0>;
def : ReadAdvance<ReadFMA32Addend, 0>;
def : ReadAdvance<ReadFMA64, 0>;
def : ReadAdvance<ReadFMA64Addend, 0>;
def : ReadAdvance<ReadFDiv16, 0>;
def : ReadAdvance<ReadFDiv32, 0>;
def : ReadAdvance<ReadFDiv64, 0>;
def : ReadAdvance<ReadFSqrt16, 0>;
def : ReadAdvance<ReadFSqrt32, 0>;
def : ReadAdvance<ReadFSqrt64, 0>;
def : ReadAdvance<ReadFCmp16, 0>;
def : ReadAdvance<ReadFCmp32, 0>;
def : ReadAdvance<ReadFCmp64, 0>;
def : ReadAdvance<ReadFSGNJ16, 0>;
def : ReadAdvance<ReadFSGNJ32, 0>;
def : ReadAdvance<ReadFSGNJ64, 0>;
def : ReadAdvance<ReadFMinMax16, 0>;
def : ReadAdvance<ReadFMinMax32, 0>;
def : ReadAdvance<ReadFMinMax64, 0>;
def : ReadAdvance<ReadFCvtF16ToI32, 0>;
def : ReadAdvance<ReadFCvtF16ToI64, 0>;
def : ReadAdvance<ReadFCvtF32ToI32, 0>;
def : ReadAdvance<ReadFCvtF32ToI64, 0>;
def : ReadAdvance<ReadFCvtF64ToI32, 0>;
def : ReadAdvance<ReadFCvtF64ToI64, 0>;
def : ReadAdvance<ReadFCvtI32ToF16, 0>;
def : ReadAdvance<ReadFCvtI32ToF32, 0>;
def : ReadAdvance<ReadFCvtI32ToF64, 0>;
def : ReadAdvance<ReadFCvtI64ToF16, 0>;
def : ReadAdvance<ReadFCvtI64ToF32, 0>;
def : ReadAdvance<ReadFCvtI64ToF64, 0>;
def : ReadAdvance<ReadFCvtF32ToF64, 0>;
def : ReadAdvance<ReadFCvtF64ToF32, 0>;
def : ReadAdvance<ReadFCvtF16ToF32, 0>;
def : ReadAdvance<ReadFCvtF32ToF16, 0>;
def : ReadAdvance<ReadFCvtF16ToF64, 0>;
def : ReadAdvance<ReadFCvtF64ToF16, 0>;
def : ReadAdvance<ReadFMovF16ToI16, 0>;
def : ReadAdvance<ReadFMovI16ToF16, 0>;
def : ReadAdvance<ReadFMovF32ToI32, 0>;
def : ReadAdvance<ReadFMovI32ToF32, 0>;
def : ReadAdvance<ReadFMovF64ToI64, 0>;
def : ReadAdvance<ReadFMovI64ToF64, 0>;
def : ReadAdvance<ReadFClass16, 0>;
def : ReadAdvance<ReadFClass32, 0>;
def : ReadAdvance<ReadFClass64, 0>;

// Bitmanip
def : ReadAdvance<ReadRotateImm, 0>;
def : ReadAdvance<ReadRotateImm32, 0>;
def : ReadAdvance<ReadRotateReg, 0>;
def : ReadAdvance<ReadRotateReg32, 0>;
def : ReadAdvance<ReadCLZ, 0>;
def : ReadAdvance<ReadCLZ32, 0>;
def : ReadAdvance<ReadCTZ, 0>;
def : ReadAdvance<ReadCTZ32, 0>;
def : ReadAdvance<ReadCPOP, 0>;
def : ReadAdvance<ReadCPOP32, 0>;
def : ReadAdvance<ReadORCB, 0>;
def : ReadAdvance<ReadIMinMax, 0>;
def : ReadAdvance<ReadREV8, 0>;
def : ReadAdvance<ReadSHXADD, 0>;
def : ReadAdvance<ReadSHXADD32, 0>;
def : ReadAdvance<ReadCLMUL, 0>;
// Single-bit instructions
def : ReadAdvance<ReadSingleBit, 0>;
def : ReadAdvance<ReadSingleBitImm, 0>;

//===----------------------------------------------------------------------===//
// Unsupported extensions
defm : UnsupportedSchedV;
defm : UnsupportedSchedXsfvcp;
defm : UnsupportedSchedZabha;
defm : UnsupportedSchedZbkb;
defm : UnsupportedSchedZbkx;
defm : UnsupportedSchedZfa;
defm : UnsupportedSchedZvk;
defm : UnsupportedSchedSFB;
}
Loading
Loading