Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark / resource usage information #9

Open
mithro opened this issue Jan 14, 2020 · 8 comments
Open

Add benchmark / resource usage information #9

mithro opened this issue Jan 14, 2020 · 8 comments

Comments

@mithro
Copy link
Contributor

mithro commented Jan 14, 2020

Would you mind adding some benchmark / resource usage information for the core?

The VexRISCV README has the following;

VexRiscv smallest (RV32I, 0.52 DMIPS/Mhz, no datapath bypass, no interrupt) ->
    Artix 7     -> 233 Mhz 494 LUT 505 FF
    Cyclone V   -> 193 Mhz 347 ALMs
    Cyclone IV  -> 179 Mhz 730 LUT 494 FF
    iCE40       -> 92 Mhz 1130 LC

....

VexRiscv full max dmips/mhz -> (RV32IM, 1.44 DMIPS/Mhz 2.70 Coremark/Mhz,, 16KB-I$,16KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch prediction in the fetch stage, branch and shift operations done in the Execute stage) ->
    Artix 7     -> 140 Mhz 1767 LUT 1128 FF
    Cyclone V   -> 90 Mhz 1,089 ALMs
    Cyclone IV  -> 79 Mhz 2,336 LUT 1,048 FF

VexRiscv full with MMU (RV32IM, 1.24 DMIPS/Mhz 2.35 Coremark/Mhz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
    Artix 7     -> 161 Mhz 1985 LUT 1585 FF
    Cyclone V   -> 124 Mhz 1,319 ALMs
    Cyclone IV  -> 122 Mhz 2,710 LUT 1,501 FF

VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/Mhz 2.27 Coremark/Mhz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, catch exceptions, static branch, MMU, Supervisor, Compatible with mainstream linux) ->
    Artix 7     -> 170 Mhz 2530 LUT 2013 FF
    Cyclone V   -> 125 Mhz 1,618 ALMs
    Cyclone IV  -> 116 Mhz 3,314 LUT 2,016 FF

My general guidance around the performance of soft-cores is listed in the following table;
Screenshot from 2020-01-14 14-50-02

@shioyadan
Copy link
Member

In the following paper presented at FPT last year, we explain resource usage and drystone values regarding RSD and some cores.

http://sv.rsg.ci.i.u-tokyo.ac.jp/pdfs/Mashimo-FPT'19.pdf

Also, we plan to integrate some benchmarks into our repository, and will add such information to the documentation as well.

@mithro
Copy link
Contributor Author

mithro commented Jan 14, 2020

It looks like you only compared your CPU to BOOM and OPA? Any reason you didn't compare to in-order cores too?

Looks like you get O(2.04 DMIPS/MHz) @ O(90 MHz) with using O(15k LUTs) and O(8k FF)? I'm currently assuming you are using an Artix-7 board?

This compares to VexRISCV (which is an in order core) which gets VexRiscv linux balanced - O(1.2 DMIPS/Mhz) @ O(170 Mhz) using O(2.5k LUTs) and O(2k FF).

I think it would be really interesting to add RSD to LiteX which already supports VexRISCV (see #6) to give a more fair comparison in real world benchmarks.

@shioyadan
Copy link
Member

It looks like you only compared your CPU to BOOM and OPA? Any reason you didn't compare to in-order cores too?

The main reason is that this paper focuses on an efficient implementation of an OoO processor on FPGAs.

(To be honest, another reason was the page limit for the conference paper, and there wasn't enough time to perform thorough evaluation including evaluation compared with InO cores until the deadline ...).

Anyway, I also would like to evaluate RSD compared with other cores using more complex and real world benchmarks.

Looks like you get O(2.04 DMIPS/MHz) @ O(90 MHz) with using O(15k LUTs) and O(8k FF)?

Yes.

I'm currently assuming you are using an Artix-7 board?

We used ZedBoard with XC7Z020, whose FPGA part seems to correspond to Artix-7 with 53K LUTs. I'm not an FPGA expert, but one of our team members is an FPGA expert and he may provide additional information about the board.

@msmssm
Copy link
Member

msmssm commented Jan 17, 2020

RSD currently supports only Zynq-based platforms because it depends on an ARM processor on Zynq to load a program binary for RSD into an external memory.
I believe that both Zynq-7000 and Artix-7 are based on the Xilinx 7-series FPGA architecture and the resource comparison is fair even though one uses Zynq-7000 and the other uses Artix-7.

@Dolu1990
Copy link

I believe that both Zynq-7000 and Artix-7 are based on the Xilinx 7-series FPGA

@msmssm I was quite suprised but it seems that Zynq-7000 can be quite faster than Artix-7 at the same speed grade.

Looking at their datasheet :
https://www.xilinx.com/support/documentation/data_sheets/ds181_Artix_7_Data_Sheet.pdf
https://www.xilinx.com/support/documentation/data_sheets/ds191-XC7Z030-XC7Z045-data-sheet.pdf

For instance the CLB Distributed RAM Switching Characteristics TSHCKO
For that timing, the slower Zynq (speed grade -1) is faster than the fastest Artix 7 (speed grade -3)
The same seems true for LUT

For bram it seems it is a bit less diverging zynq -2 ~~ Artix -3

Would need to test on a whole design to see what the average is ^^

@Dolu1990
Copy link

Dolu1990 commented May 3, 2022

I just found out that inside the same zynq family, there is 2 class of devices with radicaly different timings :

https://docs.xilinx.com/v/u/en-US/ds187-XC7Z010-XC7Z020-Data-Sheet
vs
https://docs.xilinx.com/v/u/en-US/ds191-XC7Z030-XC7Z045-data-sheet

And so the XC7Z020 devices used in the paper seems totaly equivalent to Artix 7 ones :)
my bad !

@shioyadan
Copy link
Member

msmssm is unable to respond to this issue for the reasons stated in the email.
Thank you again for your information!

@Dolu1990
Copy link

@shioyadan Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants