-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding the result of dhrystone with TCM #383
Comments
Hi, Did you tired with the vanilla GenFullNoMmuMaxPerf config ? |
Hi,
After downloading bitstream to FPGA and run the program in release mode.
The bench result is 1.33DMIPS/Mhz. |
Hi, I looked at the code, and i think i found the reason why :
Basicaly, the data cache has the advantage that the write are delayed until writeback stage, while the thigly coupled dbus has the penality that write are scheduled early (execute stage) and should ensure that there is no risk of them being unscheduled by a branch or an exception or anything else. So thigly coupled dbus will sometime have to wait for the pipeline to empty itself (when doing store) |
Hi, Thanks for the reply. |
Hi, @Dolu1990 May I ask one more question? First, change the configuration for DivPlugin, //new DivPlugin,
new MulDivIterativePlugin(genMul = false, genDiv = true, mulUnrollFactor = 1, divUnrollFactor = 2, dhrystoneOpt=true), The bench will be improved like following Second, when I set genMul = true and mulUnrollFactor=2 to replace MulPlugin, //new MulPlugin,
//new DivPlugin,
new MulDivIterativePlugin(genMul = true, genDiv = true, mulUnrollFactor = 2, divUnrollFactor = 2, dhrystoneOpt=true), The bench test is decrease to 1.33MIPS. Thanks |
I would say, not realy usefull, as it only work for very small division numbers
yes, at least in practice for FPGA |
Hi Dolu1990,
I generate ITCM and DTCM which each with 16KB size.
It allows all program and data can be loaded into TCM.
I wish all test can be done by TCM.
I only modify one line code in dhrystone project because CLK_TCK seems like obsolete.
Using GenFullNoMmuMaxPerf.scala as template to config Vexriscv in this test.
Ryan.zip
After programing the bitstream into FPGA (Artix 7) and run the dhrystone project.
The information was shown in the terminal.
The testing result seems like not reach the description in Vexriscv github.
Because all program is in TCM, the result that I expected should be reach the 1.38DMIPS as least.
Could you give me any suggestion to improve the bench test?
Thanks
The text was updated successfully, but these errors were encountered: