Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache and BP Disable Possible Way #85

Closed
YuanPol opened this issue Mar 20, 2024 · 6 comments
Closed

Cache and BP Disable Possible Way #85

YuanPol opened this issue Mar 20, 2024 · 6 comments

Comments

@YuanPol
Copy link

YuanPol commented Mar 20, 2024

Hello, I want to ask if there is a easy way to disable the I/D cache and BP? It seems some parameters of cache and bp can be adjusted in the configuration file but they cannot be totally disabled. Thank you so much :-)

@shioyadan
Copy link
Member

Hello,

Basically, it is difficult to disable the I-cache, D-cache, or branch predictor in a simple way, and manual modification is required.

Simply disabling the I-cache and D-cache will greatly reduce the speed of RSD, since each access will reach the main memory. It is possible to replace the cache with a simple, one-cycle-accessible memory with some manual modifications.

Disabling branch prediction is more complicated. It is not obvious what is meant by disabling branch prediction. It is relatively easy to rewrite the predictor so that it always predicts untaken. It is difficult to modify the fetcher to halt instruction fetching at every branch instruction until the branch resolution, and that causes significant performance degradation.

If you can tell me what your goal is, I may be able to suggest a better alternative.

@YuanPol
Copy link
Author

YuanPol commented Mar 26, 2024

Hello,

Thanks for your reply. I am developing a performance simulator based on the timing database. The accurate cycle latency information obtained by the verilator will be used to develop this database. The estimated cycle consumption from my simulator will finally compare to the result from the verilator. Unfortunately, because of the limited time, I have no time to add the cache processing mechanism to my simulator. Thus the final performance result is not comparable with the verilator result.

Adding a memory and reconnecting some ports will be a possible way. I am writing to ask if there are some configurations that can disable the cache simply like writing some control registers. If yes, then less effort needed for me.

Thank you so much! :-)

@shioyadan
Copy link
Member

In your case, a possible solution is to significantly increase the cache size. With such a setup, cache misses will not occur except for the first time for each line. Especially, for benchmarks like CoreMark and Dhrystone that repeatedly run the same loop, they will almost always hit the cache from the second run onwards. By comparing the performance of running the loop twice with that of running it once, you should achieve results similar to a scenario where every access is a cache hit.

@YuanPol
Copy link
Author

YuanPol commented Mar 26, 2024

Thank you for your suggestion, but it doesn't work in my case because my timing database granularity is only a instruction blocks with hundreds of instructions . It doesn't depends on the real context of the full program execution. If you have any other ideas, please tell me. If not, I can close it :-)

@YuanPol
Copy link
Author

YuanPol commented Apr 1, 2024

Thank you for your suggestions. now I will close it :)

@YuanPol YuanPol closed this as completed Apr 1, 2024
@shioyadan
Copy link
Member

I'm sorry for the late reply.

Have you tried pipeline visualization with Konata? The logs for the pipeline visualizer contain most of the information for each cycle of the core pipeline.

If you want per-instruction-block statistics, analyzing the log may help you. From this log, you can see when each instruction was fetched and committed.

The following is the format of the log.
RSD's "make kanata" will generate a log for Konata; see the RSD's README.
(Be careful, the command is "kanata".)
https://github.com/shioyadan/Konata/blob/master/docs/kanata-log-format.md

By running Coremark more than once and extracting the information from the second run, as I suggested at the beginning, you may be able to determine the number of execution cycles in each instruction block when most accesses hit the caches.

By the way, I think, how to define the number of "executed" cycles consumed in a fine-grained instruction block is not easy. For example, if you use the difference in commit cycles, it may not reflect the effect of instruction cache misses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants