Cache

Memory-Hierarchy Understanding Tools

These are a few programs that benchmark different access patterns and graph the results to understand the cache hierarchy of the computer they are run on.

I first saw the technique used for generating these latency numbers in exercise 5.2 on page 476 of Computer Architecture: A Quantitative Approach

The basic idea is that you write a program to walk a contiguous region of memory using different strides and time how long accessing the memory takes. The idea for this exercise comes from the Ph.D. dissertation of Rafael Héctor Saavedra‑Barrera, where he describes the following approach:

Assume that a machine has a cache capable of holding D 4‑byte words, a line size of b words, and an associativity a. The number of sets in the cache is given by D/ab. We also assume that the replacement algorithm is LRU, and that the lowest available address bits are used to select the cache set.

Each of our experiments consists of computing many times a simple floating-point function on a subset of elements taken from a one-dimensional array of N 4-byte elements. This subset is given by the following sequence:

1, s+1, 2s+1, ..., N-s+1

Thus, each experiment is characterized by a particular value of N and s. The stride s allows us to change the rate at which misses are generated, by controlling the number of consecutive accesses to the same cache line, cache set, etc. The magnitude of s varies from 1 to N/2 in powers of two.

He goes on to note

Depending on the magnitudes of N and s in a particular experiment, with respect to the size of the cache (D), the line size (b), and the associativity (a), there are four possible regimes of operations; each of these is characterized by the rate at which misses occur in the cache. A summary of the characteristics of the four regimes is given in table 5.1.

And table 5.1 helpfully summarizes the regimes.

Size of Array	Stride	Frequency of Misses	Time per Iteration
1 ≤ N ≤ D	1 ≤ s ≤ N/2	no misses	T
D < N	1 ≤ s ≤ b	one miss every b/s elements	T
D < N	b ≤ s < N/a	one miss every element	T + M
D < N	N/a ≤ s ≤ N/2	no misses	T

I thought it would be fun to try it, so I wrote a program to do that.

I ran this program on my machine after rebooting in single-user mode like Hennessy and Patterson suggest so that virtual addresses track physical addresses a little closer, and with a little help from gnuplot, I got images like this:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
efi		efi
images		images
results		results
Makefile		Makefile
README.md		README.md
cache.c		cache.c
patterson_cache.c		patterson_cache.c
plot		plot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cache

Memory-Hierarchy Understanding Tools

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ob/cache

Folders and files

Latest commit

History

Repository files navigation

Cache

Memory-Hierarchy Understanding Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages