# ChampSim analysis: pointer chasing vs streaming

This notebook parses **local** ChampSim outputs in `../results/` (which are intentionally *not* tracked in git).

It produces:
- Cache hit/miss rates by hierarchy level (when available in output)
- A simple comparison dashboard across workloads
- Extra recommended metrics: MPKI (misses / 1K instructions) and estimated memory stall pressure (when present)

Plots are saved locally under `../results/plots/`.

# Import Required Libraries

Import the necessary libraries for parsing results and plotting.

> This repo writes ChampSim text outputs to `../results/<workload>/sim.txt` and saves plots to `../results/plots/` (all ignored by git).

In [None]:
from pathlib import Path
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

ROOT = Path('..').resolve()
RESULTS_DIR = ROOT / 'results'
PLOTS_DIR = RESULTS_DIR / 'plots'
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

print('RESULTS_DIR:', RESULTS_DIR)
print('Result subdirs:', [p.name for p in RESULTS_DIR.glob('*') if p.is_dir() and p.name != 'plots'])

## Parsing notes

ChampSim output formats can vary by version/config. This parser is **best-effort** and looks for common patterns like:
- `L1D` / `L2C` / `LLC` hit/miss and hit rate
- total instructions and misses to compute MPKI

If your `sim.txt` uses different labels, tweak the regex table in the next cell.

In [None]:
LEVELS = ['L1I', 'L1D', 'L2C', 'LLC']

# Best-effort regex patterns (ChampSim output varies by version/config).
patterns = {
    'hit_rate': [
        re.compile(r'^(?P<level>L1I|L1D|L2C|LLC)\s+HIT\s+RATE\s*[:=]\s*(?P<value>[0-9.]+)%', re.I),
        re.compile(r'^(?P<level>L1I|L1D|L2C|LLC).*hit\s*rate\s*[:=]\s*(?P<value>[0-9.]+)%', re.I),
    ],
    'hits': [
        re.compile(r'^(?P<level>L1I|L1D|L2C|LLC).*\bhits\b\s*[:=]\s*(?P<value>\d+)', re.I),
    ],
    'misses': [
        re.compile(r'^(?P<level>L1I|L1D|L2C|LLC).*\bmiss(?:es)?\b\s*[:=]\s*(?P<value>\d+)', re.I),
    ],
    'instructions': [
        re.compile(r'\b(instructions|instrs)\b\s*[:=]\s*(?P<value>\d+)', re.I),
    ],
}

def parse_sim_text(text: str) -> dict:
    out = {}
    for lvl in LEVELS:
        out[(lvl, 'hit_rate_pct')] = np.nan
        out[(lvl, 'hits')] = np.nan
        out[(lvl, 'misses')] = np.nan
    out[('overall', 'instructions')] = np.nan

    for line in text.splitlines():
        line = line.strip()
        if not line:
            continue

        for rx in patterns['hit_rate']:
            m = rx.search(line)
            if m:
                out[(m.group('level').upper(), 'hit_rate_pct')] = float(m.group('value'))
                break

        for rx in patterns['hits']:
            m = rx.search(line)
            if m:
                out[(m.group('level').upper(), 'hits')] = float(m.group('value'))
                break

        for rx in patterns['misses']:
            m = rx.search(line)
            if m:
                out[(m.group('level').upper(), 'misses')] = float(m.group('value'))
                break

        for rx in patterns['instructions']:
            m = rx.search(line)
            if m:
                out[('overall', 'instructions')] = float(m.group('value'))
                break

    # derive hit rate from hits/misses if missing
    for lvl in LEVELS:
        hr = out[(lvl, 'hit_rate_pct')]
        hits = out[(lvl, 'hits')]
        misses = out[(lvl, 'misses')]
        if (np.isnan(hr) or hr == 0) and (not np.isnan(hits)) and (not np.isnan(misses)) and (hits + misses > 0):
            out[(lvl, 'hit_rate_pct')] = 100.0 * hits / (hits + misses)

    # MPKI: prefer deeper level misses if present
    instr = out[('overall', 'instructions')]
    miss_candidates = [out[(lvl, 'misses')] for lvl in ['LLC', 'L2C', 'L1D']]
    miss = next((m for m in miss_candidates if not np.isnan(m)), np.nan)
    out[('overall', 'mpki')] = (1000.0 * miss / instr) if (not np.isnan(instr) and instr > 0 and not np.isnan(miss)) else np.nan

    return out

def load_results(results_dir: Path) -> pd.DataFrame:
    rows = []
    for sub in sorted([p for p in results_dir.glob('*') if p.is_dir() and p.name != 'plots']):
        sim_txt = sub / 'sim.txt'
        if not sim_txt.exists():
            continue
        d = parse_sim_text(sim_txt.read_text(errors='replace'))
        d[('workload', 'name')] = sub.name
        rows.append(d)

    if not rows:
        return pd.DataFrame()

    df = pd.DataFrame(rows)
    df.columns = pd.MultiIndex.from_tuples(df.columns)
    return df

df = load_results(RESULTS_DIR)
df

## Cache hit/miss rate plots

If ChampSim printed hit rates per level, we plot them. If it printed hits/misses only, we derive hit rate.

In [None]:
if df.empty:
    raise RuntimeError('No results found. Run scripts/run_traces.sh successfully first (results/*/sim.txt).')

# Flatten to tidy form for plotting
rows = []
for _, r in df.iterrows():
    w = r[('workload', 'name')]
    for lvl in LEVELS:
        hr = r.get((lvl, 'hit_rate_pct'), np.nan)
        hits = r.get((lvl, 'hits'), np.nan)
        misses = r.get((lvl, 'misses'), np.nan)
        if np.isnan(hr) and np.isnan(hits) and np.isnan(misses):
            continue
        rows.append({
            'workload': w,
            'level': lvl,
            'hit_rate_pct': hr,
            'hits': hits,
            'misses': misses,
        })
tidy = pd.DataFrame(rows)
tidy

In [None]:
sns.set_theme(style='whitegrid')
plt.figure(figsize=(9, 4))
ax = sns.barplot(data=tidy, x='level', y='hit_rate_pct', hue='workload')
ax.set_ylabel('Hit rate (%)')
ax.set_xlabel('Cache level')
ax.set_title('Cache hit rate by level')
plt.legend(title='Workload', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
out = PLOTS_DIR / 'hit_rate_by_level.png'
plt.savefig(out, dpi=160)
out

## Recommended extra metric: MPKI

MPKI = (misses / instructions) * 1000. For pointer-chasing workloads it often spikes, even if raw hit rate looks ‘okay’.

In [None]:
mpki = df[[('workload','name'), ('overall','mpki')]].copy()
mpki.columns = ['workload', 'mpki']
mpki = mpki.sort_values('workload')
mpki

In [None]:
plt.figure(figsize=(7, 3.5))
ax = sns.barplot(data=mpki, x='workload', y='mpki')
ax.set_ylabel('MPKI')
ax.set_xlabel('Workload')
ax.set_title('MPKI (estimated from deepest available misses)')
plt.xticks(rotation=20, ha='right')
plt.tight_layout()
out = PLOTS_DIR / 'mpki.png'
plt.savefig(out, dpi=160)
out

## About ‘parallel operations of add’

Your two C programs are single-threaded, so there isn’t true parallelism at the software level.

What we *can* analyze from the memory system perspective is **memory-level parallelism (MLP)**: multiple outstanding cache misses that overlap. Some ChampSim configurations report MSHR occupancy / average miss latency / outstanding misses; if your output includes these, we can add them here.

If you want, I can extend the parser once we see the exact `sim.txt` format produced by your ChampSim build.