# PyChe MPI + Cython Benchmark Notebook

This notebook shows how to benchmark `backend='auto'` vs `backend='cython'` with MPI.

Run from the repository root (`PyChe/`).

## 1) Build Cython extensions (once per code change)

```bash
pip install cython
python setup.py build_ext --inplace
```

In [None]:
import os
import re
import subprocess
from statistics import median


In [None]:
def run_case(backend: str, ranks: int = 8, endoftime: int = 13700, run_id: str = 'bench'):
    cmd = [
        'mpiexec', '-n', str(ranks), 'python', '-c',
        (
            "from pyche import GCEModel; "
            "m=GCEModel(); "
            f"m.GCE({endoftime},3000.0,50.0,0.3,0.0,10000,10000,"
            "use_mpi=True,show_progress=False,"
            f"backend='" + backend + "',"
            f"output_dir='benchmarks/{run_id}_{backend}',"
            "output_mode='dataframe',df_binary_format='pickle',df_write_csv=False,"
            "profile_timing=True)"
        )
    ]
    p = subprocess.run(cmd, capture_output=True, text=True, check=True)
    out = p.stdout
    m = re.search(r"timing profile \(s\): total=([0-9.]+), interp=([0-9.]+), mpi_reduce=([0-9.]+), death=([0-9.]+), wind=([0-9.]+), output=([0-9.]+), other=([0-9.]+)", out)
    if not m:
        raise RuntimeError('Could not parse timing profile from output\n' + out)
    keys = ['total','interp','mpi_reduce','death','wind','output','other']
    vals = {k: float(v) for k, v in zip(keys, m.groups())}
    return vals, out


In [None]:
def run_repeats(backend: str, repeats: int = 3, ranks: int = 8):
    samples = []
    for i in range(repeats):
        vals, _ = run_case(backend=backend, ranks=ranks, run_id=f'r{i+1}')
        samples.append(vals)
    med = {k: median([s[k] for s in samples]) for k in samples[0].keys()}
    return samples, med


In [None]:
# Example: 3 repeats for auto and cython
auto_samples, auto_med = run_repeats('auto', repeats=3, ranks=8)
cy_samples, cy_med = run_repeats('cython', repeats=3, ranks=8)
auto_med, cy_med


In [None]:
speedup = auto_med['total'] / cy_med['total']
print('Median total auto:', auto_med['total'])
print('Median total cython:', cy_med['total'])
print('Speedup auto/cython:', speedup)


## Notes

- Keep `show_progress=False` for cleaner timings.
- Compare medians (not single runs) due to cluster jitter.
- For high-fidelity comparisons, pin CPU affinity and run on a quiet node.