In [None]:
import math
import matplotlib.pyplot as plt
import numpy as np

Hands-on with the ring test model
=================================

This notebook is meant to provide the building blocks for exploring the performance impacts of various NEURON and CoreNEURON options, using the ring test model (`ringtest.py`).

This model uses a custom MOD file (`halfgap.mod`), so we must start by building `special` using it:

In [None]:
!nrnivmodl mod

Now we can run the `ringtest.py` script, passing any options we want to:

In [None]:
!x86_64/special -python ringtest.py -nt 1

The above command executed in around 0.2s on the author's machine.
**Is that good?** *The author has no idea...*

Typically when examining the performance of a new model, or an existing model on a new system, or indeed a change in software version, we need to look at trends and comparisons.

To illustrate this, we will run the same model using different numbers of CPU threads.
This is steered by the `-nt` option to `ringtest.py`.

In [None]:
def ringtest(*args, mpi=None, repeat=3):
    """TODO: update ringtest.py to write these somewhere and avoid regexing"""
    import re
    from subprocess import check_output, STDOUT

    def run():
        cmd = []
        if mpi is not None:
            cmd += ["mpiexec", "-n", str(mpi)]
        cmd.append("./x86_64/special")
        if mpi is not None:
            cmd.append("-mpi")
        cmd += ["-python", "ringtest.py"]
        cmd += [str(x) for x in args]
        out = check_output(
            cmd,
            shell=False,
            stderr=STDOUT,
            text=True,
        )
        m = re.search("runtime=([0-9\.]+)", out)
        assert m
        return {
            "runtime": float(m.group(1)),
        }

    # run the measurements `repeat` times, to get a basic uncertainty estimate
    data = [run() for _ in range(repeat)]
    return {k: np.array([d[k] for d in data]) for k in data[0].keys()}


def pows_of_2(max):
    """Given a power of 2 (e.g. 16) return all powers of 2 from 1 to there.

    e.g. pows_of_2(16) -> [1, 2, 4, 8, 16]."""
    return [2**n for n in range(int(math.log2(max)) + 1)]


# Save performance data for several different thread counts from 1 to 16
thread_data = {
    "data": {nt: ringtest("-nt", nt) for nt in pows_of_2(max=16)},
    "label": "Thread parallelism",
}

Now we have gathered the simulation runtimes for different thread values, we can plot these:

In [None]:
def scaling_plot(data_dicts, ideal_count=8):
    """
    Given a list of dicts containing scaling data, plot them on a common scale.
    Also include an idealised scaling curve for `ideal_count` processors.
    """
    plt.figure()
    plt.xscale("log", base=2)
    plt.xlabel("Thread / MPI rank count")
    plt.ylabel("Simulation runtime [s]")
    all_x, all_y0 = set(), set()
    for data_dict in data_dicts:
        # e.g. one data_dict for multi-threaded measurements
        xvals = sorted(data_dict["data"].keys())
        yvals, yerrs_low, yerrs_high = [], [], []
        for nt in xvals:
            runtime_measurements = data_dict["data"][nt]["runtime"]
            yvals.append(runtime_measurements.mean())
            yerrs_low.append(max(0, yvals[-1] - runtime_measurements.min()))
            yerrs_high.append(max(0, runtime_measurements.max() - yvals[-1]))
            all_x.add(nt)
            all_y0.add(yvals[-1])
        plt.errorbar(
            xvals, yvals, yerr=(yerrs_low, yerrs_high), label=data_dict["label"]
        )
    # Also draw an idealised perfect-scaling curve for a machine with `ideal_count` cores
    xvals = sorted(all_x)
    plt.plot(
        xvals,
        max(all_y0) / np.minimum(xvals, ideal_count),
        label="Ideal {} cores".format(ideal_count),
    )
    plt.legend()
    return plt.show()


scaling_plot([thread_data])

Based on this, **how many CPU cores do you think this machine has available?**

The `ringtest.py` script supports a lot of other options:

In [None]:
!x86_64/special -python ringtest.py --help

As well as thread-based parallelism, we can also use process-based parallelism via MPI.
In this case, we need to run `ringtest.py` using `mpiexec`.

The `ringtest(...)` helper function defined above can handle this via the `mpi=X` keyword argument.

In [None]:
mpi_data = {
    "data": {
        num_ranks: ringtest("-nt", 1, mpi=num_ranks) for num_ranks in pows_of_2(max=8)
    },
    "label": "MPI parallelism",
}

In [None]:
scaling_plot([thread_data, mpi_data])