# Running epidemic simulations under `epyc` in parallel

Having run simulations onew-at-a-time, in this notebooks we run several together in parallel. We do this by making use of the feature in `epyc` that leverages multicore processors, letting us run simulations on several cores. We'll then compare the performance of the two approaches in terms of the "wallclock" times of the two sets of experiments.

When I present this notebook I'm using a laptop with only two cores, and so am limited to at most 2x speed-up. If you have access to a larger machine, change the parameters to use more cores: where and how to do this are clearly labelled. I typically do experimental work on a 16-core machine and use 12 of those cores for simulation, leaving 4 cores free for other things.

In [9]:
# numpy
import numpy

# data handling
import pandas
from pathlib import Path
datasets = Path("../../datasets")
datasets.mkdir(parents=True, exist_ok=True)

# simulation
from epyc import Experiment, Lab, ParallelLab, JSONLabNotebook
from epydemic import ERNetwork, SIR, StochasticDynamics

# plotting
import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'png'
matplotlib.rcParams['font.size'] = 10
import matplotlib.pyplot as plt

As usual, we'll stick to SIR simulation.

## Setting up for comparison

We'll first create a notebook with two result sets, one for sequential and one for parallel processing.

In [10]:
# create a lab notebook backed by a JSON file
nb = JSONLabNotebook(Path(datasets, "08-05-sir-seq-par.json"), create=True)

# add result sets
nb.addResultSet("sir-seq", "Sequential simulations of SIR")
nb.addResultSet("sir-par", "Parallel (multicore) simulations of SIR")

<epyc.resultset.ResultSet at 0x13f1fce10>

## Sequential processing

We'll assume we've already done the sequential experiment, so we can just grab the lab notebook holding the results.

In [11]:
# load the sequential results
nbseq = JSONLabNotebook(Path(datasets, "06-05-sir-seq.json"))

Then we'll copy these results into the appropriate results set in our new notebook, so we only have to manage one.

In [12]:
# copy result set
nb.addResult(nbseq.results(), tag="sir-seq")

## Parallel processing

We now need a lab set for parallel processing. This is simply a matter of instanciating the `ParallelLab` class and telling it how many cores it should use. There are three ways to do this:

- the default, which uses all the available cores
- specify a number of cores with a positive number, such as 12
- specify a number of cores *not* to use with a nagetive number, such as -4

The last two will be the same on a 16-core machine.

It doesn't do any good to exagerate the number of cores available, though: claiming to have 16 when you have 2 will actually make things slower! So first we should check.

In [13]:
from multiprocessing import cpu_count
print("Current system has {c} cores in total".format(c=cpu_count()))

Current system has 8 cores in total


This is the maximum number of cores it makes sense to use.

In [14]:
# pick the number of cores -- change this to change parallelism
nCores = 6

We now create the lab and populate its parameter space as usual.

In [15]:
# create the lab
lab = ParallelLab(notebook=nb, cores=nCores)

# make sure the reuslts go into the right result set
nb.select("sir-par")

# set the disease paramerter space
lab[SIR.P_INFECTED] = 0.01
lab[SIR.P_INFECT] = numpy.linspace(0.0, 1.0,
                                   num=20)
lab[SIR.P_REMOVE] = 0.002

# set the topology for the generated network
lab[ERNetwork.N] = int(1e4)
lab[ERNetwork.KMEAN] = 5

Then we run the experiment as usual &ndash; but this time in parallel.

In [16]:
# build the experiment
p = SIR()
g = ERNetwork()
e = StochasticDynamics(p, g)

# run the experiment
lab.runExperiment(e)

The results seemed to be generated faster that time. How *much* faster depends on the number of cores you have available. We can determine this by extracting the total elapsed time from the two result sets and comparing them.

In [19]:
# grab the two datasets
df_seq = nb.dataframe(tag="sir-seq")
df_par = nb.dataframe(tag="sir-par")

# compute the different wallclock total times
wallclock_seq = (df_seq[Experiment.END_TIME].max() - df_seq[Experiment.END_TIME].min()).total_seconds()
wallclock_par = (df_par[Experiment.END_TIME].max() - df_par[Experiment.END_TIME].min()).total_seconds()

print(f"Sequential {wallclock_seq:.0f}s, Parallel {wallclock_par:.0f}s")
print("Speedup {s:.0f}x on {c} cores".format(s=wallclock_seq / wallclock_par,
                                             c=lab.numberOfCores()))

Sequential 23s, Parallel 6s
Speedup 4x on 6 cores


It's worth noting that you *never* get what's sometimes called *perfect speed-up*, that is to say, a program that runs 12 times as fast on 12 cores. There is always some sequential overhead that slows things down. Getting 10x speed-up on 12 cores is impressive.

The thing to note about the code above is how little it changes between sequential and parallel evaluation. This is because of the assumptions we make about the system: that every experiment is independent of every other, and so can run in parallel with it and in any order. It's perfectly possible to build experiments where this assumption is violated &ndash; for a simple example, think of an `Experiment` class with a class variable that's updated depending on which experiment runs when &ndash; in which case things will get ... interesting, and best avoided. The kind of parallelism that `epyc` supports is called *task parallelism*, and is very efficient when its asumptions are respected.