# Basic model construction

## Introduction
The objective of this notebook is to introduce the _summer_ interface 
that we will use throughout this series of notebooks,
use it to demonstrate the construction of a simple model,
and demonstrate several principles of infectious disease modelling.
Because this is the first notebook/chapter, 
there is a little more content in this notebook introducing the code packages we will be using through the series.
In the following example, we will create an SIR compartmental model for a general, unspecified, immunising infectious disease spreading through a fully susceptible population.

In this model there will be:

- Three compartments, named `susceptible`, `infectious` and `recovered`
- A starting population of 1000 people, with 10 of them infected (and infectious)
- An evaluation timespan from day zero to 20 units of time
- Inter-compartmental flows for infection, deaths and recovery

## Preliminaries
As with any Python script, we first have to import the objects we need,
plus we'll set the visualisation of pandas objects to be interactive.

In [None]:
# If we are running in google colab, pip install the required packages, 
# but do not modify local environments
# This will be present in all subsequent notebooks
try:
  import google.colab
  IN_COLAB = True
  %pip install summerepi2
except:
  IN_COLAB = False

In [None]:
import pandas as pd
import numpy as np
from typing import Dict

from summer2 import CompartmentalModel
from summer2.parameters import Parameter as param

pd.options.plotting.backend = "plotly"

## Model definition
First, let's create a function that gives us a base SIR model with the
basic compartments, starting populations and inter-compartmental flows implemented.
Note the steps in model construction included as comments through the following function.

In [None]:
def get_base_sir_model() -> CompartmentalModel:
    """
    Generate an instance of an SIR model with some standard parameters, 
    population distribution and parameters hard-coded.
    
    Returns:
        The summer model object
    """
    
    # Name the base model compartments
    compartments = (
        "susceptible",
        "infectious",
        "recovered",
    )
    
    # Specify the evaluation times
    analysis_times = (0, 20)
    
    # Build the basic models with this information and specify which compartment is infectious
    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=["infectious"],
    )
    
    # Assign the starting population
    model.set_initial_population(
        distribution={"susceptible": 990, "infectious": 10}
    )
    
    # Add a dynamic infection flow that transitions people from susceptible to infectious
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=1.,
        source="susceptible", 
        dest="infectious",
    )
    
    # Add a constant flow that transitions people from infectious to recovered
    model.add_transition_flow(
        name="recovery", 
        fractional_rate=0.333,
        source="infectious", 
        dest="recovered",
    )
    
    # Add a constant death flow that transitions people from infectious out of the model
    model.add_death_flow(
        name="infection_death", 
        death_rate=0.05,
        source="infectious",
    )
    
    # Return the model object
    return model

### Ordinary differential equation for the infectious compartment
In developing `summer`, we have aimed to move away from notating our models in ODEs
and we will generally avoid ODE notation throughout this textbook.
This is because the `summer` API is intended to support code that is highly expressive,
such that the epidemiological intention can be easily gleaned from reading the code itself.
Therefore, there should be less need for an alternative form of notation.
Further, the ODEs used to notate epidemiological models
do not constitute the code used in simulation,
but rather the modellers intention for the construction of the system.
Therefore, there is considerable potential for the ODEs to diverge from the underlying code,
which is very common in our experience.

Nevertheless, for readers who are familiar with ODE notation,
we present the following expression for the infectious compartment:
$$ \frac{dI}{dS}=\frac{\beta S}{N}-(\gamma + \mu)I $$
where the `infectious` and `susceptible` compartments 
are represented by $I$ and $S$ respectively,
and the contact rate, `recovery` rate and `infection_death` rate
are represented by $\beta$, $\gamma$ and $\mu$ respectively.
Note that the division by $N$ is handled behind the scenes by
_summer_ after the user requested frequency-dependent transmission

### Latent period versus incubation period
The latent period is the time from infection to the onset of infectiousness.
In this simple model, there is no delay between infection and infectiousness,
such that the latent period is zero.
By contrast, the incubation period is the time from infection to symptom onset.
Symptoms are not explicitly represented in this model
(and model structure to represent symptom status may only be necessary
if symptoms lead to some epidemiological change, 
such as through case isolation).

### Sojourn times
The average infectious period in this model is the reciprocal of the sum of the recovery rate and the death rate.
This is because, for a given person entering the `infectious` compartment,
there are two ways to exit the compartment,
so the total rate of leaving the compartment is the sum of these two flow rates.
In other words, when considering the `infectious` compartment
in the absence of any inward flows,
the _per capita_ rate of exiting the compartment is the sum of the two flow rates.
If these rates remain constant over time and in the absence of inward flows,
the size of the compartment at time $t$ is given by:
$$ e ^{-outflows \times t} $$
and the average time in the compartment is given by:
$$ \int_0^\infty e ^{-outflows \times t} dt $$
which is equal to:
$$ \frac{1}{outflows} $$
This can also be termed the "sojourn time" of the compartment.
(Note that this does not imply that half of the population 
will have left the `infectious` category after one sojourn time.)
In this model the sojourn time for the `infectious` compartment is therefore:

In [None]:
print(f"The average sojourn time for the infectious compartment is {round(1. / (0.333 + 0.05), 2)} days.")

### Risks from rates
Often in epidemiology,
we may want to think about the risk of an outcome for a person in a particular state.
For example, we may want to know what the risk of death is
for a person entering the `infectious` compartment in our simple model.
In this case, there are only two possibly outcomes which are applied
together for any person entering this compartment 
(i.e. recovery and infection-related death),
and these flows "compete with one-another".
The risk of following each of the outflows from the compartment is
proportional to the rate of that flow.
In our example, 
the risk of death for a person entering the infectious compartment
is the rate of death divided by the total of all outflows.
(In this example, this could be thought of as the case fatality rate,
if we assume that all "infectious" persons are "cases".)

In [None]:
print(f"The risk of death for infectious persons is {round(0.05 / (0.333 + 0.05) * 100)}%.")

## Getting the model object and its outputs
Now we can use our model building function to get an instance of this model,
run it, and have a look at the compartment size progression over time.
Note that we use the plotting functions built-in to pandas objects to do this.
Pandas is a very widely used library for data processing, which we will use extensively in this series.
Because there may be considerable data wrangling necessary for our model outputs after we have run our model,
we prefer use external libraries for this sort of processing.
This is because these analysis processes are not specific to infectious diseases modelling,
and so it is preferable to use well-curated external libraries.
Here we use pandas' easy integration with _plotly_ to create an interactive plot from the pandas
object in a single line of code.
(Note that this could easily be set to other output formats,
e.g. by setting the pandas plotting backend to _matplotlib_ instead.)

In [None]:
base_model = get_base_sir_model()  # Get the model object
base_model.run()  # Run it
compartment_values = base_model.get_outputs_df()  # Access the outputs as a pandas dataframe
compartment_values.plot()  # Plot

## A "parameter-aware" model
Although this works perfectly well,
we'd prefer to establish some good "habits" for later on in this series.
In particular, we can ask our model object to expect certain parameters,
which we define as parameter objects,
but don't give values yet.
While we're at it, we'll also make the times and starting
population values part of the inputs to this function,
even though they go into the model object just as values.

In [None]:
def get_param_aware_sir_model(
    parameters: Dict,
) -> CompartmentalModel:
    """
    Generate an instance of an SIR model that is expecting parameter values to be provided
    for the transition rates and some other features.
    
    Args:
        parameters: The parameter values to be used in running the model
    Returns:
        The summer model object
    """
    
    # Define the compartments
    compartments = (
        "susceptible",
        "infectious",
        "recovered",
    )
    infectious_compartment = [
        "infectious",
    ]
    analysis_times = (
        parameters["start_time"], 
        parameters["end_time"]
    )

    model = CompartmentalModel(
        times=analysis_times,
        compartments=compartments,
        infectious_compartments=infectious_compartment,
    )
    
    # Check and assign infectious seed
    pop = parameters["population"]
    seed = parameters["seed"]
    suscept_pop = pop - seed
    msg = "Seed larger than population"
    assert pop >= 0., msg
    
    model.set_initial_population(
        distribution={
            "susceptible": suscept_pop, 
            "infectious": seed}
    )
    
    # Add a frequency-dependent transmission flow
    model.add_infection_frequency_flow(
        name="infection", 
        contact_rate=param("contact_rate"), 
        source="susceptible", 
        dest="infectious",
    )
    
    # Add a constant recovery flow
    model.add_transition_flow(
        name="recovery", 
        fractional_rate=param("recovery_rate"), 
        source="infectious", 
        dest="recovered",
    )
    
    # Add a constant infection-related death rate
    model.add_death_flow(
        name="infection_death", 
        death_rate=param("death_rate"), 
        source="infectious",
    )
    return model

Now let's use that function, specifying our parameters all in one go
and feeding them into the model object when we're ready to run it.
We now have a short block of code that we can use to quickly run the model with any parameter values we like.
Have a play around with this cell and see what behaviours get back
(we'll come back to this in future notebooks too).

In [None]:
parameters = {
    "contact_rate": 1.,
    "recovery_rate": 0.333,
    "death_rate": 0.05,
    "population": 1000.,
    "seed": 10.,
    "start_time": 0.,
    "end_time": 20.,
}

param_aware_model = get_param_aware_sir_model(parameters)
param_aware_model.run(parameters=parameters)
compartment_values = param_aware_model.get_outputs_df()
compartment_values.plot()

## Digging into the model object
Now that we have our `CompartmentalModel` object,
we can use this structure to inspect some aspects of what is going on under the surface,
for example, compartments, flows and others attributes.
This is **_highly recommended_**, 
to ensure that the model you have created is consistent with what you were wanting.
Try out using tab complete in this notebook to inspect the range of methods and
attributes that are available for a `CompartmentalModel` object.

In [None]:
print(param_aware_model.compartments)
print(param_aware_model.initial_population)
print(param_aware_model.times)
print(compartment_values)

## Epidemiological messages
This is clearly a very simple model of an epidemic caused by a short-lived pathogen that induces complete immunity in its host.
However, despite its simplicity, it does capture a surprising number of the actual features of an epidemic caused by an infection
of this type.
In general, models of infectious diseases transmission should be as complicated as they need to be,
which means that the additional complexity that we might need to inject into this model is highly dependent on the purpose that
we will be using it for - or the epidemiological question that we will be addressing through our analysis.
It may also be dependent on us having sufficient epidemiological understanding of the epidemic to be able to incorporate these
features - with a reasonable level of confidence that we are actually capturing the processes that we are interested in
(including empiric data to estimate the parameters that we need to build our more complicated model).

Let's think of some of the epidemiological features that this very simple model **_does_** capture:
- Very broadly, this model does give us the shape that epi curves often follow - looking vaguely like a bell
- There is an exponential growth phase when the population remains largely susceptible
- The growth in the epidemic decreases as the proportion of susceptibles decreases
- As the effective reproduction number falls past one, the epidemic peaks and begins to decline
- The epidemic dies out as the susceptibles decrease and the effective reproduction number drops below one
- As the epidemic ends and transmission declines to approximately zero, susceptibles are depleted, but not completely - a proportion of the population remains susceptible even after the epidemic

Let's think of some of the epidemiological features that this model **_does not_** capture:
- Any heterogeneity in the background population with regards progression through infection states after exposure
- Any heterogeneity in transmission, such as greater transmission to people within the population with similar characteristics
- Any heterogeneity in the pathogen, such as multiple strains with different characteristics circulating through the population
- Any changes in how people transition through their stages over time that might be induced through changes external to the model
(i.e. other than those related to the changes in the population distribution across compartments resulting from transmission of this immunising pathogen)
- Tracking any outputs other than the sizes of the model's compartments

This simple model also incorporates a death rate, allowing for people to exit the model during the simulation.
This can be an important consideration, and one that we will come back to in later notebooks on population demographics.
The terminology for this type of model is that we are simulating an "open" population,
whereas if we had omitted the death flow (which would give quite similar epidemiological results)
we would have a fixed total population size over time, or a "closed" population.

We will return to these features and how to elaborate our base model to capture them over the following notebooks.

## Summary

To summarise, now we know how to:

- Create a model
- Assing the starting population
- The syntax for adding flows (although we'll come back to what these mean)
- Run the model
- Access and visualise the outputs

A detailed API reference for the `CompartmentalModel` class can be found [here](http://summerepi.com/api/model.html).